hchang 12 Aug 2022
왜 쓰냐면 쓰면 좋기 때문이다.
일단 알겠는 부분 들
shuffle View source
shuffle(
buffer_size, seed=None, reshuffle_each_iteration=None, name=None
)
Randomly shuffles the elements of this dataset.
This dataset fills a buffer with buffer_size elements, then randomly samples elements from this buffer, replacing the selected elements with new elements. For perfect shuffling, a buffer size greater than or equal to the full size of the dataset is required.
For instance, if your dataset contains 10,000 elements but buffer_size is set to 1,000, then shuffle will initially select a random element from only the first 1,000 elements in the buffer. Once an element is selected, its space in the buffer is replaced by the next (i.e. 1,001-st) element, maintaining the 1,000 element buffer.
reshuffle_each_iteration controls whether the shuffle order should be different for each epoch. In TF 1.X, the idiomatic way to create epochs was through the repeat transformation:
dataset = tf.data.Dataset.range(3)
dataset = dataset.shuffle(3, reshuffle_each_iteration=True)
dataset = dataset.repeat(2)
# [1, 0, 2, 1, 2, 0]
Args | |
---|---|
buffer_
|
A tf.int64 scalar tf.Tensor , representing the number of
elements from this dataset from which the new dataset will sample.
|
seed
|
(Optional.) A tf.int64 scalar tf.Tensor , representing the random
seed that will be used to create the distribution. See
tf.random.set_seed for behavior.
|
reshuffle_
|
(Optional.) A boolean, which if true indicates
that the dataset should be pseudorandomly reshuffled each time it is
iterated over. (Defaults to True .)
|
name
|
(Optional.) A name for the tf.data operation. |
batch View source
batch(
batch_size,
drop_remainder=False,
num_parallel_calls=None,
deterministic=None,
name=None
)
Combines consecutive elements of this dataset into batches.
dataset = tf.data.Dataset.range(8)
dataset = dataset.batch(3)
list(dataset.as_numpy_iterator())
dataset = tf.data.Dataset.range(8)
dataset = dataset.batch(3, drop_remainder=True)
list(dataset.as_numpy_iterator())
The components of the resulting element will have an additional outer dimension, which will be batch_size (or N % batch_size for the last element if batch_size does not divide the number of input elements N evenly and drop_remainder is False). If your program depends on the batches having the same outer dimension, you should set the drop_remainder argument to True to prevent the smaller batch from being produced.
Args | |
---|---|
batch_
|
A tf.int64 scalar tf.Tensor , representing the number of
consecutive elements of this dataset to combine in a single batch.
|
drop_
|
(Optional.) A tf.bool scalar tf.Tensor , representing
whether the last batch should be dropped in the case it has fewer than
batch_ elements; the default behavior is not to drop the smaller
batch.
|
num_
|
(Optional.) A tf.int64 scalar tf.Tensor ,
representing the number of batches to compute asynchronously in
parallel.
If not specified, batches will be computed sequentially. If the value
tf.data.AUTOTUNE is used, then the number of parallel
calls is set dynamically based on available resources.
|
deterministic
|
(Optional.) When num_ is specified, if this
boolean is specified (True or False ), it controls the order in which
the transformation produces elements. If set to False , the
transformation is allowed to yield elements out of order to trade
determinism for performance. If not specified, the
tf.data.Options.deterministic option (True by default) controls the
behavior.
|
name
|
(Optional.) A name for the tf.data operation. |
repeat View source
repeat(
count=None, name=None
)
Repeats this dataset so each original value is seen count times.
dataset = tf.data.Dataset.from_tensor_slices([1, 2, 3])
dataset = dataset.repeat(3)
list(dataset.as_numpy_iterator())
-> [1,2,3,1,2,3,1,2,3]
Args | |
---|---|
count
|
(Optional.) A tf.int64 scalar tf.Tensor , representing the
number of times the dataset should be repeated. The default behavior (if
count is None or -1 ) is for the dataset be repeated indefinitely.
|
name
|
(Optional.) A name for the tf.data operation. |
scan 이랑 map도 알긴 하겠음.