Timeseries dataset generation in tensorflow


I find tf.keras.utils.timeseries_dataset_from_array difficult to use for the reasons like 1) it picks the first number in an array as y value (as opposed to the last), 2) it doesn’t allow to use multiple values as y.

So I wrote a trivial code example that anyone can easily customize. In the code below, given [0, 1, 2, 3, …, 8, 9], we use window of size 3 (sequence_length in the below) to predict the next 2 values (days_to_predict in the below).

from tensorflow.data import Dataset

X = np.arange(100)
X = X[:, np.newaxis]

sequence_length = 5
days_to_predict = 2

X_tensor = tf.convert_to_tensor(X)
X_ds = Dataset.from_tensor_slices(X_tensor[:-days_to_predict]).window(
    sequence_length, shift=1, drop_remainder=True).flat_map(
        lambda x: x.batch(sequence_length))
y_ds = Dataset.from_tensor_slices(
        X_tensor[:, 0], -sequence_length, axis=0)[:-sequence_length]).window(
            days_to_predict, shift=1, drop_remainder=True).flat_map(
                lambda x: x.batch(days_to_predict))
ds = Dataset.zip((X_ds, y_ds.batch(1))).batch(batch_size, drop_remainder=True)

for batch in ds:
    print(batch[0], batch[1])


Leave a Reply

Your email address will not be published. Required fields are marked *