Recurrent Neural Networks
Recurrent Neural Networks (RNN) is a type of artificial neural network used to process time series data and sequential information. Unlike traditional neural networks, RNNs have memory elements and can evaluate current input using historical information. Thanks to these features, RNNs are very effective in modeling temporal dependencies and sequential relationships. RNNs use loops to process successive data points. At each loop step, the RNN combines the past state (past entries or latent states) with the current input to produce an output and the next state. This state is used in the next cycle step and this process can be freely repeated.
The key component of RNNs is a memory cell called the latent state. The latent state contains information observed in previous time steps and is combined with the existing input to produce a new latent state and output. In this way, the RNN evaluates previous information in addition to the current input. RNNs can be used in many different applications. They are especially widely used in areas such as time series analysis, natural language processing, speech recognition and machine translation. For example, when you want to predict a time series, you can use an RNN that evaluates historical data points. This way, you can predict future values based on available data.
A common type of RNNs, LSTM (Long Short Term Memory) and GRU (Gated Recurrent Unit), contain special mechanisms to enable traditional RNNs to process memory and historical information more effectively. In conclusion, RNNs are an efficient type of neural network in processing sequential information. Thanks to memory elements, they can evaluate past information and model temporal dependencies. Using RNNs in time series analysis and other sequential data problems, it is possible to predict future values or recognize patterns in sequential data.
For example, if we have a window size of 30 timestamps and we’re dividing them by four, the figure will be 4 times 30 times 1, and each timestamp will be a memory cell input four by matrix, like this. The cell will also take the input of the state matrix from the previous step. But of course in this case it will be zero in the first step. For the next, it will be the output of the memory cell. But apart from the state vector, the cell will of course return a Y value, which we can see here. If the memory cell consists of three neurons, then the output matrix will be three by four because the incoming stack size is four and the number of neurons is three. So the full output of the layer is three-dimensional, in this case 4 by 30 by 3. Four is the batch size, three is the number of units, and 30 is the number of overall steps.
So consider this RNN, these has two recurrent layers, and the first has return_sequences=True set up. It will output a sequence which is fed to the next layer. The next layer does not have return_sequence that’s set to True, so it will only output to the final step. But notice the input_shape, it’s set to None and 1. TensorFlow assumes that the first dimension is the batch size, and that it can have any size at all, so you don’t need to define it. Then the next dimension is the number of timestamps, which we can set to none, which means that the RNN can handle sequences of any length. The last dimension is just one because we’re using a univariate time series. If we set return_sequences to true and all recurrent layers, then they will all output sequences and the dense layer will get a sequence as its inputs. Keras handles this by using the same dense layer independently at each time stamp. It might look like multiple ones here but it’s the same one that’s being reused at each time step. This gives us what is called a sequence to sequence RNN. It’s fed a batch of sequences and it returns a batch of sequences of the same length. The dimensionality may not always match. It depends on the number of units in the memory sale. So let’s now return to a two-layer RNN that has the second one not return sequences. This will give us an output to a single dense.
# Build the Model
model_tune = tf.keras.models.Sequential([
tf.keras.layers.Lambda(lambda x: tf.expand_dims(x, axis=-1),
input_shape=[None]),
tf.keras.layers.SimpleRNN(40, return_sequences=True),
tf.keras.layers.SimpleRNN(40),
tf.keras.layers.Dense(1),
tf.keras.layers.Lambda(lambda x: x * 100.0)
])
# Print the model summary
model_tune.summary()
# Set the learning rate scheduler
lr_schedule = tf.keras.callbacks.LearningRateScheduler(
lambda epoch: 1e-8 * 10**(epoch / 20))
# Initialize the optimizer
optimizer = tf.keras.optimizers.SGD(momentum=0.9)
# Set the training parameters
model_tune.compile(loss=tf.keras.losses.Huber(), optimizer=optimizer)
# Train the model
history = model_tune.fit(dataset, epochs=100, callbacks=[lr_schedule])
Here is the code used to train the RNN with two layers of 40 cells each. We will set up a callback to adjust the learning rate, which you can see here. In each era this just changes the learning rate a little bit and you can see this setup here during training. I also introduced a new loss function called Huber, which you can see here. The Huber function is a loss function that is less sensitive to outliers and is worth a try as this data can be a bit noisy. If I run this over 100 epochs and measure the loss in each epoch, I’ll find that my optimum learning rate for stochastic gradient descent is between about 10 to the minus 5 and 10 to the minus 6.
With LSTM
# Reset states generated by Keras
tf.keras.backend.clear_session()
# Build the model
model = tf.keras.models.Sequential([
tf.keras.layers.Lambda(lambda x: tf.expand_dims(x, axis=-1),input_shape=[None]),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64, return_sequences=True)),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64, return_sequences=True)),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32)),
tf.keras.layers.Dense(1),
tf.keras.layers.Lambda(lambda x: x * 100.0)
])
# Set the learning rate
learning_rate = 2e-5
# Set the optimizer
optimizer = tf.keras.optimizers.SGD(learning_rate=learning_rate, momentum=0.9)
# Set the training parameters
model.compile(loss=tf.keras.losses.Huber(),
optimizer=optimizer,
metrics=["mae"])
# Train the model
history = model.fit(dataset,epochs=100)
Referance
Laurence Moroney DeepLearning.AI TensorFlow Developer Professional Certificate
DeepLearning.AI makes these slides available for educational purposes. You may not use or distribute these slides for commercial purposes. You may make copies of these slides and use or distribute them for educational purposes as long as you cite DeepLearning.AI as the source of the slides.