RNN network

The Problem with time:

Requires buffering the whole sequence before processing
Forces fixed-length sequences
Cannot represent relative temporal structure independently of absolute position

Why are standard feedforward neural networks poorly suited for sequence learning tasks?

This is because sequences can have variable lengths, require an excessive number of parameters, and fail to share learned features across different positions in the input.

The RNN tracks the context by maintaining a hidden state at each time step. A feedback loop is created by passing the hidden state from one-time step to the next. The hidden state acts as a memory that stores information about previous inputs. At each time step, the RNN processes the current input plus the hidden state that is storing information of previous timesteps, allowing the network to remember previous information and use it to process the current and produce the new output.
Each layer of the network shares the same weight parameter, it uses the same weight for each timestep.
BPTT differs from the traditional approach in that BPTT sums errors at each time step whereas feedforward networks do not need to sum errors as they do not share parameters across each layer.

RNN network

Comments