Definition
A technique that normalizes the inputs across the features in a layer to stabilize and accelerate neural network training. Unlike batch normalization it operates on a single training example.
Detailed Explanation
Layer normalization computes the mean and variance used for normalization from all neurons in a layer for each training example independently. This makes it particularly effective for recurrent neural networks and transformers where batch statistics might be unstable. The technique helps reduce training time and improves model stability.
Use Cases
Transformer architectures Recurrent neural networks Deep neural networks Natural language processing models