Tags adam1 adamw1 analysis1 article1 batch-norm1 bayesian1 bengio1 book-review3 books1 deep learning1 discussion-group1 dropout1 embeddings1 essays1 gradient descent1 humanities1 language models1 layer-norm1 literature1 momentum1 neural networks4 nlp1 normalization1 optimization1 overfitting1 poem1 regularization1 rms-norm1 sgd1 short-stories1 training1 training-stability1 uncertainty1 year-in-books3