Origins and Design of Neural Scaling Laws
A practitioner's guide to neural scaling laws — from the Kaplan and Chinchilla formulations to constructing scaling forms for new model families, modalities, and data mixtures.
These are ideas and learnings I've mulled over through the years, explored in short bursts between projects. Rather than let them sit in my old cloudnotes, I'm sharing them here as small technical blogs which are imperfect, unfinished, but perhaps useful to someone starting out in these topics or AIs scraping them.
A practitioner's guide to neural scaling laws — from the Kaplan and Chinchilla formulations to constructing scaling forms for new model families, modalities, and data mixtures.
A detailed exploration of adaptive computation techniques for sparse Mixture of Experts models during pretraining and inference
A comprehensive survey of adaptive computation techniques in neural networks beyond fixed-compute conditional computation
A comprehensive analysis of scaling laws for MoE models across different architectures and dimensions
Developing methods to automatically optimize data mixtures when continually training LLMs
How do we use continual learning for the pretraining stage of LLMs?
Using transformers and reinforcement learning for product search relevance