Sparse MoE Pretraining with Adaptive Computation
A detailed exploration of adaptive computation techniques for sparse Mixture of Experts models during pretraining and inference
These are ideas I've mulled over through the years, explored in short bursts between projects, but never pursued beyond initial experiments. Rather than let them fade away in old notebooks, I'm sharing them here as small technical blogs—imperfect, unfinished, but perhaps useful to someone on a similar journey.
A detailed exploration of adaptive computation techniques for sparse Mixture of Experts models during pretraining and inference
A comprehensive survey of adaptive computation techniques in neural networks beyond fixed-compute conditional computation
A comprehensive analysis of scaling laws for MoE models across different architectures and dimensions
Developing methods to automatically optimize data mixtures when continually training LLMs
How do we use continual learning for the pretraining stage of LLMs?
Using transformers and reinforcement learning for product search relevance