Ideas I've carried over the years

These are ideas I've mulled over through the years, explored in short bursts between projects, but never pursued beyond initial experiments. Rather than let them fade away in old notebooks, I'm sharing them here as small technical blogs—imperfect, unfinished, but perhaps useful to someone on a similar journey.

Sparse MoE Pretraining with Adaptive Computation

A detailed exploration of adaptive computation techniques for sparse Mixture of Experts models during pretraining and inference

15 Oct 2024 by Soumajyoti Sarkar

Adaptive Compute - Survey of Existing Techniques

A comprehensive survey of adaptive computation techniques in neural networks beyond fixed-compute conditional computation

15 May 2024 by Soumajyoti Sarkar

Scaling Law Studies on Mixture of Experts Training

A comprehensive analysis of scaling laws for MoE models across different architectures and dimensions

15 Nov 2023 by Soumajyoti Sarkar

Target Agnostic Optimization of Data Distribution Mixtures in LLM Continual Learning

Developing methods to automatically optimize data mixtures when continually training LLMs

15 Jun 2023 by Soumajyoti Sarkar

Efficient continual pre-training of LLMs

How do we use continual learning for the pretraining stage of LLMs?

05 May 2023 by Soumajyoti Sarkar

Neural Architecture Search for Improved Transformer Models

Using transformers and reinforcement learning for product search relevance

15 Apr 2023 by Soumajyoti Sarkar