Soumajyoti Sarkar

Background

I am an Applied Scientist at Amazon AGI Foundations, where I am part of a team that focuses on understanding how to scale language model pretraining. My own work involves research related to developing new techniques for scaling up pretraining through the utilization of conditional computation (sparse mixture-of-experts models), adaptive compute for mitigating inference inefficiencies and understanding how scaling laws fit in these regimes. Along those lines, I also work on algorithm-system co-design for architectures that not only scale along the compute axis but where compute optimality can be practically achieved on modern accelerators.

Prior to this, I was part of the AI Research and Education (AIRE) group at AWS working on foundational models for structured knowledge grounding and training text embedding models that scale in distributed training environments. Even before, I contributed both as an ML engineer and researcher in the areas of search and recommendations at Twitter in San Francisco. I was part of the Tweet Search Ranking team where I worked on prototyping and deploying Twitter's first content based search relevance model utilizing explicit user survey feedback in Twitter's Search service.

I obtained my PhD in computer science at the Arizona State University, Tempe. I was advised by Paulo Shakarian. My thesis focused on measuring the impact of social network interactions using observational and experimental studies. During my graduate studies, I had the opportunity to spend summers at Amazon A9 in Palo Alto and Nokia Bell Labs in New Jersey. I live in and work remotely from San Francisco, California. I finished my undergraduate studies at the Indian Institute of Engineering Science and Technology (IIEST), Shibpur.

Research

My research interests include topics in large-scale machine learning, including data and model efficient pretraining, scaling in the data-limited infinite compute regime and distributed ML optimization. In ancient times, I used to work in computational social science, and their applications in search and recommendation systems. I also enjoy doing independent research in the fields at the intersection of economics and machine learning, mainly with decision making in peer lending platforms and reinforcement learning.

You can find an updated list of published papers and preprints in my Google Scholar page. Please feel free to reach out to me using my email on anything related to my research, paper reviews or any collaborations. For more detailed information on my past work, please check the Interests and Unpublished sections in the navigation bar.

News

Check out the technical report on Amazon Nova 2024 models.

Check out our recent papers on contrastive learning and sparse pretraining - EMC^2: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence accepted in ICML 2024 and Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning (code and models to be released).