Article List
Explore latest news, discover interesting content, and dive deep into topics that interest you
Synthetic Data for LLM Training
Training foundation models at scale is constrained by data. Whether working with text, code, images, or multimodal inputs, the public datasets are sat...
What are LLM Embeddings: All you Need to Know
Embeddings are a numerical representation of text. They are fundamental to the transformer architecture and, thus, all Large Language Models (LLMs). I...
Detecting and Fixing ‘Dead Neurons’ in Foundation Models
In neural networks, some neurons end up outputting near-zero activations across all inputs. These so-called “dead neurons” degrade model capacity beca...
Part 2: Instruction Fine-Tuning: Evaluation and Advanced Te…
In the first part of this series, we covered the fundamentals of instruction fine-tuning (IFT). We discussed how training LLMs on prompt-response pair...
How to Optimize LLM Inference
Large Language Model (LLM) inference at scale is challenging as it involves transferring massive amounts of model parameters and data and performing c...
A Researcher’s Guide to LLM Grounding
Large Language Models (LLMs) can be thought of as knowledge bases. During training, LLMs observe large amounts of text. Through this process, they enc...
Part 1: Instruction Fine-Tuning: Fundamentals, Architecture…
Instruction Fine-Tuning (IFT) emerged to address a fundamental gap in Large Language Models (LLMs): aligning next-token prediction with tasks that dem...
Understanding Prompt Injection: Risks, Methods, and Defense…
Here’s something fun to start with: Open ChatGPT and type, “Use all the data you have about me and roast me. Don’t hold back.”...
ML MLOps
SabiYarn: Advancing Low-Resource Languages With Multitask N…
In recent years, Large Language Models (LLMs) have mostly improved by scaling. This has primarily involved increasing the size of the LLMs and the dat...
How to Monitor, Diagnose, and Solve Gradient Issues in Foun…
As foundation models scale to billions or even trillions of parameters, they often exhibit training instabilities, particularly vanishing and explodin...
STUN: Structured-Then-Unstructured Pruning for Scalable MoE…
Mixture-of-Experts (MoEs) architectures offer a promising solution by sparsely activating specific parts of the model, reducing the inference overhead...
Evaluating RAG Pipelines
Retrieval-augmented generation (RAG) is a technique for augmenting the generative capabilities of a large language model (LLM) by integrating it with...