Article List
Explore latest news, discover interesting content, and dive deep into topics that interest you
AlignEval: Building an App to Make Evals Easy, Fun, and Aut…
Look at and label your data, build and evaluate your LLM-evaluator, and optimize it against your labels....
Weights & Biases LLM-Evaluator Hackathon - Hackathon Judge
Being a human judge at the Weights & Biases LLM-as-a-Judge Hackathon...
Building the Same App Using Various Web Frameworks
FastAPI, FastHTML, Next.js, SvelteKit, and thoughts on how coding assistants influence builders' choices....
Evaluating the Effectiveness of LLM-Evaluators (aka LLM-as-…
Use cases, techniques, alignment, finetuning, and critiques against LLM-evaluators....
How to Interview and Hire ML/AI Engineers
What to interview for, how to structure the phone screen, interview loop, and debrief, and a few tips....
AI Engineer 2024 Keynote - What We Learned from a Year of L…
Special double-feature closing keynote from the 6 authors of the hit O'Reilly article on Applied LLMs....
Netflix PRS 2024 - Applying LLMs to Recommendation Experien…
Challenges and lessons from deploying LLM experiences: evals, scalability, guardrails....
Prompting Fundamentals and How to Apply them Effectively
Structured input/output, prefilling, n-shots prompting, chain-of-thought, reducing hallucinations, etc....
What We've Learned From A Year of Building with LLMs
From the tactical nuts & bolts to the operational day-to-day to the long-term business strategy....
Building an AI Coach to Help Tame My Monkey Mind
Building an AI coach with speech-to-text, text-to-speech, an LLM, and a virtual number....
Task-Specific LLM Evals that Do & Don't Work
Evals for classification, summarization, translation, copyright regurgitation, and toxicity....
Don't Mock Machine Learning Models In Unit Tests
How unit testing machine learning code differs from typical software practices...
How to Generate and Use Synthetic Data for Finetuning
Overcoming the bottleneck of human annotations in instruction-tuning, preference-tuning, and pretraining....
Language Modeling Reading List (to Start Your Paper Club)
Some fundamental papers and a one-sentence summary for each; start your own paper club!...
2023 Year in Review
An expanded charter, lots of writing and speaking, and finally learning to snowboard....
Push Notifications: What to Push, What Not to Push, and How…
Sending helpful & engaging pushes, filtering annoying pushes, and finding the frequency sweet spot....
Out-of-Domain Finetuning to Bootstrap Hallucination Detecti…
How to use open-source, permissive-use data and collect less labeled samples for our tasks....
Reflections on AI Engineer Summit 2023
The biggest deployment challenges, backward compatibility, multi-modality, and SF work ethic....