Article List

Explore latest news, discover interesting content, and dive deep into topics that interest you

Clear Filters
Open Source AI Research

Reward Hacking Resarch Update

Interim report on ongoing work on reward hacking...

3 months ago Blog on Ele…
41 words 1 min
Open Source AI Research

Pretraining Data Filtering for Open-Weight AI Safety

Announcing Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs...

4 months, 4 weeks ago Blog on Ele…
99 words 1 min
Open Source AI Research

Attention Probes

Adding attention to linear probes...

5 months, 1 week ago Blog on Ele…
29 words 1 min
Open Source AI Research

Research Update: Applications of Local Volume Measurement

Research update on on applying local volume measurement to downstream tasks...

6 months, 2 weeks ago Blog on Ele…
65 words 1 min
Open Source AI Research

Studying inductive biases of random networks via local volu…

In this post, we will study inductive biases of the parameter-function map of random neural networks using star domain volume estimates. This builds o...

7 months ago Blog on Ele…
495 words 1 min
Open Source AI Research

The Common Pile v0.1

Announcing the Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text...

7 months ago Blog on Ele…
75 words 1 min
Open Source AI Research

Product Key Memory Sparse Coders

Using Product Key Memories to encode sparse coder features...

7 months, 1 week ago Blog on Ele…
50 words 1 min
Open Source AI Research

SAEs trained on the same data don’t learn the same features

In this post, we show that when two TopK SAEs are trained on the same data, with the same batch order but with different random initializations, there...

1 year ago Blog on Ele…
429 words 1 min
Open Source AI Research

Partially rewriting an LLM in natural language

Using interpretations of SAE latents to simulate activations....

1 year, 2 months ago Blog on Ele…
54 words 1 min
Open Source AI Research

Third-party evaluation to identify risks in LLMs’ training …

An overview of the minetester and preliminary work...

1 year, 2 months ago Blog on Ele…
43 words 1 min
Open Source AI Research

Mechanistic Anomaly Detection Research Update 2

Interim report on ongoing work on mechanistic anomaly detection...

1 year, 2 months ago Blog on Ele…
55 words 1 min
Open Source AI Research

RLHF and RLAIF in GPT-NeoX

GPT-NeoX now supports post-training thanks to a collaboration with SynthLabs....

1 year, 3 months ago Blog on Ele…
68 words 1 min
Open Source AI Research

The Practitioner's Guide to the Maximal Update Parameteriza…

Exploring the implementation details of muTransfer...

1 year, 3 months ago Blog on Ele…
45 words 1 min
Open Source AI Research

Mechanistic Anomaly Detection Research Update

Interim report on ongoing work on mechanistic anomaly detection...

1 year, 5 months ago Blog on Ele…
55 words 1 min
Open Source AI Research

Open Source Automated Interpretability for Sparse Autoencod…

Building and evaluating an open-source pipeline for auto-interpretability...

1 year, 5 months ago Blog on Ele…
66 words 1 min
Open Source AI Research

Experiments in Weak-to-Strong Generalization

Writing up results from a recent project...

1 year, 6 months ago Blog on Ele…
34 words 1 min
Open Source AI Research

Free Form Least-Squares Concept Erasure Without Oracle Conc…

Achieving even more surgical edits than LEACE without concept labels at inference time....

1 year, 6 months ago Blog on Ele…
75 words 1 min
Open Source AI Research

VINC-S: Closed-form Optionally-supervised Knowledge Elicita…

Writing up results from a project from Spring 2023...

1 year, 7 months ago Blog on Ele…
42 words 1 min
1 / 3