#56

Goodfire

Mechanistic interpretability research lab building tools to reverse-engineer neural network internals. Uses sparse autoencoders (SAEs) for feature discovery and activation steering to expose model representations. Products include Ember (feature-level API for LLM intervention) and Silico (model design environment for debugging and steering behavior during training). Applied to LLMs and scientific foundation models.

Topics

POST_TRAINING

DEVELOPMENT

LLMS

Subtopics

ALIGNMENTRLHFFINE TUNING

Links