A Features' Perspective on Neural Scaling Laws

January 15, 2025

Notes of a talk I recently gave on feature superposition (a microscopic phenomenon) and neural scaling laws (a macroscopic phenomenon).

Guiding Question For My Research

Can feature superposition (a microscopic phenomenon) explain neural scaling laws (a macroscopic phenomenon)?

Specific Questions For This Talk

Consider Transformer models of fixed depth but different widths.

Do different models store the same features in different fashion?
OR: Do larger models store more features, giving them higher capability?
In either case, can we see its effect on scaling laws?

Outline

Background
- Neural Scaling Laws
- Feature Superposition
This work
- Feature Importance and Universality
- Relationship with Scaling Laws

Background

Neural Scaling Laws

Neural scaling laws describe how model performance improves with scale (Kaplan et al., 2020; Hoffman et al., 2023, and others).

Definition: Neural Network Features

Feature Definition Diagrams

Activations can be decomposed into (overcomplete) bases.
For token j, we can write activations as a sum over features i.
Each feature d_i has a specific interpretation.
f_i(x) ≥ 0 represents feature activations.
For a given token, only a few features are active.

Feature Activation Diagram

Example: The Golden-Gate-Bridge feature of Claude 3 Sonnet (Templeton et al., 2024). Activating this feature to 10x its value changes the model behavior.

Golden Gate Bridge Feature Example

Golden Gate Bridge Feature Activation

Definition: Superposition

Superposition Diagrams

Neural networks store more features than the number of available dimensions.
Hence, some features interfere with others.
Intuitively, larger models perform better as they have more “capacity”: they can store more features without interference (Elhage et al., 2022).

Superposition Visualization

This talk: Make this more precise.

How are features learned?

SAE Diagram

Reconstruct activations using autoencoders and let the decomposition be sparse — Sparse Autoencoders (SAEs) (Bricken et al., 2023; Cunningham et al., 2023).

SAE Architecture

New Work

Feature Importance

Can we define a notion of importance of a feature for features of a real model?

Important features must be more universal across models of different widths.
Important features may be learned early in training. (Not answered today, but I have some observations.)
Hence, scaling laws could be studied from the perspective of feature importance.