Attention Optimization
Testing how many attention heads can be removed without losing quality
running
Model
GPT-2 (117M)
Experiments
6 total · 2 running
Hypotheses
4 active
Overview
Workspace
Logs
Research Progress
Overall completion58%
Literature
✓ complete
Hypotheses
✓ complete
Experiments
● in progress
Analysis
— pending
Synthesis
— pending
Active Hypotheses
Up to 40% of heads can be removed with minimal impact
87% conf.
Lower layers are more sensitive to pruning
61% conf.
Anomaly flagged
Perplexity 14% higher than expected at 50% pruning
Recent Agent Activity
14:32
Experiment
Pruning experiment v3 complete — results flagged for review
13:58
Literature
Indexed 3 new papers on head pruning — knowledge graph updated