Science for Everyone
1 articles available matching "Interpretability"
Researchers at Anthropic and Oxford identified a linear 'Assistant Axis' in LLM activation space that governs persona stability. Activation capping al...