2 articles available matching "Anthropic Research"

Anthropic researchers found a hidden 'Assistant Axis' in AI that, when tampered with, cuts harmful responses by nearly 60%. Oops.

Researchers at Anthropic and Oxford identified a linear 'Assistant Axis' in LLM activation space that governs persona stability. Activation capping al...