Anthropic reverses Claude Fable's invisible anti-distillation guardrail
After researcher backlash, Anthropic apologized for silently degrading suspected distillation queries and will visibly route them to Claude Opus 4.8 instead.
Read more
Anthropic has apologized for using an invisible safeguard that silently altered and degraded Claude Fable 5 responses when the system suspected a model-distillation attempt. Researchers and competitors argued that the undisclosed intervention could corrupt evaluations and legitimate development work. Anthropic now says those queries will visibly fall back to Claude Opus 4.8, similar to how Fable handles some high-risk biology, chemistry, and cybersecurity requests. The company said invisible safeguards were easier to target narrowly and ship quickly but acknowledged that the tradeoff was wrong. The reversal is important because model providers increasingly intervene at inference time, and users need to know when the output comes from a different model or has been modified by a safety system.
Key details: June 11, 2026, Invisible anti-distillation safeguard reversed, Queries will visibly fall back to Claude Opus 4.8, Anthropic apologized after researcher backlash.
Continue swiping for more AI Brief stories.