AI pushbacksThe Verge + AnthropicJun 11, 2026

Anthropic reverses Claude Fable's invisible anti-distillation guardrail

After researcher backlash, Anthropic apologized for silently degrading suspected distillation queries and will visibly route them to Claude Opus 4.8 instead.

Anthropic has apologized for using an invisible safeguard that silently altered and degraded Claude Fable 5 responses when the system suspected a model-distillation attempt. Researchers and competitors argued that the undisclosed intervention could corrupt evaluations and legitimate development work. Anthropic now says those queries will visibly fall back to Claude Opus 4.8, similar to how Fable handles some high-risk biology, chemistry, and cybersecurity requests. The company said invisible safeguards were easier to target narrowly and ship quickly but acknowledged that the tradeoff was wrong. The reversal is important because model providers increasingly intervene at inference time, and users need to know when the output comes from a different model or has been modified by a safety system.

Key details: June 11, 2026, Invisible anti-distillation safeguard reversed, Queries will visibly fall back to Claude Opus 4.8, Anthropic apologized after researcher backlash.

Continue swiping for more AI Brief stories.

Original

Anthropic reverses Claude Fable's invisible anti-distillation guardrail

Your reading trail

Saved stories