Researchers show coding-agent attacks can be spread across pull requests
An arXiv paper studies distributed attacks against persistent-state AI control systems, showing how small malicious changes can evade monitors when split across pull requests.
Read more
An arXiv paper titled Distributed Attacks in Persistent-State AI Control studies how attackers can target coding agents that retain state across tasks and repositories. The authors show that malicious behavior can be spread across multiple pull requests and appear benign in isolation, creating high evasion rates against single monitors. They report that a four-monitor ensemble reduced gradual-attack evasion from 93% to 47%, but did not eliminate the risk.
Key details: The paper studies persistent-state coding-agent control systems, Distributed attacks can split malicious behavior across pull requests, A four-monitor ensemble cut gradual-attack evasion from 93% to 47%.
Why it matters: Agent security has to account for long-lived state and multi-step attack paths, not just single suspicious prompts or commits.