Research & papersarXivJun 6, 2026

PACE tries to stop self-improving agents from p-hacking themselves

PACE replaces greedy self-modification with sequential statistical tests, sharply reducing false and harmful agent updates while cutting evaluation cost.

PACE addresses a quiet failure mode in self-evolving agents: repeatedly accepting prompt, skill, or workflow changes whenever a small test score rises can amount to adaptive p-hacking. The proposed Paired Anytime-valid Commit Evaluation gate compares each candidate change with the incumbent on identical examples and commits only when sequential evidence becomes strong enough. In experiments with Qwen2.5 agents on GSM8K, SVAMP, and ARC-Challenge, greedy acceptance committed 30% to 42% false changes and 10% to 33% harmful edits when a real improvement was mixed with noisy proposals. PACE reportedly accepted the genuine improvement and almost nothing else while reducing evaluation cost by about 18%. This is a preprint with limited models and tasks, but it identifies statistical acceptance rules as a core safety and reliability component for agents allowed to modify themselves.

Key details: arXiv:2606.08106, June 6, 2026, Paired Anytime-valid Commit Evaluation, Greedy acceptance produced 30% to 42% false commits, Greedy acceptance produced 10% to 33% harmful edits, About 18% lower evaluation cost.

Continue swiping for more AI Brief stories.

Original

PACE tries to stop self-improving agents from p-hacking themselves

Your reading trail

Saved stories