Research & papersarXivJun 30, 2026

AxDafny paper tests agentic code generation against formal verification

A new arXiv paper introduces AxDafny, a verifier-guided repair framework for generating Dafny code and proof artifacts, and reports strong gains over a GPT-5.5 baseline.

An arXiv paper introduces AxDafny, a framework for agentic code generation in Dafny where the system must produce both executable code and proof artifacts that pass verification. The authors also introduce LiveCodeBench-Pro-Dafny, a 250-problem benchmark translated into Dafny with formal specifications and verifier-based evaluation. They report that AxDafny substantially improves verification success over baseline GPT-5.5 performance and reaches 92.7% verification success on DafnyBench.

Key details: Submitted June 30, 2026 to arXiv, The paper studies agentic verified code generation in Dafny, It introduces AxDafny and LiveCodeBench-Pro-Dafny, The benchmark includes 250 programming problems with formal specifications, AxDafny reports 92.7% verification success on DafnyBench.

Why it matters: Formal verification is one of the cleanest ways to measure whether coding agents actually produce correct programs, not just plausible patches.

Original

AxDafny paper tests agentic code generation against formal verification

Your reading trail

Saved stories