OpenAI introduces LifeSciBench for applied life-science work
OpenAI introduced LifeSciBench, an expert-written and expert-reviewed benchmark for realistic life-science research tasks across workflows such as evidence handling, experimental design, validation, translation, and scientific communication.
Read more
OpenAI introduced LifeSciBench, a benchmark meant to evaluate whether AI systems can help with the messy, multi-step work of life-science research rather than only answering clean biology questions. The benchmark includes 750 expert-authored tasks, 1,062 task artifacts, 19,020 rubric criteria, 173 scientist contributors, and 453 expert reviewers. Tasks span seven workflows and seven biological domains, with many requiring models to interpret figures, PDFs, tables, sequence files, molecular structures, or conflicting evidence.
Key details: Published June 17, 2026, LifeSciBench includes 750 expert-authored tasks, The benchmark includes 1,062 task artifacts and 19,020 rubric criteria, Tasks span seven workflows and seven biological domains.
Why it matters: LifeSciBench matters because it pushes AI evaluation toward the actual judgment-heavy tasks scientists do, not just leaderboard-style biology recall.