AI Brief

Research & papers

43 curated stories.

Research & papers · NvidiaNvidia publishes the first AgentPerf infrastructure resultsResearch & papers · AnthropicAnthropic survey finds AI adoption outpacing public trustResearch & papers · Google DeepMindDeepMind and partners commit up to $10M to multi-agent safetyResearch & papers · The RegisterAI memory and personalization can increase sycophancyResearch & papers · arXiv / SciAgentArena researchersSciAgentArena tests AI agents on real scientific research workflowsResearch & papers · TechCrunchAI memory tools can degrade model performanceResearch & papers · Associated Press + MITMIT turns hidden hand motion into robot-training dataResearch & papers · The Register + CheckmarxAI-heavy teams ship vulnerable code at 3.4 times the rateResearch & papers · TechRadar + Linux FoundationEuropean employers expect AI to increase tech hiringResearch & papers · arXivActProbe spots robot-policy failures before they become visibleResearch & papers · arXivA self-evolving scientific agent discovers an interpretable fluid controllerResearch & papers · Unite.AI + SCAMMER4U researchersWeb agents leak sensitive data even after recognizing scamsResearch & papers · arXivPACE tries to stop self-improving agents from p-hacking themselvesResearch & papers · The Register + ForresterForrester finds enterprise agents still trapped in pilot modeResearch & papers · ServiceNow AI / Hugging FaceServiceNow expands EVA-Bench for real enterprise voice-agent workflowsResearch & papers · AgentScout / arXiv trackerAgent research week centers on self-evolving agents and governanceResearch & papers · arXivMIRAI tries to predict which research will matter years laterResearch & papers · AnthropicAnthropic maps AI-enabled cyber abuse across 832 banned accountsResearch & papers · Nature CommunicationsSuperARC benchmark says frontier models remain far from its AGI targetResearch & papers · Nature / Pediatric ResearchNature paper tests multimodal AI for a dangerous neonatal diseaseResearch & papers · University of TorontoResearchers demonstrate an AI worm that adapts as it spreadsResearch & papers · ChatPaper / arXivClinEnv benchmarks LLM agents as attending physicians over full inpatient staysResearch & papers · ChatPaper / arXivMOC paper targets message quality inside multi-agent AI systemsResearch & papers · EurekAlert / IRB BarcelonaIRB Barcelona uses generative AI to design cell-selective moleculesResearch & papers · OpenClaw / Hugging FaceOpenClaw releases a security dataset for agent skillsResearch & papers · Nature Machine IntelligenceNature Machine Intelligence paper links climate modes with machine learningResearch & papers · VentureBeat / arXivMeMo proposes memory models as an alternative to noisy enterprise RAGResearch & papers · arXivMAVEN shows tool-calling agents still struggle to generalizeResearch & papers · arXivEHRBench scales clinical-decision testing for medical LLMsResearch & papers · VentureBeat / arXivAutoTTS uses an AI-designed controller to cut reasoning-token use 69.5%Research & papers · EurekAlert / Annals of Family MedicineAI-assisted ultrasound helps Shanghai GPs spot carotid plaqueResearch & papers · VentureBeat + DatacurveDeepSWE challenges coding-agent leaderboards and benchmark trustResearch & papers · arXivAgentHijack tests computer-use agents against ordinary UI disruptionResearch & papers · Howard University + AWS coverageHoward University launches an AWS-powered AI networkResearch & papers · OpenAIOpenAI model disproves a decades-old discrete geometry conjectureResearch & papers · Google DeepMindGoogle DeepMind highlights Antigravity 2.0 in its May research slateResearch & papers · Google DeepMindGoogle DeepMind publishes Co-Scientist and opens Hypothesis GenerationResearch & papers · AgentScout / arXiv trackerRecent arXiv AI-agent papers focus on GUI agents, memory, and multi-agent systemsResearch & papers · arXivCollider-Bench measures whether agents can reproduce LHC analysesResearch & papers · arXivQwen-Scope gives developers sparse-feature tools for Qwen modelsResearch & papers · Hugging Face Papers / arXivSkillRet benchmarks skill retrieval for LLM agentsResearch & papers · arXivForesight Arena proposes an on-chain benchmark for forecasting agentsResearch & papers · arXivMLR-Bench tests whether AI agents can do open-ended ML research