Research & papers

94 curated stories.

Research & papers · arXivPaper argues deployment rules shape multi-agent AI safetyJul 9, 2026 Research & papers · arXivSkillCenter paper proposes a source-grounded skill library for AI agentsJul 9, 2026 Research & papers · arXivPaper maps recursive self-improvement into autonomous research loopsJul 9, 2026 Research & papers · arXivPaper finds deterministic gates can catch tool-agent policy failuresJul 9, 2026 Research & papers · arXivPaper finds early signals for doomed LLM agent runsJul 7, 2026 Research & papers · arXivDataGovBench tests LLM data analysis on government open dataJul 7, 2026 Research & papers · arXivPaper tests evidence-linked multi-agent AI on biopsy reportsJul 7, 2026 Research & papers · arXivX-FEMR paper proposes token-level explanations for EHR foundation modelsJul 7, 2026 Research & papers · MarkTechPostNVIDIA HORIZON uses git worktrees to iterate hardware-design agentsJul 4, 2026 Research & papers · MarkTechPostNVIDIA ASPIRE turns robot debugging into a reusable skill libraryJul 3, 2026 Research & papers · arXivResearchers show coding-agent attacks can be spread across pull requestsJul 2, 2026 Research & papers · arXivResearchers propose real-time online safety monitoring for LLM outputsJul 2, 2026 Research & papers · arXivReContext improves long-context reasoning by replaying evidenceJul 2, 2026 Research & papers · arXivStudy finds AI agents say different things off the recordJul 2, 2026 Research & papers · arXivPaper says constraints can make coding-agent review scale betterJul 2, 2026 Research & papers · arXivResearchers propose hardware-enforced coordination for autonomous AIJul 2, 2026 Research & papers · arXivPACE proxy benchmark predicts agentic evaluation scores at under 1% of full costJul 2, 2026 Research & papers · MarkTechPostAlibaba Page Agent controls web interfaces through the DOMJul 2, 2026 Research & papers · arXivVera benchmark finds high attack success against production agent frameworksJul 2, 2026 Research & papers · arXivMicrosoft study tracks command-line AI coding-agent adoptionJul 2, 2026 Research & papers · Google ResearchGoogle AMIE research moves from diagnosis to disease managementJul 1, 2026 Research & papers · arXivAxDafny paper tests agentic code generation against formal verificationJun 30, 2026 Research & papers · arXivMARS paper uses text refusal directions to improve multimodal model safetyJun 30, 2026 Research & papers · arXivProtoPilot paper shows self-evolving agents for wet-lab protocolsJun 30, 2026 Research & papers · arXivWorldEvolver paper targets self-evolving world models for agentsJun 30, 2026 Research & papers · arXivResearchers identify entity-binding failures in tool-using agentsJun 30, 2026 Research & papers · arXivNew paper asks whose side an AI agent is onJun 30, 2026 Research & papers · arXivData-centre paper frames AI infrastructure as a many-body problemJun 30, 2026 Research & papers · arXivResearchers propose an immune-system model for agent safetyJun 28, 2026 Research & papers · arXivTandem reinforcement learning links language and action for agentsJun 28, 2026 Research & papers · arXivResearchers show prompt injection can manipulate AI resume screeningJun 26, 2026 Research & papers · arXivNew benchmark asks when combining frontier models actually helpsJun 26, 2026 Research & papers · arXivResearchers argue benchmarks miss much of collective model capabilityJun 25, 2026 Research & papers · The RegisterStudy warns medical diagnosis AIs can reveal training-data membershipJun 24, 2026 Research & papers · arXivNew paper maps how AI changes enterprise software user rolesJun 24, 2026 Research & papers · The VergeThe Atlantic makes AI music-training datasets searchableJun 21, 2026 Research & papers · arXivLedgerAgent proposes structured state for policy-adherent agentsJun 19, 2026 Research & papers · arXivMulti-LCB extends LiveCodeBench across programming languagesJun 19, 2026 Research & papers · arXivPaper probes what safety-aligned LLMs learn from mixed complianceJun 19, 2026 Research & papers · arXivTxBench-PP tests AI agents on preclinical pharmacologyJun 18, 2026 Research & papers · arXivSciRisk-Bench targets AI-for-science safety evaluationJun 18, 2026 Research & papers · Financial TimesFT says AI medical tools matched or surpassed doctorsJun 17, 2026 Research & papers · arXivRed-team paper finds sustained attacks still break frontier Anthropic modelsJun 16, 2026 Research & papers · OpenAIOpenAI details deployment simulation for model-risk forecastingJun 16, 2026 Research & papers · Financial TimesStudy finds Mistral more vulnerable to Russian disinformationJun 16, 2026 Research & papers · arXivPaper argues U.S. controls helped accelerate China open AI ecosystemsJun 15, 2026 Research & papers · The GuardianUK Nerve Lab uses AI to map how screen time affects childrenJun 13, 2026 Research & papers · NvidiaNvidia publishes the first AgentPerf infrastructure resultsJun 13, 2026 Research & papers · AnthropicAnthropic survey finds AI adoption outpacing public trustJun 12, 2026 Research & papers · arXivClinHallu diagnoses stage-by-stage hallucinations in medical AIJun 12, 2026 Research & papers · arXivAgentCyberRange tests frontier AI systems in realistic cyber rangesJun 12, 2026 Research & papers · arXivAFFORDANCE20Q probes AI reasoning about physical propertiesJun 12, 2026 Research & papers · arXivUniversal Manipulation Exoskeleton releases physical-AI data stackJun 12, 2026 Research & papers · Google DeepMindDeepMind and partners commit up to $10M to multi-agent safetyJun 11, 2026 Research & papers · The RegisterAI memory and personalization can increase sycophancyJun 11, 2026 Research & papers · arXiv / SciAgentArena researchersSciAgentArena tests AI agents on real scientific research workflowsJun 11, 2026 Research & papers · TechCrunchAI memory tools can degrade model performanceJun 10, 2026 Research & papers · Associated Press + MITMIT turns hidden hand motion into robot-training dataJun 10, 2026 Research & papers · The Register + CheckmarxAI-heavy teams ship vulnerable code at 3.4 times the rateJun 9, 2026 Research & papers · TechRadar + Linux FoundationEuropean employers expect AI to increase tech hiringJun 8, 2026 Research & papers · arXivActProbe spots robot-policy failures before they become visibleJun 7, 2026 Research & papers · arXivA self-evolving scientific agent discovers an interpretable fluid controllerJun 7, 2026 Research & papers · Unite.AI + SCAMMER4U researchersWeb agents leak sensitive data even after recognizing scamsJun 6, 2026 Research & papers · arXivPACE tries to stop self-improving agents from p-hacking themselvesJun 6, 2026 Research & papers · The Register + ForresterForrester finds enterprise agents still trapped in pilot modeJun 5, 2026 Research & papers · ServiceNow AI / Hugging FaceServiceNow expands EVA-Bench for real enterprise voice-agent workflowsJun 4, 2026 Research & papers · AgentScout / arXiv trackerAgent research week centers on self-evolving agents and governanceJun 4, 2026 Research & papers · arXivMIRAI tries to predict which research will matter years laterJun 4, 2026 Research & papers · AnthropicAnthropic maps AI-enabled cyber abuse across 832 banned accountsJun 3, 2026 Research & papers · Nature CommunicationsSuperARC benchmark says frontier models remain far from its AGI targetJun 3, 2026 Research & papers · Nature / Pediatric ResearchNature paper tests multimodal AI for a dangerous neonatal diseaseJun 3, 2026 Research & papers · University of TorontoResearchers demonstrate an AI worm that adapts as it spreadsJun 2, 2026 Research & papers · ChatPaper / arXivClinEnv benchmarks LLM agents as attending physicians over full inpatient staysJun 2, 2026 Research & papers · ChatPaper / arXivMOC paper targets message quality inside multi-agent AI systemsJun 2, 2026 Research & papers · EurekAlert / IRB BarcelonaIRB Barcelona uses generative AI to design cell-selective moleculesJun 2, 2026 Research & papers · OpenClaw / Hugging FaceOpenClaw releases a security dataset for agent skillsJun 1, 2026 Research & papers · Nature Machine IntelligenceNature Machine Intelligence paper links climate modes with machine learningJun 1, 2026 Research & papers · VentureBeat / arXivMeMo proposes memory models as an alternative to noisy enterprise RAGMay 30, 2026 Research & papers · arXivMAVEN shows tool-calling agents still struggle to generalizeMay 29, 2026 Research & papers · arXivEHRBench scales clinical-decision testing for medical LLMsMay 29, 2026 Research & papers · VentureBeat / arXivAutoTTS uses an AI-designed controller to cut reasoning-token use 69.5%May 29, 2026 Research & papers · EurekAlert / Annals of Family MedicineAI-assisted ultrasound helps Shanghai GPs spot carotid plaqueMay 27, 2026 Research & papers · VentureBeat + DatacurveDeepSWE challenges coding-agent leaderboards and benchmark trustMay 27, 2026 Research & papers · arXivAgentHijack tests computer-use agents against ordinary UI disruptionMay 25, 2026 Research & papers · Howard University + AWS coverageHoward University launches an AWS-powered AI networkMay 21, 2026 Research & papers · OpenAIOpenAI model disproves a decades-old discrete geometry conjectureMay 20, 2026 Research & papers · Google DeepMindGoogle DeepMind highlights Antigravity 2.0 in its May research slateMay 20, 2026 Research & papers · Google DeepMindGoogle DeepMind publishes Co-Scientist and opens Hypothesis GenerationMay 19, 2026 Research & papers · AgentScout / arXiv trackerRecent arXiv AI-agent papers focus on GUI agents, memory, and multi-agent systemsMay 14, 2026 Research & papers · arXivCollider-Bench measures whether agents can reproduce LHC analysesMay 13, 2026 Research & papers · arXivQwen-Scope gives developers sparse-feature tools for Qwen modelsMay 12, 2026 Research & papers · Hugging Face Papers / arXivSkillRet benchmarks skill retrieval for LLM agentsMay 7, 2026 Research & papers · arXivForesight Arena proposes an on-chain benchmark for forecasting agentsMay 1, 2026 Research & papers · arXivMLR-Bench tests whether AI agents can do open-ended ML researchMay 26, 2025