AI科学家

3.0 2026-04-20 4 4 7.78MB 42 页 10龙币
侵权投诉
Kosmos: An AI Scientist for Autonomous
Discovery
Ludovico Mitchener,1,, Angela Yiu,1, Benjamin Chang,1,2, Mathieu Bourdenx3,4,5, Tyler Nadolski1, Arvis
Sulovari1, Eric C. Landsness5,6, Dániel L. Barabási7,8, Siddharth Narayanan1, Nicky Evans9, Shriya Reddy10,
Martha Foiani3,4, Aizad Kamal6, Leah P. Shriver11,12,13, Fang Cao10, Asmamaw T. Wassie1, Jon M. Laurent1,
Edwin Melville-Green1, Mayk Caldas1, Albert Bou1, Kaleigh F. Roberts14 , Sladjana Zagorac15, Timothy C.
Orr6, Miranda E. Orr6,16, Kevin J. Zwezdaryk17,18,19, Ali E. Ghareeb1, Laurie McCoy1, Bruna Gomes10,
Euan A. Ashley10, Karen E. Duff3,4,5, Tonio Buonassisi9,20, Tom Rainforth2, Randall J. Bateman5,6, Michael
Skarlinski1, Samuel G. Rodriques1,7,, Michaela M. Hinks1,, Andrew D. White1,7,
Abstract
Data-driven scientific discovery requires iterative cycles of literature search, hypothesis generation, and data
analysis. Substantial progress has been made towards AI agents that can automate scientific research, but all
such agents remain limited in the number of actions they can take before losing coherence, thus limiting the
depth of their findings. Here we present Kosmos, an AI scientist that automates data-driven discovery. Given an
open-ended objective and a dataset, Kosmos runs for up to 12 hours performing cycles of parallel data analysis,
literature search, and hypothesis generation before synthesizing discoveries into scientific reports. Unlike prior
systems, Kosmos uses a structured world model to share information between a data analysis agent and a
literature search agent. The world model enables Kosmos to coherently pursue the specified objective over
200 agent rollouts, collectively executing an average of 42,000 lines of code and reading 1,500 papers per run.
Kosmos cites all statements in its reports with code or primary literature, ensuring its reasoning is traceable.
Independent scientists found 79.4% of statements in Kosmos reports to be accurate, and collaborators reported
that a single 20-cycle Kosmos run performed the equivalent of 6 months of their own research time on
average. Furthermore, collaborators reported that the number of valuable scientific findings generated scales
linearly with Kosmos cycles (tested up to 20 cycles). We highlight seven discoveries made by Kosmos that
span metabolomics, materials science, neuroscience, and statistical genetics. Three discoveries independently
reproduce findings from preprinted or unpublished manuscripts that were not accessed by Kosmos at runtime,
while four make novel contributions to the scientific literature.
1Edison Scientific Inc., San Francisco, CA, USA
2University of Oxford, Oxford, UK
3UK Dementia Research Institute at University College London, London, UK
4Queen Square Institute of Neurology, University College London, London, UK
5Consortium for Biomedical Research and Artificial Intelligence in Neurodegeneration (C-BRAIN)
6Department of Neurology, Washington University School of Medicine, St. Louis, MO, USA
7FutureHouse Inc., San Francisco, CA, USA
8Eric and Wendy Schmidt Center, Broad Institute of MIT and Harvard, Cambridge, MA, USA
9Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, MA, USA
10Division of Cardiovascular Medicine, Stanford University, CA, USA
11Department of Chemistry, Washington University, St. Louis, MO, USA
12Center for Mass Spectrometry and Metabolic Tracing, Washington University, St. Louis, MO, USA
13Division of Nutritional Science and Obesity Medicine, Washington University, St. Louis, MO, USA
14Department of Pathology and Immunology, Washington University, St. Louis, MO, USA
15Growth Factors, Nutrients and Cancer Group, Molecular Oncology Programme, Spanish National Cancer Research Center,
Madrid, Spain
16St. Louis VA Medical Center, St Louis, MO, USA
17Department of Microbiology and Immunology, Tulane University School of Medicine, New Orleans, LA, USA
18Tulane Center for Aging, Tulane University School of Medicine, New Orleans, LA, USA
19Tulane Brain Institute, Tulane University School of Medicine, New Orleans, LA, USA
20Centre d’Electronique et de Microtechnique (CSEM), Neuchâtel, Switzerland
These authors contributed equally.
These authors jointly supervise work at Edison.
These authors jointly supervised this work.
Correspondence to {andrew,sam}@edisonscientific.com
1
arXiv:2511.02824v2 [cs.AI] 5 Nov 2025
1 Introduction
Data-driven discovery consists of iterative steps of literature search, hypothesis generation, and data analysis to draw
new scientific conclusions from datasets. Due to their proficiency in programming and interdisciplinary reasoning,
large language model (LLM) agents have the potential to automate data-driven discovery across domains.
Robin [1], a system we previously reported, performs automated cycles of literature search and data analysis to
propose evidence-based hypotheses, but has limited context sharing between its agents and is primarily tailored for
therapeutics development. Sakana’s AI Scientist [2] autonomously forms hypotheses, iteratively conducts computa-
tional experiments, and writes and reviews manuscripts about its results, but remains limited to machine learning
research. Google’s AI co-scientist [3] conducts iterative cycles of reasoning to generate scientific hypotheses, but
does not perform or analyze experiments. The Virtual Lab [4] successfully designed novel nanobodies that neutralize
SARS-CoV-2, and may be extensible to other domains, but lacks exploratory data analysis capabilities.
Here, we present Kosmos, an AI scientist that automates data-driven discovery across a wide range of scientific
disciplines. Given an open-ended objective and a dataset, Kosmos performs iterative cycles of parallel data analysis,
literature search, and hypothesis generation, and summarizes its discoveries in scientific reports (Figure 1a). At each
cycle, Kosmos launches several parallel instances of two general-purpose Edison Scientific agents, a data analysis
agent [5] and a literature search agent [6], with each instance assigned to a specific task that is aligned with the
end objective. Kosmos shares and synthesizes information among these agents by continuously updating a structured
world model, which enables Kosmos to execute an average of 42,000 lines of code across 166 data analysis agent
rollouts and read 1,500 full-length scientific papers across 36 literature review agent rollouts per run (Figure 1b).
This is a 9.8x increase in code generation compared to Robin. Consolidating information in the world model further
allows every claim in a Kosmos scientific report to be directly linked to the data analysis or source from which it
originated, ensuring that Kosmos’ reasoning is traceable.
We report seven discoveries made by Kosmos: three discoveries made by Kosmos reproduce findings from preprinted
or unpublished manuscripts, while the remaining four make novel contributions to the scientific literature. Each
discovery is derived from a unique data type and field and is corroborated by independent analysis from a domain
expert. Among the examples below, Kosmos identified a novel, clinically relevant mechanism of neuronal aging, and
generated novel statistical evidence that high circulating levels of superoxide dismutase 2 (SOD2) may causally reduce
myocardial fibrosis in humans. Together, these discoveries illustrate a system that can autonomously reproduce, refine,
and generate data-driven discoveries with the rigor and transparency essential for advancing scientific understanding.
2 Results
2.1 Kosmos system and architecture
The core advancement in Kosmos is the use of a structured world model to manage the output of a large number
of agents running in parallel. Kosmos is initiated with a research objective and a dataset, which are specified by a
scientist. Kosmos attempts to complete the research objective by using LLMs, data analysis agents, literature search
agents, and the world model to perform iterative discovery cycles. In each cycle, Kosmos executes up to ten literature
search and analysis tasks, and subsequently updates the world model with summaries of the task outputs. Kosmos
then queries the world model to propose literature search and data analysis tasks to be completed in the next cycle.
This context management strategy allows Kosmos to explore many different research avenues simultaneously, and
run for eight times as many iterations than existing systems [1, 2, 7]. Once Kosmos believes it has completed the
research objective, it synthesizes key discoveries into three or four scientific reports. Each statement and figure in
the report cites either a publication found by the literature search agent or a Jupyter notebook created by the data
analysis agent.
To assess the overall accuracy of Kosmos reports, we first extracted 102 statements from three representative reports
and determined whether the statements originated from the scientific literature, a data analysis, or an interpretation
between the two. We then asked expert scientist evaluators to classify the accuracy of each statement as “Supported”
or “Refuted”. The scientist evaluators were instructed to classify the statement as “Supported” if they could reproduce
the statement with their own analysis or find support for the statement in the literature, or “Refuted” if their analyses
or literature search produced a different result (see Methods). The original code or cited paper(s) supporting the
2
Reasoning
Depth
Novelty
50% 50%
62% 25% 12%
5
10
15
20
Kosmos Cycle Number
0
4
8
12
16
Number of Valuable Findings
11.0
5
10
15
20
Kosmos Cycle Number
0
2
4
6
8
10
Estimated Expert Time (months)
3.9
4.4
Data analysis
Literature review
Interpretation
Overall
0
20
40
60
80
100
Accuracy (%)
85%
(n = 55) 82%
(n = 28)
58%
(n = 19)
79%
(n = 102)
a
b c d
e f g
Data Analyses
Read Papers
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Average
0
1
2
3
4
5
6
Predicted Expert Time (months)
Discovery 3
Discovery 1
Discovery 2
Established Finding
Literature Review Agent
Data Analysis Agent
Input Kosmos World Model Output
Lines of Code Papers Read
Kosmos
Robin
Finch
PaperQA2
42,500
±7,280
4,310
±344
301
±36
0
±0
1,500
±1,120
1,530
±289
0
±0
33
±29
Agent
Completely novel (n=1)
Largely novel (n=2)
Moderately novel (n=5)
Not novel (n=0)
High depth (n=4)
Moderate depth (n=4)
Shallow depth (n=0)
No depth (n=0)
Research objective:
Identify validated Type 2
diabetes protective mechanisms
Dataset:
Annotated multi-omics data
Figure 1: Kosmos workflow and performance. a) Overall workflow for Kosmos. (left) Kosmos is provided with an initial
dataset and broad research objective specified by a scientist. (middle) The Kosmos world model coordinates data analysis and
literature search agents to identify key discoveries. (right) Each discovery is presented in a scientific report. b) Statistics for
lines of code written and papers read for an average run across scientific agents [5, 6]. c) Accuracy of Kosmos across data
analysis, literature review, and interpretation statement types, as evaluated by expert scientists from 102 statements across
three representative reports. d) Predicted equivalent expert time for the seven Kosmos runs described in this report, assuming
it takes an expert 15 minutes to read a full paper and 2 hours to complete a Jupyter notebook of data analysis at 174 hours
of work per month. e) Equivalent expert time was estimated by leading academic groups about the time required to achieve
findings for Kosmos at cycles 5, 10, and 20, showing scaling in expert-equivalent research time with Kosmos runtime. Shaded
region denotes ±1 SD. f) The number of valuable findings generated at different steps of a Kosmos run scales with Kosmos
runtime, as estimated by leading academic groups. Shaded region denotes ±1 SD. g) Academic groups’ evaluations of valuable
findings from Kosmos cycle 20 indicate moderate to complete novelty (top) and high to moderate reasoning depth (bottom).
3
statements were not made available to the evaluating scientist during this process. Overall, 79.4% of the statements in
the report were accurate, with differing results by type: 85.5% of the data analysis-based statements were reproducible,
82.1% of literature review-based statements were validated with primary sources, and 57.9% of synthesis statements
were accurate (Figure 1c).
We then evaluated the time it would take for a human scientist to complete the work that Kosmos performs in
an individual run. We first calculated this value by tallying the number of data analysis and papers included in a
given Kosmos run and estimating the time it would take a human researcher to complete the same number of tasks.
We estimated each Kosmos run performs approximately 4.1 expert-months of research (n=6, σ=0.85; Figure 1d),
assuming an expert scientist takes 15 minutes to read a full paper and 2 hours to complete a data analysis task using
a Jupyter notebook [8] at 174 hours of work per month.
To obtain an orthogonal estimate of the amount of work done by Kosmos, we collaborated with leading academic
groups to evaluate Kosmos at cycles 5, 10, and 20 of its run. These groups estimated that the findings from a 20-cycle
Kosmos run would have taken them 6.14 months of research to complete (n=7, σ=2.49), much higher than our
estimated time savings given their runs (Figure 1e). Furthermore, they report that expert-equivalent research time
scales with Kosmos runtime, roughly doubling from cycle 5 to cycle 20 (Figure 1e). Similarly, when asked about the
number of valuable findings that Kosmos generated across the run, experts report that the number of valuable findings
scales with the amount of cycles in the Kosmos run (Figure 1f). Lastly, when asked about Kosmos’ reasoning depth
and novelty, expert scientists report that valuable findings from cycle 20 demonstrate high to moderate reasoning
depth and moderate to complete novelty (Figure 1g). Together, these results suggest scaling between computational
investment and scientific output.
In the following sections, we present seven studies illustrating discoveries made by Kosmos during these collaborations.
We group these Kosmos discoveries into the following categories:
1. Two runs that reproduce existing discoveries that were either unpublished, or published after the cutoff of the
relevant language models and not accessed by Kosmos at runtime.
2. One run that reproduces a published finding not accessed by Kosmos at runtime using independent reasoning.
3. Two runs that establish additional, novel support for existing discoveries.
4. One run that independently develops a new analytical method.
5. One run that makes a novel, clinically-relevant discovery not previously identified by human researchers.
The prompts and datasets given to Kosmos for each of these discoveries can be found in Supplementary Information
1. The Kosmos reports describing these discoveries can be viewed at the links in Supplementary Table 1. In the main
figures of this report, any plots or results generated by Kosmos are highlighted in blue “Kosmos” sections. Plots
generated by human scientists to validate Kosmos’ results are highlighted in orange “Human Validation” sections.
The Jupyter notebooks Kosmos wrote to generate these plots or results are linked in the figure captions. We improved
the legibility of these plots for publication, but did not change the plot content.
2.2 Kosmos replicates human findings from different fields
2.2.1 Discovery 1: Nucleotide metabolism as the dominant pathway altered under hypothermic conditions
in brain
We tested whether Kosmos could reproduce an unpublished discovery using metabolomics data (Figure 2a). The data
used in this run was originally collected to help identify the metabolic mechanisms underlying cooling-induced neuro-
protection. While the Kosmos run was performed on the unpublished dataset, this work has since been preprinted [9].
Specifically, the original experiment investigated whether activating a specific brain circuit that regulates body tem-
perature in mice could induce a controlled, torpor-like state that protects the brain from injury. Using chemogenetic
tools, Kamal et al. selectively activated kappa opioid receptor–expressing (KOR+) neurons in the medial preoptic
area (POA), a key center for thermoregulation [9].
4
摘要:

Kosmos:AnAIScientistforAutonomousDiscoveryLudovicoMitchener∗,1,†,AngelaYiu∗,1,BenjaminChang∗,1,2,MathieuBourdenx3,4,5,TylerNadolski1,ArvisSulovari1,EricC.Landsness5,6,DánielL.Barabási7,8,SiddharthNarayanan1,NickyEvans9,ShriyaReddy10,MarthaFoiani3,4,AizadKamal6,LeahP.Shriver11,12,13,FangCao10,Asmamaw...

展开>> 收起<<
AI科学家.pdf

共42页,预览4页

还剩页未读, 继续阅读

声明:本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击侵权投诉。
分类:实用文档 价格:10龙币 属性:42 页 大小:7.78MB 格式:PDF 时间:2026-04-20

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 42
客服
关注