AI科学家

3.0 2026-04-20 4 4 7.78MB 42 页 10龙币

侵权投诉

Kosmos: An AI Scientist for Autonomous

Discovery

Ludovico Mitchener∗,1,†, Angela Yiu∗,1, Benjamin Chang∗,1,2, Mathieu Bourdenx3,4,5, Tyler Nadolski1, Arvis

Sulovari1, Eric C. Landsness5,6, Dániel L. Barabási7,8, Siddharth Narayanan1, Nicky Evans9, Shriya Reddy10,

Martha Foiani3,4, Aizad Kamal6, Leah P. Shriver11,12,13, Fang Cao10, Asmamaw T. Wassie1, Jon M. Laurent1,

Edwin Melville-Green1, Mayk Caldas1, Albert Bou1, Kaleigh F. Roberts14 , Sladjana Zagorac15, Timothy C.

Orr6, Miranda E. Orr6,16, Kevin J. Zwezdaryk17,18,19, Ali E. Ghareeb1, Laurie McCoy1, Bruna Gomes10,

Euan A. Ashley10, Karen E. Duﬀ3,4,5, Tonio Buonassisi9,20, Tom Rainforth2, Randall J. Bateman5,6, Michael

Skarlinski1, Samuel G. Rodriques1,7,‡, Michaela M. Hinks1,†, Andrew D. White1,7,‡

Abstract

Data-driven scientiﬁc discovery requires iterative cycles of literature search, hypothesis generation, and data

analysis. Substantial progress has been made towards AI agents that can automate scientiﬁc research, but all

such agents remain limited in the number of actions they can take before losing coherence, thus limiting the

depth of their ﬁndings. Here we present Kosmos, an AI scientist that automates data-driven discovery. Given an

open-ended objective and a dataset, Kosmos runs for up to 12 hours performing cycles of parallel data analysis,

literature search, and hypothesis generation before synthesizing discoveries into scientiﬁc reports. Unlike prior

systems, Kosmos uses a structured world model to share information between a data analysis agent and a

literature search agent. The world model enables Kosmos to coherently pursue the speciﬁed objective over

200 agent rollouts, collectively executing an average of 42,000 lines of code and reading 1,500 papers per run.

Kosmos cites all statements in its reports with code or primary literature, ensuring its reasoning is traceable.

Independent scientists found 79.4% of statements in Kosmos reports to be accurate, and collaborators reported

that a single 20-cycle Kosmos run performed the equivalent of 6 months of their own research time on

average. Furthermore, collaborators reported that the number of valuable scientiﬁc ﬁndings generated scales

linearly with Kosmos cycles (tested up to 20 cycles). We highlight seven discoveries made by Kosmos that

span metabolomics, materials science, neuroscience, and statistical genetics. Three discoveries independently

reproduce ﬁndings from preprinted or unpublished manuscripts that were not accessed by Kosmos at runtime,

while four make novel contributions to the scientiﬁc literature.

1Edison Scientiﬁc Inc., San Francisco, CA, USA

2University of Oxford, Oxford, UK

3UK Dementia Research Institute at University College London, London, UK

4Queen Square Institute of Neurology, University College London, London, UK

5Consortium for Biomedical Research and Artiﬁcial Intelligence in Neurodegeneration (C-BRAIN)

6Department of Neurology, Washington University School of Medicine, St. Louis, MO, USA

7FutureHouse Inc., San Francisco, CA, USA

8Eric and Wendy Schmidt Center, Broad Institute of MIT and Harvard, Cambridge, MA, USA

9Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, MA, USA

10Division of Cardiovascular Medicine, Stanford University, CA, USA

11Department of Chemistry, Washington University, St. Louis, MO, USA

12Center for Mass Spectrometry and Metabolic Tracing, Washington University, St. Louis, MO, USA

13Division of Nutritional Science and Obesity Medicine, Washington University, St. Louis, MO, USA

14Department of Pathology and Immunology, Washington University, St. Louis, MO, USA

15Growth Factors, Nutrients and Cancer Group, Molecular Oncology Programme, Spanish National Cancer Research Center,

Madrid, Spain

16St. Louis VA Medical Center, St Louis, MO, USA

17Department of Microbiology and Immunology, Tulane University School of Medicine, New Orleans, LA, USA

18Tulane Center for Aging, Tulane University School of Medicine, New Orleans, LA, USA

19Tulane Brain Institute, Tulane University School of Medicine, New Orleans, LA, USA

20Centre d’Electronique et de Microtechnique (CSEM), Neuchâtel, Switzerland

∗These authors contributed equally.

‡These authors jointly supervise work at Edison.

†These authors jointly supervised this work.

Correspondence to {andrew,sam}@edisonscientific.com

arXiv:2511.02824v2 [cs.AI] 5 Nov 2025

1 Introduction

Data-driven discovery consists of iterative steps of literature search, hypothesis generation, and data analysis to draw

new scientiﬁc conclusions from datasets. Due to their proﬁciency in programming and interdisciplinary reasoning,

large language model (LLM) agents have the potential to automate data-driven discovery across domains.

Robin [1], a system we previously reported, performs automated cycles of literature search and data analysis to

propose evidence-based hypotheses, but has limited context sharing between its agents and is primarily tailored for

therapeutics development. Sakana’s AI Scientist [2] autonomously forms hypotheses, iteratively conducts computa-

tional experiments, and writes and reviews manuscripts about its results, but remains limited to machine learning

research. Google’s AI co-scientist [3] conducts iterative cycles of reasoning to generate scientiﬁc hypotheses, but

does not perform or analyze experiments. The Virtual Lab [4] successfully designed novel nanobodies that neutralize

SARS-CoV-2, and may be extensible to other domains, but lacks exploratory data analysis capabilities.

Here, we present Kosmos, an AI scientist that automates data-driven discovery across a wide range of scientiﬁc

disciplines. Given an open-ended objective and a dataset, Kosmos performs iterative cycles of parallel data analysis,

literature search, and hypothesis generation, and summarizes its discoveries in scientiﬁc reports (Figure 1a). At each

cycle, Kosmos launches several parallel instances of two general-purpose Edison Scientiﬁc agents, a data analysis

agent [5] and a literature search agent [6], with each instance assigned to a speciﬁc task that is aligned with the

end objective. Kosmos shares and synthesizes information among these agents by continuously updating a structured

world model, which enables Kosmos to execute an average of 42,000 lines of code across 166 data analysis agent

rollouts and read 1,500 full-length scientiﬁc papers across 36 literature review agent rollouts per run (Figure 1b).

This is a 9.8x increase in code generation compared to Robin. Consolidating information in the world model further

allows every claim in a Kosmos scientiﬁc report to be directly linked to the data analysis or source from which it

originated, ensuring that Kosmos’ reasoning is traceable.

We report seven discoveries made by Kosmos: three discoveries made by Kosmos reproduce ﬁndings from preprinted

or unpublished manuscripts, while the remaining four make novel contributions to the scientiﬁc literature. Each

discovery is derived from a unique data type and ﬁeld and is corroborated by independent analysis from a domain

expert. Among the examples below, Kosmos identiﬁed a novel, clinically relevant mechanism of neuronal aging, and

generated novel statistical evidence that high circulating levels of superoxide dismutase 2 (SOD2) may causally reduce

myocardial ﬁbrosis in humans. Together, these discoveries illustrate a system that can autonomously reproduce, reﬁne,

and generate data-driven discoveries with the rigor and transparency essential for advancing scientiﬁc understanding.

2 Results

2.1 Kosmos system and architecture

The core advancement in Kosmos is the use of a structured world model to manage the output of a large number

of agents running in parallel. Kosmos is initiated with a research objective and a dataset, which are speciﬁed by a

scientist. Kosmos attempts to complete the research objective by using LLMs, data analysis agents, literature search

agents, and the world model to perform iterative discovery cycles. In each cycle, Kosmos executes up to ten literature

search and analysis tasks, and subsequently updates the world model with summaries of the task outputs. Kosmos

then queries the world model to propose literature search and data analysis tasks to be completed in the next cycle.

This context management strategy allows Kosmos to explore many diﬀerent research avenues simultaneously, and

run for eight times as many iterations than existing systems [1, 2, 7]. Once Kosmos believes it has completed the

research objective, it synthesizes key discoveries into three or four scientiﬁc reports. Each statement and ﬁgure in

the report cites either a publication found by the literature search agent or a Jupyter notebook created by the data

analysis agent.

To assess the overall accuracy of Kosmos reports, we ﬁrst extracted 102 statements from three representative reports

and determined whether the statements originated from the scientiﬁc literature, a data analysis, or an interpretation

between the two. We then asked expert scientist evaluators to classify the accuracy of each statement as “Supported”

or “Refuted”. The scientist evaluators were instructed to classify the statement as “Supported” if they could reproduce

the statement with their own analysis or ﬁnd support for the statement in the literature, or “Refuted” if their analyses

or literature search produced a diﬀerent result (see Methods). The original code or cited paper(s) supporting the

Reasoning

Depth

Novelty

50% 50%

62% 25% 12%

Kosmos Cycle Number

Number of Valuable Findings

5.8

7.5

11.0

Kosmos Cycle Number

Estimated Expert Time (months)

3.9

4.4

6.2

Data analysis

Literature review

Interpretation

Overall

100

Accuracy (%)

85%

(n = 55) 82%

(n = 28)

58%

(n = 19)

79%

(n = 102)

b c d

e f g

Data Analyses

Read Papers

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Fig. 6

Fig. 7

Fig. 8

Average

Predicted Expert Time (months)

4.3

4.0

3.2

5.0

5.1

2.5

4.4

4.1

Discovery 3

Discovery 1

Discovery 2

Established Finding

Literature Review Agent

Data Analysis Agent

Input Kosmos World Model Output

Lines of Code Papers Read

Kosmos

Robin

Finch

PaperQA2

42,500

±7,280

4,310

±344

301

±36

±0

1,500

±1,120

1,530

±289

±0

±29

Agent

Completely novel (n=1)

Largely novel (n=2)

Moderately novel (n=5)

Not novel (n=0)

High depth (n=4)

Moderate depth (n=4)

Shallow depth (n=0)

No depth (n=0)

Research objective:

Identify validated Type 2

diabetes protective mechanisms

Dataset:

Annotated multi-omics data

Figure 1: Kosmos workﬂow and performance. a) Overall workﬂow for Kosmos. (left) Kosmos is provided with an initial

dataset and broad research objective speciﬁed by a scientist. (middle) The Kosmos world model coordinates data analysis and

literature search agents to identify key discoveries. (right) Each discovery is presented in a scientiﬁc report. b) Statistics for

lines of code written and papers read for an average run across scientiﬁc agents [5, 6]. c) Accuracy of Kosmos across data

analysis, literature review, and interpretation statement types, as evaluated by expert scientists from 102 statements across

three representative reports. d) Predicted equivalent expert time for the seven Kosmos runs described in this report, assuming

it takes an expert 15 minutes to read a full paper and 2 hours to complete a Jupyter notebook of data analysis at 174 hours

of work per month. e) Equivalent expert time was estimated by leading academic groups about the time required to achieve

ﬁndings for Kosmos at cycles 5, 10, and 20, showing scaling in expert-equivalent research time with Kosmos runtime. Shaded

region denotes ±1 SD. f) The number of valuable ﬁndings generated at diﬀerent steps of a Kosmos run scales with Kosmos

runtime, as estimated by leading academic groups. Shaded region denotes ±1 SD. g) Academic groups’ evaluations of valuable

ﬁndings from Kosmos cycle 20 indicate moderate to complete novelty (top) and high to moderate reasoning depth (bottom).

statements were not made available to the evaluating scientist during this process. Overall, 79.4% of the statements in

the report were accurate, with diﬀering results by type: 85.5% of the data analysis-based statements were reproducible,

82.1% of literature review-based statements were validated with primary sources, and 57.9% of synthesis statements

were accurate (Figure 1c).

We then evaluated the time it would take for a human scientist to complete the work that Kosmos performs in

an individual run. We ﬁrst calculated this value by tallying the number of data analysis and papers included in a

given Kosmos run and estimating the time it would take a human researcher to complete the same number of tasks.

We estimated each Kosmos run performs approximately 4.1 expert-months of research (n=6, σ=0.85; Figure 1d),

assuming an expert scientist takes 15 minutes to read a full paper and 2 hours to complete a data analysis task using

a Jupyter notebook [8] at 174 hours of work per month.

To obtain an orthogonal estimate of the amount of work done by Kosmos, we collaborated with leading academic

groups to evaluate Kosmos at cycles 5, 10, and 20 of its run. These groups estimated that the ﬁndings from a 20-cycle

Kosmos run would have taken them 6.14 months of research to complete (n=7, σ=2.49), much higher than our

estimated time savings given their runs (Figure 1e). Furthermore, they report that expert-equivalent research time

scales with Kosmos runtime, roughly doubling from cycle 5 to cycle 20 (Figure 1e). Similarly, when asked about the

number of valuable ﬁndings that Kosmos generated across the run, experts report that the number of valuable ﬁndings

scales with the amount of cycles in the Kosmos run (Figure 1f). Lastly, when asked about Kosmos’ reasoning depth

and novelty, expert scientists report that valuable ﬁndings from cycle 20 demonstrate high to moderate reasoning

depth and moderate to complete novelty (Figure 1g). Together, these results suggest scaling between computational

investment and scientiﬁc output.

In the following sections, we present seven studies illustrating discoveries made by Kosmos during these collaborations.

We group these Kosmos discoveries into the following categories:

1. Two runs that reproduce existing discoveries that were either unpublished, or published after the cutoﬀ of the

relevant language models and not accessed by Kosmos at runtime.

2. One run that reproduces a published ﬁnding not accessed by Kosmos at runtime using independent reasoning.

3. Two runs that establish additional, novel support for existing discoveries.

4. One run that independently develops a new analytical method.

5. One run that makes a novel, clinically-relevant discovery not previously identiﬁed by human researchers.

The prompts and datasets given to Kosmos for each of these discoveries can be found in Supplementary Information

1. The Kosmos reports describing these discoveries can be viewed at the links in Supplementary Table 1. In the main

ﬁgures of this report, any plots or results generated by Kosmos are highlighted in blue “Kosmos” sections. Plots

generated by human scientists to validate Kosmos’ results are highlighted in orange “Human Validation” sections.

The Jupyter notebooks Kosmos wrote to generate these plots or results are linked in the ﬁgure captions. We improved

the legibility of these plots for publication, but did not change the plot content.

2.2 Kosmos replicates human ﬁndings from diﬀerent ﬁelds

2.2.1 Discovery 1: Nucleotide metabolism as the dominant pathway altered under hypothermic conditions

in brain

We tested whether Kosmos could reproduce an unpublished discovery using metabolomics data (Figure 2a). The data

used in this run was originally collected to help identify the metabolic mechanisms underlying cooling-induced neuro-

protection. While the Kosmos run was performed on the unpublished dataset, this work has since been preprinted [9].

Speciﬁcally, the original experiment investigated whether activating a speciﬁc brain circuit that regulates body tem-

perature in mice could induce a controlled, torpor-like state that protects the brain from injury. Using chemogenetic

tools, Kamal et al. selectively activated kappa opioid receptor–expressing (KOR+) neurons in the medial preoptic

area (POA), a key center for thermoregulation [9].

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 龙币 4人已下载

立即下载

摘要：

Kosmos:AnAIScientistforAutonomousDiscoveryLudovicoMitchener∗,1,†,AngelaYiu∗,1,BenjaminChang∗,1,2,MathieuBourdenx3,4,5,TylerNadolski1,ArvisSulovari1,EricC.Landsness5,6,DánielL.Barabási7,8,SiddharthNarayanan1,NickyEvans9,ShriyaReddy10,MarthaFoiani3,4,AizadKamal6,LeahP.Shriver11,12,13,FangCao10,Asmamaw...

展开>> 收起<<

AI科学家.pdf

共42页,预览4页

还剩页未读，继续阅读

声明：本站是提供个人知识管理的网络存储空间，所有内容均由用户发布，不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息，谨防诈骗。如发现有害或侵权内容，请点击侵权投诉。

AI科学家

相关推荐

开通VIP享超值会员特权

相关内容

热门标签

举报选择: