AI Tooling / Methodology

This page describes the AI Tooling / Methodology column of the Papers.md matrix — general-purpose AI methods, foundation models, agent frameworks, benchmarks, and infrastructure that could be applied across many cellular-agriculture problems but haven’t yet been deployed against any specific one. As of late 2026 this is the highest-traffic column in the matrix, reflecting both the rapid expansion of AI-for-biology research and CAAIL’s curatorial decision to track these foundational methods even when their cell-ag application is implicit rather than demonstrated.

The boundary between this column and the applied columns is intentionally pragmatic: a paper that introduces a general method and demonstrates it on a cell-ag problem belongs in the applied column for that problem; a paper that introduces a general method without a cell-ag deployment belongs here. As the column has grown, it is now sub-divided into seven method-architecture clusters that each occupy their own matrix row. The sections below mirror those clusters, plus a short section for AI Tooling-column refs that live in other method rows and a final section cross-referencing unpublished agent software catalogued in Software.md. Closely paired is the AI Evaluation & Benchmarking column, which catalogues papers whose primary contribution is an evaluation suite, benchmark, or eval methodology for the tools in this column.

Foundation Models for Biology

Pretrained transformer models — distinct from agents in that they predict or generate without invoking external tools or planning multi-step workflows. They are the substrate downstream agentic systems build on, and the cluster where the boundary between “method paper” and “applied paper” is most often pragmatic rather than principled.

#42 OmicsLM (Sypetkowski et al. 2026) — a multimodal LLM connecting quantitative transcriptomic profiles with natural-language biological tasks; a candidate substrate for any cell-ag work involving omics interpretation, particularly in cellular-engineering contexts where scRNA-seq is already the dominant readout (refs 5, 8, 9, 12, 13).
#88 SpectraLLM (Su et al. 2026) — an LLM pretrained and fine-tuned to reason over multi-spectral data (IR, Raman, UV-Vis, NMR, MS) in a shared language space for end-to-end molecular structure prediction; the spectral analogue of OmicsLM for sensomics workflows (see also SensoryPrediction.md).
#91 NP Foundation Model (Ding et al. 2026, Nat Machine Intelligence) — a pretrained foundation model for small-molecule natural products; substrate for downstream tasks ranging from media-component prediction to flavor-precursor discovery.
#92 TranscriptFormer (Pearce et al. 2026, Science) — a generative cell atlas trained across 1.5 billion years of evolution and 12 species; the closest existing cross-species transcriptomic foundation model, with direct cell-ag utility for translating biological knowledge between bovine, porcine, chicken, salmonid, and other livestock cells where annotated reference data is sparse (the per-species pages in Datasets/ collect that cell-ag data substrate). Catalogued at Foundation Models × Cellular Engineering rather than × AI Tooling because its primary cell-ag relevance is direct: cross-species transcriptomic reasoning is the cellular-engineering substrate. See also the TranscriptFormer software entry.
#224 ProCyon (Queen et al. 2025, Zitnik lab) — a multimodal foundation model unifying protein sequence, structure, natural-language phenotype text, and small-molecule structure in one latent space (bridged by natural language), trained on PROCYON-INSTRUCT, a dataset of 33M+ protein–phenotype instruction examples; it retrieves proteins by mechanism-of-action or disease context and generates open-ended phenotype descriptions, a candidate substrate for annotating cell-ag-relevant proteins (growth factors, ECM components, differentiation regulators) from sequence and structure.

Scientific Literature & Discovery Agents

Agents focused on the scientific research process itself — retrieving and synthesizing literature, generating hypotheses, drafting papers. They are not domain-specific to biology, but the literature, methods, and ideation patterns relevant to cell-ag are squarely within their operating envelope.

#44 PaperQA (Lála et al. 2023, FutureHouse) — canonical retrieval-augmented agent for answering questions over scientific literature with verified citations.
#46 PaperQA2 (Skarlinski et al. 2024, FutureHouse) — extends PaperQA with superhuman synthesis of scientific knowledge.
#45 The AI Scientist (Lu et al. 2024, Sakana AI) — first end-to-end framework for fully automated open-ended discovery: idea generation, experiment design, paper drafting.
#47 The AI Scientist-v2 (Yamada et al. 2025, Sakana AI) — workshop-level extension using agentic tree search.
#53 ARIEL (Liu et al. 2026) — biomedical AI research assistant with expert-involved learning.
#153 AI Co-Scientist (Gottweis et al. 2026, Nature) — a Gemini-based multi-agent system whose six specialized agents (Generation, Reflection, Ranking, Evolution, Proximity, Meta-review) generate and tournament-refine novel hypotheses under scientist-in-the-loop supervision; validated on drug repurposing, target discovery, and antimicrobial resistance.
#154 Robin (Ghareeb et al. 2026, Nature) — a “lab-in-the-loop” multi-agent system automating both hypothesis generation and data analysis for experimental biology, coordinating PaperQA2-based literature agents (Crow, Falcon) with a data-analysis agent (Finch); it proposed and experimentally confirmed a repurposed drug candidate for dry age-related macular degeneration.
#166 ERA (Aygün et al. 2026, Nature) — an LLM-plus-tree-search system that writes expert-level empirical analysis software to maximize a measurable quality metric, drawing ideas from the literature; in bioinformatics it discovered single-cell batch-integration methods that outperformed the top human methods on a public benchmark.
#221 IP-RAR (Feng et al. 2025, GigaScience) — a retrieval-augmented “deep-thinking” pipeline that builds a stratified biomedical knowledge graph (BioStrataKG) and applies integrated-then-progressive reasoning with a self-reflection step; on its own BioCDQA cross-document benchmark it improves document-retrieval F1 by ~20% and answer accuracy by ~25% over prior methods — a literature-mining pattern for surfacing cross-document associations (e.g. media-component → phenotype) in cell-ag literature.
#222 cmbagent-VLM (Gandhi et al. 2025) — extends a multi-agent autonomous-discovery system with vision-language-model steering: a VLM-as-judge scores agent-generated plots against dynamically generated rubrics and iteratively debugs them, alongside a scientific-anomaly-detection loop. Demonstrated in cosmology and astrochemistry (pass@1 of 0.7–0.8 versus 0.2–0.3 for code-only baselines), but the “VLM judges figures to self-correct exploratory analysis” pattern transfers to any data-driven cell-ag analysis loop.

General-Purpose Biomedical Agents

Broad biomedical-reasoning agents designed to handle multiple sub-domains within drug discovery, biomarker discovery, or generalist biomedical research. Distinct from the domain-specific cluster below in that they pitch themselves as horizontal platforms rather than vertical tools.

#40 TxAgent (Gao et al. 2025, Zitnik lab) — AI agent for therapeutic reasoning across a universe of 211 biomedical tools.
#49 Biomni (Huang et al. 2025, Stanford SNAP / Leskovec) — general-purpose biomedical AI agent across drug discovery, genomics, and clinical analysis.
#94 BRAD (Pickard et al. 2025, Bioinformatics) — automatic biomarker discovery and enrichment.
#95 OLAF (Riffle et al. 2025) — Open Life Science Analysis Framework for conversational bioinformatics; agent-pipe-router architecture.
#96 STELLA (Jin et al. 2025) — self-evolving LLM agent for biomedical research.
#98 BioMANIA (Dong et al. 2024) — conversational chatbot-per-Python-tool framework for simplifying bioinformatics data analysis.
#223 BioScientist Agent (Zhang et al. 2025) — an end-to-end, KG-augmented multi-agent framework for drug repurposing and mechanism-of-action elucidation: a variational graph auto-encoder predicts drug–disease links, an adversarial actor–critic reinforcement-learning module traverses a 6.3M-node biomedical knowledge graph to recover mechanistic paths, and an LLM layer generates causal reports; the VGAE-plus-RL-path-plus-LLM-report architecture is retargetable to cell-ag hypothesis generation (gene → pathway → phenotype).

Chemistry / Synthesis Agents

Foundational LLM-agent patterns from chemistry. Both papers share a common template (LLM + curated chemistry tools + autonomous experiment loop) and have been the dominant reference for tool-augmented LLM design in the broader biomedical-agent literature.

#70 Coscientist (Boiko et al. 2023, Nature) — GPT-4 autonomous chemistry research system; the foundational pattern for tool-augmented LLMs in the natural sciences.
#71 ChemCrow (Bran et al. 2024, Nat Machine Intelligence) — GPT-4 + 18 chemistry tools for synthesis planning, drug discovery, and materials design.
#161 ether0 (Narayanan et al. 2025, FutureHouse) — a 24-billion-parameter chemistry reasoning model post-trained with reinforcement learning on verifiable chemistry problems; it reasons in natural language and emits molecular structures as SMILES, with direct relevance to media-component, scaffold-chemistry, and flavor-molecule design.

Domain-Specific Biomedical Agents

Agents specialized for a single biomedical task or sub-domain. Narrower scope than the general-purpose cluster above, but often the most directly transferable to cell-ag because the underlying tasks (sequencing analysis, perturbation prediction, metabolic-model reasoning, spatial biology) are also the tasks cell-ag workflows depend on.

#43 Lee NGS (Lee et al. 2025) — agentic AI for NGS downstream analysis (differential-expression workflows for users without computational backgrounds).
#50 Talk2Biomodels (Wehling et al. 2025, BMC Bioinformatics) — agent for kinetic biological models (SBML); the kinetic-modeling member of the AIAgents4Pharma Talk2X family.
#51 Medea (Sui et al. 2026, Zitnik lab) — omics AI agent for therapeutic discovery.
#54 ClockBase Agent (Ying et al. 2025) — autonomous agent mining methylation/RNA-seq data for aging interventions.
#56 SpatialAgent (Wang et al. 2025) — autonomous agent for spatial biology research.
#66 Lila (Singh et al. 2023, Carbonell group) — automated scientist for microbial strain design.
#68 D2Cell (Li et al. 2024) — an LLM pipeline that mines metabolic-engineering literature into a knowledge base and feeds a hybrid deep-learning / genome-scale-model target predictor, with a retrieval-augmented chatbot over the resulting database.
#69 KinModGPT (Maeda & Kurata 2023) — GPT-driven generation of SBML kinetic models from natural-language texts.
#167 Talk2Biomodels / Talk2KnowledgeGraphs (Singh et al. 2025) — the workshop paper introducing the LangGraph-based agents that give natural-language access to SBML systems-biology models (the BioModels database) and biomedical knowledge graphs; the published core of the AIAgents4Pharma Talk2X family alongside the later #50 Talk2Biomodels.
#90 GenCellAgent (Yu et al. 2026) — a training-free multi-agent framework that routes cellular images to the best segmentation tool and supports text-guided segmentation of structures existing models miss; an imaging and phenotyping agent for reading out engineered cells (discussed for its cell-characterization use under Cellular Engineering).
#126 scBaseCount (Youngblut et al. 2025, Arc Institute) — an AI agent that discovers and uniformly reprocesses public scRNA-seq data into a self-updating repository of more than 500 million cells across 27 organisms; data-curation infrastructure that lowers the entry barrier for any single-cell modeling, including for livestock species.

Sibling refs in this row’s × Cellular Engineering cell — cell-engineering-specific agentic work — include #93 CellForge (Tang et al. 2026, agentic design of virtual cell models) and #97 PerTurboAgent (Hao et al. 2025, self-planning agent for sequential Perturb-seq).

Robot Scientists & Lab Automation

Closed-loop autonomous research systems integrating LLM agents with wet-lab execution. The cluster spans 25+ years of “Robot Scientist” work from the King group (Adam → Eve → Genesis) extended by recent LLM-era multi-agent systems.

#182 The Robot Scientist (King et al. 2004, Nature) — the original autonomous “Robot Scientist”: it abductively generates functional-genomics hypotheses and uses active learning to select cost-optimal yeast deletion-mutant experiments, matching human experiment-selection performance — the founding reference for this cluster’s Adam → Eve → Genesis lineage.
#64 Genesis (Tiukova et al. 2024, King group) — third-generation robot scientist for systems biology.
#65 AutonoMS (Brunnsåker et al. 2025) — agentic AI integrated with scientific knowledge for laboratory validation in systems biology.
#62 BioMARS (Qiu et al. 2025) — a fully-robotic multi-agent system (Biologist, Technician, and Inspector agents built on LLMs and vision-language models, driving dual-arm robotics) that autonomously performs cell passaging and culture, matching or exceeding manual performance and outperforming conventional strategies when optimizing retinal-pigment-epithelial differentiation (discussed for its bioprocess use under Bioprocess & Scale-Up).

Sibling refs in this row’s × Bioprocess & Scale-Up cell — applied multi-agent lab automation for cell and organoid manufacturing — include #61 Agentic Lab (Wang et al. 2025). See also MetabolicModeling.md for the broader closed-loop FBA / strain-design context.

Agent Infrastructure (Frameworks, KGs, Protocols)

Substrate platforms — agent frameworks, biomedical knowledge-graph backends, and tool-orchestration protocols — that downstream agents are built on. These are the papers that don’t introduce a specific agent but introduce the scaffolding that bespoke agent projects have historically had to reinvent.

#41 ToolUniverse (Gao et al. 2025) — ecosystem for democratizing AI scientists from any open- or closed-weight model; companion to TxAgent (#40) and the rest of the Zitnik-lab agent stack.
#48 BioCypher (Lobentanzer et al. 2025, Nat Biotech) — knowledge-graph platform purpose-built for biomedical applications of LLMs.
#67 MCP Servers for biology (Ruscone et al. 2025, Saez-Rodriguez lab) — Model Context Protocol server implementations as AI-biology interfaces (NeKo, MaBoSS, PhysiCell).
#133 BioContextAI (Kuehl et al. 2025, Nat Biotech) — an open, community registry and reference servers for the Model Context Protocol in biomedicine, standardizing how agentic assistants discover and compose tool servers; its Knowledgebase MCP exposes more than 15 biomedical resources.
#160 Aviary (Narayanan et al. 2024, FutureHouse) — an extensible “gymnasium” for training language agents on language decision processes, with scientific environments for molecular cloning, literature QA, and protein stability; trained small open-source models matched experts and frontier LLMs at far lower inference cost.
#162 SciAtlas (Qiao et al. 2026) — a large multidisciplinary academic knowledge graph (43M+ papers, 157M entities, 3B triplets) with a neuro-symbolic “tri-path” retrieval algorithm, pitched as a topological substrate for automated-research agents.
#219 ESCARGOT (Matsumoto et al. 2025, Bioinformatics) — an LLM agent combining a dynamic, Python-executable Graph of Thoughts with biomedical knowledge graphs queried by Cypher (with a vector-database fallback); on the AlzKB Alzheimer’s knowledge graph it reached ~88% on open-ended one-hop questions versus ~4% for a vanilla LLM, running deterministic operations as real Python rather than hallucinating them — a hallucination-resistant KG-reasoning pattern a cell-ag knowledge graph could adopt.
#220 PloverDB (Glen et al. 2025, Bioinformatics) — a fully in-memory, pure-Python platform that serves Biolink-compliant knowledge graphs as TRAPI-standard web APIs, roughly 3× faster than a Neo4j-backed equivalent; infrastructure for standing up a cell-ag biomedical knowledge graph as a queryable, standards-compliant API that agents like ESCARGOT can call.

Other AI methodology in the AI Tooling column

A handful of AI Tooling-column papers live outside the LLM/agent taxonomy above:

#52 BioMedReasoner (Mulyadi et al. 2025, NeurIPS 2025 AI4Science Workshop) — multi-hop reasoning via path-based relational learning on biomedical knowledge graphs (lives in the GNN row).
#63 Pandi et al. (2022, Nat Comms) — versatile active-learning workflow for optimization of genetic and metabolic networks (lives in the Active Learning row).
#197 Epicure (Radzikowski & Chen 2026) — skip-gram (Metapath2Vec) food-ingredient embeddings trained on 4.14M recipes across seven languages plus a FlavorDB ingredient–compound graph, supporting ingredient pairing and sensory/cuisine navigation; a representation-learning method (in the Deep Learning row) most relevant to flavor design (see also Sensory Prediction).
#4 D-GEX (Chen et al. 2016, Bioinformatics) — a multi-task neural network that infers the expression of ~9,500 target genes from ~1,000 landmark genes, a cost-reduction strategy for profiling cell lines (in the Deep Learning row); also discussed under Cellular Engineering for single-cell characterization.
#198 ropls (Thévenot et al. 2015, J. Proteome Research) and #199 mixOmics (Rohart et al. 2017, PLoS Computational Biology) — the canonical R/Bioconductor packages for multivariate chemometrics (PCA / PLS / OPLS-DA) and sparse multi-omics feature selection and integration, respectively (the Chemometrics row); general-purpose statistical workhorses for any LC-MS/NMR metabolomics or multi-omics analysis in cell-ag — spent-media profiling, off-flavor discrimination, condition–phenotype association — and the multivariate engines behind several sensory and bioprocess workflows.

A growing number of AI-agent tools are released as open-source platforms or commercial products without a separate companion paper. They don’t appear in the matrix (which catalogues papers, not tools) but are catalogued in Software.md, and they fill important gaps in the agent landscape:

K-Dense-AI — co-scientist ecosystem combining the commercial K-Dense Web platform with an open-source stack of Agent Skills (scientific-agent-skills, 120+ skills wrapping scientific Python libraries including cobrapy, pyopenms, scanpy/scvi-tools, rdkit/datamol), the k-dense-byok desktop client, the mimeo skill-generation tool, and claude-scientific-writer. Directly composable with Claude Code, Cursor, and other code-execution agents.
Superpowers (obra/superpowers) — one of the most-starred general-purpose Claude Code skill collections (Jesse “obra” Vincent). Domain-agnostic skills for planning, debugging, code review, and execution that compose cleanly with the cell-ag-specific skill packs above.
Skill Seekers — meta-tool that converts documentation websites, GitHub repos, and PDFs into Claude AI skills with automatic conflict detection. The closest existing automation for the pattern an AI-augmented cell-ag lab needs as it scales: turn a new wet-lab protocol, bioinformatics package, or GitHub library into a skill that the lab’s agents can call directly.
AI Research Skills Library (orchestra-research/AI-research-SKILLs) — open-source skills library covering AI research and ML engineering infrastructure (vLLM, Megatron, GRPO, HuggingFace). Relevant for cell-ag teams building or fine-tuning their own biology foundation models or running large-scale agentic workflows.
AIAgents4Pharma — beyond the published Talk2Biomodels (#50), the platform hosts three sibling Talk2X agents — Talk2KnowledgeGraphs, Talk2Cells, and Talk2Scholars — that share infrastructure but lack standalone papers at time of curation.
Seqera AI / Co-Scientist — Seqera Cloud’s AI assistant for Nextflow pipeline authoring, debugging, and workflow-run analysis.
Dotmatics Luma — commercial lab-orchestration platform connecting laboratory instruments, data systems, and AI assistance; representative of the commercial-tooling layer cell-ag startups increasingly evaluate as they scale beyond bench-scale workflows.