Skip to content
CAAIL

AI Agents & Foundation Models

The fastest-moving corner of the library: general-purpose AI agents, biological foundation models, and the initiatives building toward predictive “virtual cells.” Because CAAIL organizes its catalogs by resource type, this theme is deliberately spread across several files — the agent frameworks and foundation models live in Software.md, the research papers in Papers.md, the evaluation suites in Datasets/Benchmarks.md. This page is the connective hub: it owns the ecosystem-and-initiatives view of the area, and points to the canonical home of every other facet.

Virtual Cell Initiative & Single-Cell Foundation Models

Companion landing pages, blog posts, and challenge announcements for the virtual-cell initiative — Arc Institute’s foundation-model program (State, Stack), the broader CZ Virtual Cells Platform, and the open Virtual Cell Challenge. The conceptual framing for this cluster is in Papers.md ref #128 (Bunne et al. 2024, Cell) and ref #129 (Roohani et al. 2025, Cell); the foundation models themselves live in the Foundation Models rows of Papers.md — split by training paradigm (next-token prediction, masked language modeling, LM + biological priors, cell-state & perturbation prediction) — and in the corresponding entries in Software.md.

Agent frameworks & foundation models in Software.md

The open-source tools themselves are catalogued under AI Agents & Foundation Models in Software.md. That section spans two families: general-purpose agent frameworks and tool ecosystems — ToolUniverse, Biomni, AIAgents4Pharma, BRAD, CellForge, PaperQA, BioContextAI, and the broader MCP-server layer — and single-cell foundation models — Geneformer, scGPT, scBERT, scFoundation, UCE, TranscriptFormer, Cell2Sentence, and Arc’s State. New tools are added there, in their type-correct home; this hub does not duplicate the catalog.

In the Papers matrix

In the Papers matrix, this area is reachable two ways: the LLMs / AI Agents method row (agentic systems with tool use, retrieval, and reasoning) and the AI Tooling / Methodology column (general-purpose methods and agents not yet tied to a specific cell-ag application). The AI Tooling / Methodology research-area page gives the deeper narrative for that column.

Evaluation & benchmarks

How well these models and agents actually perform is tracked separately. The benchmark datasets — including the foundation-model-relevant ProteinGym, LAB-Bench, and BixBench — are catalogued in Datasets/Benchmarks.md, and the evaluation methodology, assay-level work such as AssayBench, and the AI Evaluation & Benchmarking matrix column are covered in the AI Evaluation & Benchmarking research-area page. Not every benchmark there is AI-agent-specific — many are general or domain benchmarks — which is why evaluation keeps its own home rather than folding into this page.

Talks & onboarding