Research and Deployment of AIÂ Agents for Biology
benchmarks.bio
We build verifiable benchmarks that measure how frontier AI agents perform on messy, real-world biological data — across spatial, single-cell, epigenomics, and preclinical pharmacology. Every result is reproducible and published openly.
Benchmarking AI Agents on Long-Horizon Single-Cell Biology
arXiv · scBench-Long
We introduce scBench-Long, a benchmark for long-horizon single-cell biology in which agents must recover scientific conclusions from raw or near-raw data without prescribed methods. The benchmark contains 21 evaluations spanning melanoma CD8 T-cell reactivity, regulatory inference, human–monkey chimera development, KRAS-driven lung tumor aging, and lethal COVID-19 lung pathology. Across 1,068 completed trajectories, the strongest model–harness pair passes just 16/63 runs (25.4%).
agent.bio
We deploy frontier agents into real scientific work — a public, interactive sandbox already supporting 40+ solution providers and 300+ biopharmas and R&D labs worldwide. Point it at your data, watch it analyze, and put the latest models to work in your lab.

Agent for Solution Providers
Agent for R&D Teams
Book a Call
Put AI to work on the hardest biology data.
- Understand how today's best models handle your data
- Build rigorous benchmarks for real-world biology
- Work directly with our research and engineering team