Latch Biosecurity

Scaling AI in Biology
Responsibly

Helping frontier AI labs evaluate and audit their systems for biosecurity risk.

Why now

AI is learning biology fast.
Safety has to scale with it.

Latch builds the measurement layer for biosecurity: the benchmarks, audits, and red-teaming that tell you whether your models meaningfully raise biological risk, and by how much.

Benchmarks

Benchmarks and evaluations of AI models across the biosecurity-relevant capabilities that matter, from advancing biological threats to strengthening biodefense. We refresh them over time to prevent saturation.

View the leaderboard

Audits

Independent, pre-deployment assessments of the biological risk a frontier model poses, run to feed directly into your responsible scaling policy.

Request an audit

Red-teaming

Targeted adversarial testing of a model’s bio-capabilities: pushing on the exact edges a benchmark surfaces.

Agentic KYC

Autonomous identity and legitimacy checks on applicants, so gene synthesis companies and frontier AI labs know exactly who they’re granting access to.

Our Background

Built biosecurity tools that run in the real world, from a national surveillance pipeline to frontier AI evaluations

Our algorithms took the fastest-analysis prize in DARPA's Bio-Attribution Challenge, our benchmarks stress-test frontier AI models for biosecurity risk, and our research has pioneered machine-learning methods that both predict viral evolution and reveal how AI models represent it.

Selected papersYear

Paper · 01 / 092026

BioSecBench-Refusal: A paired metric for performance and alignment in agentic biosecurity risk assessment

Edwin H. Wintermute, Harmon Bhasin, Christina M. Agapakis, Dianzhuo Wang, Evan Seeyave, Arjun Banerjee, Daniel Fulop, Matthew C. Watson, Adam J. Meyer, Sandrine Boissel, Jens H. Kuhn, Rishi Jain, Noah D. Taylor, Helena Shomar, Patrick M. Boyle, Kenny Workman

arXiv

A paired benchmark measuring both capability and caution in AI agents applied to biology: 61 legitimate 'Routine' tasks adapted from published literature alongside 46 'Red-Team' tasks that conceal a biosecurity hazard inside a realistic research scenario. Across 16 model-harness configurations, refusal rates ranged from 7–74% on Routine tasks and 1–62% on Red-Team tasks — with many systems refusing legitimate work as often as concealed threats, and most refusals triggered by provider API filters before the model could reason. Released as a tool for developers to calibrate capability against caution for agentic biotech R&D.

Latch · American WetwareRead paper

Build the future oflife sciences responsibly.

Talk to us