Lean community, Coq, Isabelle/HOL.
AlphaProof (DeepMind), LEGO-Prover.
Sakana AI (AI Scientist), Aemon (YC W26).
Xanadu (PennyLane), Qiskit (IBM).
Formal verification via Lean produces machine-checked proofs, not probabilistic outputs. Once a research team's proof library is built on Cajal's framework, migrating means re-proving everything from scratch. The intersection of LLM generation and formal verification is a narrow talent pool that Cajal's team already occupies.
Using LLM autoformalization into verified Lean 4 code, multi-agent collaborative proof search, and synthetic proof generation for a self-improving training flywheel.
Crowdsourced human-preference benchmarking platform for LLMs and generative AI models.
Neutral third-party evaluation becomes critical infrastructure as model proliferation outpaces any single lab's ability to grade itself credibly.
Catches AI agent failures before users see them by stress-testing across text, voice, and images.
AI agents are shipping to production faster than anyone can test them. Ashr generates synthetic users that stress-test agents across text, voice, and images before real users hit the failure modes.
Evaluates and certifies AI agents for safe deployment with red teaming and formal guarantees.
Red teaming and guardrails exist as separate tools. Cascade combines them into one platform with adaptive scaffolding that learns from production runs, already deployed across legal reasoning and customer support agents. The CEO researched graph reasoning and agentic safety at UC Berkeley's BAIR Lab.
Lets model builders inspect and steer AI behavior inside the latent space to catch failures.
Most AI safety tools work on model outputs. Envariant operates inside the latent space itself, detecting hallucinations and drift at the representation level before they surface. Beta SDK launched with applications in text LLMs, robotic agents, and protein models.