Luthien

Safety = Trust = Power

CEO · Scott Wofford

$4.5B¹

Profit generated at Amazon

CTO · Jai Dhyani

Shipped Language Models to

2B users

¹ Profit: all figures are based on A/B experiment results, annualized and include long-term sales/profit based on synthetic controls ($1.6B) and NPV of future cash flows at 10% discount rate ($3.5B). Jai Dhyani: Resume · RE-Bench paper · The Chart (METR). Scott Wofford: Resume.

Chart reconstructed from METR, "Measuring AI ability to complete long tasks" (Horizon v1.1), rendered on a linear time-horizon axis. Model positions are approximate.

12-hour autonomy: Claude Code documentation, Anthropic. Chart: METR Horizon v1.1 (linear axis, reconstructed).

45 seconds

average Claude Code turn length¹

¹ 45-second babysitting cadence: Anthropic, "Measuring Agent Autonomy".

Chart reconstructed from METR, "Measuring AI ability to complete long tasks" (Horizon v1.1), linear axis. ¹ 12-hour autonomy: Claude Code docs, Anthropic. ² 45-second cadence: Anthropic, "Measuring Agent Autonomy".

93

user interviews

Developer problems

Has your Claude Code ever...

Deleted important code

Ignored instructions

Impersonated the user to answer its own questions

Leaked secrets

Cheated on tests

Hidden errors

Failed to read documentation

Only partially completed tasks

Failed to update docs

Written 12 versions of the same function

Added 6 layers of pointless abstractions

Insisted on 'backwards compatability' for a one-off script only you use

Copied existing mistakes instead of correcting them

Used pip instead of uv even though you've told it to use uv like 40 times

Sources: Luthien's 93 user interviews (34 recorded, plus brief event conversations) · Twitter · Reddit · Hacker News · GitHub Issues · Cursor Forum · 204 data points

luthien.cc/frustrations

Leadership problem

"We have a citizen developer issue right now: giving [all 5,000 employees] access to Claude Code or Codex. How do we ensure that we have the minimum right things in place?"

Kris Kimmerle

VP, AI Risk & Governance

Future of Work?

...babysitting several AIs...

The future of work,

if you can trust your AIs

Inc.

Technology

An AI Ran a Simulated Vending Machine Business. It Lied, Cheated, and Extorted Its Way to the Top Anthropic's newest version of Claude displayed cutthroat tactics—and also revealed some big, weird feelings.

By Ben Sherry, Staff Reporter · @benlucassherry · Feb 6, 2026

        Source: andonlabs.com/blog/openai-gpt-5-5-vending-bench
      

Luthien's architecture

luthien-proxy: request pipeline

🧑‍💻 You ⇔ 💻 Claude Code ⇔ Luthien ⇔ ☁️ Anthropic API
                              |
                       logs every request and response.   ~5-15ms
                       you can configure rules/policies to
                       modify or block certain responses or requests:
                              |
                              |-- did it do what I asked?
                              |-- did it follow CLAUDE.md?
                              +-- did it do something suspicious?

Luthien can call a separate model to check whether each response follows your rules. Runs alongside normal requests, adding almost no delay. When Claude Code compacts or starts a new session, Luthien still remembers. Your rules stay enforced from first prompt to last.

Uses your existing Anthropic subscription via OAuth. Luthien never sees your API key and there are no extra charges.

Architecture Evolution

Four places where you can put LLM monitoring: LLM API, Agent scaffold, Code review, Global execution, Local execution, Cyber threat detection system

today

Luthien solves this.

A Fully Customizable Real-Time
Manager for Every AI in Your Org

What makes Luthien different from existing Gateways, Guardrails, and Observability Platforms?

Deep interventions

Not just accept, block, or replace.

Live stream modification

Not just response replacement.

Org-wide context

Not just single-turn rules.

Seamless API integration

Not just lossy wrappers.

What Makes Luthien Different?

Feature comparison

	Gateways	Guardrails	Post-hoc Code Review	Luthien
Live conversation observability	✗	✗	✗	✓¹
Block in real-time	Partial²	✓	✗	✓
Fully customizable: modify, insert or run arbitrary logic mid-stream	✗	✗	✗	✓³
Open source	Partial⁴	Partial⁵	✗	✓⁶
Org-wide multi-conversation context	✗	✗	Partial	✓

¹ Luthien sees the full multi-turn conversation in real time as it streams; competitors operate per-request, on the prompt-or-final-response boundary, or post-hoc.

² LiteLLM exposes custom guardrails as Python pre/post hooks — limited blocking only; cannot modify, insert, or apply mid-stream policies.

³ Luthien can modify, insert into, and apply mid-stream policies against agent activity, and run arbitrary logic against live conversation context.

⁴ LiteLLM core proxy is open source; the enterprise control plane is closed.

⁵ Guardrails AI ships an open-source validator library; the policy management layer is closed.

⁶ Luthien is fully open source at github.com/LuthienResearch/luthien-proxy under an MIT-compatible license.

What Makes Luthien Different?

March 24, 2026 · 10:39 UTC

litellm was compromised by a supply-chain attack.

For 40 minutes, installing litellm meant losing everything.

Source: LiteLLM security update (Mar 2026); Snyk analysis.

What Makes Luthien Different?

No safety infrastructure

Claude Code

user: "set up the dev environment"

↓

Anthropic API

generates tool call: uv sync

↓

Claude Code

runs uv sync

↓

OWNED

install runs, .pth payload fires

What Makes Luthien Different?

Helicone + Portkey + Guardrails

Claude Code

user: "set up the dev environment"

↓

Infra

portkey routes

✓ guardrails

helicone logged

↓

Anthropic API

generates tool call: uv sync

↓

Infra

portkey routes

✓ guardrails

helicone logged

↓

Claude Code

runs uv sync

↓

OWNED

install runs, .pth payload fires

↓

Claude Code

tool result returns: package list

↓

Infra

guardrails flags litellm==1.82.8 (post-hoc)

helicone logged

What Makes Luthien Different?

With Luthien

Claude Code

user: "set up the dev environment"

↓

Anthropic API

generates tool call: uv sync

↓

Luthien

rewrites in flight: uv sync → uv sync --dry-run

before it reaches Claude Code

↓

Claude Code

runs the dry-run; resolved packages return to Luthien

↓

Luthien

matches litellm==1.82.8 against live CVE feed

block install
alert users and agents across org
loop in security
dispatch agent for org-wide investigation

What Makes Luthien Different?

How?LIVE DEMO

"""Anthropic-native request processing pipeline.

This module provides a dedicated processing pipeline for Anthropic API
requests, using the native Anthropic types throughout without converting
to OpenAI format. This preserves Anthropic-specific features like
extended thinking, tool use patterns, and prompt caching.

Span Hierarchy
--------------
The pipeline creates a structured span hierarchy for observability:

    anthropic_transaction_processing (root)
    ├── process_request
    ├── process_response
    │   ├── policy_execute
    │   └── send_upstream (zero or more backend calls)
    └── send_to_client (non-streaming)
"""

We worry about complex details like this so your policies Just Work.

luthien-proxy/src/luthien_proxy/pipeline/anthropic_processor.py

In a few years, all knowledge workers will be using Claude Code, Cowork or similar agentic AI tools.³

Claude Code revenue, 0–9 months¹

$2.5B

Projected Growth of AI Coding Agents²

$14.6B

by 2033

Agentic AI TAM by 2030³

$155B

¹ Claude Code ARR: Anthropic Series G announcement, Feb 2026. ² AI code assistant market: SNS Insider AI Code Assistant Market Report (15.31% CAGR to 2033). ³ Bank of America Institute, "On the clock: Agentic AI in the workplace" (Sept 2025).

Developer Feedback

Users are highly motivated

"I saw the website and immediately thought, 'I want that.' It's like CLAUDE.md that actually works." Nicolas Mesa, Cofounder & CTO, Veleiro AI

"I could do maybe 30–50% more parallel threads… this is about freeing up cognitive resources." Elvis Sikora, AI Engineer, CloudWalk

"At least as valuable as Datadog ($20K/yr)… on the order of $10K–$100K/yr, depending on the customer." Sami Jawhar, Agent Wrangler, Trajectory Labs

"If this problem were solved, it would 2x my task completion efficiency." Matthew Handzel, Stealth AI Safety Startup

Executive Feedback

"You guys are solving a very important problem. It's a
no-brainer that companies would want a proxy that monitors and restricts Claude Code traffic. I heard from higher-ups at Capital One that they're considering such tooling."

"Enterprise AI usage monitoring is going to be
very important in the near future."

VP, 3,500-person legal tech company

Why this market grows

Mistake Rate × Tokens = Risk

↓ 10x

Mistake Rate

↑ 100x

Tokens

↑ 10x

Total Risk

Defensibility

Won't Anthropic build this?

"I think the space of AI risk mitigations is large and full of tricky details, and I am excited about people exploring mitigations like external API-level monitors."

Fabian Roger

AI Control Researcher

Doesn't make sense for labs to build provider-agnostic tooling.

Fabian Roger, Anthropic AI Control Researcher, private correspondence, Apr 14, 2026. Shared with permission.

The Playbook

Do things that don't scale.

Supabase

Turned down million-dollar enterprise contracts to protect developer focus.

Self-serve Postgres-based backend with tiered pricing.

$5B

Oct 2023 Series E

HashiCorp

Hand-built Terraform configs for each early customer.

Terraform module registry and HashiCorp Cloud Platform.

$14B

Dec 2021 IPO

Hand-built solutions for AI failures.

AI control layer as a service.

TBD

¹ Supabase valuation: Supabase Series C announcement, Oct 2023. ² HashiCorp acquisition: IBM acquired HashiCorp for ~$6.4B in 2024; peak market cap reached $14B at Dec 2021 IPO. hashicorp.com.

Traction

Sales pipeline

No outbound or sales needed yet

93 orgs qualified

20 problem discoveries

14 live trials

4 LOIs signed $340K-$600K

Traction ← zoom in on the funnel

We asked the smartest people we know to trial our product.
They're proactively proposing improvements.

We absolutely need this [i.e. Luthien].

Kris Kimmerle · VP, AI Risk & Governance

Live trial Tue May 6.
Pilot planned. Target deployment:

2,000devs

Pioneered AI control research.

Buck (CEO): 2,935 Google Scholar citations.

Co-discovered alignment faking with Anthropic.

$330K–$500KLOI

Signed Fri Apr 10.
15–20 individuals.

Building RL environments for frontier labs.

Sami built Legion, an autonomous dev swarm.

12 Luthien PRs since pilot started Sun Apr 12.

$60KARR

Signed Tue Apr 14.

¹ RealPage: live trial Tue May 6, 2026 with Kris Kimmerle (VP, AI Risk & Governance) and Yashwanth (Principal AI Engineer). Pilot SOW being scoped at RealPage, ~2,000 devs targeted. ² Redwood Research: LOI signed Fri Apr 10, 2026, $330K–$500K, 15–20 individuals. Pilot in progress. ³ Trajectory Labs: contract signed Tue Apr 14, 2026, $60K ARR. 12 Luthien PRs submitted by Sami Jawhar since pilot started Sun Apr 12, 2026 (verify on GitHub).

The Ask