halt.

v1.4 Now in public beta · streaming traces shipped →

Stop your AI before
it does something stupid.

Halt is the observability and intervention layer for production AI agents. See every step, kill bad runs in flight, and replay any execution down to the token.

Start free Watch 90-sec demo

Free tier · no card required

SDKs: Python · TS · Go · Rust

Self-host or cloud

halt · support-agent-prod · us-east-1

status: any model: gpt-4o cost > $0.20 latency > 8s + filter

trace

cost

latency

tr_8a2f

refund_lookup → stripe.charges.retrieve

$0.012

412 ms

tr_8a2e

summarize_ticket → tools.search_kb

$0.041

1.2 s

tr_8a2d

draft_response → recursive call (depth 4)

$0.18

9.4 s

tr_8a2c

classify_intent → low-confidence (0.42)

$0.008

320 ms

tr_8a2b

execute_refund → guardrail.amount_exceeded

$0.024

182 ms

tr_8a2a

refund_lookup → stripe.customers.list

$0.011

388 ms

tr_8a29

summarize_ticket → tools.search_kb

$0.038

1.1 s

Trusted by teams shipping AI to production

Glasswing Northwind Foxtail Lattice/AI Helix Labs Pebble Quarry Mercer Stack

Three primitives

Trace it. Stop it. Replay it.

Three primitives wired into your SDK. No proxies, no sidecars, no schema rewrites. Drop in one import and the rest is opinionated for you.

Trace

Every prompt, tool call, retrieval, and token — laid out as a structured tree. Search by intent, not by log line.

▾ support_agent.run2.4s

├─ classify_intent320ms

├─ search_kb680ms

│ ├─ embed.query42ms

│ └─ vector.search (k=8)120ms

├─ draft_response1.1s

│ └─ llm.complete (gpt-4o)980ms

└─ execute_refund — halted8ms

Stop

Define guardrails in code or in the UI. When an execution crosses a line, Halt severs the run mid-flight — under 80 ms.

depth > 3

cost / run < $0.25

tools allowed

depth_exceeded — draft_response → recursive call (4 / 3)

Replay

Re-run any past trace against new prompts, models, or tools. Diff the outputs side-by-side. Ship the change with proof.

tr_8a2d · gpt-4o step 23 / 47 0:48 / 1:42

In the product

Every surface tuned for one job: keep the agent honest.

Filters

Slice 10M traces in a query.

A structured filter language that maps 1:1 to SQL. Save common slices as views — your team sees the same picture you do.

model="gpt-4o"×

cost.usd>0.20×

tool.namein["stripe.*", "db.write"]×

guardrail.fired=true×

+ add filter

Spend

Token economics, hourly.

Per-agent, per-customer, per-feature. Set hard caps and Halt enforces them at the SDK boundary.

00:0012:00now

$184.20today

$0.027per run avg

↓ 38%vs last week

Evaluation

Score every run.

Built-in evaluators for faithfulness, toxicity, and tool-correctness — or BYO function.

Faithfulness

Tool correctness

Latency budget

PII leak

100

Cost budget

Hallucination

Webhooks

Wire it into your incident loop.

Slack, PagerDuty, Linear, or a plain HTTP endpoint. Stream new guardrail events to the same place your humans live.

POST https://hooks.slack.com/services/T0XX...

event: guardrail.fired

rule: "depth_exceeded"

trace: "tr_8a2d"

agent: "support.v3"

action:"halted"

→ 204 No Content · 38ms

Drop-in

Three lines, then you're traced.

Halt wraps your model client. Anything that goes through it gets a trace, a guardrail check, and a kill switch — automatically.

# pip install halt-sdk
from halt import Halt
from openai import OpenAI

client = Halt.wrap(OpenAI(), project="support-prod")

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "refund order 8214"}],
    tools=tools,
)
# every call now traced. guardrails enforced. cost tracked.
        

// npm i @halt/sdk
import { Halt } from "@halt/sdk";
import OpenAI from "openai";

const client = Halt.wrap(new OpenAI(), { project: "support-prod" });

const res = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "refund order 8214" }],
  tools,
});
        

# send a trace from any language
curl https://ingest.halt.dev/v1/traces \
  -H "Authorization: Bearer $HALT_KEY" \
  -H "Content-Type: application/json" \
  -d '{"agent":"support.v3","model":"gpt-4o","input":"refund 8214"}'
        

halt init → project ready · ingest endpoint live · sample trace received in 1.2s

Why Halt

APMs were built for stateless services. Agents aren't stateless.

Traditional observability tells you something broke after the fact. By then your agent has refunded the wrong customer, looped through $40 of tokens, or written to prod.

Without Halt today

You see failures in retrospect.Datadog ships the trace 30 seconds late. The bad action already happened.

"Stop" means redeploy.The only kill switch is shipping a new version and waiting for it to roll out.

Cost is a monthly surprise.You learn about the $14k token bill from your CFO, not your dashboard.

Reproducing a bug is forensic work.Stitch together logs, prompts, retrieval payloads, and tool outputs by hand.

With Halt now

✓

Intervene before the side effect.Guardrails fire pre-tool-call. The bad write never reaches your database.

✓

One click to halt — global or scoped.Kill one trace, one customer, one rule, or the whole agent. No deploy.

✓

Hard caps that bite.Per-agent and per-tenant spend limits enforced at the SDK boundary in milliseconds.

✓

Replay any run, any time.Re-run an old trace against a new model. Diff the outputs. Decide with data.

Pricing

Pay for what you trace. Stop runs for free.

No per-event billing. No retention games. Halt's guardrails and replay are included on every tier — because killing a bad run shouldn't cost extra.

Free

For side-projects and proofs of concept.

$0/ forever

Start free

500 traces / month
7-day retention
Unlimited guardrails & stops
1 project · 1 seat
Community Slack

Team

For startups in production.

^$49/ seat / month

Start 14-day trial

1M traces / month included
90-day retention
Replay & diff against any model
Webhook + Slack + PagerDuty
Per-agent & per-tenant spend caps
Role-based access · audit log

Enterprise

For regulated and high-volume teams.

Custom

Talk to sales

Unlimited traces & retention
SSO (SAML / OIDC) · SCIM
VPC peering or self-hosted
SOC 2 Type II · HIPAA · DPA
Dedicated success engineer
99.99% uptime SLA

Stop your next failure
before it ships.

Drop in the SDK, watch traces stream in. If the first guardrail saves you a Postmortem, the rest of the year is free.

No credit card · 500 traces/mo free · 14-day Team trial