halt.
v1.4 Now in public beta · streaming traces shipped

Stop your AI before
it does something stupid.

Halt is the observability and intervention layer for production AI agents. See every step, kill bad runs in flight, and replay any execution down to the token.

Free tier · no card required
SDKs: Python · TS · Go · Rust
Self-host or cloud
halt · support-agent-prod · us-east-1
status: any model: gpt-4o cost > $0.20 latency > 8s + filter
id
trace
cost
latency
tr_8a2f
refund_lookup → stripe.charges.retrieve
$0.012
412 ms
tr_8a2e
summarize_ticket → tools.search_kb
$0.041
1.2 s
tr_8a2d
draft_response → recursive call (depth 4)
$0.18
9.4 s
tr_8a2c
classify_intent → low-confidence (0.42)
$0.008
320 ms
tr_8a2b
execute_refund → guardrail.amount_exceeded
$0.024
182 ms
tr_8a2a
refund_lookup → stripe.customers.list
$0.011
388 ms
tr_8a29
summarize_ticket → tools.search_kb
$0.038
1.1 s
Trusted by teams shipping AI to production
Glasswing Northwind Foxtail Lattice/AI Helix Labs Pebble Quarry Mercer Stack
Three primitives

Trace it. Stop it. Replay it.

Three primitives wired into your SDK. No proxies, no sidecars, no schema rewrites. Drop in one import and the rest is opinionated for you.

Trace

Every prompt, tool call, retrieval, and token — laid out as a structured tree. Search by intent, not by log line.

support_agent.run2.4s
├─ classify_intent320ms
├─ search_kb680ms
│ ├─ embed.query42ms
│ └─ vector.search (k=8)120ms
├─ draft_response1.1s
│ └─ llm.complete (gpt-4o)980ms
└─ execute_refund — halted8ms

Stop

Define guardrails in code or in the UI. When an execution crosses a line, Halt severs the run mid-flight — under 80 ms.

depth > 3
cost / run < $0.25
tools allowed
depth_exceeded — draft_response → recursive call (4 / 3)

Replay

Re-run any past trace against new prompts, models, or tools. Diff the outputs side-by-side. Ship the change with proof.

tr_8a2d · gpt-4o step 23 / 47 0:48 / 1:42
In the product

Every surface tuned for one job: keep the agent honest.

Filters

Slice 10M traces in a query.

A structured filter language that maps 1:1 to SQL. Save common slices as views — your team sees the same picture you do.

model="gpt-4o"×
cost.usd>0.20×
tool.namein["stripe.*", "db.write"]×
guardrail.fired=true×
+ add filter
Spend

Token economics, hourly.

Per-agent, per-customer, per-feature. Set hard caps and Halt enforces them at the SDK boundary.

00:0012:00now
$184.20today
$0.027per run avg
↓ 38%vs last week
Evaluation

Score every run.

Built-in evaluators for faithfulness, toxicity, and tool-correctness — or BYO function.

Faithfulness
94
Tool correctness
88
Latency budget
62
PII leak
100
Cost budget
34
Hallucination
91
Webhooks

Wire it into your incident loop.

Slack, PagerDuty, Linear, or a plain HTTP endpoint. Stream new guardrail events to the same place your humans live.

POST https://hooks.slack.com/services/T0XX...
event: guardrail.fired
rule: "depth_exceeded"
trace: "tr_8a2d"
agent: "support.v3"
action:"halted"
204 No Content · 38ms
Drop-in

Three lines, then you're traced.

Halt wraps your model client. Anything that goes through it gets a trace, a guardrail check, and a kill switch — automatically.

# pip install halt-sdk
from halt import Halt
from openai import OpenAI

client = Halt.wrap(OpenAI(), project="support-prod")

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "refund order 8214"}],
    tools=tools,
)
# every call now traced. guardrails enforced. cost tracked.
// npm i @halt/sdk
import { Halt } from "@halt/sdk";
import OpenAI from "openai";

const client = Halt.wrap(new OpenAI(), { project: "support-prod" });

const res = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "refund order 8214" }],
  tools,
});
# send a trace from any language
curl https://ingest.halt.dev/v1/traces \
  -H "Authorization: Bearer $HALT_KEY" \
  -H "Content-Type: application/json" \
  -d '{"agent":"support.v3","model":"gpt-4o","input":"refund 8214"}'
halt init  → project ready · ingest endpoint live · sample trace received in 1.2s
Why Halt

APMs were built for stateless services. Agents aren't stateless.

Traditional observability tells you something broke after the fact. By then your agent has refunded the wrong customer, looped through $40 of tokens, or written to prod.

Without Halt today
×
You see failures in retrospect.Datadog ships the trace 30 seconds late. The bad action already happened.
×
"Stop" means redeploy.The only kill switch is shipping a new version and waiting for it to roll out.
×
Cost is a monthly surprise.You learn about the $14k token bill from your CFO, not your dashboard.
×
Reproducing a bug is forensic work.Stitch together logs, prompts, retrieval payloads, and tool outputs by hand.
With Halt now
Intervene before the side effect.Guardrails fire pre-tool-call. The bad write never reaches your database.
One click to halt — global or scoped.Kill one trace, one customer, one rule, or the whole agent. No deploy.
Hard caps that bite.Per-agent and per-tenant spend limits enforced at the SDK boundary in milliseconds.
Replay any run, any time.Re-run an old trace against a new model. Diff the outputs. Decide with data.
Pricing

Pay for what you trace. Stop runs for free.

No per-event billing. No retention games. Halt's guardrails and replay are included on every tier — because killing a bad run shouldn't cost extra.

Free
For side-projects and proofs of concept.
$0/ forever
Start free
  • 500 traces / month
  • 7-day retention
  • Unlimited guardrails & stops
  • 1 project · 1 seat
  • Community Slack
Team
For startups in production.
$49/ seat / month
Start 14-day trial
  • 1M traces / month included
  • 90-day retention
  • Replay & diff against any model
  • Webhook + Slack + PagerDuty
  • Per-agent & per-tenant spend caps
  • Role-based access · audit log
Enterprise
For regulated and high-volume teams.
Custom
Talk to sales
  • Unlimited traces & retention
  • SSO (SAML / OIDC) · SCIM
  • VPC peering or self-hosted
  • SOC 2 Type II · HIPAA · DPA
  • Dedicated success engineer
  • 99.99% uptime SLA

Stop your next failure
before it ships.

Drop in the SDK, watch traces stream in. If the first guardrail saves you a Postmortem, the rest of the year is free.

No credit card · 500 traces/mo free · 14-day Team trial