Observability — Free Built-In Observability Layer

Full-Chain Trace Viewer

Every decision, visible

Visualize every decision, tool call, and model response for every step of every agent mission. Drill into token usage, latency, and cost at each node.

Debugging Latency Cost Audit

Online Evaluation

LLM-as-Judge, real-time

Real-time scoring of production traffic across dimensions: hallucination, reasoning quality, safety, and more. Catch regressions the moment they appear.

Hallucination Reasoning Safety

Offline Eval & Regression Tests

Ship with confidence

Run offline evaluation and regression test suites before every deploy. Compare versions side-by-side to ensure updates never degrade quality.

CI/CD Regression Version Compare

Insights Agent

AI-powered failure analysis

An AI agent that automatically analyzes your traces, clusters failure patterns, and surfaces emerging issues before they become systemic problems.

Clustering Pattern Detection Proactive

AI Debug Assistant

Ask your traces a question

Query your trace data with natural language. Ask "Why did this agent retry three times?" and get a structured answer with the exact spans that caused the issue.

NL Query Root Cause Suggestions

Annotation Queue

Human-in-the-loop review

Low-scoring traces are automatically routed to human reviewers. Build a continuous quality feedback loop that makes your agents smarter over time.

HITL Quality Labeling

Prompt Playground

Debug and iterate online

Online prompt debugging and version comparison. Test prompt variations side-by-side, compare outputs, and iterate without redeploying your agents.

Prompt Debug A/B Test Version

A/B Experiment Comparison

Data-driven optimization

Run A/B experiments comparing different prompts, models, and strategies. Statistical significance testing ensures you ship the variation that actually performs better.

Experiment Statistics Optimization

Cost / Latency / Error Dashboards

Monitor what matters

Pre-built monitoring dashboards for cost, latency, and error rates with configurable alert thresholds. Know the health of every agent at a glance.

Monitoring Cost Alerts SLA

DEEP DIVE 01

Insights Agent

An AI agent that watches your AI agents. It continuously analyzes trace data, clusters failure patterns, and proactively surfaces emerging issues — so you fix problems before users notice them.

CORE CAPABILITIES

Failure Clustering: Groups similar errors by root cause, not just error code — distinguishing timeout failures from logic bugs from model hallucinations

Pattern Detection: Identifies systemic issues across thousands of traces, like a specific tool consistently failing on edge-case inputs

Trend Analysis: Tracks quality metrics over time and alerts when a metric starts degrading, even before it crosses a threshold

Impact Scoring: Ranks issues by real business impact — cost, user-facing errors, mission completion rate — not just frequency

HOW IT WORKS

Runs continuously on your production trace stream

Generates weekly insight reports with actionable recommendations

Integrates with Alert Rules to auto-create tickets for high-severity patterns

Exportable summaries for compliance and stakeholder review

DEEP DIVE 02

AI Debug Assistant

Stop reading JSON traces manually. Ask questions in plain English — "Why did this mission fail?" — and get a structured root-cause analysis with the exact spans, tool calls, and model decisions that led to the issue.

EXAMPLE QUERIES

"Why did the agent retry three times?" — Surfaces the exact tool failure and retry logic that triggered the loop

"Show me all traces where latency exceeds 5s" — Returns filtered traces with latency breakdown per step

"What changed between v2.1 and v2.2?" — Compares trace patterns across agent versions to pinpoint behavioral differences

"Which model is causing the most hallucinations?" — Correlates eval scores with model usage across production traffic

CAPABILITIES

Natural language to trace query translation

Automatic root-cause identification with confidence scoring

Suggested fixes based on similar resolved issues

Session sharing — share a debug session with your team via link

DEEP DIVE 03

Annotation Queue

The bridge between automated evaluation and human judgment. Low-scoring traces are automatically routed to a review queue where domain experts label, correct, and feed data back into your evaluation pipeline — creating a continuous improvement loop.

WORKFLOW

Auto-Routing: Traces scoring below configurable thresholds are automatically added to the queue — no manual triage needed

Priority Scoring: Queue items ranked by business impact, uncertainty, and recency so reviewers focus on what matters most

Annotation Interface: Purpose-built UI for labeling trace quality, flagging hallucinations, and adding corrective examples

Feedback Loop: Annotations automatically update eval criteria and fine-tuning datasets, closing the quality loop

TEAM FEATURES

Role-based access — assign reviewers to specific agent types or domains

Inter-annotator agreement tracking for quality assurance

SLA tracking for annotation turnaround time

Export annotations for external training pipelines

INTEGRATIONS

Connects to your existing stack

Export traces and metrics to your existing observability infrastructure via OpenTelemetry, or use the built-in dashboards and alerts to monitor everything in one place.

OpenTelemetry Export

Native OTel support — send traces and metrics to any OTel-compatible backend

Alert Rules

Anomaly detection with PagerDuty, Webhook, and Slack integrations

Monitoring Dashboards

Pre-built cost, latency, and error rate views with configurable thresholds

APM Integration

Works alongside Datadog, New Relic, Grafana, and other APM tools

Built-In Observability Layer

Seven tools for complete agent visibility

Full-Chain Trace Viewer

Online Evaluation

Offline Eval & Regression Tests

Insights Agent

AI Debug Assistant

Annotation Queue

Prompt Playground

A/B Experiment Comparison

Cost / Latency / Error Dashboards

Insights Agent

CORE CAPABILITIES

HOW IT WORKS

AI Debug Assistant

EXAMPLE QUERIES

CAPABILITIES

Annotation Queue

WORKFLOW

TEAM FEATURES

The Observability Pipeline

Connects to your existing stack

See inside every agent mission — for free.