# How obs-unified compares (/docs/comparison)
Every claim about a third-party product on this page is anchored to a public vendor page (URL + quoted phrase). Vendor pricing and feature scope change frequently; the [obsunified.com comparison table](https://obsunified.com#compare) on the landing page footnote-references the anchors in this document, so updating this page updates the linked footnotes.
## What this page is [#what-this-page-is]
A long-form companion to the comparison table on [obsunified.com](https://obsunified.com#compare). The landing-page table is intentionally terse — each cell footnote-links into this page for the underlying claim, source URL, and a representative quoted phrase. Use this page to:
* Verify any claim that appears on the landing page.
* Read the per-vendor profile in one place.
* See what was deliberately **not** compared (Scope, below).
## Scope [#scope]
We compared **nine third-party tools** along **ten capability axes**:
* **In scope:** Datadog, Sentry, PostHog, Honeycomb, New Relic, Grafana Cloud (LGTM stack), SigNoz, Uptrace, HyperDX.
* **Capability axes:** hosting model, pricing model, distributed traces / APM, structured logs, AI / LLM observability, session replay, product analytics, alerting, cross-signal correlation, data ownership.
* **Out of scope:** infrastructure-only tools (Prometheus, Zabbix, Nagios), incident-response platforms (PagerDuty, Opsgenie), single-signal frontend tools (LogRocket, FullStory, Highlight), legacy enterprise-only suites (Splunk Observability Cloud, Dynatrace, AppDynamics, Elastic Observability). These are interesting but cover only a slice of what obs-unified covers, and including them would inflate the "—" column count without adding signal.
We deliberately excluded **performance benchmarks** (ingest throughput, query latency, agent overhead) — those depend so heavily on deployment shape that any cross-vendor number would be misleading.
## Methodology [#methodology]
1. For each vendor, the lead capabilities page or pricing page on the vendor's own domain was fetched (docs.datadoghq.com, sentry.io, posthog.com, honeycomb.io, docs.newrelic.com, grafana.com, signoz.io, uptrace.dev, hyperdx.io / clickhouse.com).
2. A factual claim was extracted with a quoted phrase that supports it.
3. Each claim got an anchor id (e.g. `src-dd-pricing`) that the landing-page table footnotes into.
4. Negative claims ("vendor does not have X") are flagged when the evidence is absence-of-marketing-page rather than an explicit denial — see vendor sections for which negatives are inferential.
5. Capability scoping is **non-judgmental**: a vendor scoring "—" on session replay isn't worse than one scoring "✓" — it just doesn't sell that capability.
## TL;DR [#tldr]
> The nine vendors split into three clusters:
>
> * **SaaS-only full-stack** (Datadog, New Relic) — comprehensive coverage, opaque cost model, no data-residency option beyond region choice.
> * **Single-signal-led** (Sentry on errors, PostHog on product analytics, Honeycomb on traces, Uptrace on traces) — strong in one area, partial elsewhere, all now growing into adjacent signals (Sentry has logs + AI agents; PostHog has logs + LLM analytics; Honeycomb has agent observability EA).
> * **OSS-first OTel-native** (Grafana, SigNoz, Uptrace, HyperDX) — self-host friendly, OTLP-native; HyperDX is the only one in this cluster that ships session replay. Product analytics is absent across all four.
>
> obs-unified sits in the **OSS-first** cluster, and is the only tool in the whole comparison that ships **session replay (rrweb), LLM observability, *and* product analytics** alongside traces/logs/metrics — i.e. the only tool here that doesn't force you to bolt on PostHog or LogRocket on the side.
## Comparison criteria [#comparison-criteria]
The ten capability axes used in the landing-page table:
1. **Hosting model** — can you run the vendor's backend in your own infrastructure, or only consume it as SaaS?
2. **Pricing model** — what dimension is the bill metered on (host, GB, event, session, user)?
3. **Traces / APM** — distributed tracing, with attention to whether OTLP is a first-class ingest path or a translation layer.
4. **Structured logs** — log ingestion, parsing, search, and explicit trace correlation.
5. **AI / LLM observability** — tracing of prompts/responses/tools/cost/eval for LLM-backed applications, as a vendor-shipped product (not a community plug-in).
6. **Session replay** — DOM-level recording of browser sessions (typically rrweb-based).
7. **Product analytics** — funnels, retention, cohorts, journey analysis over user-behavior events.
8. **Alerting** — alert rule types and primary signals the rules can fire on.
9. **Cross-signal correlation** — vendor's stated mechanism for pivoting between signal types.
10. **Data ownership** — where customer telemetry physically lives, and what residency / self-host options exist.
***
## obs-unified [#obs-unified]
Self-hosted on Cloudflare Workers + D1 + R2 (or Node + Postgres + S3 via the [storage interface](https://github.com/obs-unified/obs-unified/blob/main/rfcs/0008-storage-interface.md)). Ships traces, logs, metrics, session replay (rrweb), AI/LLM observability, product analytics, alerts, and profiles in one stack with one telemetry graph agents can traverse from user action to backend trace, logs, replay, AI cost, and CPU profile. The [Connected rail](/docs/what-to-expect#the-connected-rail) is the human-facing version of that graph. Free; you pay your own infra bill.
The rest of this page is the third-party comparison.
***
## Datadog [#datadog]
The SaaS-only full-stack incumbent.
### Hosting model [#hosting-model-]
SaaS-only. Customers pick a regional "site" (US1/US3/US5/EU1/AP1/AP2/Gov); data cannot cross sites and there is no general-purpose self-host or BYOC option for the Datadog backend itself.
* Source: [Datadog Sites](https://docs.datadoghq.com/getting_started/site/)
* > *"Datadog offers different sites throughout the world ... you cannot share data across sites."*
* Note: Datadog Observability Pipelines runs in customer infra, but only as a [VM-deployed pre-processor](https://www.datadoghq.com/architecture/op-vm-deployment/) — not a self-hostable backend.
### Pricing model [#pricing-model-]
Per-host for Infra ($15–$23/host/mo) and APM ($31–$40/host/mo); per-ingested/indexed GB for logs ($0.10/GB ingest, $1.70 per million indexed events); per-session for RUM; per-committer for code coverage; per-investigation for Bits AI SRE ($500 per 20 investigations).
* Source: [Pricing | Datadog](https://www.datadoghq.com/pricing/)
* > *"$15 Per host, per month"* (Infra Pro, annual), *"$0.10 Per ingested or scanned GB, per month"* (Log Ingestion).
### Distributed traces / APM [#distributed-traces--apm-]
Datadog APM provides distributed tracing. The default/recommended path is the Datadog Agent plus Datadog tracing SDKs; OTLP ingest is supported, but the direct OTLP traces endpoint is **still in Preview** as of May 2026 (contact-CSM gated).
* Source: [Datadog OTLP Intake Endpoint](https://docs.datadoghq.com/opentelemetry/setup/otlp_ingest/)
* > *"OTLP traces intake endpoint (in Preview): To request access for use, contact your Customer Success Manager."*
### Structured logs [#structured-logs-]
Yes. Ingestion is decoupled from indexing for cost control; trace correlation is explicit.
* Source: [Log Management](https://docs.datadoghq.com/logs/)
* > *"Connect your logs and traces to gain observability into your applications."*
### AI / LLM observability [#ai--llm-observability-]
Yes — a dedicated **LLM Observability** product traces prompts/responses/tools, tracks tokens/latency/cost, and runs evaluations including hallucination + prompt-injection + sensitive-data checks. Metered per LLM span.
* Source: [LLM Observability | Datadog](https://www.datadoghq.com/product/llm-observability/)
* > *"Trace every request across prompts, model responses, retrieval steps, and tool calls"*.
### Session replay [#session-replay-]
Yes — Session Replay ships as part of Real User Monitoring with link-out to backend traces.
* Source: [Session Replay | Datadog](https://www.datadoghq.com/product/real-user-monitoring/session-replay/)
* > *"jump from session replays to backend traces for full-stack visibility."*
### Product analytics [#product-analytics-]
Yes — **Product Analytics** is a distinct product within the Digital Experience suite (funnels, Sankey journeys, cohort/retention) sharing an SDK with RUM. (Up from "not shipped" in 2024.)
* Source: [Product Analytics | Datadog](https://www.datadoghq.com/product/product-analytics/)
* > *"a single SDK for RUM and Product Analytics"*.
### Alerting [#alerting-]
Broad catalog: Metric, Anomaly, Forecast, Outlier, Change, Log, APM, Error Tracking, RUM, Synthetic, SLO, Composite, **Watchdog** (ML-based), Audit Trail, Database, Data Observability, CI, Cloud Cost, Network/NetFlow, Service Check.
* Source: [Monitor Types](https://docs.datadoghq.com/monitors/types/)
### Cross-signal correlation [#cross-signal-correlation-]
Marketed as a core differentiator; tag-based correlation across logs/metrics/traces.
* Source: [Datadog Platform](https://www.datadoghq.com/product/)
* > *"Navigate seamlessly between logs, metrics, and request traces."*
***
## Sentry [#sentry]
Errors-first, growing into the full observability tent.
### Hosting model [#hosting-model--1]
SaaS (sentry.io) plus a Docker-Compose self-hosted distribution explicitly scoped to low-volume / proof-of-concept deployments.
* Source: [getsentry/self-hosted](https://github.com/getsentry/self-hosted)
* > *"Sentry, feature-complete and packaged up for low-volume deployments and proofs-of-concept."*
### Pricing model [#pricing-model--1]
Tier-priced base plan (Team $26/mo, Business $80/mo, Enterprise custom) with included quotas for errors / spans / replays / profiling; usage above quota is pay-as-you-go on per-unit dimensions (per-GB for logs and application metrics, per-hour for profiling, per-monitor for crons, per-alert for uptime).
* Source: [Pricing | Sentry](https://sentry.io/pricing/)
* > *"Logs +$0.50/GB additional"*, *"Continuous Profiling +$0.0315/hr"*.
### Distributed traces / APM [#distributed-traces--apm--1]
Yes — Performance Monitoring with distributed tracing. OTLP traces and logs are supported via per-project DSN-derived endpoints (open beta); **OTLP metrics are not supported**.
* Source: [OpenTelemetry Protocol (OTLP) | Sentry](https://docs.sentry.io/concepts/otlp/)
* > *"Sentry can ingest OpenTelemetry traces and logs via OTLP endpoints. ... Sentry does not support OTLP metrics at this time."*
### Structured logs [#structured-logs--1]
Yes — a structured logs product that auto-links to active traces.
* Source: [Logs | Sentry](https://docs.sentry.io/product/explore/logs/)
* > *"send text-based log information from your applications, whether frontend or backend, to Sentry"* — *"searchable, trace-connected, and viewable alongside your errors."*
### AI / LLM observability [#ai--llm-observability--1]
Yes — **AI Agent Monitoring** auto-captures agent runs / tool calls / model interactions. **Seer** is a separately-billed AI debugger that drafts root-cause analyses and autofix proposals over Sentry telemetry.
* Source (Agent Monitoring): [AI Agent Monitoring | Sentry](https://docs.sentry.io/product/insights/agents/)
* > *"automatically collect information about agent runs, tool calls, model interactions, and errors across your entire AI pipeline."*
* Source (Seer): [Seer | Sentry](https://sentry.io/product/seer/)
* > *"Sentry's debugging agent ... reads the stack trace, traces the root cause through your codebase, and drafts a fix."*
### Session replay [#session-replay--1]
Yes — Session Replay is GA for web and mobile (Android, iOS, React Native).
* Source: [Session Replay | Sentry](https://docs.sentry.io/product/explore/session-replay/)
* > *"video-like reproductions of user interactions ... browser-based applications and certain native mobile platforms, such as Android, iOS, and React Native."*
### Product analytics [#product-analytics--1]
Not offered as a product. The closest in-platform capability is Discover / Dashboards / Trace Explorer over telemetry. **Inferential negative** — Sentry's [product index](https://sentry.io/welcome/) does not list a product-analytics SKU.
### Alerting [#alerting--1]
Issue alerts on projects + Monitors, delivered via notifications, ticketing, webhooks, or integrations. Uptime monitoring and cron monitoring are billable line items in their own right.
* Source: [Alerts | Sentry](https://docs.sentry.io/product/alerts/)
* > *"An alert can send notifications, create tickets, call webhooks, or use other integrations — for issues coming from projects or Monitors."*
### Cross-signal correlation [#cross-signal-correlation--1]
The trace ID is the join key across signals; logs are explicitly positioned as "trace-connected."
* Source: [Tracing | Sentry](https://docs.sentry.io/concepts/key-terms/tracing/)
* > *"The trace ID connects all the actions that take place, starting from the moment a user performs an action on the frontend ... all the way through to the actions this triggers across your application and services."*
***
## PostHog [#posthog]
Product-analytics-led, aggressively growing into observability adjacencies.
### Hosting model [#hosting-model--2]
Cloud (US + EU) is the recommended deployment; open-source self-host is positioned for hobbyist/small deployments only, receives no commercial support, has no access to paid-plan features, and the previous Kubernetes Helm chart is **sunsetted**.
* Source: [Self-host PostHog](https://posthog.com/docs/self-host)
* > *"All paid-plan features are Cloud-only."* / *"New deployments of PostHog's paid open source product using Kubernetes are no longer supported."*
* Supporting: [Self-host disclaimer](https://posthog.com/docs/self-host/open-source/disclaimer) — *"unlikely to scale past a couple 100ks events without significant effort."*
### Pricing model [#pricing-model--2]
Per-unit usage-based per product. Product Analytics from $0.00005/event; Session Replay $0.005/web recording; LLM Analytics from $0.00006/event; Logs from $0.25/GB; Error Tracking from $0.00037/exception; Feature Flags, Surveys, Data Warehouse, Pipelines, Workflows each have their own SKU.
* Source: [PostHog pricing](https://posthog.com/pricing)
### Distributed traces / APM [#distributed-traces--apm--2]
No general-purpose APM / distributed tracing. The only "traces" PostHog ships are LLM-scoped (a collection of LLM generations and spans for a single user-LLM interaction).
* Source: [LLM Analytics Traces](https://posthog.com/docs/llm-analytics/traces)
* > *"Traces are a collection of generations and spans that capture a full interaction between a user and an LLM."*
### Structured logs [#structured-logs--2]
Yes — **Logs went GA on 2026-01-29**, OTLP-compatible (ingests via standard OpenTelemetry SDKs, no PostHog package required), priced from $0.25/GB.
* Source: [Logs | PostHog](https://posthog.com/docs/logs)
* > *"a powerful logging solution that works with the OpenTelemetry Protocol (OTLP)"*
* Supporting: [Logs GA blog](https://posthog.com/blog/logs-ga) — *"Logs is generally available, and it lives in the same place as your errors, session replays, and product data."*
### AI / LLM observability [#ai--llm-observability--2]
Yes — **LLM Analytics** captures conversations, model performance, spans, costs, latency, and traces as PostHog events.
* Source: [LLM Analytics](https://posthog.com/llm-analytics)
* > *"Track conversations, model performance, spans, costs, latency, and traces in LLM applications"* — *"all as regular PostHog events."*
### Session replay [#session-replay--2]
Yes — built on rrweb.
* Source: [Session replay ingestion](https://posthog.com/docs/how-posthog-works/recordings-ingestion)
* > *"We use \`rrweb\` to collect "snapshot data" from the browser."*
### Product analytics [#product-analytics--2]
Flagship product. Autocapture + jump-from-graph-to-recording is the headline pivot.
* Source: [Product Analytics](https://posthog.com/product-analytics)
* > *"you can jump from a graph to a session recording to visually see why something happened"* — autocapture *"tracks every click and pageview automatically."*
### Alerting [#alerting--2]
Alerts are scoped to **insights (trends only)** — fixed-threshold, relative-change, and anomaly-detection modes. Notifications via in-app, email, Slack, and webhooks. Not a general alerting engine over logs/errors/LLM-cost thresholds.
* Source: [Alerts | PostHog](https://posthog.com/docs/alerts)
* > *"Alerts enable you to monitor your insights and get notified when something important changes."* / *"alerts are supported on all trends."*
### Cross-signal correlation [#cross-signal-correlation--2]
Everything (analytics events, replays, errors, logs, LLM traces) lands on the same event store; the pitch is pivoting from a chart into the underlying session recording.
* Source: [Logs GA blog](https://posthog.com/blog/logs-ga)
* > *"Logs is generally available, and it lives in the same place as your errors, session replays, and product data."*
***
## Honeycomb [#honeycomb]
Tracing-first SaaS that pioneered the "wide events" data model.
### Hosting model [#hosting-model--3]
SaaS, plus — as of **November 19, 2025** — **Honeycomb Private Cloud**: a customer-hosted or Honeycomb-managed deployment that runs exclusively in the customer's **AWS account** (no other clouds, no on-prem).
* Source: [Honeycomb Private Cloud](https://www.honeycomb.io/platform/private-cloud)
* > *"The power of Honeycomb, in your AWS environment."*
* Supporting: [Private Cloud deployment models](https://docs.honeycomb.io/private-cloud/deployment-models)
### Pricing model [#pricing-model--3]
Event-volume based: Free up to 20M events/mo; Pro from $130/mo for up to 1.5B events/mo; Enterprise from a 10B events/year base allowance.
* Source: [Honeycomb Pricing](https://www.honeycomb.io/pricing)
### Distributed traces / APM [#distributed-traces--apm--3]
OTLP-native — accepts OTLP over gRPC, HTTP/protobuf, and HTTP/JSON.
* Source: [Send Data with OpenTelemetry](https://docs.honeycomb.io/send-data/opentelemetry/)
* > *"Honeycomb supports receiving telemetry data via OpenTelemetry's native protocol, OTLP, over gRPC, HTTP/protobuf, and HTTP/JSON."*
### Structured logs [#structured-logs--3]
No separate logs product — logs are modeled as **structured events** in the same wide-event store used for traces.
* Source: [Events, Metrics, and Logs](https://docs.honeycomb.io/get-started/basics/observability/concepts/events-metrics-logs)
* > *"a structured log can easily turn into a structured event."*
### AI / LLM observability [#ai--llm-observability--3]
Launched **2026-05-12**: **Agent Observability** (Agent Timeline, Canvas Agent, Skills, Auto-investigations) built on OpenTelemetry GenAI semantic conventions v1.40.0. As of May 2026, Agent Timeline is in **Early Access** (GA expected June 2026).
* Source: [Honeycomb Launches Agent Observability](https://www.honeycomb.io/blog/honeycomb-launches-agent-observability-full-visibility-agentic-workflows)
* > Agent Timeline connects *"every LLM call, tool invocation, agent handoff, and downstream system impact in real time."*
### Session replay [#session-replay--3]
Not offered. Their [Frontend Observability page](https://www.honeycomb.io/platform/frontend-observability) lists Core Web Vitals, errors, traces, and user journeys — but not session replay. Honeycomb has publicly [positioned against](https://www.honeycomb.io/blog/redefining-rum-comparative-gap-analysis-existing-tools) session-replay-led RUM.
### Product analytics [#product-analytics--3]
Not offered. **Inferential negative** — there is no product-analytics page on honeycomb.io; the closest [Analyze](https://www.honeycomb.io/analyze) page positions analytics around telemetry, not user behavior.
### Alerting [#alerting--3]
**Triggers** (threshold alerts on queries), **BubbleUp** (outlier dimensional analysis), Anomaly Detection. Notifications via PagerDuty, Slack, Microsoft Teams, webhooks, and email.
* Source: [Triggers](https://docs.honeycomb.io/notify/alert/triggers/)
* Supporting: [BubbleUp](https://www.honeycomb.io/platform/bubbleup)
### Cross-signal correlation [#cross-signal-correlation--3]
A single unified wide-event store: logs/traces/metrics share the same query substrate.
* Source: [Log Analytics](https://www.honeycomb.io/platform/log-analytics)
* > *"Use one unified tool to manage logs, traces, and metrics."*
### Data residency [#data-residency-]
SaaS in AWS us-east-1 (US) or AWS eu-west-1 (EU); Private Cloud in customer's AWS account across US, European, or APAC regions.
* Source: [Data Residency in Europe](https://www.honeycomb.io/blog/honeycomb-launches-data-residency-europe)
***
## New Relic [#new-relic]
The original SaaS APM, now usage-priced.
### Hosting model [#hosting-model--4]
SaaS only across two regions (US and EU); accounts are pinned to a region at creation and data cannot be moved between them. No documented self-host / BYOC.
* Source: [Choose your data center](https://docs.newrelic.com/docs/accounts/accounts-billing/account-setup/choose-your-data-center/)
* > *"Customer Data from existing New Relic accounts cannot be transferred or shared across regions."*
### Pricing model [#pricing-model--4]
Usage-based: 100 GB/mo free ingest, then $0.40/GB (Original) or $0.60/GB (Data Plus) — **plus** per-user seat fees (Core $49/user, Full Platform up to $349/user/year on Pro).
* Source: [New Relic Pricing](https://newrelic.com/pricing)
### Distributed traces / APM [#distributed-traces--apm--4]
Yes. **Native OTLP** is the recommended ingest path for OpenTelemetry data; regional endpoints (`otlp.nr-data.net` US, `otlp.eu01.nr-data.net` EU) over HTTP or gRPC.
* Source: [New Relic OTLP endpoint](https://docs.newrelic.com/docs/opentelemetry/best-practices/opentelemetry-otlp/)
* > *"New Relic supports native OTLP ingest and recommends it as the preferred method for sending OpenTelemetry data."*
### Structured logs [#structured-logs--4]
Yes. Ingestion via APM agents, infrastructure agent, Fluentd/Fluent Bit/Logstash/Kubernetes, OTel Collector, or direct HTTP Log API. Server-side Grok parsing turns unstructured strings into queryable attributes.
* Source: [Get started with log management](https://docs.newrelic.com/docs/logs/get-started/get-started-log-management/)
* Supporting: [Parsing log data](https://docs.newrelic.com/docs/logs/ui-data/parsing/)
### AI / LLM observability [#ai--llm-observability--4]
Yes — **AI Monitoring** is positioned as "APM for AI," with end-to-end visibility into latency/cost/quality across supported vendors (OpenAI, Bedrock, DeepSeek).
* Source: [Introduction to AI monitoring](https://docs.newrelic.com/docs/ai-monitoring/intro-to-ai-monitoring/)
* > *"AI monitoring is our solution for application monitoring (APM) for AI. ... end-to-end visibility into performance, cost, and quality of supported models."*
### Session replay [#session-replay--4]
Yes — a Pro / Pro+SPA browser-agent feature (agent v1.260.0+); DOM-based (not screen video); 8-day retention; PII masked by default.
* Source: [Session replay | New Relic](https://docs.newrelic.com/docs/browser/browser-monitoring/browser-pro-features/session-replay/get-started/)
### Product analytics [#product-analytics--4]
No standalone product-analytics SKU. Behavioral capabilities (Sankey navigation paths, drop-offs, engagement) are folded into [Browser Monitoring](https://newrelic.com/platform/browser-monitoring) + ad-hoc NRQL.
### Alerting [#alerting--4]
NRQL-based conditions are the recommended path; conditions are organized within policies.
* Source: [Create NRQL alert conditions](https://docs.newrelic.com/docs/alerts/create-alert/create-alert-condition/create-nrql-alert-conditions/)
* > *"We recommend creating an alert using a NRQL alert condition."*
### Cross-signal correlation [#cross-signal-correlation--4]
**NRQL** is the SQL-like query language that spans every telemetry type (events, metric timeslices, dimensional metrics, spans, logs).
* Source: [Introduction to NRQL](https://docs.newrelic.com/docs/nrql/get-started/introduction-nrql-new-relics-query-language/)
* > *"The New Relic Query Language (NRQL) is a powerful tool you can use to query and understand nearly any type of data."*
***
## Grafana Cloud (LGTM stack) [#grafana-cloud-lgtm-stack]
Multi-product OSS-led platform: Grafana + **L**oki + **T**empo + **M**imir + Pyroscope + Faro + Alloy + Beyla + OnCall + k6.
### Hosting model [#hosting-model--5]
Grafana Cloud (SaaS) or self-host any of the LGTM components as open source. All ten OSS projects are first-party Grafana Labs maintained.
* Source: [Grafana Open Source Projects](https://grafana.com/oss/)
* Supporting (self-hosting): each project has its own OSS repo under `github.com/grafana/{loki,tempo,mimir,pyroscope,faro,alloy,beyla}`.
### Pricing model [#pricing-model--5]
Free tier + Pro from $19/mo + usage-based per-unit charges: Metrics $6.50 per 1k series; Logs/Traces/Profiles $0.05/GB process + $0.40/GB write + $0.10/GB retain; Frontend Observability $0.75 per 1k sessions; Grafana Assistant $20/active AI user. Enterprise from $25k/year.
* Source: [Grafana Cloud Pricing](https://grafana.com/pricing/)
### Distributed traces / APM [#distributed-traces--apm--5]
**Tempo** for traces; Grafana Cloud accepts OTLP for metrics, logs, and traces.
* Source: [Grafana Cloud OTLP](https://grafana.com/docs/grafana-cloud/send-data/otlp/)
* > *"Grafana Labs supports the ingestion of metrics, logs, and traces through OTLP into Grafana Cloud."*
### Structured logs [#structured-logs--5]
**Loki** — "a horizontally scalable, highly available, multi-tenant log aggregation system using the same powerful data model as Prometheus."
* Source: [Grafana Open Source](https://grafana.com/oss/) → Loki
### AI / LLM observability [#ai--llm-observability--5]
**Inferential negative.** Grafana Cloud ships **Grafana Assistant** ($20/active AI user) — but Assistant is a GenAI-powered helper for navigating Grafana itself, not an LLM observability product for tracing customers' LLM applications. We could not find a dedicated Grafana LLM-observability product page; mark as "Assistant only."
* Source (for Assistant pricing existence): [Grafana Cloud Pricing](https://grafana.com/pricing/)
### Session replay [#session-replay--5]
Not offered. **Faro** (Frontend Observability) captures Core Web Vitals, errors, logs, and client-side traces — but not session replay.
* Source: [Frontend Observability](https://grafana.com/docs/grafana-cloud/monitor-applications/frontend-observability/)
* > *"automatically captures real user performance metrics, errors, logs, and client-side traces."* (No mention of session replay.)
### Product analytics [#product-analytics--5]
Not offered. **Inferential negative** — no product on the Grafana platform page targets funnels / retention / behavior.
### Alerting [#alerting--5]
**Grafana Alerting** — unified alert rules across data sources (metrics, logs, multi-dimensional).
* Source: [Alerting overview](https://grafana.com/docs/grafana-cloud/alerting-and-irm/alerting/)
### Cross-signal correlation [#cross-signal-correlation--5]
Per-data-source plumbing — trace-to-logs / metrics-to-traces (exemplars) / split-pane views in Grafana Explore. Correlation is configured per-data-source rather than implicit in a single store.
* Source: [Grafana Open Source](https://grafana.com/oss/) (platform-level framing)
***
## SigNoz [#signoz]
OTel-native OSS alternative.
### Hosting model [#hosting-model--6]
SigNoz Cloud (SaaS) and self-hosted open-source — mix-and-match supported.
* Source: [SigNoz/signoz on GitHub](https://github.com/SigNoz/signoz)
* > *"Open-Source - you can use open-source, our cloud service or a mix of both based on your use case."*
### Pricing model [#pricing-model--6]
Pay-as-you-go: $0.30/GB ingested for traces and logs; $0.10 per million samples for metrics. Selectable retention (15 days–1 year for traces/logs; 1–13 months for metrics).
* Source: [SigNoz Pricing](https://signoz.io/pricing/)
* > *"$0.3/GB ingested"* (traces and logs); *"$0.1/mil samples"* (metrics).
### Distributed traces / APM [#distributed-traces--apm--6]
OTLP-native — core pitch. Built directly on the OpenTelemetry Collector.
* Source: [Distributed Tracing | SigNoz](https://signoz.io/distributed-tracing/)
* > *"Auto-instrument your applications with OpenTelemetry across all major languages and frameworks."*
### Structured logs [#structured-logs--6]
Yes — Logs Explorer with attribute filters, multiple view modes, aggregation operators.
* Source: [Logs | SigNoz](https://signoz.io/docs/userguide/logs/)
### AI / LLM observability [#ai--llm-observability--6]
Yes — dedicated **LLM Observability** surface tracing agent workflows, token usage, cost; auto-instruments OpenAI, Anthropic, Bedrock, LangChain, LlamaIndex, CrewAI.
* Source: [LLM Observability | SigNoz](https://signoz.io/llm-observability/)
### Session replay [#session-replay--6]
Not offered. Maintainers have publicly stated it is not on the roadmap.
* Source: [Discussion #3846](https://github.com/SigNoz/signoz/discussions/3846)
* > *"As of now, this not in our near term roadmap (next 3-4 months)."*
### Product analytics [#product-analytics--6]
Not offered. **Inferential negative** — no product-analytics surface on signoz.io.
### Alerting [#alerting--6]
Five alert types: metric-based, log-based, trace-based, exceptions-based, anomaly-based.
* Source: [Alerts Management](https://signoz.io/docs/userguide/alerts-management/)
### Cross-signal correlation [#cross-signal-correlation--6]
OTel SDKs auto-inject `trace_id` + `span_id` into log records; UI exposes "Go to related logs" from a trace.
* Source: [Correlate Traces and Logs](https://signoz.io/docs/traces-management/guides/correlate-traces-and-logs/)
### Data ownership [#data-ownership-]
Self-hosted: ClickHouse cluster behind a SigNoz-flavored OTel Collector. Cloud: US, EU, and India regions.
* Source: [Architecture | SigNoz](https://signoz.io/docs/architecture/)
***
## Uptrace [#uptrace]
OTel + ClickHouse APM, OSS-first.
### Hosting model [#hosting-model--7]
Self-host (open-source, AGPL-3.0) **or** Uptrace Cloud. ClickHouse for telemetry + PostgreSQL for metadata.
* Source: [uptrace/uptrace on GitHub](https://github.com/uptrace/uptrace)
* > *"Uptrace uses OpenTelemetry framework to collect data and ClickHouse database to store it. It also requires PostgreSQL database to store metadata such as metric names and alerts."*
### Pricing model [#pricing-model--7]
Per-GB ingest: traces $0.10/GB, logs $0.10/GB, metrics $0.025 per million datapoints. 50 GB/mo free; 28-day default retention. **No per-seat fees.**
* Source: [Uptrace Pricing](https://uptrace.dev/pricing)
* > *"50 GB of traces, logs, and metrics free every month."*
### Distributed traces / APM [#distributed-traces--apm--7]
OTLP-native. (The internal [obs-unified Uptrace migration doc](https://github.com/obs-unified/obs-unified/blob/main/docs/comparison/uptrace.md) notes Uptrace supports OTLP/gRPC and OTLP/HTTP; obs-unified is OTLP/HTTP-only as of this writing.)
* Source: [Uptrace product](https://uptrace.dev/)
* > *"OpenTelemetry-native observability platform"* — *"Uptrace unifies traces, metrics, and logs in a single platform."*
### Structured logs [#structured-logs--7]
Yes. Logs ingested via OTel; integrated with trace context.
* Source: [Uptrace product page](https://uptrace.dev/) — *"unifies traces, metrics, and logs."*
### AI / LLM observability [#ai--llm-observability--7]
Not offered. **Inferential negative** — no LLM observability product is listed on uptrace.dev as of May 2026.
### Session replay [#session-replay--7]
Not offered.
### Product analytics [#product-analytics--7]
Not offered.
### Alerting [#alerting--7]
**Metric monitors** and **Error monitors**. Notification channels: email, Slack, Mattermost, Telegram, Microsoft Teams, PagerDuty, Opsgenie, AlertManager, webhooks.
* Source: [Uptrace Alerting](https://uptrace.dev/features/alerting)
### Cross-signal correlation [#cross-signal-correlation--7]
Traces, metrics, logs in one ClickHouse-backed store, queried via UQL.
* Source: [Uptrace product](https://uptrace.dev/)
***
## HyperDX [#hyperdx]
ClickHouse-backed OSS observability; acquired by ClickHouse Inc. in March 2025 and now also offered as a managed component inside ClickHouse Cloud ("ClickStack").
### Hosting model [#hosting-model--8]
Three paths: (a) self-hosted OSS on your own ClickHouse cluster (MIT-licensed), (b) standalone HyperDX Cloud at hyperdx.io, (c) managed inside ClickHouse Cloud as part of ClickStack. The OSS project remains actively maintained after the acquisition.
* Source: [ClickHouse acquires HyperDX](https://clickhouse.com/blog/clickhouse-acquires-hyperdx-the-future-of-open-source-observability)
* > *"HyperDX Cloud will continue serving and onboarding new customers"* and *"the open-source project remains actively maintained and developed."*
* Supporting: [hyperdxio/hyperdx](https://github.com/hyperdxio/hyperdx) (MIT licensed); [ClickStack in ClickHouse Cloud](https://clickhouse.com/blog/announcing-clickstack-in-clickhouse-cloud) (2025-08-06).
### Pricing model [#pricing-model--8]
Three tiers: Free ($0/mo, 3 GB/mo, 3-day retention, 1 user); Starter ($20/mo flat, 50 GB/mo included, 30-day retention, unlimited users, $0.40/GB overage, $0.40 per 100 DPM); Enterprise (custom, adds SAML SSO).
* Source: [HyperDX Pricing](https://www.hyperdx.io/pricing)
* > *"Includes 50 GB/mo," "$0.40 per additional 1 GB," "$0.40 per 100 metrics (1 DPM)," "Unlimited Users, Flat Rate."*
### Distributed traces / APM [#distributed-traces--apm--8]
OpenTelemetry-native — accepts OTLP over both HTTP (`https://in-otel.hyperdx.io`) and gRPC (`in-otel.hyperdx.io:4317`) for traces, logs, and metrics.
* Source: [OpenTelemetry | HyperDX Docs](https://www.hyperdx.io/docs/install/opentelemetry)
* > *"HyperDX accepts telemetry directly from OpenTelemetry code instrumentation or collectors."*
### Structured logs [#structured-logs--8]
Yes — full-text search, native JSON parsing, live tail, and a "Log Patterns" feature that clusters related logs, all on top of ClickHouse.
* Source: [hyperdxio/hyperdx](https://github.com/hyperdxio/hyperdx)
* > *"An open source observability platform unifying session replays, logs, metrics, traces and errors powered by ClickHouse and OpenTelemetry."*
### AI / LLM observability [#ai--llm-observability--8]
No first-party LLM observability product. HyperDX is a documented OTLP destination for **OpenLLMetry** (Traceloop), **OpenLIT**, and **Mirascope** — so you get LLM tracing/token-cost via OTel-based instrumentation rather than a vendor-shipped LLM product.
* Source: [LLM Observability with HyperDX and OpenLLMetry](https://www.traceloop.com/docs/openllmetry/integrations/hyperdx)
* > HyperDX is *"an open source observability platform that natively supports OpenTelemetry"*; integration sets `TRACELOOP_BASE_URL=https://in-otel.hyperdx.io`.
### Session replay [#session-replay--8]
Yes — browser-side session replay via the HyperDX OTel browser SDK, **automatically linked** to the corresponding logs and traces. (The session-replay engine is not named on hyperdx.io pages; reported as rrweb-based elsewhere, but flagged here as "industry-standard browser session replay" rather than asserted.)
* Source: [HyperDX](https://www.hyperdx.io/)
* > *"Automatically link session replays with backend logs and traces"*; *"Unify Session Replays, Logs, Traces, Metrics and Errors."*
### Product analytics [#product-analytics--8]
Not offered. **Confirmed absent** — no funnels, retention, or user-journey analytics on the docs nav, marketing site, or pricing page.
* Source: [HyperDX Docs](https://www.hyperdx.io/docs)
### Alerting [#alerting--8]
Search-based and dashboard-chart-based alerts; threshold + duration + check-interval configurable (1m–1d). Notifications via Slack, Email, PagerDuty, or Slack Webhook.
* Source: [Alerts | HyperDX Docs](https://www.hyperdx.io/docs/alerts)
* > *"Set the threshold, duration, and notification method for the alert (Slack, Email, PagerDuty or Slack Webhook)."*
### Cross-signal correlation [#cross-signal-correlation--8]
**Auto-linking across signals** is the marketed differentiator — session replays, frontend events, backend traces, and logs share IDs so users one-click pivot between them.
* Source: [HyperDX](https://www.hyperdx.io/)
* > *"Trace every request from a user's browser and phone to your backend servers and async workers, automatically."*
### Data ownership [#data-ownership--1]
Self-hosted: data lives entirely in your own ClickHouse cluster. Cloud: hosted in the United States; **no published EU region** as of this review.
* Source: [DeploySentinel DPA](https://www.hyperdx.io/terms/dpa)
* > *"DeploySentinel (HyperDX) will host and process Customer Personal Data in the United States."*
* Supporting: [OSS vs Cloud](https://www.hyperdx.io/docs/oss-vs-cloud)
***
## Discussion [#discussion]
This section is editorial (analysis, not vendor-cited facts). Where it points at vendor capabilities, the relevant anchor is linked back to the cited claim above.
### Storage architecture as a cost-and-flexibility lever [#storage-architecture-as-a-cost-and-flexibility-lever]
The vendors split along their primary storage backend, and that split shows up in pricing and operational shape:
* **ClickHouse-backed open-source platforms** ([SigNoz](#src-sn-residency), [Uptrace](#src-up-hosting), [HyperDX](#src-hx-hosting)) inherit columnar compression and high-cardinality query performance from ClickHouse, which is what lets their per-GB ingest pricing ([SigNoz $0.30/GB](#src-sn-pricing), [Uptrace $0.10/GB](#src-up-pricing)) undercut per-host SaaS by an order of magnitude.
* **Proprietary backends** ([Datadog](#src-dd-pricing), [New Relic](#src-nr-pricing)) meter on dimensions that map to their internal cost model (per host, per indexed event, per GB ingest + per user) and bill at higher per-unit rates.
* **Wide-event store** ([Honeycomb](#src-hc-pricing)) is a third shape — single event store with event-volume pricing.
* **PostHog** uses ClickHouse internally for events but exposes per-product SKUs ([per event, per recording, per LLM span](#src-ph-pricing)) rather than per-GB; storage shape and pricing shape are decoupled here.
* **obs-unified** deliberately does **not** use ClickHouse. The default backend is **SQLite via Cloudflare D1**, with a Postgres adapter via the [storage interface](https://github.com/obs-unified/obs-unified/blob/main/rfcs/0008-storage-interface.md). The trade-off: D1 caps at \~100M hot rows per project (then archive sweep or move to Postgres), in exchange for zero operational cost on the Workers tier and a single-image local deploy.
The right backend depends on what you're optimizing for. ClickHouse pays off for high-cardinality search across billions of rows. SQLite/D1 pays off for projects whose hot-data ceiling is bounded and who want a single-binary or single-Worker deploy. No one shape is universally better.
### Two product shapes: suites vs. graphs [#two-product-shapes-suites-vs-graphs]
The vendors fall into two product shapes:
* **Multi-product suites** ([Datadog](#src-dd-correlation), [Sentry](#src-se-correlation), [PostHog](#src-ph-correlation), [New Relic](#src-nr-correlation), [Grafana Cloud](#src-gr-correlation)) — each signal has its own product UI and SKU under a common brand. Correlation works *within* the suite (Datadog by tag, Sentry by trace\_id, PostHog by event store, New Relic by NRQL, Grafana per-data-source). Strong on breadth, integration ecosystem, and enterprise controls.
* **Unified telemetry graphs** ([SigNoz](#src-sn-correlation), [HyperDX](#src-hx-correlation), [Uptrace](#src-up-correlation), [Honeycomb](#src-hc-correlation), obs-unified) — all signals are nodes in one store and one UI. Pivots are first-class because there's nothing to pivot *between*. Stronger on cross-signal flow at the cost of fewer per-product surface features.
Neither shape is wrong. Big organizations with established team boundaries (frontend RUM team vs. SRE vs. data team) often map cleanly to a suite. Smaller teams that own a vertical slice (backend → frontend → AI) often get more from a graph.
### What's actually convergent [#whats-actually-convergent]
A real trend across all nine vendors over the past 18 months: every one of them has added either LLM observability or structured logs (or both) to their offering. The vendor lines are blurring:
* **Sentry** shipped Structured Logs ([cited above](#src-se-logs)) and AI Agent Monitoring + Seer ([cited](#src-se-llm)).
* **PostHog** shipped Logs (GA 2026-01-29, [cited](#src-ph-logs)) and LLM Analytics ([cited](#src-ph-llm)).
* **Honeycomb** shipped Agent Observability in Early Access (2026-05-12, [cited](#src-hc-llm)).
* **Datadog** shipped Product Analytics ([cited](#src-dd-pa)) and LLM Observability ([cited](#src-dd-llm)).
* **SigNoz**, **New Relic**, **Grafana Cloud**, and **HyperDX** all shipped LLM observability or LLM-instrumentation paths in the same window.
obs-unified's positioning is the bet that the *graph shape* (one identity chain, one store, one agent-traversable telemetry graph) is the right organizing structure for that convergence — not that any specific vendor is wrong. Most of the vendors above will persist; the design hypothesis is just that convergence rewards graph shape over suite shape, and that "we own our data plane" is a multiplier on that.
### Buyer guidance [#buyer-guidance]
The factual table at the top is the answer to "what does each vendor do." This is the answer to "which one should I pick":
* If you need **enterprise controls, a wide integration ecosystem, and quick deployment** with a per-host cost model your finance team has already approved → **Datadog** or **New Relic**.
* If you need **errors + replay + traces** but don't want to think about infra → **Sentry**.
* If you need **product analytics + replay** and observability is secondary → **PostHog**.
* If you need **deep trace analysis** and you'll build the rest yourself → **Honeycomb**.
* If you want **OSS, OTel-native, and ClickHouse-shaped pricing** → **SigNoz**, **HyperDX**, or **Uptrace** (mostly differ on UX taste; HyperDX is the only one with session replay; Uptrace is the only one with native OTLP/gRPC; SigNoz has the most alert types).
* If you want a **fully composable LGTM stack** you can scale independently → **Grafana Cloud** (or self-host the components).
* If you want **everything-in-one (traces, logs, metrics, replay, LLM, analytics) self-hosted on your infra and built for agentic debugging** → **obs-unified**, with the trade-offs in the next section.
***
## Where obs-unified fits [#where-obs-unified-fits]
obs-unified is the only tool in this comparison that is **simultaneously self-hostable, OTLP-native, and covers session replay + LLM observability + product analytics in one stack**. The closest neighbors are:
* **HyperDX** — closest in *graph shape*: OSS, OTel-native, auto-linked replay + logs + traces. Differs in backend (ClickHouse vs. SQLite/D1), in LLM coverage (HyperDX inherits via OpenLLMetry rather than shipping LLM eval first-party), and in product analytics (absent).
* **PostHog** — same multi-signal breadth (now that they ship logs + LLM analytics), but Cloud-only for paid features; primary scaffolding is product-analytics-shaped, not trace-shaped.
* **SigNoz** — same OSS + OTel-native posture, but no session replay, no product analytics, no LLM cost tracking with eval (LLM observability is trace-shaped only).
* **Grafana Cloud** — same OSS-friendly multi-component posture, but no session replay, no product analytics, no LLM observability for customer apps.
The trade-offs working against obs-unified versus this set:
* **Maturity** — early. Production deployments are limited; storage scale is bounded by the SqlDb adapter you pick (D1 caps at \~100M hot rows; Postgres is fine to billions).
* **OTLP/gRPC** — not yet (OTLP/HTTP only as of this writing). Migrate gRPC exporters to HTTP or front with an OTel Collector.
* **SSO / RBAC** — single-password auth today (see [Auth & multi-tenancy gap](https://github.com/obs-unified/obs-unified/blob/main/docs/comparison/uptrace.md#auth-multi-tenancy-governance) in the Uptrace comparison).
* **Free-form dashboards / query language** — Analyses are LLM-narrative-shaped, not panel-shaped. There's no PromQL/NRQL/UQL equivalent.
* **Generic infra metrics** — the Resources dashboard is Cloudflare-shaped first, with Linux-host mode added via the OTel `hostmetricsreceiver`. Non-Linux non-Cloudflare deployments have less curated coverage.
These are the right axes to weigh obs-unified on. If any of the above are blockers, one of the other eight tools above is likely the right pick.
## Refresh schedule [#refresh-schedule]
Last full review: **2026-05-19** (this document). Next scheduled review: **2026-08-19** (quarterly). On each refresh, every cited URL is re-fetched and updated; any claim where the cited evidence has changed is updated or removed.
# Examples (/docs/examples)
Use this page to pick the right starting point. Runnable examples are things you can clone, scaffold, or run locally. Reference examples are docs and snippets to copy into an existing app.
## Fastest paths [#fastest-paths]
| Goal | Start here | Type |
| ----------------------------------------------- | ----------------------------------------------------------- | ----------------- |
| First run from a fresh checkout | [Getting started](/docs/getting-started) | Guide |
| Run everything from one local image | `Dockerfile.local` via `pnpm local:image && pnpm local:run` | Runnable image |
| Try obs-unified with realistic traffic | `demo/` | Runnable demo |
| Scaffold a new React + Hono app | `obs-unified create`, choose React + Vite + Hono | Runnable template |
| Add obs-unified to an existing React + Hono app | [React + Hono walkthrough](/docs/instrumenting) | Walkthrough |
| Add obs-unified to an existing Python Flask app | [Python Flask walkthrough](/docs/instrument-python-flask) | Walkthrough |
| Add browser analytics only | [Analytics SDK](/docs/sdks#obs-unifiedanalytics-sdk) | Reference |
| Add TypeScript backend telemetry | [Telemetry SDK](/docs/sdks#obs-unifiedtelemetry-sdk) | Reference |
| Instrument Python, JVM, or .NET | [Language recipes](#language-recipes) | Recipes |
## Runnable examples [#runnable-examples]
| Example | What it shows | Run / entry point |
| -------------------------------------- | --------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------- |
| `Dockerfile.local` | All-in-one local image with Postgres, collector, dashboard, file blob storage, and seed data. | `pnpm local:image && pnpm local:run` |
| `packages/cli/templates/react-vite/` | React + Vite frontend, Hono Node API, `AnalyticsProvider`, backend spans, and click-to-trace propagation. | `obs-unified create my-app`, choose React + Vite + Hono |
| `packages/cli/templates/vanilla-ts/` | Browser-only Vite + TypeScript analytics. | `obs-unified create my-app`, choose Vanilla TypeScript |
| `packages/cli/templates/hono-workers/` | Hono on Cloudflare Workers with backend telemetry wiring. | `obs-unified create my-api`, choose Hono on Workers |
| `apps/obs-demo/` | AI calls, RAG, tool calls, session tracking, and evaluation scenarios. | `pnpm dev`, then `curl http://127.0.0.1:8787/api/demo/run-all` |
| `apps/collector-node/` | Standalone Node collector with Postgres + MinIO. | `docker compose up -d` from `apps/collector-node` |
| `demo/` | OpenTelemetry Astronomy Shop feeding obs-unified with polyglot microservice traffic. | `pnpm demo:setup`, `pnpm demo:preflight`, `pnpm demo:up` |
## SDK examples [#sdk-examples]
| Runtime | Example | Notes |
| -------------------- | -------------------------------- | -------------------------------- |
| Node.js / TypeScript | `sdks/node/examples/basic.ts` | First-party Node SDK usage |
| Node.js / TypeScript | `sdks/node/examples/smoke.mjs` | Lightweight SDK smoke path |
| Go | `sdks/go/examples/basic/main.go` | Go SDK init and span conventions |
| Rust | `sdks/rust/examples/basic.rs` | Rust SDK init and helper usage |
## Instrumentation guides [#instrumentation-guides]
| App shape | Guide |
| ------------------------------ | ------------------------------------------------------------- |
| React + Hono | [Instrumenting your app](/docs/instrumenting) |
| Python + Flask | [Python Flask instrumentation](/docs/instrument-python-flask) |
| Browser / React analytics | [Analytics SDK](/docs/sdks#obs-unifiedanalytics-sdk) |
| TypeScript backend / Workers | [Telemetry SDK](/docs/sdks#obs-unifiedtelemetry-sdk) |
| Deeper backend instrumentation | [SDK API reference](/docs/sdk-reference) |
| Profiling / pprof | `docs/howto/profiling.md` in the repo |
| eBPF / host metrics | `docs/howto/ebpf.md` in the repo |
## Language recipes [#language-recipes]
| Runtime | Recipe |
| ----------------------------- | -------------------------- |
| Python | `docs/recipes/python.md` |
| JVM / Java / Kotlin | `docs/recipes/jvm.md` |
| .NET | `docs/recipes/dotnet.md` |
| Go | `sdks/go/README.md` |
| Rust | `sdks/rust/README.md` |
| SDK skeleton for contributors | `sdks/_template/README.md` |
## Migration examples [#migration-examples]
| Source | Guide |
| -------------------------- | -------------------------------- |
| Sentry | `docs/migrate/from-sentry.md` |
| PostHog | `docs/migrate/from-posthog.md` |
| Honeycomb | `docs/migrate/from-honeycomb.md` |
| Old `@obs/*` package scope | `docs/migrate/from-obs-scope.md` |
## Verification [#verification]
After wiring any example, verify the collector and browser CORS path:
```bash
obs-unified doctor http://localhost:8790 --origin http://localhost:5173
```
For browser examples, the origin should match the app you are testing. For the Astronomy Shop demo, use `http://localhost:8080`.
# Getting started (/docs/getting-started)
This guide gets you from a fresh checkout to unified observability in the dashboard. The goal is not just to send a trace or a log; it is to see multiple signal types connected through the same collector and identity chain.
There are two decisions:
1. **How do you want to run obs-unified?** Use the all-in-one Docker image, or install and run the repo locally.
2. **What data do you want to look at?** Use seeded sample data, the Astronomy Shop demo, or telemetry from your own app.
## Choose how to run [#choose-how-to-run]
| Runtime path | Use when | Starts |
| ---------------------------- | ---------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------- |
| **Option 1 — Docker image** | You want the quickest first run with the fewest host dependencies. | Postgres, collector, dashboard, filesystem blob store, and sample data in one container. |
| **Option 2 — local install** | You want to edit code, run dev servers, or inspect internals while using the repo. | Local collector, demo API, and Vite dashboard from the workspace. |
## Choose what to observe [#choose-what-to-observe]
| Data path | Use when | Start here |
| ----------------------- | -------------------------------------------------------------- | --------------------------------------------------------------- |
| **Seeded sample data** | You want populated dashboards immediately. | Built into Option 1, or run `pnpm run seed` with Option 2. |
| **Astronomy Shop demo** | You want realistic microservice traffic and service-map edges. | Run the OpenTelemetry demo against your local collector. |
| **Your own app** | You want to validate obs-unified against a real application. | Add SDKs or OpenTelemetry exporters pointing at your collector. |
## Prerequisites [#prerequisites]
* **Docker**: required for the Docker image and Astronomy Shop demo.
* **Node.js 22+ and pnpm 10+**: required for local install and repo scripts.
## Option 1 — Docker image [#option-1--docker-image]
This packages the full local stack into one container. It is the lowest-friction first run.
Build the image locally:
```bash
pnpm local:image
```
Start the container:
```bash
pnpm local:run
```
Or use Docker directly:
```bash
docker build -f Dockerfile.local -t obs-unified/local:dev .
docker run --rm -p 5173:5173 -p 8790:8790 obs-unified/local:dev
```
Access the interfaces:
| Interface | URL |
| ---------------- | ----------------------- |
| Dashboard | `http://localhost:5173` |
| Collector ingest | `http://localhost:8790` |
Default local credentials:
| Credential | Value |
| ------------------ | ---------------- |
| Ingest write key | `dev-ingest-key` |
| Dashboard password | `e2e-test-pass` |
To persist local database and blob storage state across container restarts:
```bash
docker volume create obs-unified-local-db
docker volume create obs-unified-local-blobs
docker run --rm \
-p 5173:5173 \
-p 8790:8790 \
-v obs-unified-local-db:/var/lib/postgresql \
-v obs-unified-local-blobs:/data \
obs-unified/local:dev
```
Verify the full first-run path:
```bash
pnpm smoke:local-image
```
The smoke test builds the image, boots a fresh container, seeds data, and verifies collector health, dashboard HTML, and login from outside Docker.
## Option 2 — local install [#option-2--local-install]
Use this when you are modifying dashboard code, collector code, or SDK packages.
```bash
git clone https://github.com/obs-unified/obs-unified.git
cd obs-unified
pnpm install
pnpm run setup
pnpm run dev
```
The local dev stack exposes:
| Service | URL |
| --------- | ----------------------- |
| Dashboard | `http://localhost:5173` |
| Demo API | `http://localhost:8787` |
| Collector | `http://localhost:8790` |
## Seeded sample data [#seeded-sample-data]
Use this when you want populated dashboards immediately.
* Docker image: sample data is seeded automatically on startup.
* Local install: run the seeder in a second terminal.
```bash
pnpm run seed
```
Expected result: Traces, Logs, AI Calls, Usage, and Issues show sample data. Replays require browser interaction, so open the dashboard's Playground tab and trigger a replay-producing interaction once.
## Astronomy Shop demo [#astronomy-shop-demo]
Use this when you want realistic microservice traffic and service-map edges. It runs the official OpenTelemetry Astronomy Shop demo and points its exporters at your local obs-unified collector.
This path assumes the collector is already running. For local development, start it with:
```bash
pnpm dev:collector
```
Prepare the upstream demo services:
```bash
pnpm demo:setup
pnpm demo:preflight
```
Launch the microservices:
```bash
pnpm demo:up
```
Access URLs:
| Interface | URL |
| ------------------ | ----------------------- |
| Dashboard | `http://localhost:5173` |
| Shop web interface | `http://localhost:8080` |
The load generator takes roughly 30 seconds to begin driving traffic. Once running, Traces, Service Maps, Issues, Logs, and Metrics populate dynamically from active microservice calls.
Stop and remove the compose stack:
```bash
pnpm demo:down
```
## Your own app [#your-own-app]
This is a data path that sends telemetry to whichever collector you started above.
| Target | Setup guide |
| ------------------------------ | -------------------------------------------------------------- |
| React/Vite frontend + Hono API | [React + Hono walkthrough](/docs/instrumenting) |
| Python Flask API | [Python Flask walkthrough](/docs/instrument-python-flask) |
| Browser-only application | [Analytics SDK reference](/docs/sdks#obs-unifiedanalytics-sdk) |
| TypeScript backend | [Telemetry SDK reference](/docs/sdks#obs-unifiedtelemetry-sdk) |
| Polyglot recipes | [Examples](/docs/examples#language-recipes) |
The common wiring is:
1. Deploy or run an accessible collector endpoint.
2. Install the appropriate client or server SDK package.
3. Configure `OBS_COLLECTOR_URL` and a write-only ingest key.
4. Ensure API CORS policies allow `x-obs-interaction`.
5. Verify connectivity:
```bash
obs-unified doctor http://localhost:8790 --origin http://localhost:5173
```
## Standalone Node collector variant [#standalone-node-collector-variant]
The all-in-one Docker image is the recommended first-run Docker path. If you only want the collector service backed by Postgres and MinIO/S3, use the standalone Node collector:
```bash
cd apps/collector-node
docker compose up -d
docker compose logs -f collector
```
# Welcome (/docs)
obs-unified is unified observability for every signal, built for agentic debugging. Traces, logs, AI calls, usage events, replays, alerts, profiles, and analyses flow into **one collector**, share **one identity chain**, and appear in **one telemetry graph** agents can traverse from user action to backend trace, logs, replay, AI cost, and CPU profile.
Self-hosting is the deployment model, not the whole pitch: the default hosted path runs on Cloudflare Workers with D1 and R2, while the Node collector path uses Postgres and S3-compatible blob storage.
## What's in this documentation [#whats-in-this-documentation]
* [Getting started](/docs/getting-started) — choose Docker or local dev, then choose seeded data, Astronomy Shop, or your own app.
* [Installation](/docs/installation) — the shortest current install path for the all-in-one local image and the editable local repo.
* [Examples](/docs/examples) — runnable demos, scaffold templates, SDK examples, recipes, and migration guides.
* [SDKs](/docs/sdks) — `@obs-unified/analytics-sdk` (browser, click + interaction propagation), `@obs-unified/telemetry-sdk` (Node/Workers), plus Go and Rust.
* [SDK API reference](/docs/sdk-reference) — compact method map for init, interaction stamping, LLM/tool spans, and Cloudflare wrappers.
* [Instrumenting](/docs/instrumenting) — concrete React + Worker examples that produce signals correlated end-to-end.
* [Python Flask instrumentation](/docs/instrument-python-flask) — OpenTelemetry setup for a Flask service.
* [What to expect](/docs/what-to-expect) — the click-to-CPU scenarios the platform was built around, with the rail's role at every hop.
* [Production operations](/docs/ops/production) — reverse proxy, Postgres tuning, storage retention, and Kubernetes notes.
## The unified-stack promise [#the-unified-stack-promise]
```text
user_id → session_id → interaction_id → trace_id → span_id
```
That identity skeleton is the unifying layer. The SDKs propagate it, the collector stores it, and the dashboard plus agent-readable docs use it to connect product behavior, backend execution, AI cost, logs, replay, alerts, profiles, and analyses. Read the [SDKs](/docs/sdks) page for which ID is minted where, and [What to expect](/docs/what-to-expect) for the journeys it makes possible.
The click-to-CPU path is deliberately precise: the browser mints `interaction_id`, the backend stamps it onto spans/logs/AI calls, and profiles join through the `trace_id` those spans belong to. In shorthand:
```text
frontend action → interaction_id → trace_id → profile_trace_index → CPU/off-CPU profile
```
This documentation describes the platform as it ships on `main` today. For the fastest first run, use the all-in-one Docker image from [Getting started](/docs/getting-started).
# Installation (/docs/installation)
This page is the shortest install path. For a guided decision tree across runtime and data options, start with [Getting started](/docs/getting-started). If you only want to instrument a service with an SDK, jump to [SDKs](/docs/sdks).
## Prerequisites [#prerequisites]
* **Docker** for the all-in-one local image and the Astronomy Shop demo.
* **Node.js 22+ and pnpm 10+** for local repo development and scripts.
* A POSIX shell such as bash or zsh.
## Fastest first run: Docker image [#fastest-first-run-docker-image]
Build and run the local image:
```bash
pnpm local:image
pnpm local:run
```
This starts Postgres, the collector, the dashboard, a filesystem blob store, and seeded sample data in one container.
| Interface | URL |
| ---------------- | ----------------------- |
| Dashboard | `http://localhost:5173` |
| Collector ingest | `http://localhost:8790` |
Default local credentials:
| Credential | Value |
| ------------------ | ---------------- |
| Ingest write key | `dev-ingest-key` |
| Dashboard password | `e2e-test-pass` |
Run the local image smoke test when you want a full outside-the-container verification:
```bash
pnpm smoke:local-image
```
## Editable local install [#editable-local-install]
Use this when you are changing collector, dashboard, or SDK code.
```bash
git clone https://github.com/obs-unified/obs-unified.git
cd obs-unified
pnpm install
pnpm run setup
pnpm run dev
```
The local dev stack exposes:
| Service | Port | Purpose |
| ----------- | ------ | ----------------------------------------- |
| `collector` | `8790` | OTLP ingest and dashboard read API |
| `web` | `5173` | React dashboard |
| `demo` | `8787` | Synthetic demo API for end-to-end testing |
## Seed sample data [#seed-sample-data]
The Docker image seeds sample data automatically. For the editable local install, run:
```bash
pnpm run seed
```
This populates traces, logs, AI calls, usage events, issues, users, and alert rules. Replays require a real browser interaction, so use the dashboard Playground tab to capture one.
## Astronomy Shop demo [#astronomy-shop-demo]
For realistic microservice traffic and service-map edges:
```bash
pnpm demo:setup
pnpm demo:preflight
pnpm demo:up
```
Open the demo shop at `http://localhost:8080` and the dashboard at `http://localhost:5173`.
Stop it with:
```bash
pnpm demo:down
```
## Standalone Node collector [#standalone-node-collector]
If you only want the collector service with Postgres and MinIO/S3:
```bash
cd apps/collector-node
docker compose up -d
docker compose logs -f collector
```
## Production deployment [#production-deployment]
For production deployment details, see [Production operations](/docs/ops/production). At minimum, set a real `INGEST_KEY`, set `DASHBOARD_PASSWORD`, point `VITE_OBS_COLLECTOR_URL` at your collector, configure CORS for browser origins, and run migrations before accepting traffic.
Do not use `ALLOW_UNAUTHENTICATED=true` or the default local credentials in production.
# Python Flask instrumentation (/docs/instrument-python-flask)
Use the Python OpenTelemetry SDK when instrumenting Flask services. obs-unified accepts standard OTLP/HTTP, so Python services can send traces and logs to the same collector as the first-party TypeScript SDKs.
## Install packages [#install-packages]
```bash
python -m pip install \
opentelemetry-api \
opentelemetry-sdk \
opentelemetry-exporter-otlp-proto-http \
opentelemetry-instrumentation-flask \
opentelemetry-instrumentation-requests \
flask
```
## Configure tracing [#configure-tracing]
```py
import os
from flask import Flask, request
from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.instrumentation.flask import FlaskInstrumentor
from opentelemetry.instrumentation.requests import RequestsInstrumentor
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
collector_url = os.environ.get("OBS_COLLECTOR_URL", "http://localhost:8790")
ingest_key = os.environ["OBS_INGEST_KEY"]
provider = TracerProvider(
resource=Resource.create({
"service.name": "flask-api",
"service.version": "1.0.0",
})
)
provider.add_span_processor(
BatchSpanProcessor(
OTLPSpanExporter(
endpoint=f"{collector_url}/v1/traces",
headers={"authorization": f"Bearer {ingest_key}"},
)
)
)
trace.set_tracer_provider(provider)
tracer = trace.get_tracer("flask-api")
app = Flask(__name__)
FlaskInstrumentor().instrument_app(app)
RequestsInstrumentor().instrument()
```
## Stamp browser interaction ids [#stamp-browser-interaction-ids]
When a request comes from a browser instrumented with `@obs-unified/analytics-sdk`, the request carries `x-obs-interaction`. Copy that value onto the active span as `obs.interaction.id`.
```py
@app.before_request
def stamp_interaction_id():
interaction_id = request.headers.get("x-obs-interaction")
if not interaction_id:
return
span = trace.get_current_span()
if span and span.is_recording():
span.set_attribute("obs.interaction.id", interaction_id)
```
## Create child spans [#create-child-spans]
```py
@app.get("/api/recommendations")
def recommendations():
with tracer.start_as_current_span("recommendations.query") as span:
span.set_attribute("db.system", "postgresql")
rows = load_recommendations()
return {"items": rows}
```
## Verify [#verify]
Start your collector, set the environment, then call the Flask route from an instrumented browser or with a manual header:
```bash
export OBS_COLLECTOR_URL=http://localhost:8790
export OBS_INGEST_KEY=dev-ingest-key
flask --app app run --port 5000
curl -H 'x-obs-interaction: demo_click_123' http://localhost:5000/api/recommendations
```
The trace should appear in the Traces tab with `obs.interaction.id`, allowing the Connected rail to pivot back to the originating browser interaction when one exists.
# Instrumenting your app (/docs/instrumenting)
This page walks through wiring `@obs-unified/analytics-sdk` and `@obs-unified/telemetry-sdk` into a real two-tier app — a React frontend that calls a Worker backend — so a single user click produces a usage event, a span, a log, and (optionally) an AI call, all carrying the same `interaction_id`.
## What "end-to-end correlation" means here [#what-end-to-end-correlation-means-here]
```text
[Browser] [Server]
───────── ────────
click button
→ mint interaction_id
→ push usage event ─────────────────────→ /v1/usage (collector)
→ fetch("/api/checkout")
headers["x-obs-interaction"] = id
│
▼
stampInteractionFromRequest(span, req)
span.attributes["obs.interaction.id"] = id
→ child spans inherit
→ logs from this handler inherit
→ AI calls in this handler inherit
→ export span ─────→ /v1/traces (collector)
```
Every signal that flows out of the server while that request is in flight carries the same `interaction_id`. The dashboard's [Connected rail](/docs/what-to-expect) reads that column to surface "the click that caused this trace" in one click.
## Frontend (React + Vite) [#frontend-react--vite]
### 1. Install + wrap [#1-install--wrap]
```tsx
// src/main.tsx
import { AnalyticsProvider, AnalyticsErrorBoundary } from "@obs-unified/analytics-sdk/react";
import { createRoot } from "react-dom/client";
import { App } from "./App";
createRoot(document.getElementById("root")!).render(
Something crashed.}>
,
);
```
By default `AnalyticsProvider` installs **Mode A auto-correlation**: clicks/submits/keydowns mint `interaction_id` and `window.fetch` is patched to inject `x-obs-interaction`. No per-button wiring needed for the happy path.
### 2. Identify your user [#2-identify-your-user]
```tsx
import { useEffect } from "react";
import { useAnalytics } from "@obs-unified/analytics-sdk/react";
export function App() {
const { identify } = useAnalytics();
useEffect(() => {
const user = readCurrentUser();
if (user) {
identify(user.id, {
email: user.email,
plan: user.plan,
role: user.role,
});
}
}, [identify]);
return ;
}
```
After this call, the dashboard's user-detail page (`/#/users/`) shows this user's sessions, traces, AI calls, and replay.
### 3. Track meaningful interactions explicitly [#3-track-meaningful-interactions-explicitly]
Auto-tracked clicks give you a usage event for every DOM click, which is noisy. For business-meaningful events, track explicitly:
```tsx
const { trackInteraction } = useAnalytics();
const onSubmit = async () => {
const res = await fetch("/api/checkout", { method: "POST" });
trackInteraction("checkout_submitted", {
status: res.status,
cartValue: cart.total,
});
};
```
### 4. Mode B for async work [#4-mode-b-for-async-work]
```tsx
import { useAnalytics } from "@obs-unified/analytics-sdk/react";
export function DebouncedSearchBox() {
const { withInteraction } = useAnalytics();
const onChange = withInteraction(
debounce(async (query: string) => {
// Even after the debounce delay, this fetch carries the
// click's interaction_id because withInteraction snapshotted
// it at call time.
await fetch(`/api/search?q=${query}`);
}, 300),
);
return onChange(e.target.value)} />;
}
```
## Backend (Cloudflare Worker / Hono) [#backend-cloudflare-worker--hono]
### 1. Two middlewares [#1-two-middlewares]
```ts
// src/index.ts
import { Hono } from "hono";
import { cors } from "hono/cors";
import {
createRequestSpan,
initObservability,
runWithSpan,
stampInteractionFromRequest,
flushLogs,
flushAICalls,
} from "@obs-unified/telemetry-sdk";
const app = new Hono<{ Bindings: Env }>();
// CORS — explicitly allow the obs headers or the browser will strip
// them via preflight.
app.use(
"*",
cors({
origin: ["https://app.example.com"],
credentials: true,
allowHeaders: ["Content-Type", "Authorization", "X-Obs-Session-Id", "x-obs-interaction"],
}),
);
// Bootstrap the SDK once per request.
app.use("*", async (c, next) => {
initObservability({
collectorUrl: c.env.OBS_COLLECTOR_URL,
apiKey: c.env.OBS_INGEST_KEY,
serviceName: "checkout-api",
});
await next();
});
// Root span + interaction stamping.
app.use("*", async (c, next) => {
const span = createRequestSpan("checkout-api", `${c.req.method} ${c.req.path}`);
span.setAttribute("http.request.method", c.req.method);
span.setAttribute("url.path", c.req.path);
stampInteractionFromRequest(span, c.req.raw);
const sessionId = c.req.header("x-obs-session-id");
if (sessionId) span.setAttribute("session.id", sessionId);
try {
await runWithSpan(span, () => next());
span.setAttribute("http.response.status_code", c.res.status);
span.setStatus(c.res.status >= 400 ? 2 : 1);
} catch (err) {
span.setStatus(2, err instanceof Error ? err.message : String(err));
throw err;
} finally {
span.end();
await exportSpan(c.env, span);
await Promise.all([flushLogs(), flushAICalls()]).catch(() => {});
}
});
```
### 2. Child spans + logs in your handlers [#2-child-spans--logs-in-your-handlers]
```ts
import { withChildSpan, createLogger } from "@obs-unified/telemetry-sdk";
const log = createLogger("checkout-api");
app.post("/api/checkout", async (c) => {
const { items } = await c.req.json();
const user = await withChildSpan("db.query.user", async (child) => {
child.setAttribute("db.system", "postgres");
return await db.query("SELECT * FROM users WHERE id = $1", [c.var.userId]);
});
log.info("Charging payment", { userId: user.id, total: cart.total });
const charge = await withChildSpan("payment.charge", async (child) => {
child.setAttribute("stripe.amount", cart.total);
return await stripe.charges.create({ amount: cart.total });
});
return c.json({ chargeId: charge.id });
});
```
Both the child spans and the log inherit `interaction_id` from the root span. Everything ends up correlated.
### 3. AI calls [#3-ai-calls]
If your backend invokes an LLM, instrument it as an OpenInference-typed span:
```ts
import { setAISessionContext } from "@obs-unified/telemetry-sdk";
setAISessionContext({ sessionId, userId });
const response = await openai.chat.completions.create({ /* ... */ });
// The SDK's OpenAI wrapper (or your own typed helper) emits a span
// with openinference.span.kind=LLM, llm.cost_usd, llm.token_count.*, etc.
```
AI calls also land in the `ai_calls` denormalized table that the dashboard's AI tab and the Connected rail's "AI calls in this trace" section both read.
## What you should see [#what-you-should-see]
After wiring both ends and hitting a route, open the dashboard:
1. **Usage tab** — your tracked interaction appears
2. **Traces tab** — the root span + child spans, with `obs.interaction.id` in attributes
3. **Logs tab** — your `log.info("Charging payment", …)` row, joined to the trace
4. **AI Calls tab** — the LLM span (if any), joined to the same trace
5. **Replay tab** — the user's session, with the interaction listed under "Interactions in this session" — clicking it opens the trace that the click caused
If "Click that caused this trace" on the span detail rail shows the absence text (`Server-originated work — not bound to a user click`), the most common causes are:
* The browser SDK isn't installed or `installAutoCorrelate` was disabled
* CORS preflight is stripping `x-obs-interaction` — add it to `allowHeaders`
* The server isn't calling `stampInteractionFromRequest()` on the root span
## Going further [#going-further]
* **Multiple services**: each service initializes the SDK with its own `serviceName`. The OTLP `traceparent` header propagates trace context across service boundaries; `x-obs-interaction` propagates the click-scoped key. Native `fetch` doesn't forward arbitrary headers — pass them explicitly when fanning out:
```ts
await fetch(downstreamUrl, {
headers: {
"x-obs-interaction": c.req.header("x-obs-interaction") ?? "",
traceparent: c.req.header("traceparent") ?? "",
},
});
```
* **Node.js (non-Workers)**: same SDK, just call `initObservability` once at startup. The `flushLogs` / `flushAICalls` calls become periodic background tasks instead of per-request flushes.
* **eBPF agents**: drop in [Beyla](https://grafana.com/oss/beyla/) pointed at the obs-unified collector's `/v1/traces` and it'll emit spans tagged `telemetry.sdk.name=beyla`. The dashboard's Service Map tab filters by `SDK | eBPF | ALL`.
# Production operations (/docs/ops/production)
This page covers production deployment patterns for the standalone Node collector backed by Postgres and S3-compatible object storage. The Cloudflare Workers path remains supported, but the Node collector is the clearest fit for container platforms and private infrastructure.
## Network and reverse proxy [#network-and-reverse-proxy]
The recommended layout places the collector behind TLS, keeps OTLP ingest routes reachable, and protects dashboard/internal query routes with your normal auth boundary.
| Route | Backend | Purpose |
| ------------- | ---------------------------------- | -------------------------- |
| `/v1/*` | Collector on `:8790` | Public OTLP and SDK ingest |
| `/internal/*` | Collector on `:8790` | Dashboard query APIs |
| `/` | Dashboard static server on `:5173` | React dashboard |
### Nginx [#nginx]
```nginx
server {
listen 443 ssl http2;
server_name obs.my-app.com;
ssl_certificate /etc/letsencrypt/live/obs.my-app.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/obs.my-app.com/privkey.pem;
gzip on;
gzip_types application/json application/x-protobuf text/plain text/css;
location /v1/ {
proxy_pass http://127.0.0.1:8790;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
client_max_body_size 10m;
}
location /internal/ {
proxy_pass http://127.0.0.1:8790;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
location / {
proxy_pass http://127.0.0.1:5173;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
try_files $uri $uri/ /index.html;
}
}
```
### Caddy [#caddy]
```text
obs.my-app.com {
reverse_proxy /v1/* 127.0.0.1:8790 {
header_up Host {host}
header_up X-Real-IP {remote}
}
reverse_proxy /internal/* 127.0.0.1:8790 {
header_up Host {host}
header_up X-Real-IP {remote}
}
reverse_proxy /* 127.0.0.1:5173 {
header_up Host {host}
header_up X-Real-IP {remote}
}
}
```
## Postgres tuning [#postgres-tuning]
Each collector instance uses a persistent client-side pool. Start with the default `PG_POOL_MAX=10` for low to moderate traffic. For high-throughput environments, scale to `30` or `50`, making sure the total across replicas stays below the database server's `max_connections`.
Set a statement timeout so expensive dashboard or analysis queries cannot lock the database indefinitely:
```bash
PG_STATEMENT_TIMEOUT=30000
```
## Object storage retention [#object-storage-retention]
Replay chunks and pprof profiles live in S3-compatible object storage. Match bucket lifecycle retention to the database retention window. For a 72-hour hot-debugging window, use a three-day expiration rule.
```json
{
"Rules": [
{
"ID": "AutoDeleteOldTelemetryBlobs",
"Status": "Enabled",
"Filter": {
"Prefix": ""
},
"Expiration": {
"Days": 3
}
}
]
}
```
Apply it with:
```bash
aws s3api put-bucket-lifecycle-configuration \
--bucket obs-unified-storage-bucket \
--lifecycle-configuration file://lifecycle-policy.json
```
## Kubernetes shape [#kubernetes-shape]
Run collectors as stateless replicas against managed Postgres and S3-compatible storage.
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: obs-config
namespace: observability
data:
BLOB_STORE: "s3"
S3_REGION: "us-east-1"
S3_BUCKET: "my-production-obs-bucket"
PG_POOL_MAX: "30"
PORT: "8790"
---
apiVersion: v1
kind: Secret
metadata:
name: obs-secrets
namespace: observability
type: Opaque
data:
DATABASE_URL: cG9zdGdyZXM6Ly91c2VyOnBhc3NAcGctaG9zdDo1NDMyL29ic191bmlmaWVk
S3_ACCESS_KEY_ID: QUtJQVhYWFhYWFhYWFhYWFhYWFg=
S3_SECRET_ACCESS_KEY: c2VjcmV0LWtleS12YWx1ZS1nb2VzLWhlcmU=
INGEST_KEY: bXktc2VjdXJlLXdyaXRlLWtleQ==
DASHBOARD_PASSWORD: bXktc3VwZXItc2VjdXJlLXBhc3N3b3Jk
```
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: obs-collector
namespace: observability
spec:
replicas: 2
selector:
matchLabels:
app: obs-collector
template:
metadata:
labels:
app: obs-collector
spec:
containers:
- name: collector
image: obs-unified/collector:latest
ports:
- containerPort: 8790
envFrom:
- configMapRef:
name: obs-config
- secretRef:
name: obs-secrets
resources:
requests:
cpu: 250m
memory: 512Mi
limits:
cpu: "1"
memory: 1Gi
```
## Operational checks [#operational-checks]
Before opening ingest traffic:
1. Run database migrations for the target collector storage.
2. Confirm `/health` responds through the reverse proxy.
3. Verify CORS from the browser origin with `obs-unified doctor`.
4. Send one synthetic trace, one log, and one usage event.
5. Confirm dashboard login and internal query routes are not publicly exposed without your intended auth.
# SDK API reference (/docs/sdk-reference)
This page is a centralized, searchable cheat sheet for the browser SDK (`@obs-unified/analytics-sdk`) and server-side SDKs (`@obs-unified/telemetry-sdk`, Go, and Rust).
## SDK initialization [#sdk-initialization]
### Browser / client-side [#browser--client-side]
Initialize the browser SDK to track page views, click interactions, frontend exceptions, and optional DOM session replay.
```tsx
import { AnalyticsProvider } from "@obs-unified/analytics-sdk/react";
function Root() {
return (
);
}
```
For non-React hosts:
```ts
import { installAutoCorrelate, UsageTracker } from "@obs-unified/analytics-sdk";
const tracker = new UsageTracker({
collectorUrl: "https://obs.my-app.com",
apiKey: "your-public-ingest-key",
});
installAutoCorrelate({ tracker });
```
### Backend / server-side [#backend--server-side]
Initialize the standard telemetry exporter pipeline once at application startup.
```ts
import { initObservability } from "@obs-unified/telemetry-sdk";
initObservability({
collectorUrl: "https://obs.my-app.com",
apiKey: process.env.OBS_INGEST_KEY!,
serviceName: "checkout-api",
serviceVersion: "1.2.0",
flushIntervalMs: 5000,
});
```
## Click-to-trace context [#click-to-trace-context]
### Server-side HTTP ingress [#server-side-http-ingress]
Extract `x-obs-interaction` and bind it onto the active trace context.
```ts
import {
createRequestSpan,
runWithSpan,
stampInteractionFromRequest,
} from "@obs-unified/telemetry-sdk";
const span = createRequestSpan("my-service", `${req.method} ${req.path}`);
stampInteractionFromRequest(span, req);
runWithSpan(span, () => {
// Child spans and logs created here inherit the interaction id.
});
```
### Client-side async continuity [#client-side-async-continuity]
Auto-correlation handles synchronous handlers and shallow `await` chains. For debounces, timers, and state-machine ticks, capture and restore the interaction context.
```ts
import {
currentInteractionId,
withInteractionContext,
} from "@obs-unified/analytics-sdk";
const clickId = currentInteractionId();
setTimeout(() => {
withInteractionContext(clickId!, () => {
fetch("/api/long-running-task");
});
}, 300);
```
## AI and LLM span tracking [#ai-and-llm-span-tracking]
### High-level event tracking [#high-level-event-tracking]
```ts
import { trackAICall } from "@obs-unified/telemetry-sdk";
trackAICall({
modelName: "claude-3-5-sonnet-20251022",
provider: "anthropic",
callType: "chat",
promptTokens: 250,
completionTokens: 180,
latencyMs: 1400,
totalCostUsd: 0.0035,
isError: false,
});
```
### OpenInference-style spans [#openinference-style-spans]
```ts
import { startLLMSpan, startToolSpan } from "@obs-unified/telemetry-sdk";
const llmSpan = startLLMSpan("user-query-completion", {
modelName: "gpt-4o",
provider: "openai",
inputPrompt: "What is the capital of France?",
});
try {
const result = await myLLMApiCall();
llmSpan.setAttributes({
"openinference.span.output": result.text,
"openinference.usage.prompt_tokens": result.prompt_tokens,
"openinference.usage.completion_tokens": result.completion_tokens,
});
} finally {
llmSpan.end();
}
const toolSpan = startToolSpan("database-search-tool", {
toolName: "pg-vector-search",
toolInput: "paris coordinates",
});
try {
const data = await searchDb();
toolSpan.setAttributes({ "openinference.span.output": JSON.stringify(data) });
} finally {
toolSpan.end();
}
```
## Cloudflare Workers wrappers [#cloudflare-workers-wrappers]
Import these from `@obs-unified/telemetry-sdk/cloudflare` to avoid pulling heavy ambient type dependencies into Node.js environments.
```ts
import { wrapD1, wrapFetch, wrapR2 } from "@obs-unified/telemetry-sdk/cloudflare";
const db = wrapD1(env.DATABASE);
const bucket = wrapR2(env.STORAGE_BUCKET, { bucketName: "blobs" });
const telemetryFetch = wrapFetch(globalThis.fetch);
```
## Cross-language mapping [#cross-language-mapping]
| Concept / action | TypeScript SDK | Go SDK | Rust SDK |
| ------------------ | ---------------------------------------- | ------------------------------------ | ------------------------------------------- |
| Exporter init | `initObservability(config)` | `obs.Init(ctx, config)` | `obs_unified::init(config)` |
| HTTP stamp | `stampInteractionFromRequest(span, req)` | `obs.StampInteraction(ctx, r)` | `obs_unified::stamp_interaction(span, req)` |
| LLM tracking span | `startLLMSpan(name, options)` | `obs.StartLLMSpan(ctx, name, opts)` | `obs_unified::start_llm_span(name, opts)` |
| Tool tracking span | `startToolSpan(name, options)` | `obs.StartToolSpan(ctx, name, opts)` | `obs_unified::start_tool_span(name, opts)` |
| Project id | `extraHeaders: { "x-project-id": id }` | `obs.SetProjectID(ctx, id)` | `obs_unified::set_project_id(id)` |
| Capture exception | `annotateErrorSpan(span, error)` | `obs.AnnotateError(ctx, span, err)` | `obs_unified::annotate_error(span, err)` |
# SDKs (/docs/sdks)
obs-unified ships first-party SDKs for **three backend languages** plus a browser SDK, all sharing one identity-propagation contract.
For a compact method-by-method cheat sheet, see [SDK API reference](/docs/sdk-reference).
## Backend SDKs [#backend-sdks]
| Language | Install | Where it runs |
| -------------- | ---------------------------------------------------------- | ----------------------------------------- |
| **TypeScript** | `pnpm add @obs-unified/telemetry-sdk` from GitHub Packages | Node.js · Bun · Deno · Cloudflare Workers |
| **Go** | `go get github.com/obs-unified/obs-unified/sdks/go@latest` | Any Go 1.22+ binary |
| **Rust** | Git dependency from `sdks/rust` | Any Rust 1.75+ binary (Tokio runtime) |
All three expose the **same API surface**, so muscle memory ports across runtimes:
| Concept | TypeScript | Go | Rust |
| ---------- | ---------------------------------------- | ------------------------------------ | ------------------------------------------- |
| Init | `initObservability(config)` | `obs.Init(ctx, cfg)` | `obs_unified::init(cfg)` |
| HTTP stamp | `stampInteractionFromRequest(span, req)` | `obs.StampInteraction(ctx, r)` | `obs_unified::stamp_interaction(span, req)` |
| LLM span | `startLLMSpan(name, opts)` | `obs.StartLLMSpan(ctx, name, opts)` | `start_llm_span(name, opts)` |
| Tool span | `startToolSpan(name, opts)` | `obs.StartToolSpan(ctx, name, opts)` | `start_tool_span(name, opts)` |
| Project ID | `extraHeaders: { "x-project-id": id }` | `obs.SetProjectID(ctx, id)` | `set_project_id(id)` |
These are deliberately **thin wrappers around OpenTelemetry**: one-line collector init, OpenInference-shaped helpers so LLM/tool spans render in the dashboard's AI tab, project-id propagation for multi-tenant, and the loop-guard header for services that ingest their own telemetry. HTTP server/client and database instrumentation come from the OTel ecosystem of your language (`@opentelemetry/instrumentation-http`, `otelhttp`, `tower-http` + `tracing-opentelemetry`, etc.).
If your backend language isn't listed, any OTLP-compatible OpenTelemetry SDK works as a fallback — you just lose the `x-obs-interaction` click-to-trace stitching unless you set the header yourself.
## Browser SDK [#browser-sdk]
| Package | Where it runs | What it does |
| ---------------------------- | -------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------- |
| `@obs-unified/analytics-sdk` | Browser (React or vanilla) | Tracks usage events, identifies users, mints **interaction\_id** on every click, auto-injects `x-obs-interaction` on outbound `fetch`/XHR |
This one is browser-only by definition — there's no parallel Go/Rust analytics SDK because that surface area is the web.
All four packages live in the obs-unified monorepo. The TypeScript packages publish to GitHub Packages under `@obs-unified/*`; GitHub Packages requires authentication even for public packages. The Go SDK is consumed straight from the public `sdks/go` module path. The Rust SDK is currently consumed as a Git dependency until a crates.io release is cut.
### GitHub Packages setup [#github-packages-setup]
```bash
pnpm config set @obs-unified:registry https://npm.pkg.github.com
pnpm login --scope=@obs-unified --auth-type=legacy --registry=https://npm.pkg.github.com
```
Use your GitHub username and a classic personal access token with `read:packages`.
## The identity skeleton [#the-identity-skeleton]
```text
user_id → session_id → interaction_id → trace_id → span_id
```
The browser SDK owns the left half (user / session / interaction). The server SDK owns the right half (trace / span). They meet at the **`x-obs-interaction` request header** that the browser sets on outbound fetches and the server stamps onto the resulting span.
Once a signal carries `interaction_id`, the dashboard's [Connected rail](/docs/what-to-expect) can pivot from any entity to "the click that caused this trace" in one click.
For CPU and off-CPU work, `interaction_id` is the entry key, not the profiler's storage key. Profiles are indexed by `trace_id`; when a profiled service labels samples with trace IDs, the path becomes:
```text
interaction_id → trace_id → profile_trace_index → CPU/off-CPU profile
```
## @obs-unified/analytics-sdk [#obs-unifiedanalytics-sdk]
### Install [#install]
```bash
pnpm config set @obs-unified:registry https://npm.pkg.github.com
pnpm login --scope=@obs-unified --auth-type=legacy --registry=https://npm.pkg.github.com
pnpm add @obs-unified/analytics-sdk
```
For React hosts the SDK exports a provider component; for other hosts call the lower-level `installAutoCorrelate()` directly.
### React quick start [#react-quick-start]
```tsx
import { AnalyticsProvider, AnalyticsErrorBoundary } from "@obs-unified/analytics-sdk/react";
export function Root() {
return (
}>
);
}
```
The provider:
* Mints a `session_id` on first mount, persists it for \~30 minutes of inactivity
* Tracks page views automatically on `pushState` / `popstate`
* Captures uncaught errors and unhandled promise rejections
* Installs the **Mode A** auto-correlator: a global capture-phase listener on `click`, `submit`, `keydown` that mints a fresh `interaction_id`, plus a global `fetch` + `XMLHttpRequest` wrapper that injects the `x-obs-interaction` header on outbound requests
* Provides `useAnalytics()` hook for explicit calls
### Inside a component [#inside-a-component]
```tsx
import { useAnalytics } from "@obs-unified/analytics-sdk/react";
export function CheckoutButton() {
const { trackInteraction, identify, withInteraction } = useAnalytics();
const onSubmit = withInteraction(async () => {
// fetch() inside here automatically carries x-obs-interaction
const res = await fetch("/api/checkout", { method: "POST" });
trackInteraction("checkout_submitted", { status: res.status });
});
return ;
}
```
### Identifying users [#identifying-users]
```ts
identify("user-42", { email: "alice@example.com", plan: "pro" });
```
Calls `POST /v1/identify` on the collector with `userId`, `visitorId`, `email`, `name`, `properties`. The endpoint upserts into `user_profiles` and uses `MIN(...)` to preserve the earliest `first_seen_at` on conflict.
For historical imports / backfills, you can also pass an optional `firstSeenAt` ISO timestamp:
```ts
await fetch(`${collector}/v1/identify`, {
method: "POST",
body: JSON.stringify({
userId: "user-42",
visitorId: "vis-abc",
email: "alice@example.com",
firstSeenAt: "2026-01-15T08:30:00Z",
}),
});
```
Future timestamps are rejected silently — clock-skewed clients can't poison the table.
### Mode B: explicit interaction context [#mode-b-explicit-interaction-context]
Auto-correlation (Mode A) handles synchronous handlers and shallow `await` chains. Deeper async flows (debounces, setTimeout-queued work, state-machine transitions) escape the microtask cascade. For those, use `withInteractionContext`:
```ts
import { withInteractionContext, currentInteractionId } from "@obs-unified/analytics-sdk";
// Capture at click time
const id = currentInteractionId();
// Re-enter the context wherever the work actually happens
setTimeout(() => {
withInteractionContext(id!, () => {
fetch("/api/long-running"); // carries the click's interaction_id
});
}, 500);
```
## @obs-unified/telemetry-sdk [#obs-unifiedtelemetry-sdk]
### Install [#install-1]
```bash
pnpm config set @obs-unified:registry https://npm.pkg.github.com
pnpm login --scope=@obs-unified --auth-type=legacy --registry=https://npm.pkg.github.com
pnpm add @obs-unified/telemetry-sdk
```
### Cloudflare Worker / Hono quick start [#cloudflare-worker--hono-quick-start]
```ts
import {
createRequestSpan,
initObservability,
runWithSpan,
stampInteractionFromRequest,
flushLogs,
flushAICalls,
} from "@obs-unified/telemetry-sdk";
app.use("*", async (c, next) => {
initObservability({
collectorUrl: c.env.OBS_COLLECTOR_URL,
apiKey: c.env.OBS_INGEST_KEY,
serviceName: "checkout-api",
});
await next();
});
app.use("*", async (c, next) => {
const span = createRequestSpan("checkout-api", `${c.req.method} ${c.req.path}`);
span.setAttribute("http.request.method", c.req.method);
// RFC 0004 — closes the click-to-trace loop. No-op if header is missing.
stampInteractionFromRequest(span, c.req.raw);
try {
await runWithSpan(span, () => next());
span.setStatus(c.res.status >= 400 ? 2 : 1);
} finally {
span.end();
await exportSpan(c.env, span);
await Promise.all([flushLogs(), flushAICalls()]);
}
});
```
The same primitives work in plain Node.js — replace `c.env` with `process.env`, replace the Hono context with a node request handler.
### Child spans [#child-spans]
```ts
import { withChildSpan } from "@obs-unified/telemetry-sdk";
const items = await withChildSpan("db.query.items", async (child) => {
child.setAttribute("db.system", "postgres");
child.setAttribute("db.statement", "SELECT * FROM items");
return await db.query("SELECT * FROM items");
});
```
Child spans inherit the parent's trace context and interaction id automatically.
### Logs [#logs]
```ts
import { createLogger } from "@obs-unified/telemetry-sdk";
const logger = createLogger("checkout-api");
logger.info("Cart loaded", { cartId, itemCount: items.length });
logger.error("Stripe webhook signature failed", { traceId, error });
```
Logs are flushed at request end via `flushLogs()`. If a span is active, the log inherits `trace_id`, `span_id`, `session_id`, and `interaction_id`.
### AI calls [#ai-calls]
The SDK exposes typed helpers for OpenInference-style LLM spans:
```ts
import { setAISessionContext, startRetrieverSpan } from "@obs-unified/telemetry-sdk";
setAISessionContext({ sessionId, userId });
await startRetrieverSpan("vector.search", { input: query }, async (span) => {
span.setAttribute("retriever.k", 10);
return await db.vectorSearch(query);
});
```
For lower-level LLM and tool spans, use `startLLMSpan()` and `startToolSpan()` as shown in [SDK API reference](/docs/sdk-reference#ai-and-llm-span-tracking).
### Cloudflare Workers bindings [#cloudflare-workers-bindings]
Workers services can wrap platform bindings from the Cloudflare-specific entry point:
```ts
import { wrapD1, wrapFetch, wrapR2 } from "@obs-unified/telemetry-sdk/cloudflare";
const db = wrapD1(env.DATABASE);
const bucket = wrapR2(env.STORAGE_BUCKET, { bucketName: "blobs" });
const telemetryFetch = wrapFetch(globalThis.fetch);
```
## Wire-format compatibility [#wire-format-compatibility]
The collector accepts standard OTLP/HTTP at:
* `POST /v1/traces` — protobuf or JSON OTLP traces
* `POST /v1/logs` — protobuf or JSON OTLP logs
So any OTel-compatible producer (the Go collector, OpenInference instrumentations, Beyla eBPF agent) works alongside the obs SDKs. The obs SDKs add the `x-obs-interaction` header and the `obs.interaction.id` span attribute — native OTel SDKs don't, which is why the [click-to-trace pivot is gated on the obs SDKs](/docs/what-to-expect#scenario-a).
# What to expect (/docs/what-to-expect)
obs-unified is designed around one promise: **built for agentic debugging: one telemetry graph agents can traverse from user action to backend trace, logs, replay, AI cost, and CPU profile**. The dashboard's ≤2-click rail is the human-facing version of that same graph. This page walks through what the dashboard actually surfaces once instrumentation is in place.
## The Connected rail [#the-connected-rail]
Every detail page in the dashboard mounts a right-side rail with four sections:
```text
┌─ Connected — span ─┐
│ │
│ Up: │
│ Trace │
│ Parent trace │
│ │
│ Across: │
│ Other spans │
│ Logs in trace │
│ AI calls │
│ │
│ Down: │
│ Profiles │
│ │
│ Related: │
│ Click that │
│ caused this │
│ trace │
│ → click_5 │
│ │
└────────────────────┘
```
* **Up** — the parent entity (trace ← span, session ← usage event, etc.)
* **Across** — sibling signals sharing the same identity key (other spans in the same trace, logs from the same session)
* **Down** — derived data (pprof profile for a trace, off-CPU profile for a span)
* **Related** — non-identity-based neighbors (the click that caused this trace, alerts firing on this service)
When a section has no neighbors, the rail renders an **informative-absence** message explaining *why* — never a silent empty section. The platform's contract is that "no data" should always tell you what's missing and how to populate it.
## Scenario A — alert → trace → flame graph → cohort → session → replay [#scenario-a--alert--trace--flame-graph--cohort--session--replay]
The headline product test. From a paged alert:
| Step | What you see | What you click | RFCs |
| ---- | ------------------------------------------------------------------------------- | -------------------------------------------- | ---------------- |
| 1 | Alert detail with bound Analysis narrative + exemplar traces | Slowest exemplar trace | 0002, 0006 |
| 2 | Trace waterfall, self-time bars, ⚠ UNINSTRUMENTED + 🔥 PROFILES badges | 🔥 badge on the slow span | 0005, 0006, 0007 |
| 3 | Flame graph filtered to this trace's samples (server-side filter, smaller blob) | "Other traces sampled in this profile (243)" | 0007 |
| 4 | Cohort: all traces touched by this profile, with user attribution | A user from the cohort | 0007, 0006 |
| 5 | Session timeline: user's page views, clicks, traces side-by-side | An rrweb event | 0004, 0006 |
| 6 | Replay scrubbed to the click + Connected rail: "Trace caused by this click" | Closes the loop back to step 2's trace | 0004, 0006 |
Six clicks across the entire platform. The platform's claim is that every neighbor at every step is on the rail.
## Interaction ID to CPU [#interaction-id-to-cpu]
The browser SDK mints a single `interaction_id` for a frontend action and injects it as `x-obs-interaction` on outbound requests. Backend SDKs copy that value onto the active span as `obs.interaction.id`, and correlated logs or AI calls inherit it from the span context.
CPU and off-CPU profiles are joined through traces rather than storing `interaction_id` directly on every sample. If profiling is enabled and samples are labeled with trace IDs, the dashboard can follow:
```text
frontend click → interaction_id → backend trace_id → profile_trace_index → CPU/off-CPU profile
```
That is the accurate version of "one ID from frontend to CPU": one interaction ID anchors the user action, and the trace it caused carries the investigation into profiling data.
## Scenario B — AI cost spike → user → session → trace [#scenario-b--ai-cost-spike--user--session--trace]
A different entry point exercising the same identity skeleton:
1. **AI dashboard** shows a cost spike (`SPANS OVER TIME` chart peaks). The Sessions view ranks the heavy spender at the top by cost.
2. **Click the `👤 user-id` chip** on the heavy spender's row → user detail page.
3. **User detail page** shows the user's `Identity` card + a Connected rail with "Latest session", "Recent traces", "Recent AI calls". The rail surfaces the count-collapsed link for a session with N traces / M AI calls.
4. **Click "Latest session"** → Replay tab scoped to that session, showing the session's interactions linked to their traces.
5. **Click an interaction** → trace waterfall for the trace that click caused. Connected rail's "Click that caused this trace" closes the loop back to the originating click.
The seed (`pnpm seed`) plants a "Heavy Spender (seed)" user with 8–9 high-cost claude-3-5-haiku calls so this walkthrough is reproducible without writing real AI traffic.
## Scenario C — futex contention via off-CPU flame graph [#scenario-c--futex-contention-via-off-cpu-flame-graph]
Validates the kernel-level layer:
1. Trace shows an unexplained pause inside a span (no child spans, on-CPU profile shows little activity).
2. Rail's "Down → 🔥 off-CPU profile" leads to an icicle flame graph that surfaces `futex_wait_queue` ↑ `pthread_mutex_lock` ↑ `inventory_pool::checkout` taking 84% of off-CPU time.
3. Root cause: a single pool-wide mutex serializing every checkout.
This scenario currently runs only against the docker-compose demo with [Beyla](https://grafana.com/oss/beyla/) feeding pprof. The dashboard code paths are live; the synthetic seed doesn't generate pprof blobs.
## Per-tab walkthrough [#per-tab-walkthrough]
| Tab | What's there | Key rail pivots |
| ------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------- |
| **Health** | Tier-0 analysis tiles (error top offenders, latency outliers, log anomaly summary) with optional LLM narrative | Click a tile → Investigations page with the analysis detail |
| **Timeline** | Per-session lane of usage / span / log events, grouped by `interaction_id` | Click an event → trace or replay |
| **Service Map** | Service-to-service edges with SDK / eBPF source filter | Click an edge → traces between those services |
| **Logs** | Histogram + by-service / by-severity breakdown, filterable | Click a log → log detail with rail surfacing parent trace |
| **Investigations** | List of analyses + per-analysis detail page with narrative + evidence + connected rail | Rail's "Cited traces" → trace detail |
| **Traces** | Trace list with inline waterfall expansion, self-time visualization, ⚠ + 🔥 badges, span detail drawer | Click a span row → rail with "Click that caused this trace" |
| **Issues** | Trace-level issue grouping by error fingerprint | Click an issue → trace |
| **AI Calls** | Two views — Spans (typed LLM/TOOL/RETRIEVER spans) and Sessions (multi-turn conversation rendering with cost + tokens). User chips are clickable. | Click `👤 user-id` → user detail page |
| **Replays** | Session list + rrweb player + per-session interactions panel | Click an interaction → trace it caused |
| **Alerts** | Alert rules + recent firings + bound analyses | Click an alert → bound Analysis → exemplar traces |
| **Usage** | Page views, interactions, top paths, by-country breakdown | Click a session row → timeline |
| **Resources** | Cloudflare worker resource panels + (when populated) Linux host metrics | Click a host → host detail |
| **Projects** | Multi-project routing (ingest keys, dashboard auth) | n/a |
## When you should expect informative absence [#when-you-should-expect-informative-absence]
The rail is honest about what's missing. You'll see explicit "—" messages when:
* **No interaction\_id on a span** — the trace wasn't caused by a browser click (cron, queue consumer, retry). The "Originating click" section explains this.
* **No pprof profile** — the producing service hasn't wired `startProfiler()` or an eBPF agent. The Down section explains how to populate.
* **No rrweb replay** — the session had no real browser to capture chunks. The Replay tab tells you to visit `/playground` and click "Start replay" to capture one.
* **Alert/analysis topic links** — alerts and analyses don't carry identity columns; they relate by topic, not identity. The rail's `Related` section explains this is by design.
These are part of the design — empty data should always be explained, never silent.
## Production deployment caveats [#production-deployment-caveats]
* The migration runner has a `--remote` mode; first-run on a partially-migrated production DB needs manual backfill (see [Installation](/docs/installation#production-deployment)).
* The every-minute analyses cron uses a 90s claim/lease to prevent overlap on long-running LLM narrative passes (RFC 0002 Stage 4 follow-up).
* The pprof receiver returns 422 on decode failure (corrupted blobs surface to the agent instead of landing silently in R2).
* The connected-routes endpoint returns 400 on unknown entity kinds (catches client-side URL building bugs).
## Recent reliability behavior [#recent-reliability-behavior]
The May 31, 2026 updates tightened several user-visible dashboard paths:
* Telemetry and AI dashboards abort stale loaders, so quick filter/tab changes do not let older responses overwrite newer views.
* Replay chunk loading is paginated, so long sessions load progressively instead of depending on one large response.
* Live-tail streams enforce project isolation end-to-end.
* Connected rail scenario tests now cover trace, replay, and service-map pivots more directly.
These are not new navigation concepts, but they make the rail and dashboard flows behave more predictably under realistic traffic.