Last modified October 7, 2025

Advanced TraceQL tutorial

TraceQL is Grafana Tempo’s native query language designed specifically for distributed trace analysis. Unlike traditional log or metrics queries, TraceQL lets you search and filter across the complex, hierarchical structure of traces and spans.

This tutorial builds on the concepts from data exploration and assumes you have traces flowing into your Giant Swarm observability platform.

Prerequisites

Before starting this tutorial:

  • Tracing enabled: Your cluster must have tracing enabled (contact your account engineer)
  • Instrumented applications: Your applications should be sending traces to otlp-gateway.kube-system.svc
  • Grafana access: You can access traces from your Grafana instance using the Tempo data source

Understanding TraceQL basics

Trace structure hierarchy

Traces consist of spans organized in a tree structure:

  • Trace: The complete journey of a request through your system
  • Span: Individual operations within the trace (service calls, database queries, etc.)
  • Root span: The entry point of the trace
  • Child spans: Operations triggered by parent spans

TraceQL syntax fundamentals

TraceQL queries follow this basic pattern:

{span.attribute = "value"}

Basic elements:

  • Scope selectors: span., resource., trace.
  • Attribute names: The specific attribute you want to filter on
  • Operators: =, !=, >, <, >=, <=, =~ (regex)
  • Values: Strings, numbers, or durations

Essential TraceQL queries

Finding traces by service

Query traces involving a specific service:

{resource.service.name = "user-service"}

Find traces involving multiple services:

{resource.service.name = "user-service"} && {resource.service.name = "payment-service"}

Filtering by operation name

Find specific operations within services:

{span.name = "GET /api/users"}

Use regex for pattern matching:

{span.name =~ "GET /api/.*"}

Performance-based filtering

Find slow traces (duration in nanoseconds):

{trace.duration > 5s}

Find traces with errors:

{status = error}

Combine conditions:

{trace.duration > 2s && status = error}

Advanced filtering techniques

HTTP-specific queries

Find traces for specific HTTP methods:

{span.http.method = "POST"}

Filter by HTTP status codes:

{span.http.status_code >= 400}

Find slow HTTP requests:

{span.http.method = "GET" && span.duration > 1s}

Database operation analysis

Find database queries:

{span.db.system = "postgresql"}

Analyze slow database operations:

{span.db.system = "postgresql" && span.duration > 500ms}

Find specific database operations:

{span.db.operation = "SELECT" && span.db.name = "users"}

Custom attribute filtering

Many applications add custom attributes to spans. Filter using these:

{span.custom.user_id = "12345"}

Find traces for specific customers or tenants:

{span.tenant.id = "customer-abc"}

Performance analysis patterns

Identifying bottlenecks

Find the slowest traces:

{trace.duration > 95th percentile}

Find services with high error rates:

{resource.service.name = "payment-service" && status = error}

Capacity planning queries

Count spans per service:

{resource.service.name = "api-gateway"} | count()

Analyze request patterns:

{span.http.method = "POST" && span.http.route = "/api/orders"}

Combining TraceQL with other observability data

Correlating with metrics

Use trace IDs in metric queries to correlate performance:

  1. Find problematic traces with TraceQL
  2. Extract trace IDs
  3. Use trace IDs in PromQL queries with trace_id label

Correlating with logs

Link traces to logs using trace correlation:

  1. Find traces in Tempo with TraceQL
  2. Use the trace ID in LogQL: {namespace="my-app"} | json | trace_id="abc123"

Service graph exploration

Tempo generates service graphs from trace data. Use TraceQL to understand service interactions:

Analyze service dependencies

Find all services called by a specific service:

{resource.service.name = "api-gateway"} | by(resource.service.name)

Identify service communication patterns

Find cross-service calls:

{span.kind = "client"} && {resource.service.name != parent.resource.service.name}

Troubleshooting workflows

Debugging failed requests

  1. Find error traces:

    {status = error && resource.service.name = "payment-service"}
    
  2. Narrow down by time:

    {status = error && resource.service.name = "payment-service"} 
    | select(trace.start_time > now() - 1h)
    
  3. Analyze error patterns:

    {status = error} | by(span.status.message)
    

Performance regression analysis

Compare performance across time periods:

{resource.service.name = "user-service" && trace.duration > 2s}

Then use Grafana’s time range controls to compare different periods.

Dependency impact analysis

Find traces affected by a specific service issue:

{resource.service.name = "database-service" && status = error} 
&& {resource.service.name = "user-service"}

Advanced TraceQL features

Aggregation functions

Count traces matching criteria:

{resource.service.name = "api-service"} | count()

Calculate percentiles:

{resource.service.name = "api-service"} | quantile(0.95)

Structural queries

Find traces with specific span relationships:

{span.name = "database-query" && parent.span.name = "user-lookup"}

Query trace topology:

{trace.root.service.name = "api-gateway" && span.kind = "server"}

Best practices

Query optimization

  • Start specific: Begin with service or operation names before adding duration filters
  • Use time ranges: Always specify time ranges to improve query performance
  • Limit results: Use | limit(100) for exploratory queries

Common patterns

  1. Error investigation workflow:

    {status = error} | by(resource.service.name) | count()
    
  2. Performance analysis workflow:

    {trace.duration > 5s} | by(resource.service.name) | quantile(0.95)
    
  3. Service dependency mapping:

    {resource.service.name = "my-service"} | by(span.kind, resource.service.name)
    

Avoiding common mistakes

  • Don’t over-filter: Too many conditions can return no results
  • Mind the hierarchy: Remember that traces have spans, not the other way around
  • Use appropriate time ranges: Excessively long time ranges can timeout

Integration with Grafana dashboards

Creating trace panels

  1. Add a traces panel to your dashboard
  2. Configure the Tempo data source
  3. Write TraceQL queries in the query editor
  4. Use template variables for dynamic filtering

Linking from metrics

Create drill-down links from metric panels:

  1. Add data links to metric panels
  2. Include trace ID variables in the link
  3. Link directly to trace view in Grafana

Next steps

Now that you understand TraceQL:

  • Explore service graphs: Use Tempo’s service graph feature to visualize dependencies
  • Set up trace-derived metrics: Configure Tempo’s metrics-generator for alerting
  • Create trace dashboards: Build custom visualizations for your distributed systems
  • Integrate with alerts: Use metrics generated from traces for proactive monitoring

For more complex scenarios, consider exploring Grafana’s TraceQL documentation and the OpenTelemetry instrumentation guides.