Alert management

Learn how to manage alerts in the Giant Swarm observability platform, including alert rules, routing, and the alerting pipeline.

Alert management is crucial for any observability solution. The Giant Swarm observability platform provides comprehensive alerting capabilities that help you monitor your infrastructure and applications proactively.

Alerting consists of two main concepts: the alerting pipeline (how to send alerts, to whom, and what to send) and alert rules (what to alert on).

For detailed information about alerting, visit the official Grafana documentation.

How alerting works

The alerting pipeline supports multi-tenancy, so we recommend getting familiar with our multi-tenancy concept first.

The alerting pipeline

alerting pipeline

The alerting pipeline is straightforward. The Loki and Mimir rulers evaluate alerting rules and send alerts to the Mimir Alertmanager. The Mimir Alertmanager (a multi-tenant aware Alertmanager) routes those alerts to configured receivers.

Configure Alertmanager for your tenants using our alert routing documentation.

Understanding alerting and recording rules

The platform supports two types of rules that power your monitoring strategy:

Alerting rules

Alerting rules define conditions that trigger notifications when specific issues occur in your infrastructure or applications. They use Prometheus (PromQL) or Loki (LogQL) expressions to evaluate your data and fire alerts when thresholds are met.

Key characteristics:

Metric-based alerts: Monitor infrastructure metrics like CPU usage, memory consumption, or response times
Log-based alerts: Watch for specific patterns, errors, or anomalies in application logs
Flexible conditions: Set duration requirements before alerts fire to reduce noise
Rich context: Include labels for routing and annotations for human-readable information

Recording rules

Recording rules pre-compute frequently needed or expensive expressions, saving the results as new time series. This improves query performance and enables you to create custom business metrics by combining multiple data sources.

Use recording rules to:

Improve dashboard performance by pre-calculating complex aggregations
Create custom metrics that combine multiple sources into business indicators
Simplify complex queries by breaking them into manageable components

Loading alerting and recording rules

The platform lets you create and load both types of rules into:

Mimir ruler: For metric-based alerts and recording rules
Loki ruler: For log-based alerts

You can deploy rules from both management clusters and workload clusters through our Grafana Alloy agents. The system automatically handles multi-tenant isolation and provides scoping mechanisms to prevent conflicts across clusters.

All rules must include the observability.giantswarm.io/tenant label to specify which tenant they belong to.

Learn how to create and deploy your own rules using our alert rules documentation.

Alerting features in Grafana

Access alerting configuration and monitoring in the Alerting section of your installation’s Grafana.

Grafana alerting section

The alerting section provides:

Alert rules: All alerting and recording rules currently available, filterable by state (firing, pending). Use the “see graph” link to jump to an explore page with the alert’s expression pre-filled
Contact points: Configured integrations (like Opsgenie or Slack) for sending alerts, including notification templates for formatting
Notification policies: Alert routing that defines how alerts reach contact points based on matching criteria
Silences: Current silences and their states, along with affected alerts. Create immediate silences through the UI or manage them via CRDs for GitOps workflows
Active notifications: Currently firing alerts with notification states
Settings: General Alertmanager instance settings and current configuration