Ingest metrics into the Observability Platform

How to make new metrics available in the Observability Platform in self-service.

By default, all Giant Swarm clusters are equiped with Prometheus Operator and a set of Alloy monitoring agents. They are used to collect and forward critical cluster and workload metrics to a central Grafana Mimir instance running on the management cluster.

No workload is the same, especially in the way it exposes its metrics, so the Observability Platform’s monitoring configuration needs to be flexible. That’s why it’s based on the ServiceMonitor and PodMonitor Custom Resource Definitions (CRD) provided by the Prometheus Operator.

Those allow you to:

Define where the metrics you want to ingest are (for example the container or port)
Transform metrics before ingesting them (for example dropping unneeded data, adding extra labels)

You can learn more about the ServiceMonitor and PodMonitor CRD by checking the Prometheus Operator API Docs.

Prerequisites

Before you start to ingest data from a running container, you need to make sure that your application is already instrumented to export metrics.

Keep in mind that ingesting new metrics into the Observability Platform comes with a cost. The resource consumption of the central Mimir is related to the amount of metrics it has to handle. This means ingesting more metrics also leads to higher resource consumption of the Observability Platform overall.

You can check the resource usage related to your ServiceMonitor and PodMonitor in the ServiceMonitors Overview dashboard in your installation’s Grafana.

Creating a ServiceMonitor

Here is an example showing how to create a ServiceMonitor.

This one targets a service named my-service in the monitoring namespace, and will route alerts to the mytenant tenant in Mimir. The manifests should be similar with any workload as long as you have a service that exposes the app’s metrics.

The bare minimum for a ServiceMonitor looks like this:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    ## This label is important as it is required for the metrics agent to discover it.
    ## The tenant name should be the name of your internal team.
    observability.giantswarm.io/tenant: my_tenant
    app.kubernetes.io/instance: my-service
  name: my-service
  namespace: monitoring
spec:
  endpoints:
  - interval: 60s   # Scrape the target every 60s
    path: /metrics  # Path that exposes the metrics
    port: web       # Named port on the service that exposes the metrics
    relabelings: [] # Any potential metric transformation that you want to apply to your metrics.
  selector:         # Label selector that matches your service
    matchLabels:
      app.kubernetes.io/instance: my-service

No matter if you are using Helm Charts or GitOps and Kustomize, just put the ServiceMonitor CR next to your app and apply it in the same way. Once it’s applied you can check either the ServiceMonitors Overview or ServiceMonitors Details dashboards, or just search for the new metrics ingested in your installation’s Grafana to make sure that your containers are being scraped by the new monitoring agents.

Warning: The ServiceMonitor needs to be labeled with observability.giantswarm.io/tenant: <YOUR-TENANT-NAME> for the metrics agent to be able to discover it and start collecting metrics. Also, you need to make sure that the tenant you’re setting with the label does actually exist in one of the Grafana organization CRs. Any metrics that are sent to a tenant that is not referenced by at least one Grafana Organisation is not ingested by Mimir!

You can also collect logs in a similar way by using PodLogs, which are the equivalent resource for log ingestion.

ServiceMonitor vs. PodMonitor

In most cases, a ServiceMonitor should cover most of your monitoring use cases but it can happen on rare occasions that a container doesn’t need a service to run and it doesn’t make sense to create one just for the sake of monitoring it. That’s when the PodMonitor comes into action. You can find a few other examples where PodMonitor makes sense in this discussion in the Prometheus Operator Project.

Deprecated team label

As we now support multi-tenancy for metrics, you need to attribute a tenant name to your ServiceMonitors and PodMonitors to have those routed to the adequate tenant. This means that you will have to replace the application.giantswarm.io/team: <YOUR-TEAM-NAME> label by the observability.giantswarm.io/tenant: <YOUR-TENANT-NAME> one.

For now, ServiceMonitors and PodMonitors that only have the application.giantswarm.io/team label are routed to the “Giant Swarm” tenant by default.

Giant Swarm Offerings

Ingest metrics into the Observability Platform

Prerequisites

Creating a ServiceMonitor

ServiceMonitor vs. PodMonitor

Deprecated team label

About the company