Ingest metrics into the Observability Platform

How to make new metrics available in the Observability Platform in self-service.

By default, all Giant Swarm clusters are equiped with Prometheus Operator and a set of Prometheus shards in agent mode. These shards are monitoring agents that collect and forward critical cluster and workload metrics to a central Grafana Mimir instance running on the management cluster.

No workload is the same, especially in the way it exposes its metrics, so the Observability Platform’s monitoring configuration needs to be flexible. That’s why it’s based on the ServiceMonitor and PodMonitor Custom Resource Definitions (CRD) provided by the Prometheus Operator.

Those allow you to:

  • Define where the metrics you want to ingest are (for example the container or port)
  • Transform metrics before ingesting them (for example dropping unneeded data, adding extra labels)

You can learn more about the ServiceMonitor and PodMonitor CRD by checking the Prometheus Operator API Docs.

Prerequisites

Before you start to ingest data from a running container, you need to make sure that your application is already instrumented to export metrics.

Keep in mind that ingesting new metrics into the Observability Platform comes with a cost. The resource consumption of the central Mimir is related to the amount of metrics it has to handle. This means ingesting more metrics also leads to higher resource consumption of the Observability Platform overall.

You can check the resource usage related to your ServiceMonitor and PodMonitor in the ServiceMonitors Overview dashboard in your installation’s Grafana.

Creating a ServiceMonitor

Here is an example showing how to create a ServiceMonitor. This one targets a service named my-service in the monitoring namespace, and will route alerts to team my-team. The manifests should be similar with any workload as long as you have a service that exposes the app’s metrics.

The bare minimum for a ServiceMonitor looks like this:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    ## This label is important as it is required for the Prometheus Agent to discover it.
    ## The team name should be the name of your internal team.
    application.giantswarm.io/team: my-team
    app.kubernetes.io/instance: my-service
  name: my-service
  namespace: monitoring
spec:
  endpoints:
  - interval: 60s   # Scrape the target every 60s
    path: /metrics  # Path that exposes the metrics
    port: web       # Named port on the service that exposes the metrics
    relabelings: [] # Any potential metric transformation that you want to apply to your metrics.
  selector:         # Label selector that matches your service
    matchLabels:
      app.kubernetes.io/instance: my-service

No matter if you are using Helm Charts or GitOps and Kustomize, just put the ServiceMonitor CR next to your app and apply it in the same way. Once it’s applied you can check either the ServiceMonitors Overview or ServiceMonitors Details dashboards, or just search for the new metrics ingested in your installation’s Grafana to make sure that your containers are being scraped by the new monitoring agents.

Warning: The ServiceMonitor needs to be labeled with application.giantswarm.io/team: <YOUR-TEAM-NAME> for the Prometheus Agent to be able to discover it and start collecting metrics.

ServiceMonitor vs. PodMonitor

In most cases, a ServiceMonitor should cover most of your monitoring use cases but it can happen on rare occasions that a container doesn’t need a service to run and it doesn’t make sense to create one just for the sake of monitoring it. That’s when the PodMonitor comes into action. You can find a few other examples where PodMonitor makes sense in this discussion in the Prometheus Operator Project.


This part of our documentation refers to our vintage product. The content may be not valid anymore for our current product. Please check our new documentation hub for the latest state of our docs.