Ingest metrics into the Observability Platform
How to make new metrics available in the Observability Platform in self-service.
By default, all Giant Swarm clusters are equiped with Prometheus Operator and a set of Prometheus shards in agent mode. These shards are monitoring agents that collect and forward critical cluster and workload metrics to a central Grafana Mimir instance running on the management cluster.
No workload is the same, especially in the way it exposes its metrics, so the Observability Platform’s monitoring configuration needs to be flexible. That’s why it’s based on the ServiceMonitor and PodMonitor Custom Resource Definitions (CRD) provided by the Prometheus Operator.
Those allow you to:
- Define where the metrics you want to ingest are (for example the container or port)
- Transform metrics before ingesting them (for example dropping unneeded data, adding extra labels)
You can learn more about the ServiceMonitor and PodMonitor CRD by checking the Prometheus Operator API Docs.
Prerequisites
Before you start to ingest data from a running container, you need to make sure that your application is already instrumented to export metrics.
Keep in mind that ingesting new metrics into the Observability Platform comes with a cost. The resource consumption of the central Mimir is related to the amount of metrics it has to handle. This means ingesting more metrics also leads to higher resource consumption of the Observability Platform overall.
You can check the resource usage related to your ServiceMonitor and PodMonitor in the ServiceMonitors Overview dashboard in your installation’s Grafana.
Creating a ServiceMonitor
Here is an example showing how to create a ServiceMonitor.
This one targets a service named my-service
in the monitoring
namespace, and will route alerts to team my-team
. The manifests should be similar with any workload as long as you have a service that exposes the app’s metrics.
The bare minimum for a ServiceMonitor looks like this:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
## This label is important as it is required for the Prometheus Agent to discover it.
## The team name should be the name of your internal team.
application.giantswarm.io/team: my-team
app.kubernetes.io/instance: my-service
name: my-service
namespace: monitoring
spec:
endpoints:
- interval: 60s # Scrape the target every 60s
path: /metrics # Path that exposes the metrics
port: web # Named port on the service that exposes the metrics
relabelings: [] # Any potential metric transformation that you want to apply to your metrics.
selector: # Label selector that matches your service
matchLabels:
app.kubernetes.io/instance: my-service
No matter if you are using Helm Charts or GitOps and Kustomize, just put the ServiceMonitor CR next to your app and apply it in the same way. Once it’s applied you can check either the ServiceMonitors Overview or ServiceMonitors Details dashboards, or just search for the new metrics ingested in your installation’s Grafana to make sure that your containers are being scraped by the new monitoring agents.
Warning: The ServiceMonitor needs to be labeled with application.giantswarm.io/team: <YOUR-TEAM-NAME>
for the Prometheus Agent to be able to discover it and start collecting metrics.
ServiceMonitor vs. PodMonitor
In most cases, a ServiceMonitor should cover most of your monitoring use cases but it can happen on rare occasions that a container doesn’t need a service to run and it doesn’t make sense to create one just for the sake of monitoring it. That’s when the PodMonitor comes into action. You can find a few other examples where PodMonitor makes sense in this discussion in the Prometheus Operator Project.