Observability

  • Changed

    • Upgraded chart dependency to kube-prometheus-stack-58.3.0
      • kube-state-metrics from 2.10.0 to 2.12.0
      • prometheus upgraded from 2.50.1 to 2.51.2
      • prometheus-node-exporter upgraded from 1.17.0 to 1.18.0
      • prometheus-operator from 0.71.2 to 0.73.2 also adding Scrape Class support
  • Changed

    • Upgraded chart dependency to kube-prometheus-stack-58.3.0
      • kube-state-metrics from 2.10.0 to 2.12.0
      • prometheus upgraded from 2.50.1 to 2.51.2
      • prometheus-node-exporter upgraded from 1.17.0 to 1.18.0
      • prometheus-operator from 0.71.2 to 0.73.2 also adding Scrape Class support
  • Changed

    • Change “Worker node utilization” dashboard to “Node utilization”, also allowing to analyze data for control plane nodes.
  • Added

    • Add dashboard “Worker node utilization”.
  • Changed

    • Updated “In-cluster container registry (Zot)” dashboard to use metric kubelet_volume_stats_used_bytes for storage used.
  • Changed

    • Upgrade mimir chart to 5.3.0 and mimir to 2.12.0
  • Changed

    • Move node-problem-detector to be aws only.

    Fixed

    • Fix Grafana Cloud service-level dashboard in case we have duplicate clusrer names in different installations.
    • Invalid datasource variable name in mimir cost estimate dashboard.
  • Fixed

    • Fix Mimir / writes resources disk usage related graphs.
  • Changed

    • Improved many details on the dashboard “In-cluster container registry (Zot)”.
    • Change net-exporter dashboard ownership from turtles to cabbage.
    • Change cluster-total.json dashboard ownership from turtles to cabbage.
    • Change namespace-by-pod.json dashboard ownership from turtles to cabbage.
    • Change namespace-by-workload.json dashboard ownership from turtles to cabbage.
    • Change pod-total.json dashboard ownership from turtles to cabbage.
    • Change workload-total.json dashboard ownership from turtles to cabbage.
    • Update “Ingress NGINX Controller Connection Distribution” dashboard file to schema version 39.
    • Update “Giant Swarm / Kubernetes Persistent Volumes” dashboard file to replace old graph panels with new time series panels.
    • Update “Security: Falco Dashboard” dashboard file to replace old graph panel with new time series panel, old table with new table panel.

    Fixed

    • Fix Mimir / Reads resources Disk usage graphs.

    Removed

    • Remove “Microstorage” dashboard.
  • Added

    • Add a CAPA aggregated error logs dashboard.