Observability

  • Changed

    • Updated “In-cluster container registry (Zot)” dashboard to use metric kubelet_volume_stats_used_bytes for storage used.
  • Changed

    • Upgrade mimir chart to 5.3.0 and mimir to 2.12.0
  • Changed

    • Move node-problem-detector to be aws only.

    Fixed

    • Fix Grafana Cloud service-level dashboard in case we have duplicate clusrer names in different installations.
    • Invalid datasource variable name in mimir cost estimate dashboard.
  • Fixed

    • Fix Mimir / writes resources disk usage related graphs.
  • Changed

    • Improved many details on the dashboard “In-cluster container registry (Zot)”.
    • Change net-exporter dashboard ownership from turtles to cabbage.
    • Change cluster-total.json dashboard ownership from turtles to cabbage.
    • Change namespace-by-pod.json dashboard ownership from turtles to cabbage.
    • Change namespace-by-workload.json dashboard ownership from turtles to cabbage.
    • Change pod-total.json dashboard ownership from turtles to cabbage.
    • Change workload-total.json dashboard ownership from turtles to cabbage.
    • Update “Ingress NGINX Controller Connection Distribution” dashboard file to schema version 39.
    • Update “Giant Swarm / Kubernetes Persistent Volumes” dashboard file to replace old graph panels with new time series panels.
    • Update “Security: Falco Dashboard” dashboard file to replace old graph panel with new time series panel, old table with new table panel.

    Fixed

    • Fix Mimir / Reads resources Disk usage graphs.

    Removed

    • Remove “Microstorage” dashboard.
  • Added

    • Add a CAPA aggregated error logs dashboard.
  • Fixed

    • Fixes “All Dex requests” panel showing “No data” by increasing query interval to 2m.
  • Changed

    • Remove app and namespace labels from the prometheus - remotewrite’s nginx graphs.

    Fixed

    • Fix cpu throttling panel in prometheus dashboard
  • Fixed

    • Fix and update Flux Control Plane dashboard in various ways.

    Changed

    • The private Zot dashboard is updated because of a namespace change, and some minor fixes are applied.
  • Changed

    • Updated the coredns cilium networpolicy to allow egress traffic to k8s-dns-node-cache pods.