Highlights for the week ending 2024-10-31

General

  • security-bundle version 1.9.0 introduces breaking changes. When upgrading to this version with Falco enabled, the Falco App may fail to upgrade due to a breaking change in the upstream chart. To complete the upgrade seamlessly, disable and then re-enable the Falco App by setting apps.falco.enabled=[false|true] in the security-bundle user values ConfigMap.

Observability

  • dashboards version 3.26.0

    • Introduced “Loki - Slow Queries” dashboard for enhanced query performance insights.
    • Transferred ownership from BigMac to Shield for better team alignment.
    • Resynced alloy, loki, and mimir mixins from upstream to ensure feature parity.
  • logging-operator version 0.14.0

    • Default logging agent switched to Alloy, replacing Promtail for improved performance.
  • kube-prometheus-stack-app version 12.0.0

    • Updated chart dependency to kube-prometheus-stack-65.1.1.
    • Upgraded prometheus-operator from 0.75.0 to 0.77.1.
    • Prometheus upgraded from 2.53.0 to 2.54.1.
    • Grafana upgraded from 8.2.0 to 8.5.0.
    • Thanos ruler upgraded from 0.35.1 to 0.36.1.
    • Prometheus-node-exporter upgraded from 1.8.1 to 1.8.2.
    • Removed legacy in-house SLO framework to streamline integrations.
  • prometheus-operator-crd version 12.0.0

    • Upgraded CRDs chart from 13.0.2 (prometheus-operator 0.75.2) to 15.0.0 (prometheus-operator 0.77.1). See upstream changelog for more details.
  • prometheus-meta-operator version 4.81.0

    • Created new monitoring-agent inhibitions based on existing prometheus-agent configurations for tool-agnostic monitoring.
    • Added customer label to OpsGenie alerts to enhance alert specificity.
  • prometheus-rules version 4.23.0

    • Renamed all prometheus-agent related inhibitions to monitoring-agent inhibitions for clarity.
    • Standardized inhibition alert naming: InhibitionPrometheusAgentFailing and InhibitionPrometheusAgentShardsMissing.
    • Corrected statefulset.rules naming to avoid overwriting deployment.rules.
    • Adjusted KubeletVolumeSpaceTooLow alert threshold to only trigger when space is critically low, relying on node-problem-detector otherwise.
    • Updated aggregation:giantswarm:cluster_release_version expression to include Cluster API clusters.
    • Updated InhibitionControlPlaneUnhealthy for all Cluster API clusters, not just MCs.
    • Added alert for StatefulsetNotSatisfiedAtlas.
    • Updated alloy-app to 0.6.1, including an upgrade to upstream version 1.4.2 and a CiliumNetworkPolicy fix for clustering.
  • oauth2-proxy-app version 3.0.2

    • Implemented NetworkPolicy to allow traffic to oauth2-proxy.
    • Removed cert-manager ingress annotations to resolve ingress validation issues.
  • observability-bundle version 1.8.0

    • Upgraded prometheus-agent from v0.6.9 to v0.7.0.
    • Added extraArgs to enable features like WAL truncation.
    • Upgraded kube-prometheus-stack from 61.0.0 to 65.1.1.
    • Updated prometheus-operator CRDs from 0.73.0 to 0.75.0.
    • Prometheus-operator upgraded from 0.75.0 to 0.77.1.
    • Prometheus upgraded from 2.53.0 to 2.54.1.
    • Grafana upgraded from 8.2.0 to 8.5.0.
    • Thanos ruler upgraded from 0.35.1 to 0.36.1.
    • Prometheus-node-exporter upgraded from 1.8.1 to 1.8.2.
    • Added missing depends on annotations for alloy-metrics and alloy-logs to ensure correct deployment order.

Security

  • kyverno-policies-connectivity version 0.6.1

    • Added /tmp emptyDir volume to workload cluster IP Job.
  • falco-app version 0.9.1

    • Introduced feature gates for enabling/disabling individual Falco components.
  • starboard-exporter version 0.8.0

    • Added Vertical Pod Autoscaler (VPA) configuration, enabled by default for optimized resource usage.
    • Disabled logger development mode to enhance stability.
    • Disabled PodSecurityPolicy by default.
    • Exposed port 8081 for health/liveness probes.
  • trivy-app version 0.13.0

    • Updated Trivy to upstream version v0.56.1 for enhanced security scanning.
    • Disabled PSPs.
  • trivy-operator-app version 0.10.2

    • Aligned Trivy versions between Trivy operator and the upstream project to v0.56.1.
  • security-bundle version 1.9.0

    • Updated kyverno (app) to v0.18.1.
    • Updated kyverno-crds (app) to v1.12.0.
    • Updated kyverno-policies (app) to v0.21.0.
    • Updated starboard-exporter (app) to v0.8.0.
    • Updated trivy-operator (app) to v0.10.2.
    • Updated trivy (app) to v0.13.0.
    • Updated falco (app) to v0.9.1.

Connectivity

  • dns-operator-route53 version 0.10.0
    • Added optional --role-arn flag to specify the role ARN to assume when interacting with Route53.

Fleet management

  • app-admission-controller version 0.26.2

    • Extended the /healthz endpoint to verify certificate validity and allow Kubernetes liveness probes to manage restarts if errors occur.
  • app-operator version 6.11.2

    • Updated dependencies to ensure compatibility and security.