Observability

  • Added

    • Add VerticalPodAutoscaler support for index-cache, metadata-cache, and results-cache.

    Changed

    • Change team annotation in Chart.yaml to OpenContainers format (io.giantswarm.application.team).
    • CI: test mimir disabled

    Fixed

    • Disable all mimir objects if mimir is disabled.
  • Fixed

    • Disable Loki Scaled Objects if Loki is disabled.
  • Changed

    • upgrade grafana chart: 10.3.1 => 10.5.15
    • upgrade grafana : 12.3.0 => 12.3.1
  • Changed

    • Update Network Traffic Analysis dashboards
      • Set variables to refresh on time window change
      • Change queries to user avg_over_time to have a smoothened graph over longer time ranges
      • Sort by Mean rather than Last in graps to match what’s shown on the tables
      • Use 7 days as default time window on overview dashboard
      • Add Mean column next to Last on graphs
      • Add cluster variable description
  • Changed

    • Bump kube-mixins to 1.4.1

    Removed

    • Remove the unused home dashboard
  • Added

    • Add HTTPRouteFilter support for Gateway API routes.

    Changed

    • Refactor loki-gateway HTTPRoute template to use loki’s templating for naming.
  • Added

    • Add Crossplane AWS support for automated S3 bucket provisioning.
  • Added

    • Add Crossplane support for AWS S3 bucket provisioning with the following resources:
      • S3 Buckets for mimir, ruler, and alertmanager storage
      • BucketLifecycleConfiguration for automatic object expiration
      • BucketPublicAccessBlock for security hardening
      • BucketPolicy enforcing SSL/TLS connections
      • IAM Roles with IRSA (IAM Roles for Service Accounts) support
    • Crossplane resources support dynamic AWS account ID and OIDC provider discovery from cluster CRs
    • Tags from AWSCluster CR are automatically merged with user-provided tags
    • Observe-only mode for safe migration of existing resources
  • Added

    • Add Network Traffic Analysis Overview dashboard

    Changed

    • Improve Network Traffic Analysis Dashboard Performance
      • [performance] Re-use panel queries, which gladly reduces the number of queries made to the backend, and ease maintenance
      • [performance] Change Pie Charts query types from range to instant
      • [performance] Set maximum datapoints to 500 and minimal interval to 2mn
      • [ux] Change destination pie charts to only show top 10 destinations
      • [ux] Change the top list panels to use pagination rather than only showing top 10 elements
      • [ux] Change bottom graphs to use stacked lines and added list of values + total count
      • [ux] Remove the per-namespace section which was merely a duplicate of the top one with additional namespace filter. All panels now have a namespace filter which default to all namespace, therefore keeping the old behavior of the top panels and also allowing behavior of the bottom ones at the same time.
      • [ux] Add links between both Network Traffic Analysis dashboards
      • [ux] Add Include non-namespaced toggle to filter/include non-namespaced network traffic
      • [ux] Improve documentation panel
      • [ux] Add annotations for CiliumNetworkPolicies events
      • [ux] Change Legend set to “unknown” when no value is found
      • [ux] Show percentage and all values in tooltip on destination panels
      • [maintenance] Move subnets regex to a constant
    • NGINX Ingress controller dashboard: reworked variables
      • removed app selector
      • removed namespace selector
      • added ingress namespace selector

    Removed

    • Remove logging-operator related data as it is now deprecated.
  • Changed

    • Fix HTTPRoute template.
    • Add HTTPRouteFilter.