Highlights

  • Highlights for the week ending 2024-12-12

    Observability

    • Grafana App v2.18.0

      • Improved security by blocking default access to certain endpoints (/swagger, /metrics, and /api/health).
      • Upgraded Grafana for a better user experience and new features (now at version 8.6.0).
    • Prometheus Rules v4.30.0

      • New alert added to help identify issues with KubeadmConfig configurations.
      • Reduced unnecessary alerts during tests by ignoring certain HelmReleases.
      • Added new alerts to quickly detect and resolve karpenter issues.
      • Expanded alert timing for PromtailRequestsErrors to reduce false positives (now 25 minutes).
    • Observability Operator v0.10.0

      • Integrated Mimir, Alertmanager for enhanced alerting.
      • Enhanced multi-tenant support within Grafana organizations.
      • Fixed an issue that prevented Grafana from starting by ensuring config persistence.

    Fleet Management

    • Kube Downscaler App v0.4.0
      • Introduced a new Cilium network policy template for improved network management.

    Security

    • Kyverno Policies v0.21.1

      • Enhanced visibility with the addition of the application.giantswarm.io/team label.
    • Event Exporter App v2.0.0

      • Transitioned to a new, supported image source for better stability and support.
  • Highlights for the week ending 2024-11-28

    Observability

    • logging-operator version 0.16.0

      • Introducing enhanced visibility with Kubernetes events logging in Alloy, allowing for better monitoring of your environments.
      • Improved security by adding support for Private Certificate Authorities (CAs) in Alloy logs.
      • More control with the new events-logger option, enabling tailored logging settings through the observability-bundle ConfigMap.
      • Simplified management with the Grafana-Agent configuration now templated, and user privacy respected by disabling usage data reporting.
      • Enhanced reliability with improved logging configuration tests.
    • prometheus-rules version 4.27.0

      • Get precise alerts with new rules distinguishing between production and non-production clusters, ensuring focused monitoring.
      • Expanded monitoring capabilities with new alerts for critical components like cloud-provider-controller, cilium, coredns, and vertical-pod-autoscaler-crd.
      • Improved alerting for system reliability, including Shield pod restarts and Mimir ruler failures.
      • Enhanced usability with fixes to dashboard links and more robust template testing.

    Continuous deployment

    • app-exporter version 1.0.0
      • Gain deeper insights with a new cluster_id field in app metrics, enhancing traceability by linking metrics to specific clusters.
      • Aligned with the latest standards by removing outdated Kubernetes support, simplifying your configuration for modern environments.
      • Streamline your deployment processes with updated pipeline tools and simplified Helm values.
      • Ensure consistency and reliability in your deployments with updated label values and the removal of unnecessary dependencies.

    Docs

    We’re excited to announce the launch of our new Docs Hub, featuring up-to-date documentation for the Giant Swarm platform, now fully ready for Cluster API. While we’ve preserved our vintage documentation in a dedicated folder for reference, all content in the general documentation section is now accurate and current. We are committed to expanding and enhancing our documentation further in the coming weeks, and we warmly welcome all customer feedback to help us improve and complete our resources.

  • Highlights for the week ending 2024-11-21

    Observability

    • Alloy app version 0.7.0

      • We’ve upgraded the Alloy base chart to version0.10.0, bringing Alloy itself to version 1.5.0. This update includes the latest features and improvements for enhanced performance and stability.
    • Alloy gateway app version 0.2.0

      • The Alloy gateway app now uses Alloy version 1.5.0, which includes important fixes for clustering with Cilium Network Policies.
    • Fluent log-shipping app version 5.3.1

      • We’ve fixed an issue with the fluent-bit image by adding the missing auditd libraries, enabling the use of ausearch for more comprehensive auditing capabilities.
    • Logging operator version 0.15.2

      • This update ensures compatibility with the latest Alloy logs by supporting the new secret mechanism, designed to work with Alloy 0.4.0 and the observability bundle 1.6.0. Additionally, if your deployment supports it, Vertical Pod Autoscaling (VPA) will be enabled for Alloy.
    • Dashboards version 3.26.1

      • We’ve added a new “Mimir / Continuous Test” dashboard and improved the “Management Cluster Overview” dashboard for better monitoring insights.
    • Observability bundle version 1.9.0

      • The latest bundle now includes Alloy version 1.5.0 with new event logging capabilities. We’ve also upgraded various components: alloy-logs and alloy-metrics to version 0.7.0, kube-prometheus-stack to 66.2.1, and other key monitoring tools like Prometheus, Grafana, and kube-state-metrics to their latest versions for enhanced observability.
    • Observability operator version 0.9.0

      • This release introduces new features for managing Grafana organizations, including their creation and configuration. It also addresses installation issues with the latest Alloy Metrics release and improves test reliability by updating Python dependencies and configuring required secrets.
    • Kube Prometheus stack app version 13.0.1

      • We’ve updated our chart dependencies to the latest versions, including kube-prometheus-stack 66.2.1 and Prometheus Operator 0.78.1, along with Grafana’s upgrade to 8.6.0. These updates bring improved performance and new features to your monitoring stack.

    Connectivity

    Continuous deployment

    • external-secrets version 0.11.1 * Update image version to v0.10.5 to resolve issue with failing to auth to kubernetes using client certs
    • zot version 2.0.1 * Fixed duplicate entry in ServiceMonitor resources

    Security

  • Highlights for the week ending 2024-10-31

    General

    • security-bundle version 1.9.0 introduces breaking changes. When upgrading to this version with Falco enabled, the Falco App may fail to upgrade due to a breaking change in the upstream chart. To complete the upgrade seamlessly, disable and then re-enable the Falco App by setting apps.falco.enabled=[false|true] in the security-bundle user values ConfigMap.

    Observability

    • dashboards version 3.26.0

      • Introduced “Loki - Slow Queries” dashboard for enhanced query performance insights.
      • Transferred ownership from BigMac to Shield for better team alignment.
      • Resynced alloy, loki, and mimir mixins from upstream to ensure feature parity.
    • logging-operator version 0.14.0

      • Default logging agent switched to Alloy, replacing Promtail for improved performance.
    • kube-prometheus-stack-app version 12.0.0

      • Updated chart dependency to kube-prometheus-stack-65.1.1.
      • Upgraded prometheus-operator from 0.75.0 to 0.77.1.
      • Prometheus upgraded from 2.53.0 to 2.54.1.
      • Grafana upgraded from 8.2.0 to 8.5.0.
      • Thanos ruler upgraded from 0.35.1 to 0.36.1.
      • Prometheus-node-exporter upgraded from 1.8.1 to 1.8.2.
      • Removed legacy in-house SLO framework to streamline integrations.
    • prometheus-operator-crd version 12.0.0

      • Upgraded CRDs chart from 13.0.2 (prometheus-operator 0.75.2) to 15.0.0 (prometheus-operator 0.77.1). See upstream changelog for more details.
    • prometheus-meta-operator version 4.81.0

      • Created new monitoring-agent inhibitions based on existing prometheus-agent configurations for tool-agnostic monitoring.
      • Added customer label to OpsGenie alerts to enhance alert specificity.
    • prometheus-rules version 4.23.0

      • Renamed all prometheus-agent related inhibitions to monitoring-agent inhibitions for clarity.
      • Standardized inhibition alert naming: InhibitionPrometheusAgentFailing and InhibitionPrometheusAgentShardsMissing.
      • Corrected statefulset.rules naming to avoid overwriting deployment.rules.
      • Adjusted KubeletVolumeSpaceTooLow alert threshold to only trigger when space is critically low, relying on node-problem-detector otherwise.
      • Updated aggregation:giantswarm:cluster_release_version expression to include Cluster API clusters.
      • Updated InhibitionControlPlaneUnhealthy for all Cluster API clusters, not just MCs.
      • Added alert for StatefulsetNotSatisfiedAtlas.
      • Updated alloy-app to 0.6.1, including an upgrade to upstream version 1.4.2 and a CiliumNetworkPolicy fix for clustering.
    • oauth2-proxy-app version 3.0.2

      • Implemented NetworkPolicy to allow traffic to oauth2-proxy.
      • Removed cert-manager ingress annotations to resolve ingress validation issues.
    • observability-bundle version 1.8.0

      • Upgraded prometheus-agent from v0.6.9 to v0.7.0.
      • Added extraArgs to enable features like WAL truncation.
      • Upgraded kube-prometheus-stack from 61.0.0 to 65.1.1.
      • Updated prometheus-operator CRDs from 0.73.0 to 0.75.0.
      • Prometheus-operator upgraded from 0.75.0 to 0.77.1.
      • Prometheus upgraded from 2.53.0 to 2.54.1.
      • Grafana upgraded from 8.2.0 to 8.5.0.
      • Thanos ruler upgraded from 0.35.1 to 0.36.1.
      • Prometheus-node-exporter upgraded from 1.8.1 to 1.8.2.
      • Added missing depends on annotations for alloy-metrics and alloy-logs to ensure correct deployment order.

    Security

    • kyverno-policies-connectivity version 0.6.1

      • Added /tmp emptyDir volume to workload cluster IP Job.
    • falco-app version 0.9.1

      • Introduced feature gates for enabling/disabling individual Falco components.
    • starboard-exporter version 0.8.0

      • Added Vertical Pod Autoscaler (VPA) configuration, enabled by default for optimized resource usage.
      • Disabled logger development mode to enhance stability.
      • Disabled PodSecurityPolicy by default.
      • Exposed port 8081 for health/liveness probes.
    • trivy-app version 0.13.0

      • Updated Trivy to upstream version v0.56.1 for enhanced security scanning.
      • Disabled PSPs.
    • trivy-operator-app version 0.10.2

      • Aligned Trivy versions between Trivy operator and the upstream project to v0.56.1.
    • security-bundle version 1.9.0

      • Updated kyverno (app) to v0.18.1.
      • Updated kyverno-crds (app) to v1.12.0.
      • Updated kyverno-policies (app) to v0.21.0.
      • Updated starboard-exporter (app) to v0.8.0.
      • Updated trivy-operator (app) to v0.10.2.
      • Updated trivy (app) to v0.13.0.
      • Updated falco (app) to v0.9.1.

    Connectivity

    • dns-operator-route53 version 0.10.0
      • Added optional --role-arn flag to specify the role ARN to assume when interacting with Route53.

    Fleet management

    • app-admission-controller version 0.26.2

      • Extended the /healthz endpoint to verify certificate validity and allow Kubernetes liveness probes to manage restarts if errors occur.
    • app-operator version 6.11.2

      • Updated dependencies to ensure compatibility and security.
  • Highlights for the week ending 2024-10-10

    Observability

    • alloy-gateway-app version 0.1.0

      • Deploy an extra instance of Grafana Alloy acting as an observability gateway to be able to ingest your logs from outside the clusters in the Giant Swarm managed Loki that is deployed on your management cluster.
    • alloy-app version 0.6.1

      • Upgraded alloy container image to version 1.4.2.
      • Upgraded upstream chart from 0.7.0 to 0.9.1 - see changelog for more information.
      • Fix ciliumnetworkpolicy to allow clustering mode.
        • Bump Chart appVersion to v1.4.2.
        • Fix circleci config.
        • Add PodLogs as helm chart template.
        • Upgrade Alloy upstream chart from 0.7.0 to 0.9.1.
        • This bumps the version of Alloy from 1.3.1 to 1.4.2.
        • Some debug metrics for components have changed.
        • Helm chart changes, see Alloy Helm chart v0.9.0 CHANGELOG
        • Fix CiliumNetworkPolicy to allow outgoing traffic to other nodes when running Alloy in clustering mode
    • grafana-app version 2.16.3

      • Upgraded grafana container image from 11.1.3 to 11.2.1.
      • Upgraded upstream chart from 8.3.4 to 8.5.2
      • Fix CI jobs generating new releases.
      • Upgrade grafana chart: 8.3.4 => 8.5.2
      • Upgrade grafana: 11.1.3 => 11.2.1
    • loki-app version 0.25.2

      • Upgraded upstream chart from 6.12.0 to 6.16.0 - see changelog for more information.
    • observability-bundle version 1.7.0

      • Upgrade to Alloy v1.4.2, which fixes a bug with component reload/evaluation and keeps the component in the latest upstream version.
      • Fixes an issue with CiliumNetworkPolicy preventing Alloy to run in clustering mode
    • observability-operator version 0.6.1

      • Fix CI jobs generating new releases
    • oauth2-proxy-app version 2.14.0

      • Add new configuration flags needed to let JWT token through.
      • Upgrade oauth2-proxy container image tag to v7.7.0
      • Removes oauth2-proxy from non control plane related catalogs.
    • prometheus-rules version 4.18.0

      • Add alerting rule for Loki missing logs at ingestion

    Security

    • kyverno-app version 0.18.1
      • Update Kyverno to upstream version v1.12.6.
      • Update kyverno-policy-reporter to upstream version v2.20.2.

    Fleet management

    • app-operator version 6.11.1

      • Retain a list of finalizers of Chart CR when updating it.
      • Update PolicyExceptions to v2 and failover to v2beta1.
    • zot version 2.0.0

      • Update zot to the latest version v2.1.1.
      • Update all dependencies.
  • Highlights for the week ending 2024-09-26

    Observability

    • dashboardsversion 3.24.0

      • Updated Alertmanager dashboard to show related logs.
      • Add Loki mixins dashboards update script.
      • Update Mimir mixins dashboards via script.
      • Fix Alloy mixin tags.
    • alloy-app version 0.5.2 introduces the following changes:

      • Add a helm chart templating test to the ci pipeline.
      • Add tests with ats in the CI pipeline.
      • Push alloy as a gateway component in collections.
    • kyverno-policies-observability version 0.5.0

      • Remove the policy for ServiceMonitor and PodMonitor relabelling schemas as we no longer need the enforcement.
    • fluent-logshipping-app version 5.2.2

      • Fix the Nginx Parser based on the upstream parser.
    • logging-operator version 0.12.1

      • Fix usage of structured metadata for clusters before v20.
      • Move high cardinality values into structured metadata.
      • Add Kubernetes audit log resource label, filename label, and output stream label.
      • Rename the node_name label into node to match the metric label.
    • loki-app version 0.24.0

      • Add “manual e2e” testing procedure.
      • Add PR message template referring to the manual testing procedure.
    • observability-bundle version 1.6.2:

      • Fixed alloyMetrics catalog
    • observability-operator version 0.6.0:

      • Require observability-bundle >= 1.6.2 for Alloy monitoring agent support; this is due to the incorrect alloyMetrics catalogue in observability-bundle
      • Fix invalid Alloy config due to missing comma on external labels
      • Disable logger development mode to avoid panicking; use zap as a logger.
      • Fix CircleCI release pipeline.
      • Add manual e2e testing procedure and script.
    • prometheus-meta-operator version 4.79.0:

      • Remove unused #alert and #alert-test-installation slack integration.
    • prometheus-rules version 4.15.2:

      • Update MimirHPAReachedMaxReplicas operation recipe link
      • Fix aggregation rule of the slo:current_burn_rate:ratio slo.
      • Remove aggregation of slo:period_error_budget_remaining:ratio` as this value can be easily computed and creates a lot of time series in Grafana Cloud
      • Add aggregations for SLO metrics to export them to the Grafana cloud
      • Add MimirHPAReachedMaxReplicas alert to detect when Mimir’s HPAs have reached maximum capacity.
      • Added dashboards to several Mimir alerts
      • Change IRSAACMCertificateExpiringInLessThan60Days to IRSAACMCertificateExpiringInLessThan45Days. The ACM certificate is renewed 60 days before expiration, and the alert can fire prematurely.
    • tekton-dashboard-loki-proxy version 0.4.0:

      • Change app.giantswarm.io/* labels to application.giantswarm.io/
      • Update Golang to v1.23.1

    Cluster management

    • aws-pod-identity-webhook version 1.17.0:

      • Fix VPA being ineffective due to referring to a non-existing Deployment name
    • aws-crossplane-cluster-config-operator version 0.3.0

      • Configure the Crossplane ProviderConfig to use the CAPA controller role directly without going through a middleman. For this to work, the CAPA controller must have the correct trust policy granting access to the Crossplane provider’s service account.
      • Write a value oidcDomains to the config map containing all service account issuer domains, as defined by the new aws.giantswarm.io/irsa-trust-domains annotation on the AWSCluster. The primary domain is still written to value oidcDomain.
    • cluster version 1.4.1

      • Remove deprecation message for customNodeLabels and customNodeTaints, because they are not deprecated.
      • Allow configuring kube-controller-manager --node-cidr-mask-size flag.
      • Chart: Support multiple service account issuers.\ Change providerIntegration.controlPlane.kubeadmConfig.clusterConfiguration.apiServer.serviceAccountIssuer to plural providerIntegration.controlPlane.kubeadmConfig.clusterConfiguration.apiServer.serviceAccountIssuers and render them in the specified order as --service-account-issuer parameters for the API server.
      • Only add the customNodeLabels value to the kubelet node-labels argument in the KubeadmConfig when customNodeLabels is defined.

    Connectivity

    Security

    • kyverno-policies-dx version 0.5.1

      • Use Enforce and Audit validationFailureAction.
    • kyverno-policies-ux version 0.7.3

      • cluster-names now targets Cluster by GVK
      • Use Enforce validationFailureAction.
    • kyverno-app version 0.18.0

      • Update Kyverno to the upstream version v1.12.5.
    • kyverno-crds version 1.12.0

      • Update Kyverno CRDs to Kyverno v1.12.
    • kyverno-policies version 0.21.0

      • Update to upstream Kyverno Policies version 1.12.5.
      • Don’t push to vsphere-app-collection, capz-app-collection, capa-app-collection or cloud-director-app-collection. We started to consume kyverno-policies from security-bundle.
  • Highlights for the week ending 2024-07-25

    Observability

    • alloy-app version 0.3.0 introduces the following changes: - Add kyverno policy exception for run as non root - Upgrade alloy upstream chart from 0.4.0 to 0.5.1 - This bumps the version of alloy from 1.2.0 to 1.2.1
    • logging-operator version 0.7.0 adds support for Alloy as logging agent. It adds --logging-agent flag too, to toggle between Promtail and Alloy.
    • loki-app version 0.21.0 upgrades upstream chart from 6.6.4 to 6.7.1 - see changelog for more information. The loki version goes from 3.0.0 to 3.1.0.
    • object-storage-operator version 0.8.0 introduces the following changes: - ReclaimPolicy added in the Bucket CR to manage the data clean up (retain or delete). - Add a finalizer on the Azure secret to prevent its deletion. - Empty all the objects in the S3 bucket in case of bucket deletion.
    • observability-bundle version 1.5.1 upgrades prometheus-operator-crd to 11.0.1. In addition version 1.5.0 introduces the following changes: - Add alloy v0.3.0 as alloy-logs - prometheus-operator will not check promql syntax for prometheusRules that are labelled application.giantswarm.io/prometheus-rule-kind: loki
    • observability-operator version 0.3.0 deletes monitoring resources if monitoring is disabled at the installation or cluster level using the giantswarm.io/monitoring label.
    • prometheus-operator-crd version 11.0.1 adds helm.sh/resource-policy: keep annotation to all CRDs to avoid deletion during Helm operations.
    • prometheus-rules version 4.8.0 moves alloy to monitoring namespace. The version 4.7.0 introduces the following changes: - Support for loki rules to management clusters in alloy config - grafana datasource for MC loki ruler - Make dns-operator-azure capz only. - Fix PromtailDown alert to fire only when the node is ready.
    • kube-downscaler-app version 0.3.0 pushes kube-downscaler app to all collections, and version 0.2.0 adds enabled field in values to disable whole chart if needed.

    Authentication and Authorization

    • dex-app version 1.42.11 brings the following changes: - Default ingress.tls.clusterIssuer values to letsencrypt-giantswarm - Update cert-manager.io/cluster-issuer annotation to use default.
    • teleport-kube-agent-app version 0.9.2 introduces podAntiAffinity so teleport-kube-agent pods run on different control-plane nodes also increases the number of replicas to 3 to maintain better high availability.

    Connectivity

    • k8s-dns-node-cache-app version v2.8.1 fixes an issue with app-exporter metrics that were happening on Cluster API installation by removing provider specific restrictions. Now the all app-exporter metrics are available on all providers.

    Security

    • kyverno-policies-connectivity version 0.6.0 introduces the following changes: - Update kubectl container image to version v1.26.0 for WorkloadCluster Ip Job - Increase pod and container SecurityContext settings for WorkloadCluster Ip Job - Execute kubectl apply with --server-side=true --field-manager='kubectl-client-side-apply' --force-conflicts flags in WorkloadCluster Ip Job - Remove unused tests under helm directory.
    • security-bundle version 1.8.0 introduces the following changes: - Add kyverno-crds app to handle Kyverno CRD install. - Update kyverno (app) to v0.17.15. This version disables the CRD install job in favor of kyverno-crds App.
    • kyverno-app version 0.17.15 brings the following changes: - Set VPA max 6 CPU / 24Gi memory and adjust default requests/limits for reports-controller. - Set VPA max 4 CPU / 8Gi memory and adjust default requests/limits for background-controller. - Set starting CPU limit of request+25% for cleanup-controller. - Disable Kyverno CRDs install Job in favor of kyverno-crds App.
    • kyverno-crds version 1.11.1 removes unpopulated labels and fixes the team label.

    Cluster management

    Docs

  • Highlights for the week ending Feb 15 2024

    Apps

    • dex-k8s-authenticator component is now deprecated and disabled by default due to the upstream project no longer being maintained. We advise switching to use kubectl gs login for access. Please reach out if you need any support regarding the access mechanism.
    • external-dns-app version v.3.1.0 removes the default namespace filter configuration. This was an relict from times where nginx-ingress was bound the the kube-system namespace and now got lifted.
    • flux-app version v1.3.1 corrects installation issues from the v1.2.0 release where in certain scenarios controllers were unable to start due to PSPs still being available on the clusters. This version of the app also improves monitoring of the flux controllers. Customers who are using the v1.2.0 release should upgrade to this new version at the earliest convenience. Please reach out if you need any support regarding the upgrade.

    Docs

  • Highlights for the week ending Feb 01 2024

    Apps

    • flux-app version v1.2.0 With this update we are introducing 2 changes. The first change is the update to flux version v2.1.2 Please see the upstream release notes - the changes include overall improvements without breaking changes. Besides the update to flux version v2.1.2 we are also dropping all PSPs from the install.yaml in favor of PSS, additionally we updated all security policies to satisfy the kyverno checks.

    Docs

  • Highlights for the week ending Dec 21 2023

    Observability

    • Logging for workload clusters is now enabled by default
      • You can access those logs via your installation’s Grafana
      • Logs are available for
        • All CAPA workload clusters
        • AWS workload clusters from 19.3.0 onwards
      • Available logs:
        • Pod logs from giantswarm and kube-system namespaces
        • Kubernetes API server audit logs
        • Systemd unit logs
      • Documentation: https://handbook.giantswarm.io/docs/observability/loki-usage/

This part of our documentation refers to our vintage product. The content may be not valid anymore for our current product. Please check our new documentation hub for the latest state of our docs.