Workload Cluster Releases for CAPA

  • This release introduces several changes that are required for Vintage to CAPA migration use-cases.

    Most notable change is that now auditd is disabled by default. If you actively use this feature, please add the following field global.components.auditd.enabled set to true in the Cluster App user values before the upgrade.

    Changes compared to v28.1.1

    Components

    • cluster-aws from v1.3.0 to v1.3.2

    cluster-aws v1.3.0…v1.3.2

    Added

    • Chart: Add global.connectivity.network.pods.nodeCidrMaskSize to schema.
    • Chart: Allow to enable auditd through global.components.auditd.enabled.
    • Chart: Support multiple service account issuers.

    Changed

    • Chart: Update cluster to v1.0.1.
      • Allow to enable auditd service through global.components.auditd.enabled.
      • Support multiple service account issuers.
      • Allow configuring kube-controller-manager --node-cidr-mask-size flag.
  • This release fixes an issue where certain apps installed during or before v25 will break due to API removals when upgrading to v29.

    Changes compared to v28.1.0

    Apps

    • security-bundle from v1.7.0 to v1.7.2

    security-bundle v1.7.0…v1.7.2

    Changed

    • Update trivy-operator (app) to v0.9.1.
    • Update kyverno (app) to v0.17.14.
    • Update starboard-exporter (app) to v0.7.11.
  • Changes compared to v29.0.0

    Components

    Apps

    • cert-exporter from v2.9.1 to v2.9.2
    • node-exporter from v1.19.0 to v1.20.0
    • observability-bundle from v1.5.2 to v1.6.1
    • security-bundle from v1.8.0 to v1.8.1

    cert-exporter v2.9.1…v2.9.2

    Added

    • Chart: Add VPA and resources configuration for deployment and daemonset. (#382)

    node-exporter v1.19.0…v1.20.0

    Changed

    • Synced with upstream chart v4.38.0 (node-exporter 1.8.2).

    observability-bundle v1.5.2…v1.6.1

    Added

    • Add alloy v0.4.0 as alloyMetrics.

    Changed

    • Disable usage reporting to GrafanaLabs by:
      • Bumping alloyLogs and alloyMetrics to v0.4.1.
      • Bumping grafanaAgent to v0.4.6.
    • Bump alloyLogs to v0.4.0.
    • Rename alloy-logs app to camel case alloyLogs.

    security-bundle v1.8.0…v1.8.1

    Changed

    • Update trivy-operator (app) to v0.9.1.
  • This release updates the apps and components, keeping them up to date with the latest v25 and v26 releases. It also brings improvements for the container registry usage.

    Change details compared to CAPA 27.0.0

    cluster-aws 1.3.0

    Changed

    • All workload clusters will by default use Zot registry as a pull-through cache of Azure Container Registry.

    cert-manager 3.7.9

    Fix

    • Remove quotes from acme-http01-solver-image argument. The quotes are used when looking up the image which causes an error.

    Update

    • Improves container security by setting runAsGroup and runAsUser greater than zero for all deployments.

    containerlinux 3815.2.5

    Changes since Stable 3815.2.4

    Security fixes:

    Updates:

    cilium 0.25.1

    Changed

    • Fix regression setting Policy BPF Max map policyMapMax back to 65536 from 16384.
    • Upgrade cilium to v1.15.6.
  • This release updates cluster-aws Helm chart, which brings improvements for container registry usage.

    Change details compared to CAPA 25.1.0

    cluster-aws 1.3.0

    Changed

    • All workload clusters will by default use Zot registry as a pull-through cache of Azure Container Registry.
  • This release updates the apps and components, keeping them up to date with the latest v25 release. It also brings improvements for the container registry usage.

    Change details compared to CAPA 26.0.0

    cluster-aws 1.3.0

    Changed

    • All workload clusters will by default use Zot registry as a pull-through cache of Azure Container Registry.

    cert-manager 3.7.9

    Fix

    • Remove quotes from acme-http01-solver-image argument. The quotes are used when looking up the image which causes an error.

    Update

    • Improves container security by setting runAsGroup and runAsUser greater than zero for all deployments.

    containerlinux 3815.2.5

    Changes since Stable 3815.2.4

    Security fixes:

    Updates:

    cilium 0.25.1

    Changed

    • Fix regression setting Policy BPF Max map policyMapMax back to 65536 from 16384.
    • Upgrade cilium to v1.15.6.
  • This release updates the components, keeping them up to date with Vintage AWS v20.1.x series. Several improvements for Vintage to CAPA migration have also been included.

    Change details compared to CAPA 25.0.0

    cluster-aws 1.1.0

    Fixed

    • Fixed China IRSA suffix

    Added

    • Add the Management Cluster name as a tag to the AWS resources created by CAPA.
    • Add the node pool name as a tag to the AWS resources associated with the node pool.

    Changed

    • Update cluster chart to 0.35.0

    cert-manager 3.7.9

    Fix

    • Remove quotes from acme-http01-solver-image argument. The quotes are used when looking up the image which causes an error.

    Update

    • Improves container security by setting runAsGroup and runAsUser greater than zero for all deployments.

    containerlinux 3815.2.5

    Changes since Stable 3815.2.4

    Security fixes:

    Updates:

    cilium 0.25.1

    Changed

    • Fix regression setting Policy BPF Max map policyMapMax back to 65536 from 16384.
    • Upgrade cilium to v1.15.6.
  • We are happy to announce our first Cluster API for AWS (CAPA) release v25.

    This is the first Giant Swarm supported CAPA release. It is available on CAPA Management Clusters and will be used as a first release to be upgraded to from Vintage workload clusters.

    CAPA benefits

    Each existing customer using the Giant Swarm Vintage AWS product has been given a presentation about CAPA benefits. We have gathered the most crucial high-level advantages in the list below:

    • Easier cluster management, complete production ready clusters packaged as a Helm chart, more GitOps oriented approach
    • Better visibility of subresources, clear API
    • More transparent upgrades
    • Improved cluster configuration validation
    • Management clusters capable of deploying clusters in different regions
    • Flexible network configuration
    • Automatic certificate renewal in place for worker nodes
    • Exposing more features based on the upstream implementations and contributions
    • Karpenter solution integrated into workload clusters out of the box
    • More configurable registries with credentials and mirrors

    Important changes

    Besides the benefits listed above, we have also presented changes that are introduced with CAPA. Here is the summary of most important points:

    • Flat DNS structure - Bringing more flexibility into workload cluster management across regions. The flat DNS structure will not include the Management Cluster name anymore, but more generic DNS domain prefixed with cluster ID. This provides advantage where the Workload Cluster is no longer tied to the region nor Management Cluster, extending possibilities for any future migrations. Old and new DNS are both available during and after migration, while new DNS needs to be used after migration. We will synchronize with customers for the old DNS phase out after the migrations per management cluster.
    • Only public DNS zone is supported - Private hosted zones are no longer created. However, both zones (public and private) will be available during and after the migration. We will eventually work together to remove the private zones with each customer. Ingresses that depend on the cluster base domain must be updated after the migration.
    • Cluster apps moved to org- namespaces - This change will allow to simplify RBAC and enable GitOps managed clusters to be created with a pre-defined set of applications.
    • One set of controllers - On contrary to Vintage, where each release had a set of independent operators, the CAPA solution will manage all releases with a single set of controllers. This will result in fewer components running on the management clusters, hence lowering costs of operations as well as speeding up any required hotfixes.
    • GP2 volumes not supported - The majority of customers is already using gp3 volumes, as we are refreshing the infrastructure, deprecated kubernetes.io/aws-ebs provisioner creating gp2 volumes will also be removed
    • Node pool defaults are changed - With CAPA by default, if not explicitly configured, node pools will now share a single subnet. Migrated Vintage clusters will not be affected. With that change the node limit will be determined by cluster VPC size, where additional VPC CIDRs can be added. On the contrary to Vintage, CAPA also allows to configure and deploy node pools together with the cluster object by default.
    • Teleport - Giant Swarm will now utilize Teleport for Kubernetes API and direct node access. It helps to strengthen security and simplifies compliance with regulations and network topology. This service is only available to Giant Swarm engineers, while customers can obtain audit logs for any operations and logins performed by Giant Swarm support.

    Vintage to CAPA Migration

    The migration itself is a fully automated process ran by Giant Swarm engineers using a migration-cli that handles all infrastructure as well as workload migrations. The experience in the migration process itself should be the same as in an usual upgrade of the workload cluster, where the nodes rollout takes place.

    Prior to running the tool, a new Management Cluster based on the CAPA solution has to be created in order to fully make use of the CAPI lifecycle management as well as infrastructure. CAPA clusters are bringing new structure for Workload Cluster definition in terms of Custom Resources. Hence for the period of workload cluster migration, any customer automation manging the cluster such as GitOps has to be disabled. After the migration, customers will have to adopt the new structure and adjust forementioned automations.

    As Giant Swarm manages the cloud infrastructure, there are no actions needed from customers for the migration itself. We have aimed to match the Vintage features as close as possible, introducing improvements where needed.

    One of the many improvements is the deprecation of the k8s-initiator application, which allowed the customization of some parts of the Kubernetes environment, catering for customer needs. This tool however brought a lot of freedom in terms of Bash implementation that was run in the tool itself. We reviewed the use-cases that customers have implemented, exposed certain settings in CAPA and prepared a migration plan for those features as well as allowing any future customization. The most important part for each customer is to prepare the {cluster_name}-migration-configuration YAML file, representing the k8s-initiator app features used, which will then be consumed by the migration-cli and be populated to Cluster charts for future usage.

    Your Account Engineer will provide you with a detailed checklist to go over prior to migration.

    New components with CAPA

    (NEW) capi-node-labeler 0.5.0

    (NEW) aws-pod-identity-webhook 1.16.0

    (NEW) teleport-kube-agent 0.9.0

    (NEW) cluster-aws 1.0.0

    (NEW) cilium-crossplane-resources 0.1.0

    Change details compared to Vintage AWS 20.1.1

    cloud-provider-aws (formerly aws-cloud-controller-manager-app) 1.25.14-gs3

    Changed

    • Reduce minimum CPU and memory requests.

    cert-manager 3.7.6

    Added

    • Added Vertical Pod Autoscaler support for cainjector pods.

    cilium 0.24.0

    Added

    • Cilium ENI mode for CAPA becomes usable with these changes.
      • Add security group tag filter for pod network.
      • Select subnets from secondary VPC CIDRs.
    • Upgrade cilium to v1.15.4.

    cluster-autoscaler 1.27.3-gs9

    Added

    • Node Group Auto Discovery for CAPA MachinePools.
    • Add service account annotations as value.
    • Added service monitor.
    • Add configurable node.nodeSelector in values.
    • Add configurable node.caBundlePath in values.
    • Repository: Add ABS & ATS.
    • Helpers: Add fullname template.
    • Helpers: Add application.giantswarm.io/team label.
    • Deployment: Tolerate control-plane.
    • Add possibility to use egress proxy.

    Changed

    • Update cluster-autoscaler to version 1.27.3.
    • Change ScaleDownUtilizationThreshold default from 0.5 to 0.7
    • Replace condition for PSP CR installation.
    • Configure gsoci.azurecr.io as the default container image registry.
    • Change helm value managementCluster from object to string.
    • Chart: Make PSS compliant.
    • Chart: Respect .Release.Name & .Release.Namespace.
    • Values: Move balancingIgnoreLabels to configmap.
    • Values: Rename clusterAutoscalerResources to resources.
    • Values: Rename verticalPodAutoscaler to autoscaling.
    • Helpers: Rename templates.
    • Helpers: Keep -app suffix in name template.
    • Deployment: Improve nodeSelector compatibility.
    • Improve proxy settings
    • Reduce minimum CPU and memory requests.

    Removed

    • Values: Remove port & serviceType.
    • Helpers: Remove k8s-addon label.
    • RBAC: Remove PSP role & binding.
    • Deployment: Remove replicas.
    • Deployment: Remove imagePullPolicy.
    • Deployment: Remove aws-use-static-instance-list workaround.
    • Policy Exception: Remove -exceptions suffix.
    • Service: Replace by pod monitor.

    k8s-dns-node-cache 2.6.2

    Changed

    • Reduce CPU requests

    net-exporter 1.19.0

    Added

    • Add /blackbox endpoint.

    security-bundle 1.7.0

    Added

    • Add cloudnative-pg, edgedb, and reports-server apps (disabled).

    Changed

    • Update starboard-exporter (app) to v0.7.10.
    • Update kyverno (app) to v0.17.13.
    • Update trivy (app) to v0.12.0.
    • Update trivy-operator (app) to v0.9.0.
    • Update cloudnative-pg (app) to v0.0.5.

    vertical-pod-autoscaler 5.2.2

    Changed

    • Chart: Update Helm release vertical-pod-autoscaler to v9.8.0.
    • Chart: Update appVersion and README.md.
    • Chart: Update appVersion and README.md, VPA v1.1.2.

This part of our documentation refers to our vintage product. The content may be not valid anymore for our current product. Please check our new documentation hub for the latest state of our docs.