Workload cluster releases for Azure

  • This workload cluster release contains fixes for azure-operator handling multi-tenant service principal secrets as a prerequisite for migration from VPN Gateway to VNET peering. Moreover, we have extended azure-scheduled-events app with additional Azure VMSS events handling.

    Please contact your Solution Architect in order to validate if this release is necessary for your use case.

    Change details

    azure-operator 5.5.3

    Fixed

    • Fix wrong setup of multi-account service principals.

    kubernetes 1.19.10

    API Change

    • Fixes using server-side apply with APIService resources (#100713, @kevindelgado) [SIG API Machinery, Apps, Scheduling and Testing]
    • Regenerate protobuf code to fix CVE-2021-3121 (#100515, @joelsmith) [SIG API Machinery, Auth, CLI, Cloud Provider, Cluster Lifecycle, Instrumentation, Node and Storage]

    Feature

    • Kubernetes is now built using go1.15.10 (#100520, @cpanato) [SIG Cloud Provider, Instrumentation, Release and Testing]

    Bug or Regression

    • Fixed a bug where a high churn of events was causing master instability by reducing the maximum number of objects (events) attached to a single etcd lease. (#100450, @mborsz) [SIG API Machinery and Instrumentation]
    • Fixed a race condition on API server startup ensuring previously created webhook configurations are effective before the first write request is admitted. (#95783, @roycaihw) [SIG API Machinery]
    • Fixes a data race issue in the priority and fairness API server filter (#100669, @tkashem) [SIG API Machinery]
    • Kubectl: Fixed panic when describing an ingress backend without an API Group (#100542, @eddiezane) [SIG CLI]
    • Reverts breaking change to inline AzureFile volumes in v1.19.7-v1.19.9; referenced secrets are now correctly searched for in the same namespace as the pod as in previous releases. (#100398, @andyzhangx) [SIG Cloud Provider and Storage]
    • The endpointslice mirroring controller mirrors endpoints annotations and labels to the generated endpoint slices, it also ensures that updates on any of these fields on the endpoints are mirrored. The well-known annotation endpoints.kubernetes.io/last-change-trigger-time is skipped and not mirrored. (#100443, @aojea) [SIG Apps, Network and Testing]
    • The maximum number of ports allowed in EndpointSlices has been increased from 100 to 20,000 (#99795, @robscott) [SIG Network]

    Dependencies

    Added

    Nothing has changed.

    Changed

    • github.com/gogo/protobuf: v1.3.1 → v1.3.2
    • github.com/kisielk/errcheck: v1.2.0 → v1.5.0
    • github.com/yuin/goldmark: v1.1.27 → v1.2.1
    • golang.org/x/sync: cd5d95a → 67f06af
    • golang.org/x/tools: c1934b7 → 113979e
    • golang.org/x/xerrors: 9bdfabe → 5ec99f8
    • sigs.k8s.io/structured-merge-diff/v4: v4.0.1 → v4.0.3

    Removed

    Nothing has changed.

    containerlinux 2765.2.2

    Security fixes

    Bug Fixes

    • GCE: The old interface name ens4v1 which was replaced by eth0 due to a broken udev rule was restored, but now as alternative interface name, and eth0 will stay the primary name for consistency across cloud environments. (init#38)

    Changes

    • The virtio network interfaces got predictable interface names as alternative interface names, and thus these names can also be used to match for a specific interface in case there is more than one and the eth0 and eth1 name assignment is not stable. (init#38)

    Updates

    azure-scheduled-events 0.4.0

    Added

    • React to Preempt, Reboot and Redeploy events to drain nodes properly.

    Change

    • Use the NotBefore field from the event to calculate drain timeout.
  • This is a bug fix release that involves the external-dns and chart-operator apps.

    Upgrade from 14.1.3 to 14.1.4 will only roll the apps.

    Change details

    external-dns 2.3.0

    Changed

    • Change default annotation filter to match the one we use for the nginx ingress controller.

    Added

    • Add sidecar container for provider: aws to periodically validate IAM credential acessibility (#76)

    chart-operator 2.12.0

    Changed

    • Set docker.io as the default registry
    • Pass RESTMapper to helmclient to reduce the number of REST API calls.
    • Updated Helm to v3.5.3.
  • This release improves draining of the nodes with fixes on azure-scheduled-events app. Upgrade from 14.0.1 to 14.0.2 will only roll the apps.

    Change details

    azure-scheduled-events 0.3.0

    Fixed

    • Ensure to wait long enough when draining a node before considering the node drained.

    Changed

    • Change drain timeout to 15 minutes.
  • This releases increases the Azure Events Termination timeout from 5 to 15 minutes for better upgrade experience while workloads are moved to new nodes. The draining process of the nodes has been improved as well.

    Change details

    azure-operator 5.5.2

    Changed

    • Increase VMSS termination events timeout to 15 minutes.

    Fixed

    • Avoid logging errors when trying to create the workload cluster k8s client and cluster is not ready yet.

    app-operator 3.2.1

    Security

    • Restrict ingress to only expose the status endpoint.

    azure-scheduled-events 0.3.0

    Fixed

    • Ensure to wait long enough when draining a node before considering the node drained.

    Changed

    • Change drain timeout to 15 minutes.
  • This releases increases the Azure Events Termination timeout from 5 to 15 minutes for better upgrade experience while workloads are moved to new nodes.

    Change details

    azure-operator 5.3.1

    Changed

    • Increase VMSS termination events timeout to 15 minutes.

    azure-scheduled-events 0.2.2

    Added

    • Remove the Node from Kubernetes API server right before approving the termination event.

    Fixed

    • Keep looping on events if one loop errors out.
    • Disable helm hook for new installations of the chart.
  • This is a bugfix release to resolve a few bugs related to the cluster autoscaler. We strongly suggest upgrading any 14.x workload cluster to this release to ensure the cluster autoscaler feature works properly.

    Warning: to avoid downtimes in the ingress-based workloads, before upgrading to this release it is important to ensure your cluster has a recent version (1.14.0 or newer) of the Nginx Ingress Controller APP running. Please get in touch with your Solution Engineer before upgrading if you have any concern.

    Change details

    azure-operator 5.5.1

    Fixed

    • Fix a race condition when upgrading node pools with 0 replicas.
    • Fix Upgrading condition for node pools with autoscaler enabled.

    Added

    • Add new handler that creates AzureClusterIdentity CRs and the related Secrets out of Giant Swarm’s credential secrets.
    • Ensure AzureCluster CR has the SubscriptionID field set.
    • Reference Spark CR as bootstrap reference from the MachinePool CR.
    • Ensure node pools min size is applied immediately when changed.

    azure-scheduled-events 0.2.2

    Fixed

    • Disable helm hook for new installations of the chart.
  • This is a bugfix release to resolve a few bugs related to the cluster autoscaler. We strongly suggest upgrading any 14.x workload cluster to this release to ensure the cluster autoscaler feature works properly.

    Warning: to avoid downtimes in the ingress-based workloads, before upgrading to this release it is important to ensure your cluster has a recent version (1.14.0 or newer) of the Nginx Ingress Controller APP running. Please get in touch with your Solution Engineer before upgrading if you have any concern.

    Change details

    azure-operator 5.5.0

    Added

    • Add new handler that creates AzureClusterIdentity CRs and the related Secrets out of Giant Swarm’s credential secrets.
    • Ensure AzureCluster CR has the SubscriptionID field set.
    • Reference Spark CR as bootstrap reference from the MachinePool CR.
    • Ensure node pools min size is applied immediately when changed.

    Fixed

    • Avoid blocking the whole AzureConfig handler on cluster creation because we can’t update the StorageClasses.
    • Avoid overriding the NP size when the scaling is changed by autoscaler.

    kubernetes 1.19.8

    API Change

    • Kubernetes is now built using go1.15.8 (#99093, @cpanato) [SIG Cloud Provider, Instrumentation, Release and Testing]

    Bug or Regression

    • Aggregate errors when putting vmss (#98350, @nilo19) [SIG Cloud Provider]
    • Avoid marking node as Ready until node has synced with API servers at least once (#97996, @ehashman) [SIG Node]
    • Cleanup subnet in frontend IP configs to prevent huge subnet request bodies in some scenarios. (#98288, @nilo19) [SIG Cloud Provider]
    • Fix CSI-migrated inline EBS volumes failing to mount if their volumeID is prefixed by aws:// (#96821, @wongma7) [SIG Storage]
    • Fix azure file migration issue (#97877, @andyzhangx) [SIG Auth, Cloud Provider and Storage]
    • Fix the description of command line flags that can override –config (#98873, @changshuchao) [SIG Scheduling]
    • Fix to recover CSI volumes from certain dangling attachments (#96617, @yuga711) [SIG Apps and Storage]
    • Fixed a bug that the kubelet cannot start on BtrfS. (#98015, @gjkim42) [SIG Node]
    • Fixed a bug where aggregator_unavailable_apiservice metrics were reported for deleted apiservices. (#96421, @dgrisonnet) [SIG API Machinery and Instrumentation]
    • Fixed provisioning of Cinder volumes migrated to CSI when StorageClass with AllowedTopologies was used. (#98311, @jsafrane) [SIG Storage]
    • Fixes a panic in the disruption budget controller for PDB objects with invalid selectors (#98776, @ialidzhikov) [SIG Apps]
    • Kubeadm: get k8s CI version markers from k8s infra bucket (#98836, @hasheddan) [SIG Cluster Lifecycle and Release]
    • Kubelet should ignore cgroup driver check on Windows node. (#98385, @pacoxu) [SIG Node]
    • Performance regresssion #97685 has been fixed (#98432, @tkashem) [SIG API Machinery]
    • Static pods will be deleted gracefully. (#98103, @gjkim42) [SIG Node]
    • Truncates a message if it hits the NoteLengthLimit when the scheduler records an event for the pod that indicates the pod has failed to schedule. (#98715, @carlory) [SIG Scheduling]
    • Warning about using a deprecated volume plugin is logged only once. (#96751, @jsafrane) [SIG Storage]

    Other (Cleanup or Flake)

    • Kubeadm: change the default image repository for CI images from ‘gcr.io/kubernetes-ci-images’ to ‘gcr.io/k8s-staging-ci-images’ (#97087, @SataQiu) [SIG Cluster Lifecycle]
    • Resolves flakes in the Ingress conformance tests due to conflicts with controllers updating the Ingress object (#98430, @liggitt) [SIG Network and Testing]

    Dependencies

    Added

    Nothing has changed.

    Changed

    Removed

    Nothing has changed.

    app-operator 3.2.0

    Added

    • Include apiVersion, restrictions.compatibleProviders in appcatalogentry CRs.

    Changed

    • Limit the number of AppCatalogEntry per app.
    • Delete legacy finalizers on app CRs.
    • Reconciling appCatalog CRs only if pod is unique.

    Fixed

    • Updating status as cordoned if app CR has cordoned annotation.

    external-dns 2.1.0

    Added

    • Allow the sync policy to be configured. (#60)
    • Supports customisation of the txt-owner-id (whilst still defaulting for default apps). (#60)
    • Supports dry-run mode and warns the user if enabled. (#60)

    Changed

    • Rework the way the txt prefix is generated (whilst still defaulting for default apps). (#60)
    • Rework how the annotation filter value is generated (whilst still defaulting for default app). (#60)

    azure-scheduled-events 0.2.0

    Added

    • Remove the Node from Kubernetes API server right before approving the termination event.

    Fixed

    • Keep looping on events if one loop errors out.
  • This is the first workload cluster release to support spot VMs on Azure. Please refer to Giant Swarm Azure Spot documentation for more information.

    With this release the volumeBindingMode for the following storageClasses will be changed to WaitForFirstConsumer:

    • default
    • managed-premium
    • managed-standard

    We made this change to allow kubernetes making better decisions about which availability zone to create new AzureDisks into, according to scheduling of pods.

    This will not affect any PersistentVolume already existing, but will affect future provisioning. In case of old behavior being preferable, creation of additional storageclass will be required. Refer to the official documentation for more details and get in touch your solution engineer for any doubts.

    Change details

    azure-operator 5.4.0

    Changed

    • Changed StorageClasses volumeBindingMode to WaitForFirstConsumer.
    • Simplified the upgrade process by leveraging automated draining of nodes.

    Added

    • Added spot instances support for node pools.

    containerlinux 2605.12.0

    Security fixes

    Bug fixes

    • /etc/iscsi/initiatorname.iscsi is generated by the iscsi-init service (#321)
    • Prevent iscsiadm buffer overflow (#318)

    Changes

    • Revert to building docker and containerd with go1.13 instead of go1.15. This reduces the SIGURG log spam (Issue #315 PR #774)
    • The containerd socket is now available in the default location (/run/containerd/containerd.sock) and also as a symlink in the previous location (/run/docker/libcontainerd/docker-containerd.sock) (#771)
    • With the iscsi update, the service unit has changed from iscsid to iscsi (#791)
    • AWS Pro: include scripts to facilitate setup of EKS workers (#794).
    • Missed from earlier notes: with the previous open-iscsi update to 2.1.2, the service unit name changed from iscsid to iscsi (#682)

    Updates

    chart-operator 2.9.0

    Added

    • Use diff key when logging differences between the current and desired release.

    Fixed

    • Stop updating Helm release if it has failed the previous 5 attempts.
  • This is the first workload cluster release to support Kubernetes 1.19 on Azure.

    With recent added support by Azure, with this release Availability Zones are supported for all workload clusters running in the Germany West Central region as well.

    Starting from this release, Azure workload clusters include by default a new application named azure-scheduled-events that leverages the Azure scheduled events feature to automatically drain a Kubernetes node when the underlying virtual instance is about to be terminated. This ensures the workload running on the node is handled gracefully when the cluster autoscaler scales down a node pool.

    Please note that with version 1.19 there a few breaking changes in the Kubernetes APIs. Please refer to the upstream documentation and feel free to get in touch with your solutions engineer for any concern you might have.

    Change details

    kubernetes 1.19.7

    Please check the official changelog for what’s new in kubernetes 1.19.

    app-operator 3.0.0

    Changed

    • Enable mutating and validating webhooks in app-admission-controller for tenant app CRs.

    Added

    • Make resync period configurable for use in integration tests.
    • Pause App CR reconciliation when it has app-operator.giantswarm.io/paused=true annotation.
    • Print difference between the current chart and desired chart.

    cluster-operator 0.23.22

    Added

    • Check existence of chart tarball for release CR apps in catalog.

    azure-operator 5.3.0

    Added

    • Enable Azure termination events for Node Pool VMSSes.
    • Enable availability zones on Germany West Central.

    containerlinux 2605.11.0

    Security fixes:

    Bug fixes:

    • networkd: avoid managing MAC addresses for veth devices (kinvolk/init#33)
    • The sysctl net.ipv4.conf.*.rp_filter is set to 0 for the Cilium CNI plugin to work (Flatcar#181)
    • Package downloads in the developer container now use the correct URL again (Flatcar#298)

    Changes:

    • The sysctl default config file is now applied under the prefix 60 which allows for custom sysctl config files to take effect when they start with a prefix of 70, 80, or 90 (baselayout#13)
    • Containerd CRI plugin got enabled by default, only the containerd socket path needs to be specified as kubelet parameter for Kubernetes 1.20 to use containerd instead of Docker (Flatcar#283)
    • For users with a custom update server a machine alias setting in update-engine allows to give human-friendly names to client instances (update-engine#8)

    Updates:

    etcd 3.4.14

    Please check the official changelog for the list of changes included in this release.

    cluster-autoscaler 1.19.1

    Updated to the 1.19.1 version that is the suggested upstream version to run for 1.19 Kubernetes clusters.

    cert-exporter 1.6.0

    Added

    • Add exceptions in NetworkPolicies to allow DNS to work correctly through port 53.

    external-dns 1.6.0

    Changed

    • Upgrade upstream external-dns from v0.7.4 to v0.7.6.
  • This is a bug fix release aimed at solving a bug that was affecting CIDR selection during workload clusters creation.

    There is no need to upgrade existing workload clusters to this release.

    Change details

    azure-operator 5.2.1

    Fixed

    • Ensure the management cluster’s network space is never used for workload clusters.