Workload cluster releases for AWS

  • This release fixes an issue that can cause an IP conflict to occur in certain situations when a node pool is created.

    Change details

    aws-operator 9.3.1-ipam

    • Fix IPAM conflicts when creating a node pool
  • This release provides a bug fix for the external-dns-app.

    Warning: The nginx app needs to be updated to v1.14.0+ because a new version of external-dns is included in this release.

    Change details

    external-dns 2.3.0

    Changed

    • Change default annotation filter to match the one we use for the nginx ingress controller.

    Added

    • Add sidecar container for provider: aws to periodically validate IAM credential acessibility (#76)
  • This release provides a bug fix for the external-dns-app.

    Warning: The nginx app needs to be updated to v1.14.0+ because a new version of external-dns is included in this release.

    Change details

    external-dns 2.3.0

    Changed

    • Change default annotation filter to match the one we use for the nginx ingress controller.

    Added

    • Add sidecar container for provider: aws to periodically validate IAM credential acessibility (#76)
  • This release provides security and bug fixes for various components.

    Warning: The nginx app needs to be updated to v1.14.0+ because a new version of external-dns is included in this release.

    Change details

    cluster-operator 3.6.0

    Fixed

    • Fix cluster status computation to correctly display rollbacks, version changes and multiple updates.

    Added

    • Add unit tests for cluster status computation
    • Check existence of chart tarball for release CR apps in catalog.
    • Add vertical pod autoscaler support.
    • Add appversionlabel resource to update version labels for optional app CRs.

    app-operator 3.2.1

    Added

    • Include apiVersion, restrictions.compatibleProviders in appcatalogentry CRs.

    Changed

    • Limit the number of AppCatalogEntry per app.
    • Delete legacy finalizers on app CRs.
    • Reconciling appCatalog CRs only if pod is unique.

    Fixed

    • Updating status as cordoned if app CR has cordoned annotation.

    kubernetes 1.19.9

    API Change

    • Kubernetes is now built using go1.15.8 (#99093, @cpanato) [SIG Cloud Provider, Instrumentation, Release and Testing]

    Feature

    • Add a new flag to set priority for the kubelet on Windows nodes so that workloads cannot overwhelm the node there by disrupting kubelet process. (#96157, @ravisantoshgudimetla) [SIG Node and Windows]

    Failing Test

    • Fix handing special characters in the volume path on Windows (#99137, @yujuhong) [SIG Storage]
    • Resolves an issue running Ingress conformance tests on clusters which use finalizers on Ingress objects to manage releasing load balancer resources (#96742, @spencerhance) [SIG Network and Testing]

    Bug or Regression

    • Count pod overhead against an entity’s ResourceQuota (#99600, @gjkim42) [SIG API Machinery and Node]
    • EndpointSlice controller is now less likely to emit FailedToUpdateEndpointSlices events. (#100114, @robscott) [SIG Apps and Network]
    • EndpointSliceMirroring controller is now less likely to emit FailedToUpdateEndpointSlices events. (#100144, @robscott) [SIG Apps and Network]
    • Fixed bug that caused cAdvisor to incorrectly detect single-socket multi-NUMA topology. (#99209, @iwankgb) [SIG Node]
    • Fixes kubelet to retrieve number of sockets from cAdvisor MachineInfo, instead of assuming it to be equal to number of NUMA nodes. (#99771, @iwankgb) [SIG Node]
    • Fixing a bug where a failed node may not have the NoExecute taint set correctly (#98140, @CKchen0726) [SIG Apps and Node]
    • Kubelet now cleans up orphaned volume directories automatically (#95301, @lorenz) [SIG Node and Storage]
    • Resolves spurious Failed to list *v1.Secret or Failed to list *v1.ConfigMap messages in kubelet logs. (#99538, @liggitt) [SIG Auth and Node]
    • Using NUMA nodes instead of sockets for CPU manager hints. (#99276, @iwankgb) [SIG Node]
    • We will no longer automatically delete all data when a failure is detected during creation of the volume data file on a CSI volume. Now we will only remove the data file and volume path. (#96021, @huffmanca) [SIG Storage]
    • Aggregate errors when putting vmss (#98350, @nilo19) [SIG Cloud Provider]
    • Avoid marking node as Ready until node has synced with API servers at least once (#97996, @ehashman) [SIG Node]
    • Cleanup subnet in frontend IP configs to prevent huge subnet request bodies in some scenarios. (#98288, @nilo19) [SIG Cloud Provider]
    • Fix CSI-migrated inline EBS volumes failing to mount if their volumeID is prefixed by aws:// (#96821, @wongma7) [SIG Storage]
    • Fix azure file migration issue (#97877, @andyzhangx) [SIG Auth, Cloud Provider and Storage]
    • Fix the description of command line flags that can override –config (#98873, @changshuchao) [SIG Scheduling]
    • Fix to recover CSI volumes from certain dangling attachments (#96617, @yuga711) [SIG Apps and Storage]
    • Fixed a bug that the kubelet cannot start on BtrfS. (#98015, @gjkim42) [SIG Node]
    • Fixed a bug where aggregator_unavailable_apiservice metrics were reported for deleted apiservices. (#96421, @dgrisonnet) [SIG API Machinery and Instrumentation]
    • Fixed provisioning of Cinder volumes migrated to CSI when StorageClass with AllowedTopologies was used. (#98311, @jsafrane) [SIG Storage]
    • Fixes a panic in the disruption budget controller for PDB objects with invalid selectors (#98776, @ialidzhikov) [SIG Apps]
    • Kubeadm: get k8s CI version markers from k8s infra bucket (#98836, @hasheddan) [SIG Cluster Lifecycle and Release]
    • Kubelet should ignore cgroup driver check on Windows node. (#98385, @pacoxu) [SIG Node]
    • Performance regression #97685 has been fixed (#98432, @tkashem) [SIG API Machinery]
    • Static pods will be deleted gracefully. (#98103, @gjkim42) [SIG Node]
    • Truncates a message if it hits the NoteLengthLimit when the scheduler records an event for the pod that indicates the pod has failed to schedule. (#98715, @carlory) [SIG Scheduling]
    • Warning about using a deprecated volume plugin is logged only once. (#96751, @jsafrane) [SIG Storage]
    • AcceleratorStats will be available in the Summary API of kubelet when cri_stats_provider is used. (#97017, @ruiwen-zhao) [SIG Node]
    • Exposes and sets a default timeout for the SubjectAccessReview client for DelegatingAuthorizationOptions (#95910, @p0lyn0mial) [SIG API Machinery and Cloud Provider]
    • Fix the panic when kubelet registers if a node object already exists with no Status.Capacity or Status.Allocatable (#96297, @SataQiu) [SIG Node]
    • Fixed FibreChannel volume plugin corrupting filesystems on detach of multipath volumes. (#97013, @jsafrane) [SIG Storage]
    • Fixed a bug in kubelet that will saturate CPU utilization after containerd got restarted. (#97176, @hanlins) [SIG Node]
    • Remove ready file and its directory (which is created during volume SetUp) during emptyDir volume TearDown. (#95770, @jingxu97) [SIG Storage]
    • Volumebinding: report UnschedulableAndUnresolvable status instead of an error when PVC not found (#96850, @cofyc) [SIG Scheduling and Storage]
    • Bump node-problem-detector version to v0.8.5 to fix OOM detection in with Linux kernels 5.1+ (#96716, @tosi3k) [SIG Cloud Provider, Scalability and Testing]
    • Exposes and sets a default timeout for the SubjectAccessReview client for DelegatingAuthorizationOptions (#95910, @p0lyn0mial) [SIG API Machinery and Cloud Provider]
    • Fix a bug that DefaultPreemption plugin is disabled when using (legacy) scheduler policy. (#96472, @Huang-Wei) [SIG Scheduling and Testing]
    • Fix bug in JSON path parser where an error occurs when a range is empty (#95933, @brianpursley) [SIG API Machinery]
    • Fix memory leak in kube-apiserver when underlying time goes forth and back. (#96266, @chenyw1990) [SIG API Machinery]
    • Fix pull image error from multiple ACRs using azure managed identity (#96355, @andyzhangx) [SIG Cloud Provider]
    • Fix: resize Azure disk issue when it’s in attached state (#96705, @andyzhangx) [SIG Cloud Provider]
    • Fixed a bug that prevents kubectl to validate CRDs with schema using x-kubernetes-preserve-unknown-fields on object fields. Fix kubectl SchemaError on CRDs with schema using x-kubernetes-preserve-unknown-fields on array types. (#96562, @gautierdelorme) [SIG API Machinery and Testing]
    • Fixes an issue with the max-in-flight API server filter where the filter could take a long time to process an incoming request if it had been a long time since the last request. (#96282, @staebler) [SIG API Machinery]
    • HTTP/2 connection health check is enabled by default in all Kubernetes clients. The feature should work out-of-the-box. If needed, users can tune the feature via the HTTP2_READ_IDLE_TIMEOUT_SECONDS and HTTP2_PING_TIMEOUT_SECONDS environment variables. The feature is disabled if HTTP2_READ_IDLE_TIMEOUT_SECONDS is set to 0. (#96778, @caesarxuchao) [SIG API Machinery, CLI, Cloud Provider, Cluster Lifecycle, Instrumentation and Node]
    • Kubeadm: Fixes a kubeadm upgrade bug that could cause a custom CoreDNS configuration to be replaced with the default. (#97016, @rajansandeep) [SIG Cluster Lifecycle]
    • Kubeadm: fix coredns migration should be triggered when there are newdefault configs during kubeadm upgrade (#96970, @pacoxu) [SIG Cluster Lifecycle]
    • Metric names for CSI and flexvolume drivers will include the driver name as well as the CSI plugin name. (#96477, @mattcary) [SIG Instrumentation and Storage]
    • New Azure instance types do now have correct max data disk count information. (#94340, @ialidzhikov) [SIG Cloud Provider and Storage]
    • Resolves a regression in 1.19+ with workloads targeting deprecated beta os/arch labels getting stuck in NodeAffinity status on node startup. (#96810, @liggitt) [SIG Node]
    • Volume binding: report UnschedulableAndUnresolvable status instead of an error when bound PVs not found (#96291, @cofyc) [SIG Apps, Scheduling and Storage]

    Other (Cleanup or Flake)

    • Kubeadm: change the default image repository for CI images from ‘gcr.io/kubernetes-ci-images’ to ‘gcr.io/k8s-staging-ci-images’ (#97087, @SataQiu) [SIG Cluster Lifecycle]
    • Resolves flakes in the Ingress conformance tests due to conflicts with controllers updating the Ingress object (#98430, @liggitt) [SIG Network and Testing]
    • Fix Azure file share not deleted issue when the namespace is deleted (#97417, @andyzhangx) [SIG Cloud Provider and Storage]
    • Fix counting error in service/nodeport/loadbalancer quota check (#97828, @pacoxu) [SIG API Machinery and Network]
    • Fix missing cadvisor machine metrics. (#97006, @lingsamuel) [SIG Node]
    • Fix: azure file latency issue for metadata-heavy workloads (#97082, @andyzhangx) [SIG Cloud Provider and Storage]
    • Fixed bug in CPUManager with race on container map access (#97427, @klueska) [SIG Node]
    • GCE Internal LoadBalancer sync loop will now release the ILB IP address upon sync failure. An error in ILB forwarding rule creation will no longer leak IP addresses. (#97740, @prameshj) [SIG Cloud Provider and Network]
    • Kubeadm: avoid detection of the container runtime for commands that do not need it (#97848, @pacoxu) [SIG Cluster Lifecycle]
    • Client-go header logging (at verbosity levels >= 9) now masks Authorization header contents (#95316, @sfowl) [SIG API Machinery]

    Dependencies

    Added

    Nothing has changed.

    Changed

    • github.com/google/cadvisor: v0.37.0 → v0.37.5
    • sigs.k8s.io/apiserver-network-proxy/konnectivity-client: v0.0.9 → v0.0.15
    • golang.org/x/net: ab34263 → 69a7880
    • golang.org/x/sys: ed371f2 → 5cba982

    Removed

    Nothing has changed.

    cert-manager 2.4.3

    Changed

    • Set docker.io as the default registry
    • Made CRD install Job backoffLimit configurable (and increased the default value). (#129)

    Added

    • Enabled configuration of certificate Secret deletion when the parent Certificate is deleted. (#127)

    external-dns 2.2.2

    Changed

    • Set docker.io as the default registry

    Fixed

    • Adds additional options required for vmware installations. (#74)

    Added

    • Add crd source if the provider is vmware. (#72)

    cert-exporter 1.6.1

    Changed

    • Set docker.io as the default registry

    Added

    • Add exceptions in NetworkPolicies to allow DNS to work correctly through port 53.

    chart-operator 2.13.0

    Changed

    • giantswarm-critical PriorityClass only managed when E2E.
    • Set docker.io as the default registry
    • Pass RESTMapper to helmclient to reduce the number of REST API calls.
    • Updated Helm to v3.5.3.
    • Deploy giantswarm-critical PriorityClass when it’s not found.

    Added

    • Updating namespace metadata using namespaceConfig in Chart CRs.
    • Pause Chart CR reconciliation when it has chart-operator.giantswarm.io/paused=true annotation.
    • Use diff key when logging differences between the current and desired release.
    • Add support for skip CRD flag when installing Helm releases.
    • Added last reconciled timestamp as metrics.

    Fixed

    • Stop updating Helm release if it has failed the previous 5 attempts.
    • Only create VPA if autoscaling API group is present.

    kiam 1.7.1

    Changed

    • Set docker.io as the default registry

    metrics-server 1.2.2

    Changed

    • Set docker.io as the default registry

    node-exporter 1.7.2

    Changed

    • Set docker.io as the default registry

    coredns 1.4.1

    Changed

    • Set docker.io as the default registry
    • Update coredns to upstream version 1.8.0.
    • Update coredns to upstream version 1.7.1 (including changes introduced in version 1.7.0).
    • Update coredns to upstream version 1.6.9.

    Added

    • Added monitoring annotations and common labels.

    kube-state-metrics 1.3.1

    Changed

    • Set docker.io as the default registry

    cluster-autoscaler 1.19.2

    Not found

    net-exporter 1.9.3

    Changed

    • Set docker.io as the default registry
    • Update kubectl image to v1.18.8.
  • This release provides security and bug fixes for various components.

    Change details

    aws-operator 9.3.9

    Fixed

    • Added CNI CIDR to internal ELB Security Group
    • Added new Flatcar AMI identifiers
    • Added China Flatcar AMI identifiers

    cluster-operator 3.6.0

    Fixed

    • Fix cluster status computation to correctly display rollbacks, version changes and multiple updates.

    Added

    • Add vertical pod autoscaler support
    • Add appversionlabel resource to update version labels for optional app CRs
    • Check existence of chart tarball for release CR apps in catalog
    • Add unit tests for cluster status computation

    app-operator 3.2.0

    Added

    • Add printer columns for Version, Last Deployed and Status to chart CRD in tenant clusters.
    • Use validation logic from the app library.
    • Include restrictions data from app metadata files in appcatalogentry CRs.
    • Include apiVersion, restrictions.compatibleProviders in appcatalogentry CRs.

    Changed

    • Using values service from the app library.
    • Updated Helm to v3.4.2.
    • Enable mutating and validating webhooks in app-admission-controller for tenant app CRs.
    • Limit the number of AppCatalogEntry per app.
    • Delete legacy finalizers on app CRs.
    • Reconciling appCatalog CRs only if pod is unique.

    Fixed

    • Reuse clients in clients resource when app CR uses inCluster.
    • Updating status as cordoned if app CR has cordoned annotation.

    aws-cni 1.7.8

    containerlinux 2605.12.0

    Security fixes

    Bug fixes

    • Enabled missing systemd services (#191, PR #612)
    • Fixed Docker torcx image unpacking error on machines with less than ~600 MB total RAM (#32)
    • Solved adcli Kerberos Active Directory incompatibility (#194)
    • Fixed the makefile path when building kernel modules with the developer container (#195)
    • Removed the /etc/portage/savedconfig/ folder that contained a dump of the firmware config flatcar-linux/coreos-overlay#613
    • Ensured that the /etc/coreos to /etc/flatcar symlink always exists, relevant for the Container Linux Config transpiler (ct) when specifying directives for update: or locksmith: while also reformatting the rootfs (baselayout PR#7)
    • network: Restore KeepConfiguration=dhcp-on-stop (kinvolk/init#30)
    • Added systemd-tmpfiles directives for /opt and /opt/bin to ensure that the folders have correct permissions even when /opt/ was once created by containerd (Flatcar#279)
    • Make the automatic filesystem resizing more robust against a race and add more logging (kinvolk/init#31)
    • Allow inactive network interfaces to be bound to a bonding interface, by encoding additional configuration for systemd-networkd-wait-online (afterburn PR #10)
    • Do not configure ccache in Jenkins (scripts PR #100)
    • Azure: Exclude bonded SR-IOV network interfaces with newer drivers from networkd (in addition to the old drivers) to prevent them being configured instead of just the bond interface (init PR#29, bootengine PR#19)
    • The sysctl net.ipv4.conf.*.rp_filter is set to 0 for the Cilium CNI plugin to work (Flatcar#181)
    • Package downloads in the developer container now use the correct URL again (Flatcar#298)
    • networkd: avoid managing MAC addresses for veth devices (kinvolk/init#33)
    • /etc/iscsi/initiatorname.iscsi is generated by the iscsi-init service (#321)
    • Prevent iscsiadm buffer overflow (#318)

    Changes

    • GCE: Improved oslogin support and added shell aliases to run a Python Docker image (PR #592)
    • Update-engine now detects rollbacks and reports them as errors to the update server (PR#6)
    • The zstd tools were added (version 1.4.4)
    • The kernel config CONFIG_PSI was set to support Pressure Stall Information, more information also under https://facebookmicrosites.github.io/psi/docs/overview (Flatcar#162)
    • The kernel config CONFIG_BPF_JIT_ALWAYS_ON was set to use the BPF just-in-time compiler by default for faster execution
    • The kernel config CONFIG_POWER_SUPPLY was set
    • The kernel configs CONFIG_OVERLAY_FS_METACOPY and CONFIG_OVERLAY_FS_REDIRECT_DIR were set. With the first overlayfs will only copy up metadata when a metadata-specific operation like chown/chmod is performed. The full file will be copied up later when the file is opened for write operations. With the second, which is equivalent to setting “redirect_dir=on” in the kernel command-line, overlayfs will copy up the directory first before the actual content (Flatcar#170).
    • Remove unnecessary kernel module nf-conntrack-ipv4 (overlay PR#649)
    • Compress kernel modules with xz (overlay PR#628)
    • Add containerd-runc-shim-v* binaries required by kubelet custom CRI endpoints (overlay PR#623)
    • Equinix Metal (Packet): Exclude unused network interfaces from networkd, disregard the state of the bonded interfaces for the network-online.target and only require the bond interface itself to have at least one active link instead of routable which requires both links to be active (afterburn PR#10)
    • QEMU: Use flatcar.autologin kernel command line parameter for auto login on the console (Flatcar #71)
    • The sysctl default config file is now applied under the prefix 60 which allows for custom sysctl config files to take effect when they start with a prefix of 70, 80, or 90 (baselayout#13)
    • Containerd CRI plugin got enabled by default, only the containerd socket path needs to be specified as kubelet parameter for Kubernetes 1.20 to use containerd instead of Docker (Flatcar#283)
    • For users with a custom update server a machine alias setting in update-engine allows to give human-friendly names to client instances (update-engine#8)
    • Revert to building docker and containerd with go1.13 instead of go1.15. This reduces the SIGURG log spam (Issue #315 PR #774)
    • The containerd socket is now available in the default location (/run/containerd/containerd.sock) and also as a symlink in the previous location (/run/docker/libcontainerd/docker-containerd.sock) (#771)
    • With the iscsi update, the service unit has changed from iscsid to iscsi (#791)
    • AWS Pro: include scripts to facilitate setup of EKS workers (#794).
    • Missed from earlier notes: with the previous open-iscsi update to 2.1.2, the service unit name changed from iscsid to iscsi (#682)

    Updates

    etcd 3.4.14

    See code changes and v3.4 upgrade guide for any breaking changes.

    Package clientv3

    etcd server

    • Fix server panic when force-new-cluster flag is enabled in a cluster which had learner node.

    Package netutil

    tools/etcd-dump-metrics

    Go

    kubernetes 1.18.17

    Feature

    • Add a new flag to set priority for the kubelet on Windows nodes so that workloads cannot overwhelm the node there by disrupting kubelet process. (#96158, @ravisantoshgudimetla) [SIG Node]

    Failing Test

    • Fix handing special characters in the volume path on Windows (#99138, @yujuhong) [SIG Storage]

    Bug or Regression

    • Count pod overhead against an entity’s ResourceQuota (#99600, @gjkim42) [SIG API Machinery and Node]
    • EndpointSlice controller is now less likely to emit FailedToUpdateEndpointSlices events. (#100146, @robscott) [SIG Apps and Network]
    • Fixing a bug where a failed node may not have the NoExecute taint set correctly (#98943, @CKchen0726) [SIG Apps and Node]
    • Kubelet now cleans up orphaned volume directories automatically (#95301, @lorenz) [SIG Node and Storage]
    • Resolves spurious Failed to list *v1.Secret or Failed to list *v1.ConfigMap messages in kubelet logs. (#99538, @liggitt) [SIG Auth and Node]
    • We will no longer automatically delete all data when a failure is detected during creation of the volume data file on a CSI volume. Now we will only remove the data file and volume path. (#96021, @huffmanca) [SIG Storage]
    • Avoid marking node as Ready until node has synced with API servers at least once (#99034, @ehashman) [SIG Node]
    • Cleanup subnet in frontend IP configs to prevent huge subnet request bodies in some scenarios. (#98290, @nilo19) [SIG Cloud Provider]
    • Fix CSI-migrated inline EBS volumes failing to mount if their volumeID is prefixed by aws:// (#96821, @wongma7) [SIG Storage]
    • Fix azure file migration issue (#97877, @andyzhangx) [SIG Auth, Cloud Provider and Storage]
    • Fix to recover CSI volumes from certain dangling attachments (#96617, @yuga711) [SIG Apps and Storage]
    • Fixed a bug where aggregator_unavailable_apiservice metrics were reported for deleted apiservices. (#96421, @dgrisonnet) [SIG API Machinery and Instrumentation]
    • Fixed provisioning of Cinder volumes migrated to CSI when StorageClass with AllowedTopologies was used. (#98311, @jsafrane) [SIG Storage]
    • Fixes a panic in the disruption budget controller for PDB objects with invalid selectors (#98777, @ialidzhikov) [SIG Apps]
    • Kubeadm: get k8s CI version markers from k8s infra bucket (#98836, @hasheddan) [SIG Cluster Lifecycle and Release]
    • Kubelet should ignore cgroup driver check on Windows node. (#98384, @pacoxu) [SIG Node]
    • TerminationGracePeriodSeconds from pod spec is respected for the mirror pod Static pods will be deleted gracefully (#99035, @ehashman) [SIG Node and Testing]
    • Truncates a message if it hits the NoteLengthLimit when the scheduler records an event for the pod that indicates the pod has failed to schedule. (#98715, @carlory) [SIG Scheduling]
    • Warning about using a deprecated volume plugin is logged only once. (#96751, @jsafrane) [SIG Storage]
    • Fix Azure file share not deleted issue when the namespace is deleted (#97417, @andyzhangx) [SIG Cloud Provider and Storage]
    • Fix counting error in service/nodeport/loadbalancer quota check (#97829, @pacoxu) [SIG API Machinery and Network]
    • Fix: azure file latency issue for metadata-heavy workloads (#97082, @andyzhangx) [SIG Cloud Provider and Storage]
    • Fixed bug in CPUManager with race on container map access (#97427, @klueska) [SIG Node]
    • GCE Internal LoadBalancer sync loop will now release the ILB IP address upon sync failure. An error in ILB forwarding rule creation will no longer leak IP addresses. (#97740, @prameshj) [SIG Cloud Provider and Network]
    • Kubeadm: avoid detection of the container runtime for commands that do not need it (#97849, @pacoxu) [SIG Cluster Lifecycle]
    • Cordoned nodes are now deregistered from AWS target groups. (#85920, @hoelzro) [SIG Cloud Provider]
    • Fixed FibreChannel volume plugin corrupting filesystems on detach of multipath volumes. (#97013, @jsafrane) [SIG Storage]
    • Remove ready file and its directory (which is created during volume SetUp) during emptyDir volume TearDown. (#95770, @jingxu97) [SIG Storage]
    • Avoid GCE API calls when initializing GCE CloudProvider for Kubelets. Avoid unnecessary GCE API calls when adding IP alises or reflecting them in Node object in GCE cloud provider. (#96863, @tosi3k) [SIG Apps, Cloud Provider and Network]
    • Bump node-problem-detector version to v0.8.5 to fix OOM detection in with Linux kernels 5.1+ (#96716, @tosi3k) [SIG Cloud Provider, Scalability and Testing]
    • Exposes and sets a default timeout for the SubjectAccessReview client for DelegatingAuthorizationOptions (#96152, @p0lyn0mial) [SIG API Machinery and Cloud Provider]
    • Fix memory leak in kube-apiserver when underlying time goes forth and back. (#96266, @chenyw1990) [SIG API Machinery]
    • Fix pull image error from multiple ACRs using azure managed identity (#96355, @andyzhangx) [SIG Cloud Provider]
    • Fix: resize Azure disk issue when it’s in attached state (#96705, @andyzhangx) [SIG Cloud Provider]
    • Fixed a bug that prevents kubectl to validate CRDs with schema using x-kubernetes-preserve-unknown-fields on object fields. Fix kubectl SchemaError on CRDs with schema using x-kubernetes-preserve-unknown-fields on array types. (#96563, @gautierdelorme) [SIG API Machinery and Testing]
    • Fixed kubelet creating extra sandbox for pods with RestartPolicyOnFailure after all containers succeeded (#92614, @tnqn) [SIG Node and Testing]
    • Metric names for CSI and flexvolume drivers will include the driver name as well as the CSI plugin name. (#96474, @mattcary) [SIG Instrumentation and Storage]
    • New Azure instance types do now have correct max data disk count information. (#94340, @ialidzhikov) [SIG Cloud Provider and Storage]

    Other (Cleanup or Flake)

    • Kubeadm: change the default image repository for CI images from ‘gcr.io/kubernetes-ci-images’ to ‘gcr.io/k8s-staging-ci-images’ (#97087, @SataQiu) [SIG Cluster Lifecycle]
    • Client-go header logging (at verbosity levels >= 9) now masks Authorization header contents (#95316, @sfowl) [SIG API Machinery]

    Dependencies

    Added

    Nothing has changed.

    Changed

    Nothing has changed.

    Removed

    Nothing has changed.

    cert-manager 2.4.3

    Changed

    • Set docker.io as the default registry
    • Made CRD install Job backoffLimit configurable (and increased the default value). (#129)
    • Made backoffLimit for clusterissuer job configurable. (#125)
    • Updated clusterissuer subchart API groups to cert-manager.io/v1. (#124)
    • Update to upstream v1.1.0. (#119)

    Added

    • Enabled configuration of certificate Secret deletion when the parent Certificate is deleted. (#127)

    external-dns 2.2.2

    Changed

    • Set docker.io as the default registry
    • Rework the way the txt prefix is generated (whilst still defaulting for default apps). (#60)
    • Rework how the annotation filter value is generated (whilst still defaulting for default app). (#60)
    • Only template Secret if both required values are present in values.yaml. (#53)
    • Reworked the App to prepare it for customer use. (#49)
      • General:
        • Pushes the app to the giantswarm app catalog.
        • Uses Helm release namespace.
        • Uses the release name for resource naming to avoid conflicts.
        • Added a values schema to catch incorrect values.
        • Generally makes the chart easier to use (fully documented values file).
      • external-dns options:
        • Allows customisation of the txt registry prefix.
        • Allows configuration of synchronisation interval.
        • Filter resources to reconcile via annotations.
      • AWS-specifc:
        • Allows the user to provide an IAM role to use.
        • Allows the user to provide the list of domains for external-dns to manage.
        • Allows configuration of batch size.
        • Allows configuration of CNAME instead of ALIAS records.
        • Allows configuration of the AWS zone type to update.
    • Upgrade upstream external-dns from v0.7.4 to v0.7.6.

    Fixed

    • Adds additional options required for vmware installations. (#74)
    • Ensure CNAMEs are always used when AWS access is external. (#62)
    • Revert location of AWS API credentials in values.yaml. (#57)

    Added

    • Add crd source if the provider is vmware. (#72)
    • Allow the sync policy to be configured. (#60)
    • Supports customisation of the txt-owner-id (whilst still defaulting for default apps). (#60)
    • Supports dry-run mode and warns the user if enabled. (#60)

    cert-exporter 1.6.1

    Changed

    • Set docker.io as the default registry
    • Check ca.crt expiries in TLS secrets. (#109)

    Added

    • Add exceptions in NetworkPolicies to allow DNS to work correctly through port 53.
    • Add new metric (cert_exporter_secret_not_after) which tracks expiry of TLS certificates stored in Kubernetes secrets. (#92)

    chart-operator 2.12.0

    Changed

    • Set docker.io as the default registry
    • Pass RESTMapper to helmclient to reduce the number of REST API calls.
    • Updated Helm to v3.5.3.
    • Deploy giantswarm-critical PriorityClass when it’s not found.

    Added

    • Updating namespace metadata using namespaceConfig in Chart CRs.
    • Pause Chart CR reconciliation when it has chart-operator.giantswarm.io/paused=true annotation.
    • Use diff key when logging differences between the current and desired release.
    • Add support for skip CRD flag when installing Helm releases.
    • Added last reconciled timestamp as metrics.
    • Print difference between current release and desired release.
    • Add Vertical Pod Autoscaler support.

    Fixed

    • Stop updating Helm release if it has failed the previous 5 attempts.
    • Only create VPA if autoscaling API group is present.

    kiam 1.7.1

    Changed

    • Set docker.io as the default registry

    Added

    • Add taint tolerations for kiam agent and kiam server.

    metrics-server 1.2.2

    Changed

    • Set docker.io as the default registry
    • Push app to control plane catalogs
    • Updated metrics-server version to 0.4.1

    node-exporter 1.7.2

    Changed

    • Set docker.io as the default registry
    • Use the domain registry from installation values if it is present.
  • This release provides support for Kubernetes 1.19 on AWS.

    Please note that with version 1.19 there a few breaking changes in the Kubernetes APIs. Please refer to the upstream documentation and feel free to get in touch with your solutions engineer for any concern you might have.

    Warning: The nginx app needs to be updated to v1.14.0+ because a new version of external-dns is included in this release.

    Change details

    kubernetes 1.19.4

    Bug or Regression

    • An issues preventing volume expand controller to annotate the PVC with volume.kubernetes.io/storage-resizer when the PVC StorageClass is already updated to the out-of-tree provisioner is now fixed. (#94489, @ialidzhikov) [SIG API Machinery, Apps and Storage]
    • Cloud node controller: handle empty providerID from getProviderID (#95452, @nicolehanjing) [SIG Cloud Provider]
    • Disable watchcache for events (#96052, @wojtek-t) [SIG API Machinery]
    • Disabled LocalStorageCapacityIsolation feature gate is honored during scheduling. (#96140, @Huang-Wei) [SIG Scheduling]
    • Fix a bug that Pods with topologySpreadConstraints get scheduled to nodes without required labels. (#95880, @ialidzhikov) [SIG Scheduling]
    • Fix azure disk attach failure for disk size bigger than 4TB (#95463, @andyzhangx) [SIG Cloud Provider]
    • Fix azure disk data loss issue on Windows when unmount disk (#95456, @andyzhangx) [SIG Cloud Provider and Storage]
    • Fixed a bug causing incorrect formatting of kubectl describe ingress. (#94985, @howardjohn) [SIG CLI and Network]
    • Fixed a bug in client-go where new clients with customized Dial, Proxy, GetCert config may get stale HTTP transports. (#95427, @roycaihw) [SIG API Machinery]
    • Fixed a regression which prevented pods with docker/default seccomp annotations from being created in 1.19 if a PodSecurityPolicy was in place which did not allow runtime/default seccomp profiles. (#95990, @saschagrunert) [SIG Auth]
    • Fixed kubelet creating extra sandbox for pods with RestartPolicyOnFailure after all containers succeeded (#92614, @tnqn) [SIG Node and Testing]
    • Fixes high CPU usage in kubectl drain (#95260, @amandahla) [SIG CLI]
    • If we set SelectPolicy MinPolicySelect on scaleUp behavior or scaleDown behavior,Horizontal Pod Autoscaler doesn’t automatically scale the number of pods correctly (#95647, @JoshuaAndrew) [SIG Apps and Autoscaling]
    • Kube-proxy now trims extra spaces found in loadBalancerSourceRanges to match Service validation. (#94107, @robscott) [SIG Network]
    • Kubeadm: add missing “–experimental-patches” flag to “kubeadm init phase control-plane” (#95786, @Sh4d1) [SIG Cluster Lifecycle]

    app-operator 3.1.0

    Changed

    • Enable mutating and validating webhooks in app-admission-controller for tenant app CRs.

    Added

    • Make resync period configurable for use in integration tests.
    • Pause App CR reconciliation when it has
    • app-operator.giantswarm.io/paused=true annotation.
    • Print difference between the current chart and desired chart.

    aws-operator 10.2.0

    Added

    • Allow incoming NFS traffic on node pools for Elastic File System.
    • Add support for tagging AWS resources, managed by the operator, based on the custom resource labels.
    • Add cleanupiamroles resource for detaching third party policies from our IAM roles.
    • Update k8scloudconfig version to v10.0.0 to include change for Kubernetes 1.19.
    • Allow configuration of MINIMUM_IP_TARGET and WARM_IP_TARGET for AWS CNI via annotations on AWSCluster

    Changed

    • Include Account ID in the s3bucket for access logs. It is a breaking change, that will put access logs to a new s3 bucket.
    • Change AWS CNI and AWS CNI k8s plugin log verbosity to INFO.
    • Change AWS CNI log file to stdout.
    • Add retry logic for decrypt units to avoid flapping.
    • Use values generated by config-controller to deploy aws-operator instead of catalog values.
    • Use giantswarm/config versions matching v1.x.x major.
    • Start updating tcnp CF stack only when tccpn CF stack is already updated. This ensure that master nodes are updated before worker nodes.

    etcd 3.4.14

    • Fix server panic when force-new-cluster flag is enabled in a cluster which had learner node.

    aws-cni 1.7.8

    containerlinux 2605.12.0

    Security fixes

    Bug fixes

    • /etc/iscsi/initiatorname.iscsi is generated by the iscsi-init service (#321)
    • Prevent iscsiadm buffer overflow (#318)

    Changes

    • Revert to building docker and containerd with go1.13 instead of go1.15. This reduces the SIGURG log spam (Issue #315 PR #774)
    • The containerd socket is now available in the default location (/run/containerd/containerd.sock) and also as a symlink in the previous location (/run/docker/libcontainerd/docker-containerd.sock) (#771)
    • With the iscsi update, the service unit has changed from iscsid to iscsi (#791)
    • AWS Pro: include scripts to facilitate setup of EKS workers (#794).
    • Missed from earlier notes: with the previous open-iscsi update to 2.1.2, the service unit name changed from iscsid to iscsi (#682)

    Updates

    chart-operator 2.6.0

    Added

    • Print difference between current release and desired release.

    Changed

    • Updated Helm to v3.4.2.

    cluster-autoscaler 1.19.1

    Changed

    • Updated cluster-autoscaler to version 1.19.1.

    cert-manager 2.4.1

    Changed

    • Made backoffLimit for clusterissuer job configurable. (#125)
    • Updated clusterissuer subchart API groups to cert-manager.io/v1. (#124)

    cert-exporter 1.5.0

    Changed

    • Check ca.crt expiries in TLS secrets. (#109)

    chart-operator 2.7.1

    Fixed

    • Only create VPA if autoscaling API group is present.

    kiam 1.7.0

    • Add taint tolerations for kiam agent and kiam server.

    metrics-server 1.2.1

    • Push app to control plane catalogs

    node-exporter 1.7.1

    Changed

    • Use the domain registry from installation values if it is present.

    external-dns 2.1.1

    Added

    • Allow the sync policy to be configured.
    • Supports customisation of the txt-owner-id (whilst still defaulting for default apps).
    • Supports dry-run mode and warns the user if enabled.

    Changed

    • Reworked the App to prepare it for customer use. (#49)
      • General:
        • Pushes the app to the giantswarm app catalog.
        • Uses Helm release namespace.
        • Uses the release name for resource naming to avoid conflicts.
        • Added a values schema to catch incorrect values.
        • Generally makes the chart easier to use (fully documented values file).
      • external-dns options:
        • Allows customisation of the txt registry prefix.
        • Allows configuration of synchronisation interval.
        • Filter resources to reconcile via annotations.
      • AWS-specifc:
        • Allows the user to provide an IAM role to use.
        • Allows the user to provide the list of domains for external-dns to manage.
        • Allows configuration of batch size.
        • Allows configuration of CNAME instead of ALIAS records.
        • Allows configuration of the AWS zone type to update.
  • This release fixes an issue that causes ImagePullBackOff errors when new nodes are becoming ready.

    Change details

    app-operator 2.7.0

    Added

    • Secure the webhook with token value from control plane catalog.
    • Adding webhook URL as annotation into chart CRs.
    • Added Status update endpoint.
    • Watch secrets referenced in app CRs to reduce latency when applying config changes.
    • Create appcatalogentry CRs for public app catalogs.
    • Watch configmaps referenced in app CRs to reduce latency when applying config changes.

    Changed

    • Update apiextensions to v3 and replace CAPI with Giant Swarm fork.

    Fixed

    • Use resourceVersion of configmap for comparison instead of listing option.

    aws-operator 9.3.1-fix

    Changed

    • Remove explicit registry pull limits defaulting to less restrictive upstream settings.

    chart-operator 2.5.1

    Added

    • Validate the cache in helmclient to avoid state requests when pulling tarballs.
    • Call status webhook with token values.
    • Call status webhook when webhook annotation is present.

    Fixed

    • Fix comparison of last deployed and revision optional fields in status resource.
    • Set memory limit and reduce requests.
    • Update apiextensions to v3 and replace CAPI with Giant Swarm fork.

    Removed

    • Remove chartmigration resource as migration from chartconfig to chart CRs is complete.

    cluster-operator 3.4.1

    Added

    • Add functionality to template catalog into apps depending on release CR.

    Changed

    • Update apiextensions, k8sclient, and operatorkit dependencies.
    • Update github workflows.

    Fixed

    • Allow annotations from current app CR to remain.
  • This release provides support for Kubernetes 1.18 on AWS.

    Change details

    aws-operator 9.3.4

    Changed

    • Make it mandatory to configure alike instances via e.g. the installations repo.
    • Fix naming and logs for terminate-unhealthy-node feature.
    • Update k8scloudconfig version to v9.3.0 to include change for kubelet pull QPS and kubelet cgroup.
    • Add vertical pod autoscaler support.
    • Do not return NAT gateways in state deleting and deleted to avoid problems with recreating clusters with same ID.

    aws-cni 1.7.6

    • Improvement - Avoid detaching EFA ENIs
    • Improvement - Add t4g instance type
    • Improvement - Add p4d.24xlarge instance type
    • Improvement - Update calico to v3.16.2
    • Improvement - Update readme on stdout support for plugin log file
    • Bug - Make p3dn.24xlarge examples more realistic
    • Bug - Make sure we have space for a trunk ENI
    • Bug - Update README for DISABLE_TCP_EARLY_DEMUX
    • Bug - Update p4 instance limits

    kubernetes 1.18.12

    Design

    • Prevent logging of docker config contents if file is malformed (#95347, @sfowl) [SIG Auth and Node]

    Bug or Regression

    • Do not fail sorting empty elements. (#94666, @soltysh) [SIG CLI]
    • Ensure getPrimaryInterfaceID not panic when network interfaces for Azure VMSS are null (#94801, @nilo19) [SIG Cloud Provider]
    • Fix bug where loadbalancer deletion gets stuck because of missing resource group #75198 (#93962, @phiphi282) [SIG Cloud Provider]
    • Fix detach azure disk issue when vm not exist (#95177, @andyzhangx) [SIG Cloud Provider]
    • Fix etcd_object_counts metric reported by kube-apiserver (#94818, @tkashem) [SIG API Machinery]
    • Fix network_programming_latency metric reporting for Endpoints/EndpointSlice deletions, where we don’t have correct timestamp (#95363, @wojtek-t) [SIG Network and Scalability]
    • Fix scheduler cache snapshot when a Node is deleted before its Pods (#95154, @alculquicondor) [SIG Scheduling]
    • Fix the cloudprovider_azure_api_request_duration_seconds metric buckets to correctly capture the latency metrics. Previously, the majority of the calls would fall in the “+Inf” bucket. (#95375, @marwanad) [SIG Cloud Provider and Instrumentation]
    • Fix: azure disk resize error if source does not exist (#93011, @andyzhangx) [SIG Cloud Provider]
    • Fix: detach azure disk broken on Azure Stack (#94885, @andyzhangx) [SIG Cloud Provider]
    • Fixed a bug where improper storage and comparison of endpoints led to excessive API traffic from the endpoints controller (#94934, @damemi) [SIG Apps, Network and Testing]
    • Gracefully delete nodes when their parent scale set went missing (#95289, @bpineau) [SIG Cloud Provider]
    • Kubeadm: warn but do not error out on missing “ca.key” files for root CA, front-proxy CA and etcd CA, during “kubeadm join –control-plane” if the user has provided all certificates, keys and kubeconfig files which require signing with the given CA keys. (#94988, @neolit123) [SIG Cluster Lifecycle]

    Other (Cleanup or Flake)

    • Masks ceph RBD adminSecrets in logs when logLevel >= 4 (#95245, @sfowl) [SIG Storage]

    Dependencies

    Added

    Nothing has changed.

    Changed

    Nothing has changed.

    Removed

    Nothing has changed.

    calico 3.15.3

    Other changes

    • Add FelixConfiguration parameters to explicitly allow encapsulated packets from workloads. libcalico-go #1302 (@doublek)
    • Respect explicit configuration for drop rules for encapsulated packets originating from workloads. felix #2487 (@doublek)

    cluster-operator 3.4.1

    Added

    • Add functionality to template catalog into apps depending on release CR.

    Changed

    • Update apiextensions, k8sclient, and operatorkit dependencies.
    • Update github workflows.

    Fixed

    • Allow annotations from current app CR to remain.

    app-operator 2.7.0

    Added

    • Secure the webhook with token value from control plane catalog.

    chart-operator 2.5.1

    Added

    • Validate the cache in helmclient to avoid state requests when pulling tarballs.
    • Call status webhook with token values.

    Fixed

    • Update apiextensions to v3 and replace CAPI with Giant Swarm fork.
    • Fix comparison of last deployed and revision optional fields in status resource.
    • Set memory limit and reduce requests.

    kube-state-metrics 1.3.0

    Changed

    • Change the Kubernetes Deployment name to include the app version.

    cert-exporter 1.3.0

    Added

    • Add Network Policy.

    Changed

    • Remove hostNetwork and hostPID capabilities.

    net-exporter 1.9.2

    Changed

    • Updated backward incompatible Kubernetes dependencies to v1.18.5.

    Fixed

    • Fixed indentation problem with the daemonset template.

    node-exporter 1.7.0

    Changed

    • Change the Kubernetes Daemonset name to include the app version.

    external-dns 1.5.0

    Changed

    • Upgrade upstream external-dns from v0.7.3 to v0.7.4.

    cluster-autoscaler 1.18.3

    Changed

    • Updated cluster-autoscaler to version 1.18.3.

    cert-manager 2.3.3

    Changed

    • Schedule hook Jobs on master nodes. (#106)
  • This release offers the possibility to configure the subnet size of Network Pools, the size and wait time of batches during tenant cluster upgrades. More details about the upgrade improvements can be found in our Fine-tuning upgrade disruption on AWS guide.

    Change details

    aws-cni 1.7.5

    Bug - Match primary ENI IP correctly (#1247 , @mogren)

    aws-operator 9.3.1

    Changed

    • Update dependencies to next major versions.

    Fixed

    • During a deletion of a cluster, ignore volumes that are mounted to an instance in a different cluster.

    Added

    • Annotation alpha.aws.giantswarm.io/metadata-v2 to enable AWS Metadata API v2
    • Annotation alpha.aws.giantswarm.io/aws-subnet-size to customize subnet size of Control Plane and Node Pools
    • Annotation alpha.aws.giantswarm.io/update-max-batch-size to configure max batch size in ASG update policy on cluster or machine deployment CR.
    • Annotation alpha.aws.giantswarm.io/update-pause-time to configure pause between batches in ASG update on cluster or machine deployment CR.

    cert-manager 2.3.2

    Added

    • Added values.schema.json for validation of default values. (#90)
    • Made cert-manager version configurable. (#91)

    Changed

    • Updated cert-manager to v1.0.4. (#95)
    • Update RBAC API versions. (#84)

    Fixed

    • Updated app version in Chart.yaml metadata to v1.0.3. (#91)
  • This patch release prevents an issue with QPS (Queries per Second) limits introduced by Docker Hub. Also, it solves a corner case scenario during ETCD mouting time.

    This minor release also contains two alpha features to terminate unhealthy nodes and to use the new AWS metadata API v2. Both only works when the cluster CR is annotated properly.

    Change details

    aws-operator 9.2.0

    Fixed

    • Fix dockerhub QPS by using paid user token for pulls.
    • Remove dependency on var-lib-etcd.automount to avoid dependency cycle on new systemd.

    Added

    • Add terminate-unhealthy-node alpha feature to automatically terminate bad and unhealthy nodes in a Cluster.
    • Add alpha.giantswarm.io/aws-metadata-v2 annotation to enable AWS Metadata API v2.