Workload cluster releases for KVM

  • This release includes changes in calico and etcd that should lead to better performance.

    Change details

    calico 3.20.1

    Bug fixes

    • Updated ubi base images and CentOS repos to stop CVE false positives from being reported. node #1214
    • calico/node logs write to /var/log/calico within the container by default, in addition to stdout node #1134
    • Improve error message for conflicting routes in CNI plugin cni-plugin #1164
    • Bugfix: blackhole routing table with No-OIF / InterfaceNone-only is clobbering all other routes in the same routing table because if netlink.RT_FILTER_OIF is specified with a netlink.Route{LinkIndex: 0}, it will return all routes using the remaining applicable filter (netlink.RT_FILTER_TABLE / Table 254) link routes. felix #2995
    • Fix slow performance when updating a Kubernetes namespace when there are many Pods (and in turn, slow startup performance when there are many namespaces). felix #2967
    • Fixes a benign error caused by attempting to delete the same IPAMBlock twice. kube-controllers #822

    etcd 3.5.0

    etcd server

    • Fix corruption bug in defrag.
    • Fix quorum protection logic when promoting a learner.
    • Improve peer corruption checker to work when peer mTLS is enabled.
    • Log [CLIENT-PORT]/health check in server side.
    • Log successful etcd server-side health check in debug level.
    • Improve compaction performance when latest index is greater than 1-million.
    • Add log when etcdserver failed to apply command.
    • Improve count-only range performance.
    • Remove redundant storage restore operation to shorten the startup time.
    • Fix deadlock bug in mvcc.
    • Fix inconsistency between WAL and server snapshot.
    • Improve logging around snapshot send and receive.
    • Push down RangeOptions.limit argv into index tree to reduce memory overhead.
    • Improve runtime.FDUsage call pattern to reduce objects malloc of Memory Usage and CPU Usage.
    • Improve mvcc.watchResponse channel Memory Usage.
    • Log expensive request info in UnaryInterceptor.
    • Improve healthcheck by using v3 range request and its corresponding timeout.
    • Fix server panic in slow writes warnings.
    • Reduce around 30% memory allocation by logging range response size without marshal.

    kvm-operator 3.18.3

    • Fix calico mount errors

    kubernetes 1.21.5

    Feature

    • Kubernetes is now built with Golang 1.16.8 (#104906, [SIG Cloud Provider, Instrumentation, Release and Testing]

    Bug or Regression

    • Fix NodeAuthenticator tests in dualstack (#104840, [SIG Auth and Testing]
    • Fix: skip case sensitivity when checking Azure NSG rules fix: ensure InstanceShutdownByProviderID return false for creating Azure VMs (#104447, [SIG Cloud Provider]
    • Fixed occasional pod cgroup freeze when using cgroup v1 and systemd driver. Fixed “failed to create container … unit already exists” when using cgroup v1 and systemd driver. (#104530, [SIG CLI, Cloud Provider, Cluster Lifecycle, Instrumentation, Node, Storage and Testing]
    • Kube-proxy: delete stale conntrack UDP entries for loadbalancer ingress IP. (#104151, [SIG Network]
    • Metrics changes: Fix exposed buckets of scheduler_volume_scheduling_duration_seconds_bucket metric (#100720, [SIG Apps, Instrumentation, Scheduling and Storage]
    • Pass additional flags to subpath mount to avoid flakes in certain conditions (#104347, [SIG Storage]

    Other (Cleanup or Flake)

    • Kube-apiserver: sets an upper-bound on the lifetime of idle keep-alive connections and time to read the headers of incoming requests (#103958,[SIG API Machinery and Node]
  • This release includes changes in calico and etcd that should lead to better performance.

    Change details

    calico 3.20.1

    Bug fixes

    • Updated ubi base images and CentOS repos to stop CVE false positives from being reported.
    • calico/node logs write to /var/log/calico within the container by default, in addition to stdout.
    • Improve error message for conflicting routes in CNI plugin.
    • Bugfix: blackhole routing table with No-OIF / InterfaceNone-only is clobbering all other routes in the same routing table because if netlink.RT_FILTER_OIF is specified with a netlink.Route{LinkIndex: 0}, it will return all routes using the remaining applicable filter (netlink.RT_FILTER_TABLE / Table 254) link routes.
    • Fix slow performance when updating a Kubernetes namespace when there are many Pods (and in turn, slow startup performance when there are many namespaces).
    • Fixes a benign error caused by attempting to delete the same IPAMBlock twice.
    • Fix that calico/node would fail to set NetworkUnavailable to false for etcd clusters with mismatched node names.
    • Stop ARP traffic being dropped due to RPF check.
    • Disable VXLAN tunnel checksum offload on kernels < v5.7.
    • Improve routing loop prevention to handle when advertising Service LoadBalancer IPs.
    • Retry setting AWS EC2 source/destination check until successful.
    • Install blackhole routes in VXLAN mode.
    • Fix that podIP annotation could be incorrectly clobbered for stateful set pods.
    • Reinstates logic that falls back to the status of the pod during termination if the pod IP annotation is not set by the Calico CNI plugin.
    • Fix issue with serviceaccount names larger than 63 characters.
    • Fix error parsing pod deletion updates in kube-controllers.

    Other changes

    • calico/node marks nodes with NetworkUnavailable=true on shutdown node.
    • Typha now gives newly connected clients an extra grace period to catch up after sending the snapshot. Should reduce the possibility of cyclic disconnects.
    • Added enhanced error logging for IPAM failures.
    • Add IP address garbage collection to kube-controllers.
    • Calico will now release empty IPAM blocks from nodes that no longer need them so they can be used elsewhere.
    • Mount CNI plugin directory into calico/node to enable configuration updates.

    etcd 3.5.0

    etcd server

    • Fix corruption bug in defrag.
    • Fix quorum protection logic when promoting a learner.
    • Improve peer corruption checker to work when peer mTLS is enabled.
    • Log [CLIENT-PORT]/health check in server side.
    • Log successful etcd server-side health check in debug level.
    • Improve compaction performance when latest index is greater than 1-million.
    • Add log when etcdserver failed to apply command.
    • Improve count-only range performance.
    • Remove redundant storage restore operation to shorten the startup time.
    • Fix deadlock bug in mvcc.
    • Fix inconsistency between WAL and server snapshot.
    • Improve logging around snapshot send and receive.
    • Push down RangeOptions.limit argv into index tree to reduce memory overhead.
    • Improve runtime.FDUsage call pattern to reduce objects malloc of Memory Usage and CPU Usage.
    • Improve mvcc.watchResponse channel Memory Usage.
    • Log expensive request info in UnaryInterceptor.
    • Improve healthcheck by using v3 range request and its corresponding timeout.
    • Fix server panic in slow writes warnings.
    • Reduce around 30% memory allocation by logging range response size without marshal.

    kubernetes 1.20.11

    Bug or Regression

    • Fix: skip case sensitivity when checking Azure NSG rules fix: ensure InstanceShutdownByProviderID return false for creating Azure VMs (#104448)
    • Kube-proxy: delete stale conntrack UDP entries for loadbalancer ingress IP. (#104152)
    • Metrics changes: Fix exposed buckets of scheduler_volume_scheduling_duration_seconds_bucket metric (#100720) [SIG Apps, Instrumentation, Scheduling and Storage]
    • Pass additional flags to subpath mount to avoid flakes in certain conditions (#104348)
    • When using kubectl replace (or the equivalent API call) on a Service, the caller no longer needs to do a read-modify-write cycle to fetch the allocated values for .spec.clusterIP and .spec.ports[].nodePort. Instead the API server will automatically carry these forward from the original object when the new object does not specify them. (#104674)

    Other (Cleanup or Flake)

    • Kube-apiserver: sets an upper-bound on the lifetime of idle keep-alive connections and time to read the headers of incoming requests (#103958) [SIG API Machinery and Node]

    kvm-operator 3.18.2

    Changed

    • Disable apiserver flow control to mitigate etcd memory usage issues temporarily.
  • This release upgrades Kubernetes to version 1.21 and containerlinux to 2905.2.3.

    Change details

    kubernetes 1.21.4

    Feature

    • Kubernetes is now built with Golang 1.16.7 (#104201, @cpanato) [SIG Cloud Provider, Instrumentation, Release and Testing]

    Bug or Regression

    • Disable aufs module for gce clusters (#103831, @lizhuqi) [SIG Cloud Provider]
    • Fix kube-apiserver metric reporting for the deprecated watch path of /api//watch/… (#104190, @wojtek-t) [SIG API Machinery and Instrumentation]
    • Fix the code is leaking the defaulting between unrelated pod instances. (#103284, @kebe7jun) [SIG CLI]
    • Fix: Provide IPv6 support for internal load balancer (#103794, @nilo19) [SIG Cloud Provider]
    • Fix: cleanup outdated routes (#102935, @nilo19) [SIG Cloud Provider]
    • Fix: delete non existing disk issue (#102083, @andyzhangx) [SIG Cloud Provider]
    • Fix: ignore not a VMSS error for VMAS nodes in reconcileBackendPools (#103997, @nilo19) [SIG Cloud Provider]
    • Fix: return empty VMAS name if using standalone VM (#103470, @nilo19) [SIG Cloud Provider]
    • Fixed a bug that scheduler extenders are not called on preemptions (#103019, @ordovicia) [SIG Scheduling]
    • Fixes an issue cleaning up CertificateSigningRequest objects with an unparseable status.certificate field (#103948, @liggitt) [SIG Apps and Auth]
    • Fixes issue with websocket-based watches of Service objects not closing correctly on timeout (#102541, @liggitt) [SIG API Machinery and Testing]

    Dependencies

    Added

    Nothing has changed.

    Changed

    • sigs.k8s.io/apiserver-network-proxy/konnectivity-client: v0.0.19 → v0.0.22

    Removed

    Nothing has changed.

    containerlinux 2905.2.3

    New Stable release 2905.2.3

    Changes since Stable 2905.2.2

    Security fixes

    Bug Fixes

    Updates

    kvm-operator 3.18.1

  • This is the first release for KVM with Kubernetes 1.20 and Calico 3.19. It also migrates the Calico datastore from etcd to Kubernetes.

    Change details

    calico 3.19.2

    View the changelogs for Calico 3.16 through 3.19:

    cert-exporter 1.8.0

    Added

    • Add new cert_exporter_certificate_cr_not_after metric. This metric exports the status.notAfter field of cert-manager Certificate CR.

    Changed

    • Remove static certificate source label from cert_exporter_secret_not_after (static value secret) and cert_exporter_not_after (static value file) metrics.

    chart-operator 2.19.0

    Removed

    • Remove tillermigration resource now Helm 3 migration is complete.

    Added

    • Add releasemaxhistory resource which ensures we retry at a reduced rate when there are repeated failed upgrades.

    Changed

    • Increase memory limit for deploying large charts in workload clusters.
    • Upgrade Helm release when failed even if version or values have not changed to handle situations like failed webhooks where we should retry.
    • Prepare helm values to configuration management.
    • Update architect-orb to v3.0.0. For CAPI clusters:
    • Add tolerations to start on NotReady nodes for installing CNI.
    • Create giantswarm-critical priority class.
    • Use host network to allow installing CNI packaged as an app.

    Fixed

    • Improve status message when helm release has failed max number of attempts.

    coredns 1.6.0

    Changed

    • Make targetCPUUtilizationPercentage in HPA configurable.
    • Update coredns to upstream version 1.8.3.
    • Increase maximum replica count to 50 when using horizontal pod autoscaling.

    kubernetes 1.20.10

    View the major changes since Kubernetes v1.19 here.

    kube-state-metrics 1.4.0

    Changed

    • Migrate to configuration management.
    • Update architect-orb to v4.0.0.

    kvm-operator 3.18.0

    Changed

    • Upgrade k8scloudconfig to v10.8.1 which includes a change to better determine if memory eviction thresholds are crossed.
    • Update for compatibility with Calico v3.19.

    metrics-server 1.4.0

    Changed

    • Migrate to configuration management.
    • Update architect-orb to v4.0.0.

    net-exporter 1.10.3

    Changed

    • Prepare helm values to configuration management.
    • Update architect-orb to v4.0.0.
    • Allow to customize dns service.
    • Only check pod existence on dial errors. Check pod deletion directly by IP instead of listing pods and searching.

    node-exporter 1.8.0

    Changed

    • Migrate to configuration management.
    • Update architect-orb to v4.0.0.
  • This release reverts to Linux kernel 5.4 to mitigate issues with node deadlocks on large clusters with many pods.

  • This release fixes a rare bug encountered when deleting clusters using host volumes.

    Change details

    kvm-operator 3.17.3

    Fixed

    • Avoid panic during deletion of clusters with host volumes.
  • This release introduces a new feature that allows KVM clusters to run behind a proxy. It also adds support for host volumes.

    Change details

    cluster-operator 0.27.1

    Changed

    • Dropped ensuring cluster CRDs from controllers.

    app-operator 4.4.0

    Added

    • Add support for skip CRD flag when installing Helm releases.
    • Emit events when config maps and secrets referenced in App CRs are updated.

    kvm-operator 3.17.2

    Fixed

    • Remove reference from worker PVs on cluster deletion so they can be resued.

    Added

    • Add flags for proxy settings and propagate them to ignition.

    Changed

    • Reconcile only deployments that are managed by kvm-operator.

    containerlinux 2765.2.4

    Security fixes

    Updates

    etcd 3.4.16

    kubernetes 1.19.11

    API Change

    • We have added a new Priority & Fairness rule that exempts all probes (/readyz, /healthz, /livez) to prevent restarting of “healthy” kube-apiserver instance(s) by kubelet. (#101113, @tkashem) [SIG API Machinery]

    Feature

    • Kubernetes is now built using go1.15.11 (#101197, @cpanato) [SIG Cloud Provider, Instrumentation, Release and Testing]
    • Kubernetes is now built using go1.15.12 (#101846, @cpanato) [SIG Cloud Provider, Instrumentation, Release and Testing]

    Bug or Regression

    • Azurefile: Normalize share name to not include capital letters (#100731, @kassarl) [SIG Cloud Provider and Storage]
    • EndpointSlice IP validation now matches Endpoints IP validation. (#101084, @robscott) [SIG Apps and Network]
    • EndpointSlice controllers are less likely to create duplicate EndpointSlices. (#101764, @aojea) [SIG Apps and Network]
    • Ensure service deleted when the Azure resource group has been deleted (#100944, @feiskyer) [SIG Cloud Provider]
    • Fix panic in JSON logging format caused by missing Duration encoder (#101159, @serathius) [SIG API Machinery, Cluster Lifecycle and Instrumentation]
    • Fix: azure file inline volume namespace issue in csi migration translation (#101235, @andyzhangx) [SIG Apps, Cloud Provider, Node and Storage]
    • Fixed a bug where startupProbe stopped working after a container’s first restart (#101093, @wzshiming) [SIG Node]
    • Fixed port-forward memory leak for long-running and heavily used connections. (#99839, @saschagrunert) [SIG API Machinery and Node]
    • Kubelet: improve the performance when waiting for a synchronization of the node list with the kube-apiserver (#99336, @neolit123) [SIG Node]
    • No support endpointslice in linux userpace mode (#101502, @JornShen) [SIG Network]

    Dependencies

    Added

    Nothing has changed.

    Changed

    Nothing has changed.

    Removed

    Nothing has changed.

    cert-exporter 1.6.1

    Changed

    • Set docker.io as the default registry

    chart-operator 2.15.0

    Added

    • Proxy support in helm template.

    kube-state-metrics 1.3.1

    Changed

    • Set docker.io as the default registry

    metrics-server 1.3.0

    Added

    • Added new configuration value extraArgs.

    node-exporter 1.7.2

    Changed

    • Set docker.io as the default registry
  • This release upgrades Kubernetes to 1.19. A summary of relevant changes is included in these release notes. The release also includes other minor component updates summarized below the list of Kubernetes changes.

    Change details

    kubernetes 1.19.9

    Expanded CLI support for debugging workloads and nodes

    SIG CLI expanded on debugging with kubectl to support two new debugging workflows: debugging workloads by creating a copy, and debugging nodes by creating a container in host namespaces. These can be convenient to:

    • Insert a debug container in clusters that don’t have ephemeral containers enabled
    • Modify a crashing container for easier debugging by changing its image, for example to busybox, or its command, for example, to sleep 1d so you have time to kubectl exec.
    • Inspect configuration files on a node’s host filesystem

    EndpointSlices are now enabled by default

    EndpointSlices are an exciting new API that provides a scalable and extensible alternative to the Endpoints API. EndpointSlices track IP addresses, ports, readiness, and topology information for Pods backing a Service.

    In Kubernetes 1.19 this feature will be enabled by default with kube-proxy reading from EndpointSlices instead of Endpoints. Although this will mostly be an invisible change, it should result in noticeable scalability improvements in large clusters. It will also enable significant new features in future Kubernetes releases like Topology Aware Routing.

    Ingress graduates to General Availability

    SIG Network has graduated the widely used Ingress API to general availability in Kubernetes 1.19. This change recognises years of hard work by Kubernetes contributors, and paves the way for further work on future networking APIs in Kubernetes.

    seccomp graduates to General Availability

    The seccomp (secure computing mode) support for Kubernetes has graduated to General Availability (GA). This feature can be used to increase the workload security by restricting the system calls for a Pod (applies to all containers) or single containers.

    KubeSchedulerConfiguration graduates to Beta

    SIG Scheduling graduates KubeSchedulerConfiguration to Beta. The KubeSchedulerConfiguration feature allows you to tune the algorithms and other settings of the kube-scheduler. You can easily enable or disable specific functionality (contained in plugins) in selected scheduling phases without having to rewrite the rest of the configuration. Furthermore, a single kube-scheduler instance can serve different configurations, called profiles. Pods can select the profile they want to be scheduled under via the .spec.schedulerName field.

    General ephemeral volumes

    Kubernetes provides volume plugins whose lifecycle is tied to a pod and can be used as scratch space (e.g. the builtin “empty dir” volume type) or to load some data in to a pod (e.g. the builtin ConfigMap and Secret volume types or “CSI inline volumes”). The new generic ephemeral volumes alpha feature allows any existing storage driver that supports dynamic provisioning to be used as an ephemeral volume with the volume’s lifecycle bound to the Pod.

    • It can be used to provide scratch storage that is different from the root disk, for example persistent memory, or a separate local disk on that node.
    • All StorageClass parameters for volume provisioning are supported.
    • All features supported with PersistentVolumeClaims are supported, such as storage capacity tracking, snapshots and restore, and volume resizing.

    Immutable Secrets and ConfigMaps (beta)

    Secret and ConfigMap volumes can be marked as immutable, which significantly reduces load on the API server if there are many Secret and ConfigMap volumes in the cluster. See ConfigMap and Secret for more information.

    Increase the Kubernetes support window to one year

    As of Kubernetes 1.19, bugfix support via patch releases for a Kubernetes minor release has increased from 9 months to 1 year.

    kvm-operator 3.16.0

    Added

    • Add vertical pod autoscaler configuration.
    • Automatically delete WC node pods when NotReady for too long (per-cluster opt-in only).

    Changed

    • Do not drain node pods when cluster is being deleted to improve deletion time and deadlocks.
    • Update for Kubernetes 1.19 compatibility.
    • Update k8s-kvm to v0.4.1 with QEMU v5.2.0 and Flatcar DNS fix.
    • Update k8scloudconfig to use calico-crd-installer.

    Fixed

    • Use managed-by label to check node deployments are deleted before cluster namespace.
    • Remove IPs from endpoints when the corresponding workload cluster node is not ready.

    app-operator 3.2.1

    Security

    • Restrict ingress to only expose the status endpoint.

    chart-operator 2.12.0

    Added

    • Pause Chart CR reconciliation when it has chart-operator.giantswarm.io/paused=true annotation.

    Changed

    • Set docker.io as the default registry.
    • Pass RESTMapper to helmclient to reduce the number of REST API calls.
    • Updated Helm to v3.5.3.
    • Updating namespace metadata using namespaceConfig in Chart CRs.
    • Deploy giantswarm-critical PriorityClass when it’s not found.

    coredns 1.4.1

    Changed

    • Set docker.io as the default registry
    • Update coredns to upstream version 1.8.0.
    • Added monitoring annotations and common labels.

    net-exporter 1.10.0

    Changed

    • Add label selector for pods to help lower memory usage.
  • This release upgrades QEMU to version 5.2.0 which results in scheduling improvements and better CPU limits enforcement.

    Change details

    kvm-operator 3.14.2

    Changed

    • Use k8s-kvm:0.4.1 with QEMU 5.2.0.

    app-operator 3.2.0

    Added

    • Include apiVersion, restrictions.compatibleProviders in appcatalogentry CRs.

    Changed

    • Limit the number of AppCatalogEntry per app.
    • Delete legacy finalizers on app CRs.
    • Reconciling appCatalog CRs only if pod is unique.

    Fixed

    • Updating status as cordoned if app CR has cordoned annotation.

    cluster-operator 0.24.2

    Changed

    • Migrate to Go modules.
    • Update certs package to v2.0.0.
    • Refactor to use slightly newer dependency versions.

    cert-operator 1.0.1

    Fixed

    • Add list permission for cluster.x-k8s.io.

    chart-operator 2.9.0

    Added

    • Use diff key when logging differences between the current and desired release.

    Fixed

    • Stop updating Helm release if it has failed the previous 5 attempts.
  • Nodes will be rolled when upgrading to this version.

    This patch release mitigates a DNS issue affecting cluster creation and scaling.