Workload cluster releases for KVM

  • This release updates several preinstalled apps running in the workload cluster to reduce Kubernetes API server load and etcd memory usage in certain installations.

    Change details

    app-operator 5.6.0

    Changed

    • Get tarball URL for chart CRs from index.yaml for better community app catalog support.

    Fixed

    • Fix error handling in chart CR watcher when chart CRD not installed.

    cert-operator 1.3.0

    Changed

    • Use RenewSelf instead of LookupSelf to prevent expiration of Vault token.

    cert-exporter 2.1.0

    Changed

    • Make exporter’s monitor flags configurable.

    kvm-operator 3.18.5

    Fixed

    • Update k8scloudconfig to v10.16.1 for calico permissions fix.
  • This patch release updates kvm-operator to fix a bug that caused multiple nodes to be upgraded simultaneously.

  • This release includes changes in calico and etcd that should lead to better performance.

    Change details

    calico 3.20.1

    Bug fixes

    • Updated ubi base images and CentOS repos to stop CVE false positives from being reported. node #1214
    • calico/node logs write to /var/log/calico within the container by default, in addition to stdout node #1134
    • Improve error message for conflicting routes in CNI plugin cni-plugin #1164
    • Bugfix: blackhole routing table with No-OIF / InterfaceNone-only is clobbering all other routes in the same routing table because if netlink.RT_FILTER_OIF is specified with a netlink.Route{LinkIndex: 0}, it will return all routes using the remaining applicable filter (netlink.RT_FILTER_TABLE / Table 254) link routes. felix #2995
    • Fix slow performance when updating a Kubernetes namespace when there are many Pods (and in turn, slow startup performance when there are many namespaces). felix #2967
    • Fixes a benign error caused by attempting to delete the same IPAMBlock twice. kube-controllers #822

    etcd 3.5.0

    etcd server

    • Fix corruption bug in defrag.
    • Fix quorum protection logic when promoting a learner.
    • Improve peer corruption checker to work when peer mTLS is enabled.
    • Log [CLIENT-PORT]/health check in server side.
    • Log successful etcd server-side health check in debug level.
    • Improve compaction performance when latest index is greater than 1-million.
    • Add log when etcdserver failed to apply command.
    • Improve count-only range performance.
    • Remove redundant storage restore operation to shorten the startup time.
    • Fix deadlock bug in mvcc.
    • Fix inconsistency between WAL and server snapshot.
    • Improve logging around snapshot send and receive.
    • Push down RangeOptions.limit argv into index tree to reduce memory overhead.
    • Improve runtime.FDUsage call pattern to reduce objects malloc of Memory Usage and CPU Usage.
    • Improve mvcc.watchResponse channel Memory Usage.
    • Log expensive request info in UnaryInterceptor.
    • Improve healthcheck by using v3 range request and its corresponding timeout.
    • Fix server panic in slow writes warnings.
    • Reduce around 30% memory allocation by logging range response size without marshal.

    kvm-operator 3.18.3

    • Fix calico mount errors

    kubernetes 1.21.5

    Feature

    • Kubernetes is now built with Golang 1.16.8 (#104906, [SIG Cloud Provider, Instrumentation, Release and Testing]

    Bug or Regression

    • Fix NodeAuthenticator tests in dualstack (#104840, [SIG Auth and Testing]
    • Fix: skip case sensitivity when checking Azure NSG rules fix: ensure InstanceShutdownByProviderID return false for creating Azure VMs (#104447, [SIG Cloud Provider]
    • Fixed occasional pod cgroup freeze when using cgroup v1 and systemd driver. Fixed “failed to create container … unit already exists” when using cgroup v1 and systemd driver. (#104530, [SIG CLI, Cloud Provider, Cluster Lifecycle, Instrumentation, Node, Storage and Testing]
    • Kube-proxy: delete stale conntrack UDP entries for loadbalancer ingress IP. (#104151, [SIG Network]
    • Metrics changes: Fix exposed buckets of scheduler_volume_scheduling_duration_seconds_bucket metric (#100720, [SIG Apps, Instrumentation, Scheduling and Storage]
    • Pass additional flags to subpath mount to avoid flakes in certain conditions (#104347, [SIG Storage]

    Other (Cleanup or Flake)

    • Kube-apiserver: sets an upper-bound on the lifetime of idle keep-alive connections and time to read the headers of incoming requests (#103958,[SIG API Machinery and Node]
  • This release includes changes in calico and etcd that should lead to better performance.

    Change details

    calico 3.20.1

    Bug fixes

    • Updated ubi base images and CentOS repos to stop CVE false positives from being reported.
    • calico/node logs write to /var/log/calico within the container by default, in addition to stdout.
    • Improve error message for conflicting routes in CNI plugin.
    • Bugfix: blackhole routing table with No-OIF / InterfaceNone-only is clobbering all other routes in the same routing table because if netlink.RT_FILTER_OIF is specified with a netlink.Route{LinkIndex: 0}, it will return all routes using the remaining applicable filter (netlink.RT_FILTER_TABLE / Table 254) link routes.
    • Fix slow performance when updating a Kubernetes namespace when there are many Pods (and in turn, slow startup performance when there are many namespaces).
    • Fixes a benign error caused by attempting to delete the same IPAMBlock twice.
    • Fix that calico/node would fail to set NetworkUnavailable to false for etcd clusters with mismatched node names.
    • Stop ARP traffic being dropped due to RPF check.
    • Disable VXLAN tunnel checksum offload on kernels < v5.7.
    • Improve routing loop prevention to handle when advertising Service LoadBalancer IPs.
    • Retry setting AWS EC2 source/destination check until successful.
    • Install blackhole routes in VXLAN mode.
    • Fix that podIP annotation could be incorrectly clobbered for stateful set pods.
    • Reinstates logic that falls back to the status of the pod during termination if the pod IP annotation is not set by the Calico CNI plugin.
    • Fix issue with serviceaccount names larger than 63 characters.
    • Fix error parsing pod deletion updates in kube-controllers.

    Other changes

    • calico/node marks nodes with NetworkUnavailable=true on shutdown node.
    • Typha now gives newly connected clients an extra grace period to catch up after sending the snapshot. Should reduce the possibility of cyclic disconnects.
    • Added enhanced error logging for IPAM failures.
    • Add IP address garbage collection to kube-controllers.
    • Calico will now release empty IPAM blocks from nodes that no longer need them so they can be used elsewhere.
    • Mount CNI plugin directory into calico/node to enable configuration updates.

    etcd 3.5.0

    etcd server

    • Fix corruption bug in defrag.
    • Fix quorum protection logic when promoting a learner.
    • Improve peer corruption checker to work when peer mTLS is enabled.
    • Log [CLIENT-PORT]/health check in server side.
    • Log successful etcd server-side health check in debug level.
    • Improve compaction performance when latest index is greater than 1-million.
    • Add log when etcdserver failed to apply command.
    • Improve count-only range performance.
    • Remove redundant storage restore operation to shorten the startup time.
    • Fix deadlock bug in mvcc.
    • Fix inconsistency between WAL and server snapshot.
    • Improve logging around snapshot send and receive.
    • Push down RangeOptions.limit argv into index tree to reduce memory overhead.
    • Improve runtime.FDUsage call pattern to reduce objects malloc of Memory Usage and CPU Usage.
    • Improve mvcc.watchResponse channel Memory Usage.
    • Log expensive request info in UnaryInterceptor.
    • Improve healthcheck by using v3 range request and its corresponding timeout.
    • Fix server panic in slow writes warnings.
    • Reduce around 30% memory allocation by logging range response size without marshal.

    kubernetes 1.20.11

    Bug or Regression

    • Fix: skip case sensitivity when checking Azure NSG rules fix: ensure InstanceShutdownByProviderID return false for creating Azure VMs (#104448)
    • Kube-proxy: delete stale conntrack UDP entries for loadbalancer ingress IP. (#104152)
    • Metrics changes: Fix exposed buckets of scheduler_volume_scheduling_duration_seconds_bucket metric (#100720) [SIG Apps, Instrumentation, Scheduling and Storage]
    • Pass additional flags to subpath mount to avoid flakes in certain conditions (#104348)
    • When using kubectl replace (or the equivalent API call) on a Service, the caller no longer needs to do a read-modify-write cycle to fetch the allocated values for .spec.clusterIP and .spec.ports[].nodePort. Instead the API server will automatically carry these forward from the original object when the new object does not specify them. (#104674)

    Other (Cleanup or Flake)

    • Kube-apiserver: sets an upper-bound on the lifetime of idle keep-alive connections and time to read the headers of incoming requests (#103958) [SIG API Machinery and Node]

    kvm-operator 3.18.2

    Changed

    • Disable apiserver flow control to mitigate etcd memory usage issues temporarily.
  • This release upgrades Kubernetes to version 1.21 and containerlinux to 2905.2.3.

    Change details

    kubernetes 1.21.4

    Feature

    • Kubernetes is now built with Golang 1.16.7 (#104201, @cpanato) [SIG Cloud Provider, Instrumentation, Release and Testing]

    Bug or Regression

    • Disable aufs module for gce clusters (#103831, @lizhuqi) [SIG Cloud Provider]
    • Fix kube-apiserver metric reporting for the deprecated watch path of /api//watch/… (#104190, @wojtek-t) [SIG API Machinery and Instrumentation]
    • Fix the code is leaking the defaulting between unrelated pod instances. (#103284, @kebe7jun) [SIG CLI]
    • Fix: Provide IPv6 support for internal load balancer (#103794, @nilo19) [SIG Cloud Provider]
    • Fix: cleanup outdated routes (#102935, @nilo19) [SIG Cloud Provider]
    • Fix: delete non existing disk issue (#102083, @andyzhangx) [SIG Cloud Provider]
    • Fix: ignore not a VMSS error for VMAS nodes in reconcileBackendPools (#103997, @nilo19) [SIG Cloud Provider]
    • Fix: return empty VMAS name if using standalone VM (#103470, @nilo19) [SIG Cloud Provider]
    • Fixed a bug that scheduler extenders are not called on preemptions (#103019, @ordovicia) [SIG Scheduling]
    • Fixes an issue cleaning up CertificateSigningRequest objects with an unparseable status.certificate field (#103948, @liggitt) [SIG Apps and Auth]
    • Fixes issue with websocket-based watches of Service objects not closing correctly on timeout (#102541, @liggitt) [SIG API Machinery and Testing]

    Dependencies

    Added

    Nothing has changed.

    Changed

    • sigs.k8s.io/apiserver-network-proxy/konnectivity-client: v0.0.19 → v0.0.22

    Removed

    Nothing has changed.

    containerlinux 2905.2.3

    New Stable release 2905.2.3

    Changes since Stable 2905.2.2

    Security fixes

    Bug Fixes

    Updates

    kvm-operator 3.18.1

  • This is the first release for KVM with Kubernetes 1.20 and Calico 3.19. It also migrates the Calico datastore from etcd to Kubernetes.

    Change details

    calico 3.19.2

    View the changelogs for Calico 3.16 through 3.19:

    cert-exporter 1.8.0

    Added

    • Add new cert_exporter_certificate_cr_not_after metric. This metric exports the status.notAfter field of cert-manager Certificate CR.

    Changed

    • Remove static certificate source label from cert_exporter_secret_not_after (static value secret) and cert_exporter_not_after (static value file) metrics.

    chart-operator 2.19.0

    Removed

    • Remove tillermigration resource now Helm 3 migration is complete.

    Added

    • Add releasemaxhistory resource which ensures we retry at a reduced rate when there are repeated failed upgrades.

    Changed

    • Increase memory limit for deploying large charts in workload clusters.
    • Upgrade Helm release when failed even if version or values have not changed to handle situations like failed webhooks where we should retry.
    • Prepare helm values to configuration management.
    • Update architect-orb to v3.0.0. For CAPI clusters:
    • Add tolerations to start on NotReady nodes for installing CNI.
    • Create giantswarm-critical priority class.
    • Use host network to allow installing CNI packaged as an app.

    Fixed

    • Improve status message when helm release has failed max number of attempts.

    coredns 1.6.0

    Changed

    • Make targetCPUUtilizationPercentage in HPA configurable.
    • Update coredns to upstream version 1.8.3.
    • Increase maximum replica count to 50 when using horizontal pod autoscaling.

    kubernetes 1.20.10

    View the major changes since Kubernetes v1.19 here.

    kube-state-metrics 1.4.0

    Changed

    • Migrate to configuration management.
    • Update architect-orb to v4.0.0.

    kvm-operator 3.18.0

    Changed

    • Upgrade k8scloudconfig to v10.8.1 which includes a change to better determine if memory eviction thresholds are crossed.
    • Update for compatibility with Calico v3.19.

    metrics-server 1.4.0

    Changed

    • Migrate to configuration management.
    • Update architect-orb to v4.0.0.

    net-exporter 1.10.3

    Changed

    • Prepare helm values to configuration management.
    • Update architect-orb to v4.0.0.
    • Allow to customize dns service.
    • Only check pod existence on dial errors. Check pod deletion directly by IP instead of listing pods and searching.

    node-exporter 1.8.0

    Changed

    • Migrate to configuration management.
    • Update architect-orb to v4.0.0.
  • This release reverts to Linux kernel 5.4 to mitigate issues with node deadlocks on large clusters with many pods.

  • This release fixes a rare bug encountered when deleting clusters using host volumes.

    Change details

    kvm-operator 3.17.3

    Fixed

    • Avoid panic during deletion of clusters with host volumes.
  • This release introduces a new feature that allows KVM clusters to run behind a proxy. It also adds support for host volumes.

    Change details

    cluster-operator 0.27.1

    Changed

    • Dropped ensuring cluster CRDs from controllers.

    app-operator 4.4.0

    Added

    • Add support for skip CRD flag when installing Helm releases.
    • Emit events when config maps and secrets referenced in App CRs are updated.

    kvm-operator 3.17.2

    Fixed

    • Remove reference from worker PVs on cluster deletion so they can be resued.

    Added

    • Add flags for proxy settings and propagate them to ignition.

    Changed

    • Reconcile only deployments that are managed by kvm-operator.

    containerlinux 2765.2.4

    Security fixes

    Updates

    etcd 3.4.16

    kubernetes 1.19.11

    API Change

    • We have added a new Priority & Fairness rule that exempts all probes (/readyz, /healthz, /livez) to prevent restarting of “healthy” kube-apiserver instance(s) by kubelet. (#101113, @tkashem) [SIG API Machinery]

    Feature

    • Kubernetes is now built using go1.15.11 (#101197, @cpanato) [SIG Cloud Provider, Instrumentation, Release and Testing]
    • Kubernetes is now built using go1.15.12 (#101846, @cpanato) [SIG Cloud Provider, Instrumentation, Release and Testing]

    Bug or Regression

    • Azurefile: Normalize share name to not include capital letters (#100731, @kassarl) [SIG Cloud Provider and Storage]
    • EndpointSlice IP validation now matches Endpoints IP validation. (#101084, @robscott) [SIG Apps and Network]
    • EndpointSlice controllers are less likely to create duplicate EndpointSlices. (#101764, @aojea) [SIG Apps and Network]
    • Ensure service deleted when the Azure resource group has been deleted (#100944, @feiskyer) [SIG Cloud Provider]
    • Fix panic in JSON logging format caused by missing Duration encoder (#101159, @serathius) [SIG API Machinery, Cluster Lifecycle and Instrumentation]
    • Fix: azure file inline volume namespace issue in csi migration translation (#101235, @andyzhangx) [SIG Apps, Cloud Provider, Node and Storage]
    • Fixed a bug where startupProbe stopped working after a container’s first restart (#101093, @wzshiming) [SIG Node]
    • Fixed port-forward memory leak for long-running and heavily used connections. (#99839, @saschagrunert) [SIG API Machinery and Node]
    • Kubelet: improve the performance when waiting for a synchronization of the node list with the kube-apiserver (#99336, @neolit123) [SIG Node]
    • No support endpointslice in linux userpace mode (#101502, @JornShen) [SIG Network]

    Dependencies

    Added

    Nothing has changed.

    Changed

    Nothing has changed.

    Removed

    Nothing has changed.

    cert-exporter 1.6.1

    Changed

    • Set docker.io as the default registry

    chart-operator 2.15.0

    Added

    • Proxy support in helm template.

    kube-state-metrics 1.3.1

    Changed

    • Set docker.io as the default registry

    metrics-server 1.3.0

    Added

    • Added new configuration value extraArgs.

    node-exporter 1.7.2

    Changed

    • Set docker.io as the default registry