This release includes upgrades of components and Kubernetes version to 1.24. The upgrade to v19.0.0
involve two major changes for customers, namely the migration from the AWS VPC CNI to Cilium and the replacement of Kiam with IAM Roles for Service Accounts(IRSA) for authenticating pods against the AWS API.
Next sections are describing important changes we will introduce with the new release, the key benefits, what customers can do to prepare and how to avoid downtime during this crucial upgrade.
Say goodbye to slow network initialization times and hello to lightning-fast performance with Cilium, our new Kubernetes CNI solution!
While switching to Cilium we are forced to change the CIDR used to assign IPs to Pods (192.168.0.0/16 by default).
The process is automated for the vast majority of the clusters, but if you had set up custom networking settings in your cluster the upgrade might be blocked by admission controllers. If that is the case, reach out to your SA and you’ll receive guidance how to move on with the upgrade. Same thing applies if you don’t want to stick with the default value and prefer to change it.
Migration from AWS CNI to Cilium allowed us to improve the IP space delegation per WC, meaning that starting with AWS v19 release customers will be able to use full range of the CIDR instead of 25%.
By default at Giant Swarm the pod CIDR is set to 10.2.0.0/16 and can be customized in the AWSCluster CR.
The CIDR is added in the VPC CIDR and thus cannot overlap with other CIDRs in the cluster and any other CIDR in peered VPCs.
The pod CIDR space needs to be split into separate, contiguous CIDR space, one for each AZ. Since Giant Swarm supports 3 AZs, we need to divide the range into 4 subranges. In the default scenario, we end up with 4 /18 blocks (16k addresses each). One of the /18 blocks is unused meaning we have 16k (or 1/4th) IP addresses “lost”.
In v19 and further releases the default pod CIDR is 192.168.0.0/16. The default can be changed by setting a field in the AWSCluster CR as it was the case so far.
This CIDR is used across the whole cluster, regardless of AZs and all addresses from the range can be used. In the default setting, each node gets assigned a /25 subnet for pods running in it. Meaning we can have as many as 65k pods in a cluster (~ 595 nodes in theory) but that can be increased by providing a larger pod cidr space to begin with.
With AWS-CNI, IP addresses assigned to pods are actually IP addresses assigned to the node itself.
Depending on the instance type, there is a limit on the number of IP addresses assignable to each instance. This means in practice that clusters using AWS-CNI will have less pods per node in principle. With Cilium, we can use the max number of pods per node as suggested by k8s which is 110.
To ensure that your applications can assume the appropriate IAM roles, you need to add the Cloudfront Domain Alias
to those roles as a trust entity.
We’re aiming to provide a comprehensive blackbox monitoring tool that can validate internal, DNS and external connectivity.
After its deprecation in v1.20, the dockershim component has been removed from the kubelet.
From v1.24 onwards, you will need to either use one of the other supported runtimes (such as containerd or CRI-O)
or use cri-dockerd if you are relying on Docker Engine as your container runtime.
For more information about ensuring your cluster is ready for this removal, please
see this guide.
Kubernetes is now built with Go 1.19.8 (#117132, @xmudrii) [SIG Release and Testing]
Kubelet TCP and HTTP probes are more effective using networking resources: conntrack entries, sockets, …
This is achieved by reducing the TIME-WAIT state of the connection to 1 second, instead of the defaults 60 seconds. This allows kubelet to free the socket, and free conntrack entry and ephemeral port associated. (#115143, @aojea) [SIG Network and Node]
Kubeadm: use the image registry registry.k8s.io instead of k8s.gcr.io for new clusters. During upgrade, migrate users to registry.k8s.io if they were using the default of k8s.gcr.io. (#113395, @neolit123) [SIG Cloud Provider and Cluster Lifecycle]
Kubernetes is now built with Go 1.19.5 (#115012, @cpanato) [SIG Release and Testing]- A new Priority and Fairness metric ‘apiserver_flowcontrol_work_estimate_seats_samples’ has been added that tracks the estimated seats associated with a request. (#106628, @tkashem)
Add a deprecated cmd flag for the time interval between flushing pods from unschedulable queue to active queue or backoff queue. (#108017, @denkensk)
Add one metrics(kubelet_volume_stats_health_abnormal
) of volume health state to kubelet (#105585, @fengzixu)
Add the metric container_oom_events_total
to kubelet’s cAdvisor metric endpoint. (#108004, @jonkerj)
Added SetTransform
to SharedInformer
to allow users to transform objects before they are stored. (#107507, @alexzielenski)
Added a proxy-url
flag into kubectl config set-cluster
. (#105566, @ardaguclu)
Added a metric for measuring end-to-end volume mount timing. (#107006, @gnufied)
Added a new Priority and Fairness metric apiserver_flowcontrol_request_dispatch_no_accommodation_total
to track the number of times a request dispatch attempt results in a no-accommodation status due to lack of available seats. (#106629, @tkashem)
Added a path /header?key=
to agnhost netexec
allowing one to view what the header value is of the incoming request.
Ex:
$ curl -H "X-Forwarded-For: something" 172.17.0.2:8080/header?key=X-Forwarded-For
something
(#107796, @alexanderConstantinescu)
Added completion for kubectl config set-context
. (#106739, @kebe7jun)
Added field add_ambient_capabilities
to the Capabilities message in the CRI-API. (#104620, @vinayakankugoyal)
Added label selector flag to all kubectl rollout
commands. (#99758, @aramperes)
Added more message for no PodSandbox container. (#107116, @yxxhero)
Added prune flag into diff
command to simulate apply --prune
. (#105164, @ardaguclu)
Added support for btrfs
resizing (#108561, @RomanBednar)
Added support for kubectl commands (kubectl exec
and kubectl port-forward
) via a SOCKS5 proxy. (#105632, @xens)
Adds OpenAPIV3SchemaInterface
to DiscoveryClient
and its variants for fetching OpenAPI v3 schema documents. (#108992, @alexzielenski)
Allow kubectl to manage resources by filename patterns without the shell expanding it first (#102265, @danielrodriguez)
An alpha flag --subresource
is added to get, patch, edit replace kubectl commands to fetch and update status and scale subresources. (#99556, @nikhita)
Apiextensions_openapi_v3_regeneration_count metric (alpha) will be emitted for OpenAPI V3. (#109128, @Jefftree)
Apply ProxyTerminatingEndpoints to all traffic policies (external, internal, cluster, local). (#108691, @andrewsykim)
CEL regex patterns in x-kubernetes-valiation rules are compiled when CRDs are created/updated if the pattern is provided as a string constant in the expression. Any regex compile errors are reported as a CRD create/update validation error. (#108617, @jpbetz)
CRD x-kubernetes-validations
rules now support the CEL functions: isSorted
, sum
, min
, max
, indexOf
, lastIndexOf
, find
and findAll
. (#108312, @jpbetz)
Changes the kubectl --validate
flag from a bool to a string that accepts the values {true, strict, warn, false, ignore}
- true/strict - perform validation and error the request on any invalid fields in the ojbect. It will attempt to perform server-side validation if it is enabled on the apiserver, otherwise it will fall back to client-side validation.
- warn - perform server-side validation and warn on any invalid fields (but ultimately let the request succeed by dropping any invalid fields from the object). If validation is not available on the server, perform no validation.
- false/ignore - perform no validation, silently dropping invalid fields from the object. (#108350, @kevindelgado)
Client-go metrics: change bucket distribution for rest_client_request_duration_seconds
and rest_client_rate_limiter_duration_seconds
from [0.001, 0.002, 0.004, 0.008, 0.016, 0.032, 0.064, 0.128, 0.256, 0.512] to [0.005, 0.025, 0.1, 0.25, 0.5, 1.0, 2.0, 4.0, 8.0, 15.0, 30.0, 60.0}] (#106911, @aojea)
Client-go: add new histogram metric to record the size of the requests and responses. (#108296, @aojea)
CycleState is now optimized for “write once and read many times”. (#108724, @sanposhiho)
Enabled beta feature HonorPVReclaimPolicy by default. (#109035, @deepakkinni)
Env var for additional cli flags used in the csi-proxy binary when a Windows nodepool is created with kube-up.sh
(#107806, @mauriciopoppe)
Feature of PreferNominatedNode
is graduated to GA. (#106619, @chendave)
In text format, log messages that previously used quoting to prevent multi-line output (for example, text=“some "quotation", a\nline break”) will now be printed with more readable multi-line output without the escape sequences. (#107103, @pohly)
Increase default value of discovery cache TTL for kubectl to 6 hours. (#107141, @mk46)
Introduce policy to allow the HPA to consume the external.metrics.k8s.io
API group. (#104244, @dgrisonnet)
Kube-apiserver: Subresources such as status
and scale
now support tabular output content types. (#103516, @ykakarap)
Kube-apiserver: when merging lists, Server Side Apply now prefers the order of the submitted request instead of the existing persisted object. (#107565, @jiahuif)
Kubeadm: added support for dry running kubeadm reset
. The new flag kubeadm reset --dry-run
is similar to the existing flag for kubeadm init/join/upgrade
and allows you to see what changes would be applied. (#107512, @SataQiu)
Kubeadm: added the flag --experimental-initial-corrupt-check
to etcd static Pod manifests to ensure etcd member data consistency (#109074, @neolit123)
Kubeadm: better surface errors during kubeadm upgrade
when waiting for the kubelet to restart static pods on control plane nodes (#108315, @Monokaix)
Kubeadm: improve the strict parsing of user YAML/JSON configuration files. Next to printing warnings for unknown and duplicate fields (current state), also print warnings for fields with incorrect case sensitivity - e.g. controlPlaneEndpoint
(valid), ControlPlaneEndpoint
(invalid). Instead of only printing warnings during init
and join
also print warnings when downloading the ClusterConfiguration, KubeletConfiguration or KubeProxyConfiguration objects from the cluster. This can be useful if the user has patched these objects in their respective ConfigMaps with mistakes. (#107725, @neolit123)
Kubectl now supports shell completion for the / format for specifying resources.
kubectl now provides shell completion for container names following the --container/-c
flag of the exec
command.
kubectl’s shell completion now suggests resource types for commands that only apply to pods. (#108493, @marckhouzam)
Kubelet: add kubelet_volume_metric_collection_duration_seconds
metrics for volume disk usage calculation duration (#107201, @pacoxu)
Kubelet: the following dockershim related flags are also removed along with dockershim --experimental-dockershim-root-directory
, --docker-endpoint
, --image-pull-progress-deadline
, --network-plugin
, --cni-conf-dir
, --cni-bin-dir
, --cni-cache-dir
, --network-plugin-mtu
. (#106907, @cyclinder)
Kubernetes 1.24 bumped version of golang it is compiled with to go1.18, which introduced significant changes to its garbage collection algorithm. As a result, we observed an increase in memory usage for kube-apiserver in larger an heavily loaded clusters up to ~25% (with the benefit of API call latencies drop by up to 10x on 99th percentiles). If the memory increase is not acceptable for you you can mitigate by setting GOGC env variable (for our tests using GOGC=63 brings memory usage back to original value, although the exact value may depend on usage patterns on your cluster). (#108870, @dims)
Kubernetes 1.24 is built with go1.18, which will no longer validate certificates signed with a SHA-1 hash algorithm by default. See https://golang.org/doc/go1.18#sha1 for more details. If you are using certificates like this in admission or conversion (#109024, @stlaz)
Leader Migration is now GA. All new configuration files onwards should use version v1. (#109072, @jiahuif)
Mark AzureDisk CSI migration as GA (#107681, @andyzhangx)
Move volume expansion feature to GA (#108929, @gnufied)
Moving MixedProtocolLBService from alpha to beta (#109213, @bridgetkromhout)
New “field_validation_request_duration_seconds” metric, measures how long requests take, indicating the value of the fieldValidation query parameter and whether or not server-side field validation is enabled on the apiserver (#109120, @kevindelgado)
New feature gate, ServiceIPStaticSubrange, to enable the new strategy in the Service IP allocators, so the IP range is subdivided and dynamic allocated ClusterIP addresses for Services are allocated preferently from the upper range. (#106792, @aojea)
OpenAPI definitions served by kube-apiserver now include enum types by default. (#108898, @jiahuif)
OpenStack Cinder CSI migration is now GA and switched on by default, Cinder CSI driver must be installed on clusters on OpenStack for Cinder volumes to work (has been since v1.21). (#107462, @dims)
PreFilter extension in the scheduler framework now returns not only status but also PreFilterResult (#108648, @ahg-g)
Promoted graceful shutdown based on pod priority to beta (#107986, @wzshiming)
Removed feature gate SetHostnameAsFQDN
. (#108038, @mengjiao-liu)
Removed kube-scheduler insecure flags. You can use --bind-address
and --secure-port
instead. (#106865, @jonyhy96)
Removed the ImmutableEphemeralVolumes
feature gate. (#107152, @mengjiao-liu)
Set PodMaxUnschedulableQDuration
as 5 min. (#108761, @denkensk)
Support in-tree PV deletion protection finalizer. (#108400, @deepakkinni)
The .spec.loadBalancerClass
field for Services is now generally available. (#107979, @XudongLiuHarold)
The NamespaceDefaultLabelName
feature gate, GA since v1.22, is now removed. (#106838, @mengjiao-liu)
The kubectl logs
will now warn and default to the first container in a pod. This new behavior brings it in line with kubectl exec
. (#105964, @kidlj)
The v1
version of LeaderMigrationConfiguration
supports only leases
API for leader election. To use formerly supported mechanisms, please continue using v1beta1
. (#108016, @jiahuif)
The kubelet now creates an iptables chain named KUBE-IPTABLES-HINT
in
the mangle
table. Containerized components that need to modify iptables
rules in the host network namespace can use the existence of this chain
to more-reliably determine whether the system is using iptables-legacy or
iptables-nft. (#109059, @danwinship)
The output of kubectl describe ingress
now includes an IngressClass name if available. (#107921, @mpuckett159)
The scheduler prints info logs when the extender returned an error. (--v>5
) (#107974, @sanposhiho)
The script cluster/gce/gci/configure.sh
now supports downloading crictl
on ARM64 nodes (#108034, @tstapler)
Turn on CSIMigrationAzureFile
by default on 1.24 (#105070, @andyzhangx)
Update the k8s.io/system-validators library to v1.7.0 (#108988, @neolit123)
Updated golang.org/x/net to v0.0.0-20211209124913-491a49abca63. (#106949, @cpanato)
Updates kubectl kustomize
and kubectl apply -k
to Kustomize v4.5.4 (#108994, @KnVerey)
When invoked with -list-images
, the e2e.test
binary now also lists the images that might be needed for storage tests. (#108458, @pohly)
kubectl config delete-user
now supports completion (#107142, @dimbleby)
kubectl create token
can now be used to request a service account token, and permission to request service account tokens is added to the edit
and admin
RBAC roles (#107880, @liggitt)
kubectl version
now includes information on the embedded version of Kustomize (#108817, @KnVerey)
Fix missing delete events on informer re-lists to ensure all delete events are correctly emitted and using the latest known object state, so that all event handlers and stores always reflect the actual apiserver state as best as possible (#115901, @odinuge) [SIG API Machinery]
Fix: Route controller should update routes with NodeIP changed (#116360, @lzhecheng) [SIG Cloud Provider]
Kubelet: Fix fs quota monitoring on volumes (#116795, @pacoxu) [SIG Storage]
Fix the regression that introduced 34s timeout for DELETECOLLECTION calls (#115482, @tkashem) [SIG API Machinery]
Fixed bug which caused the status of Indexed Jobs to only be updated when there are newly completed indexes. The completed indexes are now updated if the .status.completedIndexes has values outside of the [0, .spec.completions> range (#115457, @danielvegamyhre) [SIG Apps]
Golang.org/x/net updates to v0.7.0 to fix CVE-2022-41723 (#115789, @liggitt) [SIG API Machinery, Architecture, Auth, CLI, Cloud Provider, Cluster Lifecycle, Instrumentation, Node, Security and Storage]
The Kubernetes API server now correctly detects and closes existing TLS connections when its client certificate file for kubelet authentication has been rotated. (#115580, @enj) [SIG API Machinery, Node and Testing]
Client-go: fixes potential data races retrying requests using a custom io.Reader body; with this fix, only requests with no body or with string / []byte / runtime.Object bodies can be retried (#113933, @liggitt) [SIG API Machinery]
Do not include preemptor pod metadata in the event message (#115024, @mimowo) [SIG Scheduling]
Failed pods associated with a job with parallelism = 1
are recreated by the job controller honoring exponential backoff delay again. However, for jobs with parallelism > 1
, pods might be created without exponential backoff delay. (#115021, @nikhita) [SIG Apps]
Fix a regression that the scheduler always goes through all Filter plugins. (#114526, @Huang-Wei) [SIG Scheduling]
Fix bug in CRD Validation Rules (beta) and ValidatingAdmissionPolicy (alpha) where all admission requests could result in internal error: runtime error: index out of range [3] with length 3 evaluating rule: <rule name>
under certain circumstances. (#114865, @jpbetz) [SIG API Machinery]
Fix performance issue when creating large objects using SSA with fully unspecified schemas (preserveUnknownFields). (#111915, @aojea) [SIG API Machinery, Architecture, Auth, CLI, Cloud Provider, Cluster Lifecycle, Instrumentation, Storage and Testing]
Fixed StatefulSet to show the valid status even if the new replica creation fails. (#112083, @gjkim42) [SIG Apps and Testing]
Fixing issue in Winkernel Proxier - Unexpected active TCP connection drops while horizontally scaling the endpoints for a LoadBalancer Service with External Traffic Policy: Local (#114040, @princepereira) [SIG Network]
Fixing issue with Winkernel Proxier - No ingress load balancer rules with endpoints to support load balancing when all the endpoints are terminating. (#114451, @princepereira) [SIG Network]
Kube-apiserver: bugfix DeleteCollection API fails if request body is non-empty (#113968, @sxllwx) [SIG API Machinery]
Optimizing loadbalancer creation with the help of attribute Internal Traffic Policy: Local (#114466, @princepereira) [SIG Network]
Update the system-validators library to v1.8.0 (#114060, @pacoxu) [SIG Cluster Lifecycle]
[aws] Fixed a bug which reduces the number of unnecessary calls to STS in the event of assume role failures in the legacy cloud provider (#110706, @prateekgogia) [SIG Cloud Provider]
Fix endpoint reconciler not being able to delete the apiserver lease on shutdown (#114138, @aojea) [SIG API Machinery]
Fix for volume reconstruction of CSI ephemeral volumes (#113346, @dobsonj) [SIG Node, Storage and Testing]
Kube-apiserver: resolves possible hung connections using konnectivity network proxy with TCP or UDS HTTP connect configurations (#113862, @jkh52) [SIG API Machinery]
Resolves an issue that causes winkernel proxier to treat stale VIPs as valid (#113567, @daschott) [SIG Network and Windows]
Updates golang.org/x/net to fix CVE-2022-41717 (#114322, @liggitt) [SIG API Machinery, Architecture, Auth, CLI, Cloud Provider, Cluster Lifecycle, Instrumentation, Node and Storage]
Updates golang.org/x/net to v0.1.1-0.20221027164007-c63010009c80 to resolve CVE-2022-27664 (#113459, @aimuz) [SIG API Machinery, Architecture, Auth, CLI, Cloud Provider, Cluster Lifecycle, Instrumentation, Node, Release and Storage]
Volumes are no longer detached from healthy nodes after 6 minutes timeout. 6 minute force-detach timeout is used only for unhealthy nodes (node.status.conditions["Ready"] != true
). (#110721, @jsafrane) [SIG Apps]
Consider only plugin directory and not entire kubelet root when cleaning up mounts (#112920, @mattcary) [SIG Storage]
Etcd: Update to v3.5.5 (#113099, @mk46) [SIG API Machinery, Cloud Provider, Cluster Lifecycle and Testing]
Fixed a bug where a change in the appProtocol
for a Service did not trigger a load balancer update. (#113032, @MartinForReal) [SIG Cloud Provider and Network]
Kube-proxy, will restart in case it detects that the Node assigned pod.Spec.PodCIDRs have changed (#113252, @code-elinka) [SIG Cloud Provider, Network, Node and Storage]
Kubelet no longer reports terminated container metrics from cAdvisor (#112963, @bobbypage) [SIG Node]
Kubelet: fix GetAllocatableCPUs method in cpumanager (#113421, @Garrybest) [SIG Node]
Pod logs using –timestamps are not broken up with timestamps anymore. (#113516, @rphillips) [SIG Node]
Allow Label section in vsphere e2e cloudprovider configuration (#112479, @gnufied) [SIG Storage and Testing]
Kube-apiserver: gzip compression switched from level 4 to level 1 to improve large list call latencies in exchange for higher network bandwidth usage (10-50% higher). This increases the headroom before very large unpaged list calls exceed request timeout limits. (#112399, @shyamjvs) [SIG API Machinery]
Kube-apiserver: resolved a regression that treated 304 Not Modified
responses from aggregated API servers as internal errors (#112528, @liggitt) [SIG API Machinery]
Kubeadm: allow RSA and ECDSA format keys in preflight check (#112535, @SataQiu) [SIG Cluster Lifecycle]
Fix an ephemeral port exhaustion bug caused by improper connection management that occurred when a large number of objects were handled by kubectl while exec auth was in use. (#112337, @enj) [SIG API Machinery and Auth]
Fix problem in updating VolumeAttached in node status (#112304, @xing-yang) [SIG Apps]
Kube-apiserver: redirect responses are no longer returned from backends by default. Set --aggregator-reject-forwarding-redirect=false
to continue forwarding redirect responses. (#112331, @enj) [SIG API Machinery]
UserName check for ‘ContainerAdministrator’ is now case-insensitive if runAsNonRoot is set to true on Windows. (#112211, @PushkarJ) [SIG Node, Testing and Windows]
Fix JobTrackingWithFinalizers when a pod succeeds after the job is considered failed, which led to API conflicts that blocked finishing the job. (#111664, @alculquicondor) [SIG Apps and Testing]
Fix memory leak in the job controller related to JobTrackingWithFinalizers (#111722, @alculquicondor) [SIG Apps]
Fix memory leak on kube-scheduler preemption (#111803, @amewayne) [SIG Scheduling]
Fixed potential scheduler crash when scheduling with unsatisfied nodes in PodTopologySpread. (#111511, @kerthcet) [SIG Scheduling]
Fixing issue on Windows nodes where HostProcess containers may not be created as expected. (#110966, @marosset) [SIG Node and Windows]
If the parent directory of the file specified in the --audit-log-path
argument does not exist, Kubernetes now creates it. (#111225, @vpnachev) [SIG Auth]
Namespace editors and admins can now create leases.coordination.k8s.io and should use this type for leaderelection instead of configmaps. (#111515, @deads2k) [SIG API Machinery and Auth]
Reduce API server memory when many CRDs are loaded by sharing a single etcd3 client logger across all clients (#111648, @negz) [SIG API Machinery]
Run kubelet, when there is an error exit, print the error log (#110917, @yangjunmyfm192085) [SIG Node]
Fix a bug on endpointslices tests comparing the wrong metrics (#110920, @jluhrsen) [SIG Apps and Network]
Fix a bug that caused the wrong result length when using –chunk-size and –selector together (#110735, @Abirdcfly) [SIG API Machinery and Testing]
Fix bug that prevented the job controller from enforcing activeDeadlineSeconds when set (#110544, @harshanarayana) [SIG Apps]
Fix image pulling failure when IMDS is unavailable in kubelet startup (#110523, @andyzhangx) [SIG Cloud Provider]
Fix printing resources with int64 fields (#110572, @sanchezl) [SIG API Machinery]
Fix unnecessary recreation of placeholder EndpointSlice (#110732, @jluhrsen) [SIG Apps and Network]
Fixed a regression introduced in 1.24.0 where Azure load balancers were not kept up to date with the state of cluster nodes. In particular, nodes that are not in the ready state and are not newly created (i.e. not having the node.cloudprovider.kubernetes.io/uninitialized
taint) now get removed from Azure load balancers. (#109931, @ricky-rav) [SIG Cloud Provider]
Kubeadm: fix error adding extra prefix unix:// to CRI endpoints that were missing URL scheme (#110634, @pacoxu) [SIG Cluster Lifecycle]
Kubeadm: fix the bug that configurable KubernetesVersion not respected during kubeadm join (#111021, @SataQiu) [SIG Cluster Lifecycle]
EndpointSlices marked for deletion are now ignored during reconciliation. (#110484, @aryan9600) [SIG Apps and Network]
Fixed a kubelet issue that could result in invalid pod status updates to be sent to the api-server where pods would be reported in a terminal phase but also report a ready condition of true in some cases. (#110479, @bobbypage) [SIG Node and Testing]
Pods will now post their readiness during termination. (#110416, @aojea) [SIG Network, Node and Testing]
The pod phase lifecycle guarantees that terminal Pods, those whose states are Unready or Succeeded, can not regress and will have all container stopped. Hence, terminal Pods will never be reachable and should not publish their IP addresses on the Endpoints or EndpointSlices, independently of the Service TolerateUnready option. (#110258, @robscott) [SIG Apps, Network, Node and Testing]
Fix JobTrackingWithFinalizers that:
- was declaring a job finished before counting all the created pods in the status
- was leaving pods with finalizers, blocking pod and job deletions
JobTrackingWithFinalizers is still disabled by default. (#109486, @alculquicondor) [SIG Apps and Testing]
Kubeadm: only taint control plane nodes when the legacy “master” taint is present. This avoids a bug where “kubeadm upgrade” will re-taint a control plane node with the new “control plane” taint even if the user explicitly untainted the node. (#109841, @neolit123) [SIG Cluster Lifecycle]
A node IP provided to kublet via --node-ip
will now be preferred for when determining the node’s primary IP and using the external cloud provider (CCM). (#107750, @stephenfin)
A static pod that is rapidly updated was failing to start until the Kubelet was restarted. (#107900, @smarterclayton)
Add one metrics(kubelet_volume_stats_health_abnormal
) of volume health state to kubelet (#108758, @fengzixu)
Added a new label type
to apiserver_flowcontrol_request_execution_seconds
metric - it has the following values: - ‘regular’: indicates that it is a non long running request - ‘watch’: indicates that it is a watch request. (#105517, @tkashem)
Added a test to guarantee that conformance clusters require at least 2 untainted nodes. (#106313, @aojea)
Adds PV deletion protection finalizer only when PV reclaimPolicy is Delete for dynamically provisioned volumes. (#109205, @deepakkinni)
Allowed attached volumes to be mounted quicker by skipping exponential backoff when checking for reported-in-use volumes. (#106853, @gnufied)
Alowed useful inclusion of -args $prog_args
in KUBE_TEST_ARGS, when doing make test-integration
. (#107516, @MikeSpreitzer)
An inefficient lock in EndpointSlice controller metrics cache has been reworked. Network programming latency may be significantly reduced in certain scenarios, especially in clusters with a large number of Services. (#107091, @robscott)
Apiserver will now reject connection attempts to 0.0.0.0/::
when handling a proxy subresource request. (#107402, @anguslees)
Bug: client-go clientset was not defaulting to the user agent, and was using the default golang agent for all the requests. (#108772, @aojea)
Bump sigs.k8s.io/apiserver-network-proxy/konnectivity-client@v0.0.30
to fix a goroutine leak in kube-apiserver when using egress selctor with the gRPC mode. (#108437, @andrewsykim)
CEL validation failure returns object type instead of object. (#107090, @cici37)
CRI-API: IPs returned by `PodSandboxNetworkStatus`` are ignored by the kubelet for host-network pods. (#106715, @aojea)
Call NodeExpand
on all nodes in case of RWX volumes (#108693, @gnufied)
Changed node staging path for CSI driver to use a PV agnostic path. Nodes must be drained before updating the kubelet with this change. (#107065, @saikat-royc)
Client-go: fixed the paged list calls with ResourceVersionMatch
set would fail once paging is kicked in. (#107311, @fasaxc)
Correct event registration for multiple scheduler plugins; this fixes a potential significant delay in re-queueing unschedulable pods. (#109442, @ahg-g)
Etcd: Update to v3.5.3 (#109471, @justaugustus)
Existing InTree AzureFile PVs which don’t have a secret namespace defined will now work properly after enabling CSI migration - the namespace will be obtained from ClaimRef. (#108000, @RomanBednar)
Failure to start a container cannot accidentally result in the pod being considered “Succeeded” in the presence of deletion. (#107845, @smarterclayton)
Fix a race in the timeout handler that could lead to kube-apiserver crashes (#108455, @Argh4k)
Fix container creation errors for pods with cpu requests bigger than 256 cpus (#106570, @odinuge)
Fix issue where the job controller might not remove the job tracking finalizer from pods when deleting a job, or when the pod is orphan (#108752, @alculquicondor)
Fix libct/cg/fs2: fixed GetStats for unsupported hugetlb error on Raspbian Bullseye (#106912, @Letme)
Fix the bug that the outdated services may be sent to the cloud provider (#107631, @lzhecheng)
Fix the overestimated cost of delegated API requests in kube-apiserver API priority & fairness (#109188, @wojtek-t)
Fix to allow fsGroup
to be applied for CSI Inline Volumes (#108662, @dobsonj)
Fixed CSI migration of Azure Disk in-tree StorageClasses with topology requirements in Azure regions that do not have availability zones. (#109154, @jsafrane)
Fixed --retries
functionality for negative values in kubectl cp
(#108748, @atiratree)
Fixed azureDisk
parameter lowercase translation issue. (#107429, @andyzhangx)
Fixed azureFile
volumeID
collision issue in CSI migration. (#107575, @andyzhangx)
Fixed a bug in attachdetach controller that didn’t properly handle kube-apiserver errors leading to stuck attachments/detachments. (#108167, @jfremy)
Fixed a bug that a pod’s .status.nominatedNodeName
is not cleared properly, and thus over-occupied system resources. (#106816, @Huang-Wei)
Fixed a bug that caused credentials in an exec plugin to override the static certificates set in a kubeconfig. (#107410, @margocrawf)
Fixed a bug that could cause panic when a /healthz
request times out. (#107034, @benluddy)
Fixed a bug that out-of-tree plugin is misplaced when using scheduler v1beta3 config (#108613, @Huang-Wei)
Fixed a bug where a partial EndpointSlice
update could cause node name information to be dropped from endpoints that were not updated. (#108198, @liggitt)
Fixed a bug where unwanted fields were being returned from a create --dry-run
: uid and, if generateName was used, name. (#107088, @joejulian)
Fixed a bug where vSphere client connections where not being closed during testing. Leaked vSphere client sessions were causing resource exhaustion during automated testing. (#107337, @derek-pryor)
Fixed a panic when using invalid output format in kubectl create secret
command. (#107221, @rikatz)
Fixed a rare race condition handling requests that timeout. (#107452, @liggitt)
Fixed a regression in 1.23 that incorrectly pruned data from array items of a custom resource that set x-kubernetes-preserve-unknown-fields: true
. (#107688, @liggitt)
Fixed a regression in 1.23 where update requests to previously persisted Service
objects that have not been modified since 1.19 can be rejected with an incorrect spec.clusterIPs: Required value
error. (#107847, @thockin)
Fixed a regression that could incorrectly reject pods with OutOfCpu
errors if they were rapidly scheduled after other pods were reported as complete in the API. The Kubelet now waits to report the phase of a pod as terminal in the API until all running containers are guaranteed to have stopped and no new containers can be started. Short-lived pods may take slightly longer (~1s) to report Succeeded or Failed after this change. (#108366, @smarterclayton)
Fixed bug in TopologyManager
for ensuring aligned allocations on machines with more than 2 NUMA nodes (#108052, @klueska)
Fixed bug in error messaging for basic-auth and ssh secret validations. (#106179, @vivek-koppuru)
Fixed detaching CSI volumes from nodes when a CSI driver name has prefix “csi-”. (#107025, @jsafrane)
Fixed duplicate port opening in kube-proxy when --nodeport-addresses
is empty. (#107413, @tnqn)
Fixed handling of objects with invalid selectors. (#107559, @liggitt)
Fixed indexer bug that resulted in incorrect index updates if number of index values for a given object was changing during update (#109137, @wojtek-t)
Fixed kubectl bug where bash completions don’t work if --context
flag is specified with a value that contains a colon. (#107439, @brianpursley)
Fixed performance regression in JSON logging caused by syncing stdout every time error was logged. (#107035, @serathius)
Fixed regression in CPUManager that it will release exclusive CPUs in app containers inherited from init containers when the init containers were removed. (#104837, @eggiter)
Fixed static pod add and removes restarts in certain cases. (#107695, @rphillips)
Fixed: deleted a non-existent Azure disk issue. (#107406, @andyzhangx)
Fixed: do not return early in the node informer when there is no change of the topology label. (#108149, @nilo19)
Fixed: removed outdated ipv4 route when the corresponding node is deleted. (#106164, @nilo19)
Fixes bug in CronJob Controller V2 where it would lose track of jobs upon job template labels change. (#107997, @d-honeybadger)
If drainer has nil for Ctx or Client it will error with RunCordonOrUncordon
. (#105297, @jackfrancis)
Improved handling of unmount failures when device may be in-use by another container/process. (#107789, @gnufied)
Improved logging when volume times out waiting for attach/detach. (#108628, @RomanBednar)
Improved the rounding of PodTopologySpread
scores to offer better scoring when spreading a low number of pods. (#107384, @sanposhiho)
Increase Azure ACR credential provider timeout (#108209, @andyzhangx)
Kube-apiserver: Server Side Apply merge order is reverted to match v1.22 behavior until http://issue.k8s.io/104641 is resolved. (#106660, @liggitt)
Kube-apiserver: ensures the namespace of objects sent to admission webhooks matches the request namespace. Previously, objects without a namespace set would have the request namespace populated after mutating admission, and objects with a namespace that did not match the request namespace would be rejected after admission. (#94637, @liggitt)
Kube-apiserver: removed apf_fd
from server logs which could contain data identifying the requesting user (#108631, @jupblb)
Kube-proxy in iptables mode now only logs the full iptables input at -v=9
rather than -v=5
. (#108224, @danwinship)
Kube-proxy will no longer hold service node ports open on the node. Users are still advised not to run any listener on node ports range used by kube-proxy. (#108496, @khenidak)
Kubeadm: allow the certs check-expiration
command to not require the existence of the cluster CA key (ca.key file) when checking the expiration of managed certificates in kubeconfig files. (#106854, @neolit123)
Kubeadm: during execution of the certs check-expiration
command, treat the etcd CA as external if there is a missing etcd CA key file (etcd/ca.key) and perform the proper validation on certificates signed by the etcd CA. Additionally, make sure that the CA for all entries in the output table is included - for both certificates on disk and in kubeconfig files. (#106891, @neolit123)
Kubeadm: fixed a bug related to a warning printed if the KubeletConfiguration
resolvConf
field value does not match /run/systemd/resolve/resolv.conf
(#107785, @chendave)
Kubeadm: fixed a bug when using kubeadm init --dry-run
with certificate authority files (ca.key
/ ca.crt
) present in /etc/kubernetes/pki
) (#108410, @Haleygo)
Kubeadm: fixed a bug where Windows nodes fail to join an IPv6 cluster due to preflight errors (#108769, @SataQiu)
Kubeadm: fixed the bug that kubeadm certs generate-csr
command does not remove duplicated SANs (#107982, @SataQiu)
Kubelet now checks “NoExecute” taint/toleration before accepting pods, except for static pods. (#101218, @gjkim42)
Metrics Server image bumped to v0.5.2 (#106492, @serathius)
Modified command line errors (for example, kubectl list
-> unknown command
) that were printed as log message with escaped line breaks instead of a multi-line plain text, making the error hard to read. (#107044, @pohly)
Modified log messages that were logged with "v":0
in JSON output although they were debug messages with a higher verbosity. (#106978, @pohly)
No (#107769, @liurupeng) [SIG Cloud Provider and Windows]
NodeRestriction admission: nodes are now allowed to update PersistentVolumeClaim status fields resizeStatus
and allocatedResources
when the RecoverVolumeExpansionFailure
feature is enabled. (#107686, @gnufied)
Only extend token lifetimes when --service-account-extend-token-expiration
is true and the requested token audiences are empty or exactly match all values for --api-audiences
. (#105954, @jyotimahapatra)
Prevent kube-scheduler from nominating a Pod that was already scheduled to a node (#109245, @alculquicondor)
Prevent unnecessary Endpoints
and EndpointSlice
updates caused by Pod ResourceVersion
change (#108078, @tnqn)
Print <default>
as the value in case kubectl describe ingress shows default-backend:80
when no default backend is present (#108506, @jlsong01)
Publishing kube-proxy metrics for Windows kernel-mode (#106581, @knabben)
Re-adds response status and headers on verbose kubectl responses (#108505, @rikatz)
Record requests rejected with 429 in the apiserver_request_total metric (#108927, @wojtek-t)
Removed validation if AppArmor profiles are loaded on the local node. This should be handled by the container runtime. (#97966, @saschagrunert)
Replace the url label of rest_client_request_duration_seconds
and rest_client_rate_limiter_duration_seconds
metrics with a host label to prevent cardinality explosions and keep only the useful information. This is a breaking change required for security reasons. (#106539, @dgrisonnet)
Restored NumPDBViolations
info of nodes, when HTTPExtender ProcessPreemption
. This info will be used in subsequent filtering steps - pickOneNodeForPreemption
(#105853, @caden2016)
Reverted graceful node shutdown to match 1.21 behavior of setting pods that have not yet successfully completed to “Failed” phase if the GracefulNodeShutdown feature is enabled in kubelet. The GracefulNodeShutdown feature is beta and must be explicitly configured via kubelet config to be enabled in 1.21+. This changes 1.22 and 1.23 behavior on node shutdown to match 1.21. If you do not want pods to be marked terminated on node shutdown in 1.22 and 1.23, disable the GracefulNodeShutdown feature. (#106901, @bobbypage)
Reverts the CRI API version surfaced by dockershim to v1alpha2 (#106803, @saschagrunert)
Services with “internalTrafficPolicy: Local” now behave more like
“externalTrafficPolicy: Local”. Also, “internalTrafficPolicy: Local,
externalTrafficPolicy: Cluster” is now implemented correctly. (#106497, @danwinship)
Sets JobTrackingWithFinalizers, a beta feature, as disabled by default, due to unresolved bug https://github.com/kubernetes/kubernetes/issues/109485 (#109487, @alculquicondor)
Skip re-allocate logic if pod is already removed to avoid panic (#108831, @waynepeking348)
The Service field spec.internalTrafficPolicy
is no longer defaulted for Services when the type is ExternalName
. The field is also dropped on read when the Service type is ExternalName
. (#104846, @andrewsykim)
The ServerSideFieldValidation
feature has been reverted to alpha for 1.24. (#109271, @liggitt)
The TopologyAwareHints
feature gate is now enabled by default. This will allow users to opt-in to Topology Aware Hints by setting the service.kubernetes.io/topology-aware-hints
on a Service. This will not affect any Services without that annotation set. (#108747, @robscott)
The deprecated flag --really-crash-for-testing
was removed. (#101719, @SergeyKanzhelev)
The kubelet no longer forcefully closes active connections on heartbeat failures, using the HTTP2 health check mechanism to detect broken connections. Users can force the previous behavior of the kubelet by setting the environment variable DISABLE_HTTP2. (#108107, @aojea)
This code change fixes the bug that UDP services would trigger unnecessary LoadBalancer updates. The root cause is that a field not working for non-TCP protocols is considered.
ref: https://github.com/kubernetes-sigs/cloud-provider-azure/pull/1090 (#107981, @lzhecheng)
Topology translation of in-tree vSphere volume to vSphere CSI. (#108611, @divyenpatel)
Updating kubelet permissions check for Windows nodes to see if process is elevated instead of checking if process owner is in Administrators group (#108146, @marosset)
apiserver
, if configured to reconcile the kubernetes.default
service endpoints, checks if the configured Service IP range matches the apiserver public address IP family, and fails to start if not. (#106721, @aojea)
kubectl version
now fails when given extra arguments. (#107967, @jlsong01)