Fleet Management

  • Changed

    • Update to upstream Kyverno Policies version 1.10.3.
  • Changed

    • In order to facilitate the migration from aws-cni to cilium we need to keep the standard network policies in place so that VPA can communicate with the k8s API while the clusters are being upgraded
  • Changed

    • Updated to upstream char version 9.3.0 which allows to set the topologySpreadConstraints for all 3 VPA components (updater, recommender and admission controller)
    • Removed /metrics rule in the updater cilium network policy because it can cause delays in the application availability when switching from a different CNI over to Cilium
  • Changed

    • Change how login works on CAPA and gcp to use our DNS record for the k8s API when using these providers, rather than the value found in the CAPI CRs.
  • Fixed

    • Fix applying RoleBindingTemplate to multiple namespaces
  • Changed

    WARNING: this version requires Cilium to run because of the dependency on the CiliumNetworkPolicy CRD

    • Upgrade dependency chart to 9.2.0.
    • Adjusted the resource and limits to accomodate larger clusters by default
    • Adjusted the admission controller to give it more QPS against the API
    • Adjusted the updater to give it more QPS against the API
    • Adjusted the recommender to give it
      • more QPS against the API
      • doubling the memory in case of an OOMKilled event
      • Using the 95% percentile for the calculation of the CPU usage: should allow to scale up more precisely to account for spikes in CPU consumption of the workload
      • Adjusted the resource and limits to accomodate larger clusters by default
      • Calculating recommendations only for workloads which do have a VPA custom resource, instead of all workloads
      • Removed standard network policies to decrease maintenance burden
      • Fixed Cilium Network Policy to allow CRD jobs execution
      • Added Cilium Network Policy weight for an early execution
      • Disabled VPA for the updater pod otherwise it keeps on getting re-scheduled because the memory consumption varies a lot between reconsiling resources and idle
      • Disabled VPA for the recommender pod otherwise it keeps on getting re-scheduled because the memory consumption varies a lot between reconsiling resources and idle
  • Added

    • Add cr example
    • Add namespace check to rolebindingtemplate controller
  • Changed

    • Move flux auth out of org/cluster namespace controllers and reconcile it via RoleBindingTemplates instead.
  • Removed

    • Remove write-clusters and write-nodepools cluster roles as it is unused.