Last modified December 16, 2024

Setting up a caching container registry within a cluster using Zot

A registry cache within the cluster can provide several benefits.

  • Availability, in case the upstream registry is experiencing an outage
  • Pull limits are enforced by certain upstream registries
  • Faster boot times simply because of being closer to the cluster

Here we explain how to set up a registry, using the Zot app provided by Giant Swarm. Zot is an OCI-native container image registry. The Giant Swarm packaged version extends it with opinionated components like autoscaling, monitoring, Cilium network policies, etc.

We explain how to deploy apps in our app platform docs.

Zot configuration

The Zot binary itself can be configured via .configFiles.config.json Helm value as JSON. The .mountConfig values must be set to true to enable mounting the configuration file.

configFiles:
  config.json: |-
    {
      # ...
    }    

The Zot project provides a full configuration reference with more details.

Caching strategies

You can configure either active or on-demand replication via the Zot configuration file.

  • Active (onDemand: false) means that Zot will actively sync images from the upstream registry, to have it available when needed.
  • On-demand (onDemand: true) means that Zot will only pull the image when it’s requested, and then cache it.

The below example configures docker.io as on-demand mirror, while the my-registry.example.org registry it’s set to actively sync all 6.* tagged images for my-organization/my-image.

{
  "extensions": {
    "sync": {
      "enable": true,
      "registries": [
        {
          "urls": [
            "https://docker.io/library"
          ],
          "onDemand": true,
          "tlsVerify": true,
          "maxRetries": 3,
          "retryDelay": "5m"
        },
        {
          "urls": [
            "https://my-registry.example.org"
          ],
          "onDemand": false,
          "pollInterval": "5m",
          "tlsVerify": true,
          "maxRetries": 3,
          "retryDelay": "5m",
          "onlySigned": false,
          "content": [
            {
              "prefix": "my-organization/my-image",
              "tags": {
                "regex": "^6.*",
                "semver": true
              }
            }
          ]
        }
      ]
    }
  }
}

Note: For Zot to be used on the cluster, you need to for example configure containerd on the nodes to use Zot as a mirror for given upstream registries. The containerd configuration in Giant Swarm clusters is currently subject to change. Please reach out to Giant Swarm support for the latest information.

For the full mirroring documentation, check the upstream documentation.

Restricting access to container images

If not configured specifically, access to the registry is public for anonymous users. This means that all workloads within the cluster can pull images from it.

To restrict access, you have to add configuration. Since Zot supports a variety of authentication mechanisms, we only provide an example here, using Basic authentication. For all the possible authentication methods, please refer to the Zot authentication docs.

To configure Basic authentication, set the .secretFiles Helm value and make sure .mountSecret is set to true, as shown in the example below. The .secretFiles content represents an htpasswd file with one user and password per line.

To generate the user entries, use the htpasswd tool like here, where we create two different users named admin and reader:

htpasswd -bBn admin password
htpasswd -bBn reader password

Store the output like shown in the example:

mountSecret: true
secretFiles:
  htpasswd: |-
    admin:$2y$05$.fR2nhyzstpecApibWVQDOg12aeXG8I4Zq8fW/ez8VJJ9zSc8zBQi
    reader:$2y$05$20dysb7065watYOcYZLo/unDEWgB0Lr6SAB7/hcyZVhtvZNkbN8rW    

Finally, enable it via the "http" key in the configuration file:

{
  ...
  "http": {
    "auth": {
      "htpasswd": {
        "path": "/secret/htpasswd"
      }
    },
    "accessControl": {
      "repositories": {
        "**": {
          "policies": [
            {
              "users": [
                "admin"
              ],
              "actions": [
                "read",
                "create",
                "update",
                "delete"
              ]
            },
            {
              "users": [
                "reader"
              ],
              "actions": [
                "read",
              ]
            }
          ]
        }
      }
    },
    "address": "0.0.0.0",
    "port": "5000"
  }
  ...
}

Note how the "policies" key is used to define the access control for the repositories. The "**" key is a wildcard for all repositories. The "actions" key defines the allowed actions for the users.

Exposing the registry

In some use-cases you possibly want to expose Zot to be used by let’s say workload clusters, so you manage only a single instance by sharing it across multiple workloads.

To enable the ingress in the Giant Swarm managed chart, use these settings matching your cluster:

ingress:
  enabled: true
  hosts:
    - host: my-registry.example.org
      paths:
        - path: /
  tls:
    - secretName: my-registry-tls
      hosts:
        - my-registry.example.org

For private clusters, the ingress needs to be annotated differently from the default for Cert Manager to generate a proper certificate.

ingress:
  annotations:
    cert-manager.io/cluster-issuer: private-giantswarm

Authenticating with the upstream registry

In case you want to cache container images from private registries, Zot needs credentials for accessing them. In order to provide these credentials, add an entry to the .secretFiles key in chart values. Here is an example snippet:

secretFiles:
  credentials: |-
    {
      "my-registry.example.org": {
        "username": "my-user",
        "password": "my-token"
      }
    }    

Then configure the sync extension to use that file as a source to authenticate towards registries via the Zot configuration file.

{
  ...
  "extensions": {
    "sync": {
      "credentialsFile": "/secret/credentials"
    }
  }
  ...
}

Configuration recommendations

Depending on your use of Zot, please consider these additional recommendations for your deployment.

Memory constraints

Zot can take up a lot of memory over time. In the Giant Swarm packaged app you can deploy it with a vertical pod autoscaler (VPA). You can set the resource constraints like shown here:

giantswarm:
  verticalPodAutoscaler:
    enabled: true
    maxAllowed:
      # Set this higher than .resources.limits.cpu
      # to stretch limits in case of higher load
      cpu: 750m
      # Set this higher than .resources.limits.memory
      # to stretch limits in case of higher load
      memory: 2048Mi

resources:
  requests:
    # The minimum amount of CPU required for your scenario
    cpu: 300m
    # The minimum amount of memory required for your scenario
    memory: 1024Mi
  limits:
    # The generally maximum amount of CPU required for your scenario
    cpu: 500m
    # The generally maximum amount of CPU required for your scenario
    memory: 1536Mi

In case of an out-of-memory kill, VPA controller will dynamically stretch the limits on the deployment based on Prometheus metrics up to what’s set under .giantswarm.verticalPodAutoscaler.maxAllowed.

In certain scenarios - in our experience - it’s better, more stable to run with a fixed amount of memory. You can enforce the constraint by setting the memory request and limit to the same value and configuring the Go garbage collector. Don’t forget to disable the VPA in this case!

giantswarm:
  verticalPodAutoscaler:
    enabled: false

resources:
  requests:
    cpu: 350m
    memory: 1024Mi
  limits:
    cpu: 500m
    memory: 1024Mi

env:
  - name: GOGC
    value: "50"
  - name: GOMEMLIMIT
    # Set this to about 80% of the memory limit.
    value: 800MiB

For more details on this approach, we recommend to read the guide to the Go garbage collector.

Deployment strategies

Some extension, like search can create file locks on the data volume mount. With the RollingUpdate strategy, this will cause the new pods failing to stand up. In such scenarios it’s recommended to set it to Recreate.

strategy:
  type: Recreate

This part of our documentation refers to our vintage product. The content may be not valid anymore for our current product. Please check our new documentation hub for the latest state of our docs.