Last modified November 28, 2025

Job management with Kueue

Kueue is a Kubernetes-native system that manages quotas and how jobs consume them. It provides advanced job queueing, resource management, and fair sharing capabilities for batch workloads, machine learning training jobs, and other compute-intensive tasks. Giant Swarm supports Kueue through a managed app that simplifies installation and configuration.

This guide explains how to set up and use Kueue for job management in Giant Swarm workload clusters.

Overview

Kueue addresses the challenges of managing compute resources in multi-tenant Kubernetes environments by providing:

Why use Kueue

Resource quotas and fair sharing: Implement quotas and policies for fair resource distribution among different teams and tenants
Advanced job queueing: Queue jobs based on priorities with different strategies like StrictFIFO and BestEffortFIFO
Resource fungibility: Automatically use alternative resource flavors when preferred resources are unavailable
Preemption support: Allow higher-priority jobs to preempt lower-priority ones when resources are constrained
Gang scheduling support: All-or-nothing scheduling semantics for distributed workloads that require coordinated resource allocation
Multi-cluster functionality: Scale out jobs to different cluster targets to distribute your workloads, reducing costs and improving availability.

Core concepts

ClusterQueue: Defines resource quotas and admission policies for a cluster
LocalQueue: Provides a namespace-scoped interface to submit jobs to a ClusterQueue
ResourceFlavor: Represents different types of resources (for example, different instance types, zones)
Workload: Kueue’s representation of a job that needs resources
Cohort: Groups ClusterQueues to enable resource borrowing between them

Prerequisites

Before setting up Kueue, ensure you have:

A Giant Swarm workload cluster
kubectl configured to access your workload cluster
Access to the Giant Swarm platform API for app installation
Have JobSet extension on the workload cluster (helm install jobset oci://registry.k8s.io/jobset/charts/jobset --version 0.10.1). It will be automatically deployed in the near future as Kueue dependency.
Basic understanding of Kubernetes batch workloads and resource management

Installation

Kueue is available as a managed app in the Giant Swarm catalog. You can install it using the Giant Swarm app platform.

Install Kueue app

Install the Kueue app using kubectl gs:

kubectl gs template app \
  --catalog=giantswarm \
  --cluster-name=CLUSTER_NAME \
  --organization=ORGANIZATION \
  --name=kueue \
  --target-namespace=kueue-system \
  --version=0.1.0 > kueue.yaml

kubectl apply -f kueue.yaml

Replace CLUSTER_NAME and ORGANIZATION with your actual cluster name and organization.

Verify Installation

Check that Kueue components are running:

kubectl get pods -n kueue-system

Expected output:

NAME                                        READY   STATUS    RESTARTS   AGE
kueue-controller-manager-74c8f8c7c4-x7jwz   2/2     Running   0          2m

Verify that Kueue CRDs are installed:

kubectl get crd | grep kueue

Expected output:

clusterqueues.kueue.x-k8s.io                    2025-10-16T10:00:00Z
localqueues.kueue.x-k8s.io                      2025-10-16T10:00:00Z
resourceflavors.kueue.x-k8s.io                  2025-10-16T10:00:00Z
workloads.kueue.x-k8s.io                        2025-10-16T10:00:00Z
...

Configuration

Basic setup

Create a basic Kueue configuration with resource flavors, cluster queue, and local queue:

Step 1: Create resource flavors

Resource flavors represent different types of compute resources:

apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
  name: default-flavor
spec:
  nodeLabels:
    node.kubernetes.io/instance-type: m5.xlarge
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
  name: gpu-flavor
spec:
  nodeLabels:
    node.kubernetes.io/instance-type: g4dn.4xlarge
  tolerations:
  - key: nvidia.com/gpu
    value: "true"
    effect: NoSchedule

Step 2: Create queues

First, lets create a cluster queue like:

apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
  name: cluster-queue
spec:
  namespaceSelector: {}  # Allow all namespaces
  resourceGroups:
  - coveredResources: ["cpu", "memory"]
    flavors:
    - name: default-flavor
      resources:
      - name: "cpu"
        nominalQuota: 100
      - name: "memory"
        nominalQuota: 200Gi
  - coveredResources: ["nvidia.com/gpu"]
    flavors:
    - name: gpu-flavor
      resources:
      - name: "nvidia.com/gpu"
        nominalQuota: 4

And later create a local one for our example:

apiVersion: kueue.x-k8s.io/v1beta1
kind: LocalQueue
metadata:
  namespace: default
  name: local-queue
spec:
  clusterQueue: cluster-queue

Usage examples

Basic batch job

Create a simple batch job that uses Kueue for scheduling:

apiVersion: batch/v1
kind: Job
metadata:
  name: sample-job
  namespace: default
  labels:
    kueue.x-k8s.io/queue-name: local-queue
spec:
  parallelism: 3
  completions: 3
  suspend: true
  template:
    metadata:
      labels:
        kueue.x-k8s.io/queue-name: local-queue
    spec:
      containers:
      - name: sample-workload
        image: gcr.io/k8s-staging-perf-tests/sleep:latest
        args: ["30s"]
        resources:
          requests:
            cpu: 1
            memory: 200Mi
      restartPolicy: Never

Gang scheduling

Kueue supports gang scheduling through its “All-or-Nothing” semantics, ensuring that either all pods in a job are scheduled together or none are scheduled. This is particularly useful for distributed training jobs and coupled workloads.

Basic configuration

First, you need to customize the Kueue configuration to enable the waitForPodsReady setting:

apiVersion: v1
kind: ConfigMap
metadata:
  name: <CLUSTER_NAME>-kueue-userconfig
  namespace: org-<ORGANIZATION>
data:
  values: |
    waitForPodsReady:
      enable: true
      timeout: 10m

Now update the app resource to point to the configmap:

apiVersion: application.giantswarm.io/v1alpha1
kind: App
metadata:
  name: <CLUSTER_NAME>-kueue
  namespace: org-fer
spec:
  ...
  userConfig:
    configMap:
      name: <CLUSTER_NAME>-kueue-userconfig
      namespace: org-<ORGANIZATION>

Now define two new queues (Cluster and Local) to run the tests:

apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
  name: gang-cluster-queue
spec:
  namespaceSelector: {}
  resourceGroups:
  - coveredResources: ["cpu", "memory", "nvidia.com/gpu"]
    flavors:
    - name: default-flavor
      resources:
      - name: "cpu"
        nominalQuota: 10
      - name: "memory"
        nominalQuota: 20Gi
    - name: gpu-flavor
      resources:
      - name: "nvidia.com/gpu"
        nominalQuota: 2
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: LocalQueue
metadata:
  namespace: default
  name: gang-queue
spec:
  clusterQueue: "gang-cluster-queue"

Now, you create a JobSet to define a driver controller and some workers:

apiVersion: jobset.x-k8s.io/v1alpha2
kind: JobSet
metadata:
  name: gang-jobset
  namespace: default
spec:
  replicatedJobs:
  - name: driver
    replicas: 1
    template:
      metadata:
        labels:
          kueue.x-k8s.io/queue-name: gang-queue
      spec:
        parallelism: 1
        completions: 1
        template:
          spec:
            containers:
            - name: driver
              image: busybox:latest
              command: ["sh", "-c", "echo 'Driver job running'; sleep 60"]
              resources:
                requests:
                  cpu: 1
                  memory: 1Gi
            restartPolicy: Never
  - name: workers
    replicas: 3
    template:
      metadata:
        labels:
          kueue.x-k8s.io/queue-name: gang-queue
      spec:
        parallelism: 2
        completions: 2
        template:
          spec:
            containers:
            - name: worker
              image: busybox:latest
              command: ["sh", "-c", "echo 'Worker job running'; sleep 45"]
              resources:
                requests:
                  cpu: 2
                  memory: 2Gi
                  nvidia.com/gpu: 1
            restartPolicy: Never
            tolerations:
            - key: nvidia.com/gpu
              value: "true"
              effect: NoSchedule

Once you submit the JobSet, it will create two ReplicatedJobs, which, in turn, will make three worker replicas with two jobs assigned to each. Since those jobs require way more memory, CPU, and GPU resources, the job will not be scheduled, and the whole group will be re-queued after a timeout. You can relax the requests and see how the Kueue controller schedules the jobs altogether when capacity is available.

Prometheus metrics

Kueue comes with a set of built-in Prometheus metrics for observe the state of the jobs and queues. You need to pass the proper configuration at deployment time to get those into the Observability platform.

apiVersion: v1
kind: ConfigMap
metadata:
  name: <CLUSTER_NAME>-kueue-userconfig
  namespace: org-<ORGANIZATION>
data:
  values: |
    ...
    enablePrometheus: true

It will create a ServiceMonitor in the Kueue namespace which instruct the alloy agent to collect the specific metrics. Some of these metrics are:

kueue_admitted_workloads_total: Total number of admitted workloads
kueue_pending_workloads: Number of pending workloads per queue
kueue_cluster_queue_resource_usage: Resource usage per cluster queue
kueue_admission_wait_time_seconds: Time workloads wait before admission

You can access our Observability platform UI to get a glance of those metrics.

Advanced features

In multi-team environments where different teams have varying workload patterns competing for resources, you may need some higher abstractions to manage those resources. Cohorts enable dynamic resource redistribution, improving overall cluster utilization and reducing job wait times.

This is an example for enabling resource borrowing between cluster queues:

apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
  name: team-a-queue
spec:
  cohort: shared-cohort
  resourceGroups:
  - coveredResources: ["cpu", "memory"]
    flavors:
    - name: default-flavor
      resources:
      - name: "cpu"
        nominalQuota: 50
        borrowingLimit: 100
      - name: "memory"
        nominalQuota: 100Gi
        borrowingLimit: 200Gi
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
  name: team-b-queue
spec:
  cohort: shared-cohort
  resourceGroups:
  - coveredResources: ["cpu", "memory"]
    flavors:
    - name: default-flavor
      resources:
      - name: "cpu"
        nominalQuota: 50
        borrowingLimit: 100
      - name: "memory"
        nominalQuota: 100Gi
        borrowingLimit: 200Gi

Preemption policies

Preemption policies are essential when you need to ensure that critical, can access resources immediately, even when the cluster is fully utilized by lower-priority jobs. Preemption may help too maintaining SLA compliance and ensures that business-critical workloads are never blocked by less important tasks.

First, let’s configure the workload priority classes:

apiVersion: kueue.x-k8s.io/v1beta1
kind: WorkloadPriorityClass
metadata:
  name: high-priority
value: 1000
description: "Priority class for critical jobs"
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: WorkloadPriorityClass
metadata:
  name: low-priority
value: 100
description: "Priority class for non critical jobs"

Configure preemption to allow higher-priority jobs to preempt lower-priority ones:

apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
  name: preemption-queue
spec:
  preemption:
    reclaimWithinCohort: Any
    borrowWithinCohort:
      policy: LowerPriority
  resourceGroups:
  - coveredResources: ["cpu", "memory"]
    flavors:
    - name: default-flavor
      resources:
      - name: "cpu"
        nominalQuota: 100
      - name: "memory"
        nominalQuota: 200Gi

Now configure the job with the proper label:

  labels:
    ...
    kueue.x-k8s.io/priority-class: [low|hight]-priority

Now you can submit first the low priority job and after it is scheduled, you deploy the high priority one. You can observe how first one is evicted to leave space for the second one.

You can configure more complex scenarios using fair sharing.

Best practices

Resource Planning: Design resource flavors that match your actual node types and availability zones
Queue Organization: Create separate queues for different workload types (batch, ML training, gang-scheduled jobs)
Quota Management: Set realistic quotas based on actual cluster capacity and usage patterns
Gang Scheduling: Use all-or-nothing semantics for distributed workloads that require coordinated scheduling
Monitoring: Implement monitoring for queue depths, admission rates, and resource utilization
Preemption Strategy: Use preemption to balance resource efficiency with job stability
Testing: Test queue configurations in development environments before applying to production
Timeout Configuration: Set appropriate timeouts for gang-scheduled jobs to avoid resource deadlocks

Giant Swarm Offerings

Job management with Kueue

Overview

Why use Kueue

Core concepts

Prerequisites

Installation

Install Kueue app

Verify Installation

Configuration

Basic setup

Step 1: Create resource flavors

Step 2: Create queues

Usage examples

Basic batch job

Gang scheduling

Basic configuration

Prometheus metrics

Advanced features

Preemption policies

Best practices

Further reading

Need help, got feedback?

About the company

Giant Swarm Offerings

Job management with Kueue

Overview

Why use Kueue

Core concepts

Prerequisites

Installation

Install Kueue app

Verify Installation

Configuration

Basic setup

Step 1: Create resource flavors

Step 2: Create queues

Usage examples

Basic batch job

Gang scheduling

Basic configuration

Prometheus metrics

Advanced features

Cohorts for resource sharing

Preemption policies

Best practices

Further reading

Need help, got feedback?

About the company