Features

Scale-to-Zero

With conventional cloud platforms you need to keep at least one instance running always to be able to respond to incoming requests. Performing a just-in-time cold boot is too time-consuming and would create a response latency of many seconds or worse. This isn't the case with Unikraft Cloud. Unikraft Cloud is lightweight and based on Unikraft technology. Instances on Unikraft Cloud can cold boot within milliseconds, while providing the same strong, hardware-level isolation afforded by virtual machines.

Millisecond cold boots allow performing low-latency scale-to-zero. As long as no traffic is flowing through your instance, it consumes no resources. When the next connection arrives, Unikraft Cloud takes care of transparently cold booting (it's called cold booting if it's milliseconds?) your instance and replying. All within a negligible amount of time for Internet round-trip times and so unbeknownst to your end users.

By default, Unikraft Cloud reduces network and cloud stack cold start time to a min time. Some apps take a while to initialize (for example, Spring Boot, Puppeteer, etc). If you need to deploy such an app and still want millisecond cold starts, Unikraft Cloud provides a stateful feature. Please check out this guide for more information on how to set this up.

If you add instances to a service, the service will load balance traffic across all them and Unikraft Cloud will ensure each instance gets woken up as needed. This differs from autoscale, in which you don't specify the number of instances. The platform does this for you based on traffic load.

Policies

With the scale-to-zero policy you define under which circumstances Unikraft Cloud should put your instance into standby.

Unikraft Cloud currently supports the following scale-to-zero policies:

Policy	Description
`off`	Scale-to-zero isn't enabled. The instance keeps on running until manually stopped.
`on`	Enables scale-to-zero. When there are no TCP connections or HTTP requests during the cool-down time, the instance goes into standby.
`idle`	Same as `on`, but also puts the instance into standby when there are TCP connections established that have been inactive during the cool-down time. The connections remain established and incoming packets wake up the instance. Scale-to-zero doesn't happen while there are active HTTP requests (that is, traffic on ports, which use the `http` handler).

Unikraft Cloud only considers network traffic that's going through its proxy to control scale-to-zero. Connections that your app actively establishes to servers on the Internet and within Unikraft Cloud bypass the proxy for maximum performance. The instance will thus go into standby irrespective of the presence of and data sent over such connections.

Application support for scale-to-zero

Your app can use scale-to-zero without any support from your app. Cases exist where making your app aware of scale-to-zero makes sense.

Background Jobs: For example, you want to run background jobs after your app has responded to a request (for example, send trace information to a logging server). In this case, you may want to temporarily disable scale-to-zero to ensure your instance isn't put to sleep while still performing work. Cases exist where making your app aware of scale-to-zero makes sense.
Long Request Processing: The same is true if your app can have long request processing times. Consider a setup where you use the idle policy with plain TCP connections and configure a cool-down time of 10s. Sometimes if it takes your app 15s to process a request until the first response data gets sent, Unikraft Cloud will prematurely scale your instance to zero¹.

Configuring a longer cool-down time can be a simple solution sometimes. But this isn't possible if the max duration of background jobs or request processing phases is unknown. It also means that you have to compromise between cost efficiency and reliability of your service.

Unikraft Cloud allows your app to temporarily disable scale-to-zero. You can have both a short cool-down phase and reliable operation no matter how long things may take. To control scale-to-zero from within your app, instances on Unikraft Cloud provide a special file-based interface:

Path	Description
`/uk/libukp/scale_to_zero_disable`	Allows to temporarily disable scale-to-zero.

The scale_to_zero_disable pseudo file keeps track of the count of concurrent disable requests. If the count is 0, scale-to-zero remains active¹. Any number larger than 0 means scale-to-zero is temporarily inactive. Using a count instead of a boolean value helps many independent workers. Your app workers can disable scale-to-zero individually by incrementing and decrementing the count without having to synchronize.

Reading the file returns the current count. The value gets a prefix with an equals sign (that is, =X with X being the current count). Writing to the file modifies the count. You can use the following strings:

String	Description
`+`	Increment the current count by one.
`-`	Decrement the current count by one.
`=X`	Set the count to `X`.
`+X`	Increment the current count by `X`.
`-X`	Decrement the current count by `X`.

Any attempt to write an invalid string to the file returns an EINVAL error. Any attempt to set a count less than 0 or larger than 2^64 returns an ERANGE error.

Writing to the file to disable scale-to-zero doesn't atomically disable scale-to-zero. There can be a short delay (a few milliseconds) until the Unikraft Cloud controller receives the changed value and makes scale-to-zero decisions accordingly. Ensure you configure a cool-down time large enough to accommodate for potential signal delay.

Stateful scale-to-zero

Some apps have a long cold boot phase or experience large first response latency. For example, to run just-in-time compilation and fill caches. Using stateful scale-to-zero can dramatically reduce the response time of your service. With stateful scale-to-zero Unikraft Cloud takes a snapshot of the instance state before putting it into standby. When incoming network traffic triggers a wake-up, the snapshot gets loaded. The instance resumes execution where it left off - with caches already warm.

Stateful scale-to-zero can also enable scale-to-zero for apps that need to keep state for functional correctness. This works even if cold boot times are no concern.

The time to load an instance snapshot is constant and in the order of a few milliseconds. This is what gets reported in the boot time metrics. The actual memory contents gets loaded from the snapshot only at first access. This reduces response latency by loading only what's necessary to process the request. This means that the first few requests might take longer until all required memory comes back.

The time to take an instance snapshot during scale-to-zero depends on the amount of memory touched by the app since the last start.

Getting started

Millisecond scale-to-zero applies via a label in each of the subdirectories' Kraftfile or with the --scale-to-zero flag in relevant command-line tool subcommands.

(yaml)
 
labels:
  cloud.unikraft.v1.instances/scale_to_zero.policy: "on"
  cloud.unikraft.v1.instances/scale_to_zero.cooldown_time_ms: 1000

The cool-down flag tells Unikraft Cloud how long the instance must be idle before scaling to zero. In the examples, values that work for each of them appear so you don't have to worry about this label if you don't want to.

You can disable scale-to-zero either by setting the label to false, or with the --scale-to-zero=off flag.

Since Unikraft Cloud has scale-to-zero on by default, all you need to do is to start an instance normally:

(bash)
 
git clone https://github.com/kraft-cloud/examples
cd examples/nginx/
kraft cloud deploy -p 443:8080 .

This command will create the NGINX instance with scale-to-zero enabled:

(ansi)
 
[●] Deployed successfully!
 │
 ├────────── name: nginx-1a747
 ├────────── uuid: 66d05e09-1436-4d1f-bbe6-6dc03ae48d7a
 ├───────── state: running
 ├─────────── url: https://twilight-gorilla-ui5b6kwt.fra.unikraft.app
 ├───────── image: nginx@sha256:19854a12fe97f138313cb9b4806828cae9cecf2d050077a0268d98129863f954
 ├───── boot time: 19.81 ms
 ├──────── memory: 128 MiB
 ├─────── service: twilight-gorilla-ui5b6kwt
 ├── private fqdn: nginx-1a747.internal
 ├──── private ip: 172.16.6.1
 └────────── args: /usr/bin/nginx -c /etc/nginx/nginx.conf

Note that at first the system lists the status as running in the output of the kraft cloud deploy command. Check the instance's status:

(bash)
 
kraft cloud instances list

You should see output like:

(ansi)
 
NAME         FQDN                                        STATE
nginx-1a747  twilight-gorilla-ui5b6kwt.fra.unikraft.app  standby

Notice the state is now set to standby? At first kraft cloud deploy sets the state to running, but then Unikraft Cloud puts the instance immediately to sleep (more accurately, it stopped it, but it keeps state to start it again when needed).

You can also check that scale-to-zero gets enabled through the kraft cloud scale command:

(bash)
 
kraft cloud scale get twilight-gorilla-ui5b6kw

which outputs:

(ansi)
 
          uuid: 126c4ecb-4718-4a25-9f75-ac9149da9e19
          name: twilight-gorilla-ui5b6kwt
       enabled: true
      min size: 0
      max size: 1
   warmup (ms): 1000
 cooldown (ms): 1000
        master: 66d05e09-1436-4d1f-bbe6-6dc03ae48d7a
      policies:

Note the min size (0) and max size (1) fields -- these mean that the service can scale from max 1 instance to min 0 instances, meaning that the system enables scale-to-zero.

Testing scale-to-zero

Now take this out for a spin.

Code
 

Note the `min size` (0) and `max size` (1) fields -- these mean that the service can scale from max 1 instance to min 0 instances, meaning that scale-to-zero is enabled.


### Testing scale-to-zero

Try using `curl` or your browser to see scale-to-zero (well, scale to 1 in this case!) in action:

```bash title=""
curl https://twilight-gorilla-ui5b6kwt.fra.unikraft.app

You should get an NGINX response with no noticeable delay. For fun, try to use the following command to see if you can catch the instance's STATE field changing from standby to running

(bash)
 
watch --color -n 0.5 kraft cloud instance list

If you curl enough, you should see the STATE turn to a green running from time to time:

(ansi)
 
NAME         FQDN                                        STATE
nginx-1a747  twilight-gorilla-ui5b6kwt.fra.unikraft.app  running

You can use scale-to-zero in conjunction with autoscale. This ensures that as you scale back down, if traffic dies, your last instance gets removed. You're not charged for the service. For more on autoscale please see the autoscale guide

Learn more

The kraft cloud command-line tool reference, and in particular the services and scale subcommands
Unikraft Cloud's REST API reference, and in particular the section on scale-to-zero.

Footnotes

This is never the case for ports of your service that have the http handler set. ↩ ↩²

Last modified on October 16, 2025

FAQ Load Balancing