Scale-to-Zero
With conventional cloud platforms you need to keep at least one instance running always to be able to respond to incoming requests. Performing a just-in-time cold boot is too time-consuming and would create a response latency of many seconds or worse. This isn't the case with Unikraft Cloud. Unikraft Cloud is lightweight and based on Unikraft technology. Instances on Unikraft Cloud can cold boot within milliseconds, while providing the same strong, hardware-level isolation afforded by virtual machines.
Millisecond cold boots allow performing low-latency scale-to-zero. As long as no traffic is flowing through your instance, it consumes no resources. When the next connection arrives, Unikraft Cloud takes care of transparently cold booting (it's called cold booting if it's milliseconds?) your instance and replying. All within a negligible amount of time for Internet round-trip times and so unbeknownst to your end users.
By default, Unikraft Cloud reduces network and cloud stack cold start time to a min time. Some apps take a while to initialize (for example, Spring Boot, Puppeteer, etc). If you need to deploy such an app and still want millisecond cold starts, Unikraft Cloud provides a stateful feature. Please check out this guide for more information on how to set this up.
Policies
With the scale-to-zero policy you define under which circumstances Unikraft Cloud should put your instance into standby.
Unikraft Cloud currently supports the following scale-to-zero policies:
| Policy | Description |
|---|---|
off | Scale-to-zero isn't enabled. The instance keeps on running until manually stopped. |
on | Enables scale-to-zero. When there are no TCP connections or HTTP requests during the cool-down time, the instance goes into standby. |
idle | Same as on, but also puts the instance into standby when there are TCP connections established that have been inactive during the cool-down time. The connections remain established and incoming packets wake up the instance. Scale-to-zero doesn't happen while there are active HTTP requests (that is, traffic on ports, which use the http handler). |
Unikraft Cloud only considers network traffic that's going through its proxy to control scale-to-zero. Connections that your app actively establishes to servers on the Internet and within Unikraft Cloud bypass the proxy for maximum performance. The instance will thus go into standby irrespective of the presence of and data sent over such connections.
Application support for scale-to-zero
Your app can use scale-to-zero without any support from your app. Cases exist where making your app aware of scale-to-zero makes sense.
-
Background Jobs: For example, you want to run background jobs after your app has responded to a request (for example, send trace information to a logging server). In this case, you may want to temporarily disable scale-to-zero to ensure your instance isn't put to sleep while still performing work. Cases exist where making your app aware of scale-to-zero makes sense.
-
Long Request Processing: The same is true if your app can have long request processing times. Consider a setup where you use the
idlepolicy with plain TCP connections and configure a cool-down time of 10s. Sometimes if it takes your app 15s to process a request until the first response data gets sent, Unikraft Cloud will prematurely scale your instance to zero1.
Configuring a longer cool-down time can be a simple solution sometimes. But this isn't possible if the max duration of background jobs or request processing phases is unknown. It also means that you have to compromise between cost efficiency and reliability of your service.
Unikraft Cloud allows your app to temporarily disable scale-to-zero. You can have both a short cool-down phase and reliable operation no matter how long things may take. To control scale-to-zero from within your app, instances on Unikraft Cloud provide a special file-based interface:
| Path | Description |
|---|---|
/uk/libukp/scale_to_zero_disable | Allows to temporarily disable scale-to-zero. |
The scale_to_zero_disable pseudo file keeps track of the count of concurrent disable requests.
If the count is 0, scale-to-zero remains active1.
Any number larger than 0 means scale-to-zero is temporarily inactive.
Using a count instead of a boolean value helps many independent workers.
Your app workers can disable scale-to-zero individually by incrementing and decrementing the count without having to synchronize.
Reading the file returns the current count.
The value gets a prefix with an equals sign (that is, =X with X being the current count).
Writing to the file modifies the count.
You can use the following strings:
| String | Description |
|---|---|
+ | Increment the current count by one. |
- | Decrement the current count by one. |
=X | Set the count to X. |
+X | Increment the current count by X. |
-X | Decrement the current count by X. |
Any attempt to write an invalid string to the file returns an EINVAL error.
Any attempt to set a count less than 0 or larger than 2^64 returns an ERANGE error.
Writing to the file to disable scale-to-zero doesn't atomically disable scale-to-zero. There can be a short delay (a few milliseconds) until the Unikraft Cloud controller receives the changed value and makes scale-to-zero decisions accordingly. Ensure you configure a cool-down time large enough to accommodate for potential signal delay.
Stateful scale-to-zero
Some apps have a long cold boot phase or experience large first response latency. For example, to run just-in-time compilation and fill caches. Using stateful scale-to-zero can dramatically reduce the response time of your service. With stateful scale-to-zero Unikraft Cloud takes a snapshot of the instance state before putting it into standby. When incoming network traffic triggers a wake-up, the snapshot gets loaded. The instance resumes execution where it left off - with caches already warm.
Stateful scale-to-zero can also enable scale-to-zero for apps that need to keep state for functional correctness. This works even if cold boot times are no concern.
The time to load an instance snapshot is constant and in the order of a few milliseconds. This is what gets reported in the boot time metrics. The actual memory contents gets loaded from the snapshot only at first access. This reduces response latency by loading only what's necessary to process the request. This means that the first few requests might take longer until all required memory comes back.
The time to take an instance snapshot during scale-to-zero depends on the amount of memory touched by the app since the last start.
Getting started
Millisecond scale-to-zero applies via a label in each of
the subdirectories' Kraftfile or with the --scale-to-zero flag in relevant
command-line tool subcommands.
(yaml)
The cool-down flag tells Unikraft Cloud how long the instance must be idle before scaling to zero.
In the examples, values that work for each of them appear so you don't have to worry about this label if you don't want to.
You can disable scale-to-zero either by setting the label to false, or with the --scale-to-zero=off flag.
Since Unikraft Cloud has scale-to-zero on by default, all you need to do is to start an instance normally:
(bash)
This command will create the NGINX instance with scale-to-zero enabled:
(ansi)
Note that at first the system lists the status as running in the output of the kraft cloud deploy command.
Check the instance's status:
(bash)
You should see output like:
(ansi)
Notice the state is now set to standby?
At first kraft cloud deploy sets the state to running, but then Unikraft Cloud puts the instance immediately to sleep (more accurately, it stopped it, but it keeps state to start it again when needed).
You can also check that scale-to-zero gets enabled through the kraft cloud scale command:
(bash)
which outputs:
(ansi)
Note the min size (0) and max size (1) fields -- these mean that the service can scale from max 1 instance to min 0 instances, meaning that the system enables scale-to-zero.
Testing scale-to-zero
Now take this out for a spin.
Code
You should get an NGINX response with no noticeable delay.
For fun, try to use the following command to see if you can catch the instance's STATE field changing from standby to running
(bash)
If you curl enough, you should see the STATE turn to a green running from
time to time:
(ansi)
You can use scale-to-zero in conjunction with autoscale. This ensures that as you scale back down, if traffic dies, your last instance gets removed. You're not charged for the service. For more on autoscale please see the autoscale guide
Learn more
- The
kraft cloudcommand-line tool reference, and in particular the services and scale subcommands - Unikraft Cloud's REST API reference, and in particular the section on scale-to-zero.