Instances
This document describes the Unikraft Cloud Instances API (v1) for managing Unikraft instances. An instance is a Unikraft virtual machine running a single instance of your application.
Instance States
An instance can be in one of the following states:
State | Description |
---|---|
stopped | The instance is not running and does not count against live resource quotas. Connections cannot be established. |
starting | The instance is booting up. This usually takes just a few milliseconds. |
running | Your application's main entry point has been reached. |
draining | The instance is draining connections before shutting down. No new connections can be established. |
stopping | The instance is shutting down. |
standby | The instance has scale-to-zero enabled. The instance is not running, but will be automatically started when there are incoming requests. |
These are reported as instance state
by the endpoints.
Stop Reason
To understand why an instance has been stopped or is in the process of shutting down, Unikraft Cloud provides information about the stop reason.
You can retrieve this information via the GET /v1/instances
endpoint when an instance is in the draining
, stopping
, stopped
or standby
state.
The stop_reason
contains a bitmask that tells you the origin of the shutdown:
Bit | 4 [F] | 3 [U] | 2 [P] | 1 [A] | 0 [K] (LSB) |
---|---|---|---|---|---|
Desc. | This was a force stop1 | Stop initiated by user1 | Stop initiated by platform | App exited - exit_code available | Kernel exited - stop_code available |
1A forced stop does not give the instance a chance to perform a clean shutdown. Bits 0 [K] and 1 [A] can thus never be set for forced shutdowns. Consequently, there won't be an exit_code
or stop_code
.
2A stop command originating from the user is travelling through the platform controller. This is why bit 2 [P] will also always be set for user-initiated stops.
For example, the stop_reason
will contain the following values in the given scenarios:
Value | Bitmask | Scenario |
---|---|---|
28 | 11100 /FUP-- | Forced user-initiated shutdown. |
15 | 01111 /-UPAK | Regular user-initiated shutdown. The application and kernel have exited. The exit_code and stop_code indicate if the application and kernel shut down cleanly. |
13 | 01101 /-UP-K | The user initiated a shutdown but the application was forcefully killed by the kernel during shutdown. This can be the case if the image does not support a clean application exit or the application crashed after receiving a termination signal. The exit_code won't be present in this scenario. |
7 | 00111 /--PAK | Unikraft Cloud initiated the shutdown, for example, due to scale-to-zero. The application and kernel have exited. The exit_code and stop_code indicate if the application and kernel shut down cleanly. |
3 | 00011 /---AK | The application exited. The exit_code and stop_code indicate if the application and kernel shut down cleanly. |
1 | 00001 /----K | The instance likely expierenced a fatal crash and the stop_code contains more information about the cause of the crash. |
0 | 00000 /----- | The stop reason is unknown. |
There can be a short delay of a few milliseconds between the instance reaching the stopped
state and the stop_reason
being updated (or vice versa).
Exit Code
The application exit code is what the application returns upon leaving its main entry point.
The encoding of the exit_code
is application specific.
See the documentation of the application for more details.
Usually, an exit_code
of 0
indicates success / no failure.
Stop Code
The stop_code
is defined by the kernel and has the following encoding irrespective of the application.
Bits | 31 - 24 (8 bits) | 23 - 16 (8 bits) | 15 [T] | 14 - 8 (7 bits) | 7 - 0 (8 bits) |
---|---|---|---|---|---|
Desc. | reserved1 | errno | shutdown_bit | initlvl | reason |
1Reserved bits are set to 0. Ignore.
Reason
reason
can be any of the following values:
Value | Symbol | Scenario |
---|---|---|
0 | OK | Successful shutdown |
1 | EXP | The system detected an invalid state and actively stopped execution to prevent data corruption |
2 | MATH | An arithmetic CPU error (e.g., division by zero) |
3 | INVLOP | Invalid CPU instruction or instruction error (e.g., wrong operand alignment) |
4 | PGFAULT | Page fault - see errno for further details |
5 | SEGFAULT | Segmentation fault |
6 | HWERR | Hardware error |
7 | SECERR | Security violation (e.g., violation of memory access protections) |
A reason
of 0
indicates that the instance was shut down cleanly.
To detect if the system experienced a crash, all other bits of stop_code
can be ignored.
Init Level
initlvl
indicates during which initialization or shutdown phase the instance stopped. A level of 127
indicates that the instance was executing the application when it stopped.
Shutdown Bit
shutdown_bit
is set when the stop occurred while the system was shutting down.
Error Number
errno
is a Linux error code number that provides more detailed information about the root cause.
For example, an out-of-memory (OOM) situation usually results in a page fault PGFAULT(4)
with errno
being ENOMEM(12)
. Hence, the stop_code
would be 0x000C7F04=818948
and the stop_reason
would be ----K(1)
if the stopped occurred during application execution.
Restart Policy
When an instance stops either because the application exits or the instance crashes, Unikraft Cloud can auto-restart your instance. Auto-restarts are performed according to the restart policy configured for a particular instance. The policy can have the following values:
Policy | Description |
---|---|
never | Never restart the instance (default). |
always | Always restart the instance when the stop is initiated from within the instance (i.e., the application exits or the instance crashes). |
on-failure | Only restart the instance if it crashes. |
When an instance stops, the stop reason and the configured restart policy are evaluated to decide if a restart should be performed. Unikraft Cloud uses an exponential back-off delay (immediate, 5s, 10s, 20s, 40s, ..., 5m) to slow down restarts in tight crash loops. If an instance runs without problems for 10s the back-off delay is reset and the restart sequence ends.
The restart.attempt
reported in GET /v1/instances
counts the number of restarts performed in the current sequence.
The restart.next_at
field indicates when the next restart will take place if a back-off delay is in effect.
A manual start or stop of the instance aborts the restart sequence and resets the back-off delay.
Scale-To-Zero
With conventional cloud platforms you need to keep at least one instance running at all times to be able to respond to incoming requests. Performing a just-in-time cold boot is simply too time-consuming and would create a response latency of multiple seconds. This is not the case with Unikraft Cloud. Instances on Unikraft Cloud are able to cold boot within milliseconds, which allows us to perform low-latency scale-to-zero.
To enable scale-to-zero for an instance it is sufficient to add a scale_to_zero
configuration block.
Unikraft Cloud will then put the instance into standby if there is no traffic to your service within the window of a cooldown period.
When there is new traffic coming in, it is automatically started again.
If you have a heavyweight application that takes long to cold boot or has bad first request latency (e.g., with JIT compilation) consider to enable stateful scale-to-zero.
Policy
With the scale-to-zero policy you define under which circumstances Unikraft Cloud should put your instance into standby.
Unikraft Cloud currently supports the following scale-to-zero policies:
Policy | Description |
---|---|
off | Scale-to-zero is not enabled. The instance keeps on running until manually stopped |
on | Scale-to-zero is enabled. When there are no TCP connections or HTTP requests for the duration of the cooldown time, the instance is put into standby |
idle | Same as on , but also puts the instance into standby when there are TCP connections established that have been inactive for the duration of the cooldown time. The connections remain established and incoming packets wake up the instance. Scale-to-zero does not happen while there are active HTTP requests (i.e., traffic on ports, which have been marked with the http handler) |
Unikraft Cloud only considers network traffic that is going through its proxy to control scale-to-zero.
Connections that your application actively establishes to servers on the Internet and within Unikraft Cloud bypass the proxy for maximum performance.
The instance will thus be put into standby irrespective of the presence of and data sent over such connections.
See how you can make Unikraft Cloud cooperate with your application in these scenarios in the next section.
Application Support for Scale-To-Zero
Scale-to-zero can be used without any support from your application. However, there are cases where making your application aware of scale-to-zero makes sense.
Background Jobs For example, you want to run background jobs after your application has responded to a request (e.g., send trace information to a logging server). In this case, you may want to temporarily disable scale-to-zero to make sure your instance is not put to sleep while still performing work.
Long Request Processing The same is true if you application can have long request processing times.
Consider a setup where you use the idle
policy with plain TCP connections and configure a cooldown time of 10s.
If it takes your application 15s to process a request until the first response data is sent, Unikraft Cloud will prematurely scale your instance to zero1.
While configuring a longer cooldown time can be a simple solution in some cases, this is not possible if the maximum duration of background jobs or request processing phases is unknown. It also means that you have to compromise between cost efficiency and reliability of your service.
Unikraft Cloud allows your application to temporarily disable scale-to-zero so you can have both a short cooldown phase and reliable operation no matter how long things may take. To control scale-to-zero from within your application, instances on Unikraft Cloud provide a special file-based interface:
File | Description |
---|---|
/uk/libukp/scale_to_zero_disable | Allows to temporarily disable scale-to-zero |
The scale_to_zero_disable
pseudo file keeps track of the count of concurrent disable requests.
If the count is 0
, scale-to-zero is not disabled1, any number larger than 0
means scale-to-zero is temporarily disabled.
Using a count instead of a boolean value gives multiple independent workers of your application the ability to disable scale-to-zero individually by incrementing and decrementing the count without having to synchronize.
Reading the file returns the current count.
The value is prefixed with an equals sign (i.e., =X
with X
being the current count).
Writing to the file modifies the count.
The following strings are accepted:
String | Description |
---|---|
+ | Increment the current count by one |
- | Decrement the current count by one |
=X | Set the count to X |
+X | Increment the current count by X |
-X | Decrement the current count by X |
Any attempt to write an invalid string to the file returns an EINVAL
error.
Any attempt to set a count less than 0
or larger than 2^64
returns an ERANGE
error.
Writing to the file to disable scale-to-zero does not atomically disable scale-to-zero. There can be a short delay (usually a few milliseconds) until changing the value is communicated to the Unikraft Cloud controller which is responsible for making scale-to-zero decisions. Make sure to configure a cooldown time large enough to accomodate for potential signal delay.
1This is never the case for ports of your service that have the http
handler set.
2If it is actually enabled depends on the instance configuration.
Stateful Scale-To-Zero
If your application has a long cold boot phase or suffers from large first response latency, for example, to run JIT compilation and fill caches, using stateful scale-to-zero can dramatically reduce the response time of your service. With stateful scale-to-zero Unikraft Cloud takes a snapshot of the instance state before putting it into standby. When incoming network traffic triggers a wakeup, the snapshot is loaded and the instance resumes execution where it left off - with caches already warm.
As the name suggests, stateful scale-to-zero can also be used to enable scale-to-zero for applications that need to keep state for functional correctness, even if cold boot times are no concern.
The time to load an instance snapshot is roughly constant and usually in the order of a few milliseconds.
This is what is reported in the various boot time metrics.
However, the actual memory contents is loaded from the snapshot only at first access.
This is to reduce response latency by loading only what is really necessary to process the request at hand.
This means that the first few requests might take longer until all required memory has been brought back.
The time to take an instance snapshot during scale-to-zero depends on the amount of memory touched by the application since it was last started.