Zudoku
Cloud Platform

Instances

This document describes the Unikraft Cloud Instances API (v1) for managing Unikraft instances. An instance is a Unikraft virtual machine running a single instance of your application.

Instance States

An instance can be in one of the following states:

StateDescription
stoppedThe instance is not running and does not count against live resource quotas. Connections cannot be established.
startingThe instance is booting up. This usually takes just a few milliseconds.
runningYour application's main entry point has been reached.
drainingThe instance is draining connections before shutting down. No new connections can be established.
stoppingThe instance is shutting down.
standbyThe instance has scale-to-zero enabled. The instance is not running, but will be automatically started when there are incoming requests.

These are reported as instance state by the endpoints.

Stop Reason

To understand why an instance has been stopped or is in the process of shutting down, Unikraft Cloud provides information about the stop reason. You can retrieve this information via the GET /v1/instances endpoint when an instance is in the draining, stopping, stopped or standby state.

The stop_reason contains a bitmask that tells you the origin of the shutdown:

Bit4 [F]3 [U]2 [P]1 [A]0 [K] (LSB)
Desc.This was a force stop1Stop initiated by user1Stop initiated by platformApp exited - exit_code availableKernel exited - stop_code available

1A forced stop does not give the instance a chance to perform a clean shutdown. Bits 0 [K] and 1 [A] can thus never be set for forced shutdowns. Consequently, there won't be an exit_code or stop_code.
2A stop command originating from the user is travelling through the platform controller. This is why bit 2 [P] will also always be set for user-initiated stops.

For example, the stop_reason will contain the following values in the given scenarios:

ValueBitmaskScenario
2811100/FUP--Forced user-initiated shutdown.
1501111/-UPAKRegular user-initiated shutdown. The application and kernel have exited. The exit_code and stop_code indicate if the application and kernel shut down cleanly.
1301101/-UP-KThe user initiated a shutdown but the application was forcefully killed by the kernel during shutdown. This can be the case if the image does not support a clean application exit or the application crashed after receiving a termination signal. The exit_code won't be present in this scenario.
700111/--PAKUnikraft Cloud initiated the shutdown, for example, due to scale-to-zero. The application and kernel have exited. The exit_code and stop_code indicate if the application and kernel shut down cleanly.
300011/---AKThe application exited. The exit_code and stop_code indicate if the application and kernel shut down cleanly.
100001/----KThe instance likely expierenced a fatal crash and the stop_code contains more information about the cause of the crash.
000000/-----The stop reason is unknown.

There can be a short delay of a few milliseconds between the instance reaching the stopped state and the stop_reason being updated (or vice versa).

Exit Code

The application exit code is what the application returns upon leaving its main entry point. The encoding of the exit_code is application specific. See the documentation of the application for more details. Usually, an exit_code of 0 indicates success / no failure.

Stop Code

The stop_code is defined by the kernel and has the following encoding irrespective of the application.

Bits31 - 24 (8 bits)23 - 16 (8 bits)15 [T]14 - 8 (7 bits)7 - 0 (8 bits)
Desc.reserved1errnoshutdown_bitinitlvlreason

1Reserved bits are set to 0. Ignore.

Reason reason can be any of the following values:

ValueSymbolScenario
0OKSuccessful shutdown
1EXPThe system detected an invalid state and actively stopped execution to prevent data corruption
2MATHAn arithmetic CPU error (e.g., division by zero)
3INVLOPInvalid CPU instruction or instruction error (e.g., wrong operand alignment)
4PGFAULTPage fault - see errno for further details
5SEGFAULTSegmentation fault
6HWERRHardware error
7SECERRSecurity violation (e.g., violation of memory access protections)

A reason of 0 indicates that the instance was shut down cleanly. To detect if the system experienced a crash, all other bits of stop_code can be ignored.

Init Level initlvl indicates during which initialization or shutdown phase the instance stopped. A level of 127 indicates that the instance was executing the application when it stopped.

Shutdown Bit shutdown_bit is set when the stop occurred while the system was shutting down.

Error Number errno is a Linux error code number that provides more detailed information about the root cause.

For example, an out-of-memory (OOM) situation usually results in a page fault PGFAULT(4) with errno being ENOMEM(12). Hence, the stop_code would be 0x000C7F04=818948 and the stop_reason would be ----K(1) if the stopped occurred during application execution.

Restart Policy

When an instance stops either because the application exits or the instance crashes, Unikraft Cloud can auto-restart your instance. Auto-restarts are performed according to the restart policy configured for a particular instance. The policy can have the following values:

PolicyDescription
neverNever restart the instance (default).
alwaysAlways restart the instance when the stop is initiated from within the instance (i.e., the application exits or the instance crashes).
on-failureOnly restart the instance if it crashes.

When an instance stops, the stop reason and the configured restart policy are evaluated to decide if a restart should be performed. Unikraft Cloud uses an exponential back-off delay (immediate, 5s, 10s, 20s, 40s, ..., 5m) to slow down restarts in tight crash loops. If an instance runs without problems for 10s the back-off delay is reset and the restart sequence ends.

The restart.attempt reported in GET /v1/instances counts the number of restarts performed in the current sequence. The restart.next_at field indicates when the next restart will take place if a back-off delay is in effect.

A manual start or stop of the instance aborts the restart sequence and resets the back-off delay.

Scale-To-Zero

With conventional cloud platforms you need to keep at least one instance running at all times to be able to respond to incoming requests. Performing a just-in-time cold boot is simply too time-consuming and would create a response latency of multiple seconds. This is not the case with Unikraft Cloud. Instances on Unikraft Cloud are able to cold boot within milliseconds, which allows us to perform low-latency scale-to-zero.

To enable scale-to-zero for an instance it is sufficient to add a scale_to_zero configuration block. Unikraft Cloud will then put the instance into standby if there is no traffic to your service within the window of a cooldown period. When there is new traffic coming in, it is automatically started again.

If you have a heavyweight application that takes long to cold boot or has bad first request latency (e.g., with JIT compilation) consider to enable stateful scale-to-zero.

Policy

With the scale-to-zero policy you define under which circumstances Unikraft Cloud should put your instance into standby.

Unikraft Cloud currently supports the following scale-to-zero policies:

PolicyDescription
offScale-to-zero is not enabled. The instance keeps on running until manually stopped
onScale-to-zero is enabled. When there are no TCP connections or HTTP requests for the duration of the cooldown time, the instance is put into standby
idleSame as on, but also puts the instance into standby when there are TCP connections established that have been inactive for the duration of the cooldown time. The connections remain established and incoming packets wake up the instance. Scale-to-zero does not happen while there are active HTTP requests (i.e., traffic on ports, which have been marked with the http handler)

Unikraft Cloud only considers network traffic that is going through its proxy to control scale-to-zero. Connections that your application actively establishes to servers on the Internet and within Unikraft Cloud bypass the proxy for maximum performance. The instance will thus be put into standby irrespective of the presence of and data sent over such connections.

See how you can make Unikraft Cloud cooperate with your application in these scenarios in the next section.

Application Support for Scale-To-Zero

Scale-to-zero can be used without any support from your application. However, there are cases where making your application aware of scale-to-zero makes sense.

Background Jobs For example, you want to run background jobs after your application has responded to a request (e.g., send trace information to a logging server). In this case, you may want to temporarily disable scale-to-zero to make sure your instance is not put to sleep while still performing work.

Long Request Processing The same is true if you application can have long request processing times. Consider a setup where you use the idle policy with plain TCP connections and configure a cooldown time of 10s. If it takes your application 15s to process a request until the first response data is sent, Unikraft Cloud will prematurely scale your instance to zero1.

While configuring a longer cooldown time can be a simple solution in some cases, this is not possible if the maximum duration of background jobs or request processing phases is unknown. It also means that you have to compromise between cost efficiency and reliability of your service.

Unikraft Cloud allows your application to temporarily disable scale-to-zero so you can have both a short cooldown phase and reliable operation no matter how long things may take. To control scale-to-zero from within your application, instances on Unikraft Cloud provide a special file-based interface:

FileDescription
/uk/libukp/scale_to_zero_disableAllows to temporarily disable scale-to-zero

The scale_to_zero_disable pseudo file keeps track of the count of concurrent disable requests. If the count is 0, scale-to-zero is not disabled1, any number larger than 0 means scale-to-zero is temporarily disabled. Using a count instead of a boolean value gives multiple independent workers of your application the ability to disable scale-to-zero individually by incrementing and decrementing the count without having to synchronize.

Reading the file returns the current count. The value is prefixed with an equals sign (i.e., =X with X being the current count). Writing to the file modifies the count. The following strings are accepted:

StringDescription
+Increment the current count by one
-Decrement the current count by one
=XSet the count to X
+XIncrement the current count by X
-XDecrement the current count by X

Any attempt to write an invalid string to the file returns an EINVAL error. Any attempt to set a count less than 0 or larger than 2^64 returns an ERANGE error.

Writing to the file to disable scale-to-zero does not atomically disable scale-to-zero. There can be a short delay (usually a few milliseconds) until changing the value is communicated to the Unikraft Cloud controller which is responsible for making scale-to-zero decisions. Make sure to configure a cooldown time large enough to accomodate for potential signal delay.

1This is never the case for ports of your service that have the http handler set.
2If it is actually enabled depends on the instance configuration.

Stateful Scale-To-Zero

If your application has a long cold boot phase or suffers from large first response latency, for example, to run JIT compilation and fill caches, using stateful scale-to-zero can dramatically reduce the response time of your service. With stateful scale-to-zero Unikraft Cloud takes a snapshot of the instance state before putting it into standby. When incoming network traffic triggers a wakeup, the snapshot is loaded and the instance resumes execution where it left off - with caches already warm.

As the name suggests, stateful scale-to-zero can also be used to enable scale-to-zero for applications that need to keep state for functional correctness, even if cold boot times are no concern.

The time to load an instance snapshot is roughly constant and usually in the order of a few milliseconds. This is what is reported in the various boot time metrics. However, the actual memory contents is loaded from the snapshot only at first access. This is to reduce response latency by loading only what is really necessary to process the request at hand. This means that the first few requests might take longer until all required memory has been brought back.

The time to take an instance snapshot during scale-to-zero depends on the amount of memory touched by the application since it was last started.

Last modified on