Results
Last updated
Last updated
The following graph shows that SEV is significantly faster than the others. At 25 requests, containerd, Gramine and Kata have almost exactly the same values. A difference can be seen at 50 requests, where Kata takes 72s, Gramine 75s, containerd 67s, and SEV 43s. At 75 requests, kata takes 100s, Gramine 99s, containerd 98s, and SEV 75s. For computers, even a second slower is a big impact. However, in the context of an end user waiting for a response, having to wait another second while already waiting 99s is not that much of a difference. It is already a long wait. However, it must also be said that not all requests took 99 seconds, for example, but the first response was received after 19 seconds. It only took that long for all requests to be processed.
As expected, due to the small number of cores available to the cVM, the CPU was always fully utilized. This graph also helps explain the relatively high time for containerd in the previous figure. Modern CPUs have a base frequency for idle and light load, and increase the frequency under heavy load. Since the CPU usage for containerd was very low for all measurements, with it's peak at 10%, the CPU was running at a much lower frequency than compared to Gramine and Kata. So another benchmark that would be very interesting to see would be the CPU frequency. But this is missing. I noticed this too late and unfortunately I no longer had access to the bare-metal machine. However, this is not really bad and makes the Containerd benchmarks from the previous graph a bit slower. The main goal is to benchmark Gramine and SEV. With 10 requests, Gramine's CPU usage is 34% and kata's is 58%. So kata has 14% more CPU usage, which is a huge amount for 14 available cores. From 50 requests, the CPU usage for kata and gramine is relatively similar.
The problem with Kata is that it does not remove the microVMs when used with Knative after the Knative service is deleted. I started the benchmarks with 75 requests, so the RAM usage in \ref{fig:ram-usage} is constant for Kata because 75 microVMs were already running in the background. A manual shutdown of all microVMs was required, which again I noticed too late. Nevertheless, Kata's RAM usage is slightly higher than Gramine's, with a difference of 1.5 GB for 75 instances. Gramine has a whole lib-OS in memory and Kata has a kernel and the Kata agent. So Gramine is lighter than Kata. As a reference, 75 instances of the same application in containerd requires only 3 GB of RAM and for SEV 5 GB. SEV introduces additional memory structures, such as the Guest Hypervisor Communication Block (GHCB), which can be justified by the increased memory usage. In addition, Azure has its own operating system for confidential VMs, so it is not the same as the one on the bare-metal machine.
Another interesting benchmark is to measure the cold-start time of a serverless container \ref{fig:graph_undeployed}. This omits the time it takes to create the Knative service and measures the time it takes to get a response back from the web server. In context, this is the time it takes for the web server to respond with a signature for the same message on a cold start. This simulates what would happen if already registered users of the wallet were to log in at the same time. As in the previous case, and probably for the same reasons, SEVs perform best here. For 25 requests, containerd takes 17s, kata 25s, Gramine 23s, and SEV 15s. So kata is 2 seconds slower than Gramine. On the other hand, for 50 requests, Kata takes only 40s and Gramine 42s. The biggest difference between Kata and Gramine is at 75 requests, where Kata takes 4 seconds longer with 64s compared to Gramine's 40s. Interestingly, the CPU and RAM usage here is almost identical to the previous graph and therefore not explained further.
The benchmark results show that starting multiple containers at the same time is extremely slow. If 75 requests require a cold start to process, the server takes up to 68 seconds to process all the requests. To put this into perspective, a web server that has already been started takes only a few milliseconds. With serverless containers, a full web server must always be started before a request can be processed, so the time is significantly higher. In addition, since SEV is an AMD technology and SGX is an Intel technology, the same hardware cannot be used, making comparable benchmarks much more difficult. A lot of attention would have to be paid to the choice of CPUs, which achieve comparable values when starting containers and have similar technical details. SEV's results are not quite comparable to SGX's, as SGX offers much higher security due to the fact that each serverless container resides in a TEE. It can be said that Kata performs worse compared to Gramine, especially at 75 requests. If SEV is also used for Kata, the results should be slightly worse. Therefore, at least with comparable isolation, we can conclude that SEV is slower than Gramine.