Morphisms: Confidential Serverless Containers

Confidential Knative

Last updated 1 year ago

Confidential Knative

Secrets are deployed using a KMS as described in Secure Storage. Since serverless containers are mostly deployed in a cloud environment, DCAP is also used. In the figure above, Provisioning enclaves communicate with the IMS, not the actual user enclaves. Multiple enclaves can share the same Provisioning Enclave if they are on the same node. For simplicity, the user enclaves are illustrated for communicating with the IMS. Also, App A is scaled by the Activator and has two identical instances. This allows the two instances of App A to access the same secrets in the KMS. The cloud provider sets up the IMS and it's databases in the figure. Azure, for example, already provides these services.

A solution to the problem mentioned in the previous section is to additionally put the Queue Proxy and the Activator inside an enclave. All components where TLS termination occurs must be inside an enclave, otherwise the traffic will be readable in plaintext in RAM. The Istio gateway supports TLS passthrough, so with additional configuration, the gateway does not need to be inside an enclave. Since all Queue Proxies share the same code, the KMS can't distinguish between Queue Proxies, so a wildcard certificate is required to support all possible routes. In addition, a Queue Proxy does not know which route it has been deployed for, making it even more difficult to issue a certificate for a specific route. Also, the Activator and Queue Proxy must share the same certificate because the Activator can be removed from the path. If they do not share the same certificate, the end user is exposed to the possibility that the certificate could change at any time, which could cause the browser to fail.

Problems to solve

Another issue arises from TLS termination before the actual message reaches the serverless container. The client needs to be sure that each component where TLS termination occurs is actually running in an enclave. By proxying the messages, it communicates directly with only one enclave at a time. Because of the threat model, it cannot be assumed that the routing of the messages is legitimate, and a cloud administrator cannot influence it. This means that the first proxy may be an enclave, but the second may not be. For example, if the Activator is in the path, it can be in an enclave and have benign code, but an attacker can manipulate the routing to a Queue Proxy that is not in an enclave, read the traffic there, and then forward the traffic to the strong confidential serverless container. This allows the attacker to perform passive attacks undetected. Thus, the Activator must always be sure that the Queue Proxy is running inside an enclave. In addition, a Queue Proxy running inside an enclave must ensure that it only routes traffic to the SECS that the request is destined for. Routing from the Queue Proxy to the confidential container works by using HTTP over localhost and the port as an environment variable, since both are in the same pod and share the same network resources. Queue proxies do not perform any checks and assume that the correct container is in their pod. Since all Queue Proxies run the same code, and therefore have the same measurement, and cannot know which route they are deployed on, there is no way to distinguish between different Queue Proxies. To avoid further increasing the attack surface, we do not rely on information from the Kubernetes API, Autoscaler, etc. to know which route the Queue Proxy is deployed on. To further understand the problem, let's assume that the client connects to a malicious confidential serverless container through a Queue Proxy. This can be done by rerouting the traffic between the Queue Proxy and the confidential serverless container. Now, the malicious confidential serverless container can also act as a reverse proxy and forward the traffic to the actual destination. This allows sniffing and traffic manipulation.

Mutual TLS is not required between the components in the request path. Only the receiver of the request needs to be authenticated, since the sender already knows the message in cleartext. This would also require a special interface and thus changes in Knative's code, because to support browsers, regular TLS must also be supported. As explained below, security in this threat model can't be guaranteed without changing Knative's code.

PreviousArchitecture NextCertificates

Last updated 1 year ago

Problems to solve

One problem with this architecture is that it increases the attack surface by having to trust the code of the Queue Proxy and the Activator. Knative is open source and the code for both is publicly available. It's serving component has 5000 stars on it's , so trusting the code can be a valid assumption. Blindly trusting someone else's code always comes with a risk. Both, the Activator and the Queue-Proxy are small applications, thus less prone to bugs. Also since the code is open source, it is verifiable. Nevertheless, should bugs in the Activator or Queue Proxy code exist, they could allow an adversary to read and manipulate all network traffic of a SECS, since all TLS connections to an architectural component are always terminated. However, hijacking an Activator may be worse because an Activator often routes traffic for multiple SECSs, while a Queue Proxy is always associated with only one SECS. Even small bugs that do not result in a takeover can allow messages to be manipulated or read.