By: Valentina Palmiotti, @chompie1337
At Grapl we believe that in order to build the best defensive system we need to deeply understand attacker behaviors. As part of that goal we're investing in offensive security research. Keep up with our blog for new research on high risk vulnerabilities, exploitation, and advanced threat tactics.
This blog post covers attacking a vulnerability in Firecracker, an open source micro-virtual machine (microVM) monitor written in the Rust programming language. It was developed for use in AWS Lambda, a serverless software-as-a-service (SaaS) application hosting service. Firecracker is also used for AWS’ similar Fargate service that provides a way to run containers without having to manage servers for container orchestration. Due to the risks that are introduced via multi-tenancy, Firecracker was intentionally designed with security mind.
In this post, we’ll cover the following topics:
- What is Firecracker?
- Why attack it?
- How does it work?
- Root cause analysis of a memory corruption vulnerability, CVE-2019-18960
- Exploit primitives and analysis of exploitability
- Reflections and takeaways as they relate to security
I had no knowledge of Firecracker (or Rust) prior to conducting this research. My hope is that this post will be useful for those wanted to learn about virtualization, Firecracker, KVM and provide some clarity on the various layers of virtualization and VM escape exploitation.
Firecracker: What is it?
Firecracker is an open source virtual machine monitor (VMM) created and maintained by Amazon Web Services (AWS). Per Amazon’s website, Firecracker is a “new virtualization and open source technology that enables service owners to operate secure multi-tenant container-based services by combining the speed, resource efficiency, and performance enabled by containers with the security and isolation offered by traditional VMs.” .
Firecracker is comparable to QEMU-KVM; they are both VMMs that utilize KVM, a hypervisor built into the Linux kernel. Firecracker was designed to prioritize security and efficiency for serverless workloads. This led to some key design differences to QEMU. Firecracker is much less flexible than QEMU. In order to minimize complexity and attack surface, Firecracker forgoes non-essential functionality. QEMU, on the other hand, has had many vulnerabilities arise from complex device implementations.
Technology like Firecracker is of particular interest to Grapl because we’re building a multi-tenant system with customer provided code execution. Therefore, it is of upmost importance that multi-tenant boundaries can not be violated. Firecracker is used by AWS to isolate runtimes from each other. Before deciding to use Firecracker in production, we conducted a security review of the product to evaluate whether it was appropriate for our use case. We also wanted to conduct offensive driven research to come up with hardening measures that are effective and worthwhile to implement in our environment. Because Grapl’s use case is specific, unlike AWS which has to run arbitrary applications, we can enforce more constraints on our application (such as execution time, resource usage, credential limitations, the files available to it, etc). This research came as a result of our security review.
How Does it Work?
First, I’ll briefly explain generally how a virtual machine monitor (VMM) uses KVM and then get into the specifics of Firecracker.
Firecracker is a VMM that uses the Linux Kernel’s KVM virtualization infrastructure to provide Linux and OSv microVMs on Linux hosts. On the host, there is one Firecracker process per microVM.
Storage is done via block device rather than file system passthrough, to avoid giving the guest access to the host’s Linux kernel filesystem code, which is complex (and often has exploitable bugs). Firecracker also exposes a REST based configuration API over UNIX domain socket .
There have only been three CVEs registered for Firecracker since its creation, and only one that can potentially lead to RCE on the host. In addition to being an RCE vulnerability, I chose to look at CVE-2019-18960 because it is a memory corruption vulnerability. Being completely new to Rust, I thought it would be worthwhile to examine how memory corruption vulnerabilities can still occur in a memory safe language.
This is interesting, because without this second bug, the previously discussed bug would not be exploitable.
Before beginning to write code, I wanted to first look at what exploit primitives can be constructed with the vulnerability, theoretically. I had some concerns:
a) The area of out of bound’s memory that can be read/written to is limited to a specific area.
b) Runtime mitigations in Rust are restrictive.
In order to evaluate the exploitability of this vulnerability we need to investigate what memory can be accessed with the exploit primitive.
While Firecracker’s design is security focused, there are a some hardening measures that can be used to further lock down the attack surface.
First, limit untrusted code to running with the lowest privileges possible. Additionally, hardening the guest operating system and running a fully patched kernel is crucial. Without guest kernel execution, an attacker has no way to exploit the vulnerability covered in this post.
The primary recommendation from the authors of Firecracker is to use jailer, a program designed to isolate the Firecracker process in order to enhance security. In the case of exploiting the discussed vulnerability, a takeover of the Firecracker process yields a restrictive execution environment. An attacker would need to bypass all the restrictions imposed by jailer to escalate privileges and execute outside of the Firecracker process. Read a step by step account of what the jailer program does on startup here.
Security Reflections and Takeaways
There are some of the major security takeaways gleaned from doing this short research project exploiting Firecracker:
On the Kernel:
Kernel hardening and attack surface reduction is critical, despite the potential to impose restrictions on use or negatively impact performance. Given a Firecracker vulnerability like the one covered in this post, protecting the kernel prevents an attacker with access to the attack surface. If an attacker did successfully exploit this vulnerability, they would have access to the host and any other VMs executing on that host.
On Firecracker Design:
Although not all exploitable bugs are that of memory safety, an interesting project for a vulnerability researcher is to search for unsafe blocks in Rust codebases and look for cases where they can be abused. Code comments asserting the safety of these blocks are clues into the assumptions the developer has made, indicating exactly what should be checked. To this aim, a researcher might be interested in cargo-geiger, which can help identify unsafe blocks in a codebase as well as their dependencies.
Takeaways for Grapl’s Multi-tenant Architecture
This research was critical to understanding what strategies work best for hardening our multi-tenant architecture. Based on this work, we concluded there should be a focus on hardening the guest operating system. This limits an attacker’s ability to exploit the guest kernel, thus cutting off a considerable attack surface.
Given the difficulty in exploiting Firecracker even with control over the kernel, we feel confident in our solution.
Security research is critical to Grapl as a company. It helps us keep customer data safe, understand the technology we use at a deeper level, and think through advanced attack scenarios. As part of this research we generated ideas for detection logic, areas for further hardening, and more, which feeds back into product development.
Ultimately, we walk away from this research with a very positive view of Firecracker, a much deeper understanding of its internals, and confidence in our mitigations.
My amazing colleagues at Grapl:
Ian Nickles, for his help with instrumentation, Rust, and general research.
Andréa, for her incredible work on the diagrams.
Colin O’Brien, for his help with Rust.
Max Wittek, for his help with Firecracker.