Confidential Computing on IBM Protected Execution Facility

This year’s EuroSys conference included papers with significant contributions to the growing field of Confidential Computing.
This blog post reviews Confidential Computing on IBM Protected Execution Facility (PEF). It follows the paper Confidential Computing for OpenPOWER by Hunt et al, which describes the design and implementation of IBM PEF.
Besides several earlier presentations on IBM PEF at the recent Open POWER Summit or the Linux Security Summit, this is the first conference paper that describes it in detail.

Confidential Computing meets OpenPOWER

The POWER9 chip introduced Confidential Computing on IBM Protected Execution Facility. With this step, IBM joined the growing list of vendors that earlier introduced support for confidential computing in their product line.
These include Intel SGX, Intel TDX, AMD SEV, and the recently announced ARM Confidential Computing Architecture, as well as IBM’s own Z Secure Execution system.

The authors stress that the product cycle and legacy architecture of server class platforms influenced the design of PEF. This sets PEF apart from promising “greenfield” architectures such as Keystone.

PEF addresses three key challenges that hinder the broader adoption of TEEs:

  • Balancing between isolation and confidentiality. Existing server class processors make TEEs expensive by bundling confidentiality and isolation.
  • Reusing existing security technologies. Existing TEE implementations introduce many new security components and consequently either require changes to applications or trust in new components.
  • Lifecycle management of secure entities run inside the TEE. This is a crucial aspect missing today, especially in cloud deployments, where the adoption of TEEs promises to unlock new use cases.

The designers of PEF address these challenges and add a new most-privileged execution mode (the Protected Execution Ultravisor). This helps decouple isolation from confidentiality and integrity. It also helps simplify the life cycle of Secure Virtual Machines (SVMs) by removing the need for runtime attestation with the processor vendor.
Note that PEF reuses in its security architecture the already tried and tested Trusted Platform Module (TPM).

Security Model and Design Goals

PEF‘s threat model assumed for  is similar to the one for SGX.
It assumes an uncompromised platform, where the adversary has limited physical access reflecting the common server maintenance tasks.
Similar to SGX, the designers of PEF exclude side channel attacks and refer to the need for a more comprehensive solution to prevent them.

PEF prevents exposing sensitive state from SEVs to both the hypervisor and other SEVs, as well as allows users to verify the validity of the TEE. In this context a computation is valid if an unauthorized party did not modified it and if an unauthorized system cannot run it.

Protected Execution Facility

To implement PEF with minimal the changes to existing components – such as the hypervisor – the design implements a new CPU state, secure state, and a new firmware to manage it, the Protected Execution Ultravisor. I can’t help but mention here Butler Lampson’s famous aphorism: Another level of indirection can solve all problems in computer science.
Support for the ultravisor is already available in the Linux kernel v5.12.0.

The secure state is the highest privilege state in the POWER architecture. It complements the three pre-existing and mutually exclusive states of the Machine State Register: problem (for applications), privileged non-hypervisor (for OSs) and hypervisor.

The ultravisor maintains the isolation of the computation and associated data.
The ultravisor implementation uses 20 direct interfaces (ultracalls) and further uses 6 new hypervisor calls to start, stop and abort SVMs, communicate with the TPM, and perform memory management.
To support hypervisor paging of Normal Virtual Machines (NVMs) and hypervisor dump of SVMs, the ultravisor further supports moving secure page-content to insecure memory and back (it encrypts pages in Galois/Counter Mode prior to moving them)

Architectural Support for PEF

The secure state, the ultravisor and partitioning the memory into secure and normal memory are the three core architectural changes supporting PEF. These changes create a new boot sequence when PEF is enabled, namely:

  1. Host-boot, the first firmware loaded to initialize the hardware;
  2. OPAL, which stands for OpenPOWER Abstraction Layer, a firmware component that provides hardware related services to the OS after it is booted;
  3. Ultravisor, introduced above;
  4. OPAL, once more;
  5. Host operating system.

While booting the system, OPAL generates a random key and passes it on to the utravisor. Only the platform TPM and the ultravisor know this key, allowing them to communicate over a secure channel.
After communicating the key to the ultravisor, OPAL discards it, thus making it only known to the TPM and the ultravisor.
Thus, while the platform is booting, the trusted computing base (TCB) includes the hostboot, OPAL and the ultravisor; once the ultravisor is initialised, the TCB shrinks to only the ultravisor.

Integrity verification

Since all SVMs start their execution as an NVM, it is essential to verify both their integrity as well as the integrity of the platform.
In the case of PEF, verifying the platform means determining that it is trusted by the creator of the SVM.
SVM is considered to maintain integrity if no unauthorized party modified it and all of the initial parameters remained as the image the creator specified them.

The enter secure mode (ESM) ultracall (request to transition into an SVM) performs the verification. The ESM ultracall copies into secure memory all of the memory associated with the NVM requesting the transition. Thus, the state cannot be modified after verification prior to execution. It next verifies the platform and the integrity of the SVM.

  • Platform verification consists in verifying that the firmware is in the correct state. Note that the reference value is protected by the TPM. Correct state means that the firmware is trusted, the hardware is booting with secure boot enabled, and that PEF is enabled on the platform.
  • Integrity of the SVM is checked through local attestation, based on the information contained in the ESM operand, a data structure describing the expected state of the SVM.

Layout of the ESM operand

Figure 1: Layout of the ESM operand.

TPMs are essential in the verification process. The Ultravisor uses the TPM API in two cases. One is to establish a secure tunnel through the hypervisor. The other is to acquire the symmetric seed for the ESM operand associated with the ESM ultracall. The ultravisor reflects a newly added hypervisor call to KVM when it needs to utilize the TPM.

Performance

PEF has a minimal performance impact on computation. This is primarily since SVMs do not use encryption to protect data in memory. This causes performance degradation when memory access is not sequential.

SPEC CPU2017 benchmark results

Figure 2: SPEC CPU2017 benchmark results. The vertical bars indicate the min and max values of the runs.

However, the story is quite different when it comes to network performance.
The throughput achieved between NVMs running non-PEF and PEF-enabled firmware is virtually the same, indicating no major impact to network performance of NVMs.
Small messages cause a significant throughput degradation between normal and secure VMs of nearly 45%.
This is most likely due to the overhead associated with the bounce buffers that the I/O path of SVMs uses, as well as the cost of context switching between SVM and the host. This performance difference drops to ~10% as the message size grows larger and number of context switches lower.

Network performance results

Figure 3: Network performance results. Each transaction is a request and corresponding response. Message sizes: 90, 270, 512, 4K, and 16K byte. Vertical bars indicate the min and max values of the runs.

Limitations and opportunities

The authors of the PEF paper stress that product cycles place constraints and limitations on the design of the new features.
One such limitation is the lack of hardware memory encryption. This allows an adversary to probe memory at boot time and observe the key passed from the OPAL to the utravisor.
On the other hand, this allows projecting to some extent the evolution of features in the upcoming iterations.
Thus, Transparent Memory Encryption announced in the upcoming POWER10 protects the confidentiality of memory from physical probing and eliminates this attack vector.

Missing migration support in the current version of the ultravisor is another important limitation. AMD recently announced migration of encrypted VMs, so we can expect that PEF will catch up with the competition.

Dynamic allocation of secure memory (DASM) is another limitation that upcoming product cycles will resolve.
Currently, the absence of DASM increases the size of the ultravisor which must also implement memory management.
Authors expect hardware support for DASM, memory over-commit and memory sharing between SVMs in the near future. This will allow to further simplify the implementation of the ultravisor.

Final thoughts

IBM PEF is a VM-based confidential computing environment, conceptually similar to AMD SEV and Intel TDX.
Implementation details differ quite a bit. However, such solutions prepare a solid ground for deploying confidential cloud computing on a wider scale. If you need enterprise support for deploying and operating confidential VMs then just contact us!

Get Started!

Explore how Confidential Cloud helps to secure your cloud infrastructure, protect your data from any AI workload and in turn, enable new business.

 

YOU MAY ALSO LIKE …

CanaryBit unlocked Confidential AI with its first pilot customers

CanaryBit unlocked Confidential AI with its first pilot customers

During the last year, the CanaryBit team worked hard on five projects together with four of its pilot customers. The team used the services of its Confidential Cloud solution (Studio, Tower and Inspector) to run Confidential AI workloads and secure the customer's...

CanaryBit joins ABB ‘s innovation growth hub SynerLeap

CanaryBit joins ABB ‘s innovation growth hub SynerLeap

CanaryBit has become a member of Synerleap, ABB's innovation growth hub. Synerleap aims to create an ecosystem where ABB can utilize and enable technology companies to grow and expand on a global market in its business areas including industrial automation, robotics...

2023: more business, more challenges, more success to celebrate

2023: more business, more challenges, more success to celebrate

And just like that, in a blink of an eye, we have found ourselves at the end of yet another year. 2023 meant a lot to CanaryBit: it brought more business and challenges but also set the ground for growth for several years ahead. Let's rewind the year before it ends...