Introduction to CXL Type 1

Guided Usecase
[1] and [2] are just Qemu's implementation of dm-crypto for LUKS; every device mapper over a physical block device will require a key and a crypto accelerator or software crypto implementation to decrypt to get the data. We implement a crypto accelerator with CXL type 1 semantics over a framework of virtio-crypto-pci. We want to emulate the mal state or unplug the crypto device; the kernel will get ATS bit DMAed data and resume CPU software crypto implementation.
Device emulation
DMA and access memory
Create a CacheMemRegion that maps a specific SPP region for one mapping of a bunch of CXL.cache caches on a CXL device.
Crypto operations
When calling crypto operations in the kernel, we actually offload the encrypt/decrypt operations to the type 1 accelerator through CXL.io, which tells the device cast on operation on arbitrary SPP. The accelerator will first take ownership of arbitrary SPP in the CacheMemRegion and notify the host. Eventually, the host will get the shared state of the SPP's cacheline.
Cache coherency emulation
struct D2HDataReq = {
D2H DataHeader 24b;
opcode 4b;
CXL.cache Channel Crediting;
}
struct CXLCache = {
64Byte Data;
MESI 4 bit;
ATS 64 bit;
[D2HDataReq;n] remaining bit;
}
Use metadata with Intel SPP writes protection support and mark the access to an arbitrary cacheline in the SPP. We need to perform all the transaction descriptions and queuing in the 64-byte residue data in the SPP. The arbitrary operation, according to the queue, will have the effect of MESI bit change and update writes protection for the subpage and root complex or other side effects like switches changing.
The host's and device's requests are not scheduled FIFO, but the host's seeing the data will have better priority. So, the H2D requirement will be consumed first and done in D2H FIFO. All the operations follow interface operations to CXLCache.
Taking exclusiveness-able
We mark the Transportation ATS bit as exclusiveness-able. and copy the cacheline in another map; once emulated and unplugged, the cacheline is copied back for further operation of the kernel to resume software crypto calculation.
How to emulate the eviction
We have two proposals
- pebs to watch the cache eviction of the physical address of an SPP
- Use sub-page pin page_get_fast to pin to a physical address within the last-level cache. [7]
The code is currently developed at https://github.com/SlugLab/Drywall/ and pitfalls refer to https://asplos.dev/wordpress/2023/11/27/intel-sub-page-write-protection-cai-keng/
Alternative implementation
Maybe we should limit the operation of the address from CXL.cache, write a memory santinization assisted JIT compiler and use LAM instead.
Reference
- https://www.os.ecc.u-tokyo.ac.jp/papers/2021-cloud-ozawa.pdf
- https://people.redhat.com/berrange/kvm-forum-2016/kvm-forum-2016-security.pdf
- https://yhbt.net/lore/all/[email protected]/T/
- https://privatewiki.opnfv.org/_media/dpacc/a_new_framework_of_cryptography_virtio_driver.pdf
- https://github.com/youcan64/spp_patched_qemu
- https://github.com/youcan64/spp_patched_linux
- https://people.kth.se/~farshin/documents/slice-aware-eurosys19.pdf