What's outcome of Network of chip
The Cache Home Agent consists of Address Generation Unit(AGU), Address Translation Unit(ATU). The AGU is responsible for generating the physical address from the virtual address. The ATU is responsible for translating the physical address to the cache line address. The scheduling mechanism in the CPU includes:
- the fastest route from one CHA to the other.
- the scheduler for fetch exclusive/ fetch share/ invalidate before sharing or upgrade to directory based coherence protocol.
- CXL related: how to save message for remote fabric accesses?
- Is Intel HotChips' optics fabric good enough for in Rack communication?
Software hint using CXL.io to tell remote caching policy
The software hint is a hint to the remote caching policy. SDM's hint is a 4-bit vendor specific field in the CXL.io request to the mailbox to let the shared cacheline stuck there to save RTTs. But it's obviously an abuse to the mailbox since managing the cacheline level with control flow throttling the CXL bandwidth is not feasible. I think ZeroPoints' demo that utilzing mailbox for the compression memory is better use of mailbox. Programmably hinting the remote caching policy is a good idea. But I think it's a sophisticated timing problem requires observability tool for memory requests.
RoCC or MMIO for accelerator [4]
RoCC is useful for medium cohesive accelerator. If the accelerator is not cohesive, it's better to use MMIO. The strong cohesive part should be embedded into the pipeline of the processor. Most of UCB's work is using RoCC which is better for abstraction and generator using chisel, but it's not that high performance with the sacrifice of stall for communication.
Problem of using Distributed Shared Memory
Another Distributed System over the current infra. Fabric Manager will manage the fault tolerance, load balancing, and the data consistency. It still require remote software CPU efforts to decode the local host's shared memory primitives. CXL Fault Tolerance from SOSP23 provide a Memory RPC for Single Writer Single Reader and map the Ownership idea to the memory requests.
Reference
- SDM: Sharing-Enabled Disaggregated Memory System with Cache Coherent Compute Express Link
- Demystifying CXL with Genuine Devices
- Intel Optics Graph Processor PoC
- MMIO Peripherals