uManyCore: ISCA23

文章目录[隐藏]


Writing a microkernel for the village that can offload operators is definitely correct. The problem of how to define access to the memory node has long been discussed. Either a so-called semi-disaggregated MN like Yiying Zhang's Lego OS/ Cilo or purely one-sided RDMA based on prior distributed thoughts like FUSEE.

This work is leveraging hardware metrics in both the NIC and Mem to accelerate the request queue of the village accelerator access remote memory. This can either be memory dependent or a dry bulk load from the memory pool.

However, this work does not provide a distributed kernel paradigm that orchestrates all the CXL accelerator can get from remote memory and how hardware hint that can guide your distributed kernel and operator bytecode has been offloaded.

Reference

  1. https://github.com/dmemsys/FUSEE