Use hw sw codesign object coherency rather than CXL3.0 cacheline level coherency.
Low-overhead General-purpose Near-Data Processing in CXL Memory Expanders
This paper proposed a low latency general NDP offloading architecture $M^2NDP$. It has memory mapped function $M^2func$ and memory mapped uthreading $M^2\micro thr$
ClickHouse IOUring
- iouring fs插桩bpf uring context,到最底下的nvme层dispatch一段batching读的代码。iouring sock xdp一个请求接一个请求dispatch。穿透两层,一个是iouring层,另外是xdp和xrp插桩的地方层。
- MergeTree到ReplicatedMergeTree,想用iouring batch socket接read的一些请求。