November 9, 2023November 13, 2023 by vickieGPT

GPU Slicing Proposal Using CXLMemUring

The current GPU slicing by MIG is way to higher granularity.

The state of the art needs GPU on getting the service mesh requests letting you serve it, normally the launch time of the kernel and the data movement takes the most of the execution. Pre execution by MLIR JIT statically optimize out the launch but do GPU context coroutine.