Cerebros: Evading the RPC Tax in Datacenters

Main Story

RPC always plays a crucial role in distributed systems. In Bilibili, when there's a huge amount of microservices that require RPC tax to do the data transmission. The SMartNIC, PIM, or Programmable Switches solution basically offloads the data calculation to outer computing power. The current hardware optimization mainly focuses on the transportation layer, and rarely cares about the whole execution process. Moreover, the instruction supply issue is also a bad idea, all the control paths will be injected into the binary run on the main OS.

Cerebros is an accelerator that can be attached to the NIC to read incoming RPC messages and hide its sends and recvs by overlapping the operation. The affinity logic by the CPU in the OS is not fit this design.

Problems

  1. CAM table for setting the type with called function address can be congestion.
  2. More control path means more software failure possibility.
  3. The reserved memory region should be preallocated with the metadata of RPCs on NIC cache which is a waste to the current DMA buffer.

RDMA pitfalls

For baseline commercial implementation baseline like Mellonoax. NIC can bypass the kernel to invoke the network stack already, the OS just needs to use its thread register to wait for QPs to end.

However, RDMA is not good for cross-datacenters or iWarp for the internet compared with TCPs. The latency is considerably small compared with the Switch Protocol calculation. However, the atomicity of the data transmission primitive on RDMA can be leveraged between private domains e.g. Shanghai to Beijing datacenter data transmission.

Comparison with MiniOOO and wBPF

The recent talk by
IMAGE 2022-04-28 14:02:57
IMAGE 2022-04-28 14:03:10

IMAGE 2022-04-28 14:03:17

IMAGE 2022-04-28 14:03:26

IMAGE 2022-04-28 14:03:33

对于相同长度的control path来说做一个通用的硬件观测模型比这个RPC Tax 故事更好听。