Software-Defined Far Memory in Warehouse-Scale Computers

design learning-based autotuning to periodically adapt our design to fleet-wide changes without a human in the loop.
- machine learning algorithm called Gaussian Process (GP) Bandit [17, 21, 39].
we demonstrate that zswap [1], a Linux kernel mechanism that stores memory compressed in DRAM, can be used to implement software-defined far memory that provides tail.
- The control mechanism for far memory in WSCs requires
  1. tight control over performance slowdowns to meet defined SLOs
  2. low CPU overhead so as to maximize the TCO savings from far memory.
Cold Page Identification Mechanism
- We base this mechanism on prior work [28, 42, 46].
- we design our system to keep the promotion rate below P% of the application’s working set size per minute, which serves as a Service Level Objective (SLO) for far memory performance.
- define the working set size of an application as the total number of pages that are accessed within minimum cold age threshold (120 s in our system).
- The exact value of depends on the performance differ P ence between near memory and far memory. For our deployment, we conducted months-long A/B testing at scale with production workloads and empirically determined P to be 0.2%/min.
Controlling the Cold Age Threshold， build a for each job in the OS kernel promotion histogram
- our system builds per-job for a given cold page histogram set of predefined cold age thresholds.
- We use Linux’s memory cgroup (memcg) [2] to isolate jobs in our WSC.
- We use the lzo algorithm to achieve low CPU overhead for compression and decompression
- We maintain a global zsmalloc arena per machine, with an explicit compaction interface that can be triggered by the node agent when needed.
- Empirically, there are no gains to be derived by storing zsmalloc payloads larger than 2990 bytes (73% of a 4 KiB x86 page), where metadata overhead becomes higher than savings from compressing the page.
- called [17, 21, 39]. GP Bandit Gaussian Process (GP) Bandit learns the shape of search space and guides parameter search towards the optimal point with the minimal number of trials.