- design learning-based autotuning to periodically adapt our design to fleet-wide changes without a human in the loop.
- machine learning algorithm called Gaussian Process (GP) Bandit [17, 21, 39].
- we demonstrate that zswap [1], a Linux kernel mechanism that stores memory compressed in DRAM, can be used to implement software-defined far memory that provides tail.
- The control mechanism for far memory in WSCs requires
- tight control over performance slowdowns to meet defined SLOs
- low CPU overhead so as to maximize the TCO savings from far memory.
- The control mechanism for far memory in WSCs requires
- Cold Page Identification Mechanism
- We base this mechanism on prior work [28, 42, 46].
- we design our system to keep the promotion rate below P% of the application’s working set size per minute, which serves as a Service Level Objective (SLO) for far memory performance.
- define the working set size of an application as the total number of pages that are accessed within minimum cold age threshold (120 s in our system).
- The exact value of depends on the performance differ P ence between near memory and far memory. For our deployment, we conducted months-long A/B testing at scale with production workloads and empirically determined P to be 0.2%/min.
- Controlling the Cold Age Threshold, build a for each job in the OS kernel promotion histogram
- our system builds per-job for a given cold page histogram set of predefined cold age thresholds.
- We use Linux’s memory cgroup (memcg) [2] to isolate jobs in our WSC.
- We use the lzo algorithm to achieve low CPU overhead for compression and decompression
- We maintain a global zsmalloc arena per machine, with an explicit compaction interface that can be triggered by the node agent when needed.
- Empirically, there are no gains to be derived by storing zsmalloc payloads larger than 2990 bytes (73% of a 4 KiB x86 page), where metadata overhead becomes higher than savings from compressing the page.
- called [17, 21, 39]. GP Bandit Gaussian Process (GP) Bandit learns the shape of search space and guides parameter search towards the optimal point with the minimal number of trials.