NSDI 22 Attendency


最终还是决定花这200🔪一睹世界最强Distributed System 长什么样。感觉整个Network的趋势就是当programmable switches & NICs 有计算能力以后,大家在尝试offload计算到端 。

Cluster Resource Management

Efficient Scheduling Policies for Microsecond-Scale Tasks

Do multiplexing on sensitive scheduling policies: ideally

The Arachne and Caladan does not good in Batch when co-using with memcached.

Low latency caused by higher level sotware and kernel distractions.

The policy is work-stealing

Balancing overhead emulation

To find the best load balancing policy.

  1. work-stealing is easy to model.

  2. use policy and signal

A Case for Task Sampling-based Learning for Cluster Job Scheduling

  1. The dataset is from two sigma and alibaba

implementation details

Starlight: Fast Container Provisioning on the Edge and over the WAN

先mount,using delta file,到时候push。


  1. proxy is to offload the congestion of registry.
    1. Throughput may be not scalable
  2. delta bundle?

Transport Layer - Part 1

PowerTCP: Pushing the Performance Limits of Datacenter Networks


这控制方法和$\alpha \beta$的那个网络算法you得一拼。

RDMA is Turing complete, we just did not know it yet!

The RDMA is turing complete.

Pretty much the tricky technique to emulate equal operations on CPU.


FlexTOE: Flexible TCP Offload with Fine-Grained Parallelism

SMartNIC 并行处理包逻辑,kernel bypass 接收到用户态.

Programmable Switches - Part 1

NetVRM: Virtual Register Memory for Programmable Networks

SwiSh: Distributed Shared State Abstractions for Programmable Switches


Network Troubleshooting and Debugging

Closed-loop Network Performance Monitoring and Diagnosis with SpiderMon

Time Database like monitor data storage.

Use Wait for graph

Collie: Finding Performance Anomalies in RDMA Subsystems

PFC to identify Deadlock

Test by vendor is not enough

Test in an integration

Not extensible


  1. PFC is not sound
  2. efficient search algorithm for next peek for following network flow.

conduct few tests on the new region.

SCALE: Automatically Finding RFC Compliance Bugs in DNS Nameservers

Security and Privacy

Spectrum: High-bandwidth Anonymous Broadcast

Previous Chaumdistributed and aggregate. But cant pretend malicious client.

Need Blind message authentication.


How do you measure the efficiency of using Hardware counter momentum as metrics to find the next network peek?

Reliable Distributed Systems

Graham: Synchronizing Clocks by Leveraging Local Clock Properties

local clock 宕机了,还能继续保持相对有序性,但还需要相邻设备的 cross-validation reduce drift。 Graham characterizes the local clock using commodity sensors present in nearly every server and leverages this data to further improve clock accuracy, increasing its tolerance of Graham to failures. Graham reduces the clock drift of a commodity server by up to 2000×, reducing the maximum assumed drift in most situations from 200ppm to 100ppb.

Testing and Verification

Automated Verification of Network Function Binaries

用 symbolic execution 验证 function Binary。Model是简化版。

Map 无法 映射 regex

Differential Network Analysis

Programmable Switches - Part 2

Runtime Programmable Switches

IMap: Fast and Scalable In-Network Scanning with Programmable Switches

用 On Switch Logic 实现 带宽更高的NMap

Unlocking the Power of Inline Floating-Point Operations on Programmable Switches


利用CIDR TCAM算leading 1

Reliable Distributed Systems

DispersedLedger: High-Throughput Byzantine Consensus on Variable Bandwidth Networks[WIP]

Use State machine replication + Byzantine Consensus to do fast replica


Buffer-based End-to-end Request Event Monitoring in the Cloud

用 buffer 当数据的lifecycle hint

Missing RPC header and unpredictable location.

还要改RPC header,有够烦的。不过为了所有设备可读,没办法。


How to diagnose nanosecond network latencies in rich end-host stacks

CDF ofmemcachedrequest receive-latencies withand without profiling. (eBPF已经难以超越了,用messege lifetime profiling的trick搞定了)

BPF-1 stands for eBPF probing asingle function; Ftrace, Intel-PT and NSight.

end-host 接受 NIC 请求,差值是Network latency,在此之上做 Message Profiling,intel-PT记录system latency。

  1. Time reconciliation
  2. Message profiling
  3. Anomaly disambiguation (Function(Who calls) stack)

Operational Track - Part 2

MLaaS in the Wild: Workload Analysis and Scheduling in Large-Scale Heterogeneous GPU Clusters

Cloud Scale Services

Accelerating Collective Communication in Data Parallel Training across Deep Learning Frameworks

Orca: Server-assisted Multicast for Datacenter Networks

Cocktail: A Multidimensional Optimization for Model Serving in Cloud

ISPs and CDNs

C2DN: How to Harness Erasure Codes at the Edge for Efficient Content Delivery

cISP: A Speed-of-Light Internet Service Provider


星链话费高latency 高。

CloudCluster: Unearthing the Functional Structure of a Cloud Service

Data Center Network Infrastructure

Zeta: A Scalable and Robust East-West Communication Framework in Large-Scale Clouds

XDP 都用上了

Aquila: A unified, low-latency fabric for datacenter networks

RDC: Energy-Efficient Data Center Network Congestion Relief with Topological Reconfigurability at the Edge


Isolation Mechanisms for High-Speed Packet-Processing Pipelines

Justitia: Software Multi-Tenancy in Hardware Kernel-Bypass Networks

Token 分开

最后一步骗 QP/CQ

NetHint: White-Box Networking for Multi-Tenant Data Centers