NSDI 22 Attendency

文章目录[隐藏]

最终还是决定花这200🔪一睹世界最强Distributed System 长什么样。感觉整个Network的趋势就是当programmable switches & NICs 有计算能力以后,大家在尝试offload计算到端 。

Cluster Resource Management

Efficient Scheduling Policies for Microsecond-Scale Tasks


Do multiplexing on sensitive scheduling policies: ideally

The Arachne and Caladan does not good in Batch when co-using with memcached.


Low latency caused by higher level sotware and kernel distractions.

The policy is work-stealing

Balancing overhead emulation


To find the best load balancing policy.

  1. work-stealing is easy to model.

  2. use policy and signal

A Case for Task Sampling-based Learning for Cluster Job Scheduling


  1. The dataset is from two sigma and alibaba


2.
implementation details


Starlight: Fast Container Provisioning on the Edge and over the WAN







先mount,using delta file,到时候push。

Q&A

  1. proxy is to offload the congestion of registry.
    1. Throughput may be not scalable
  2. delta bundle?

Transport Layer - Part 1

PowerTCP: Pushing the Performance Limits of Datacenter Networks

用能量损耗的metrics来做拥塞算法。

这控制方法和$\alpha \beta$的那个网络算法you得一拼。

RDMA is Turing complete, we just did not know it yet!


The RDMA is turing complete.

Pretty much the tricky technique to emulate equal operations on CPU.



conditional

FlexTOE: Flexible TCP Offload with Fine-Grained Parallelism


SMartNIC 并行处理包逻辑,kernel bypass 接收到用户态.

Programmable Switches - Part 1

NetVRM: Virtual Register Memory for Programmable Networks






SwiSh: Distributed Shared State Abstractions for Programmable Switches

这篇牛逼

Network Troubleshooting and Debugging

Closed-loop Network Performance Monitoring and Diagnosis with SpiderMon



Time Database like monitor data storage.

Use Wait for graph

Collie: Finding Performance Anomalies in RDMA Subsystems


PFC to identify Deadlock

Test by vendor is not enough


Test in an integration



Not extensible


challenges:

  1. PFC is not sound
  2. efficient search algorithm for next peek for following network flow.





conduct few tests on the new region.


SCALE: Automatically Finding RFC Compliance Bugs in DNS Nameservers

Security and Privacy

Spectrum: High-bandwidth Anonymous Broadcast



Previous Chaumdistributed and aggregate. But cant pretend malicious client.

Need Blind message authentication.


同态加密?




How do you measure the efficiency of using Hardware counter momentum as metrics to find the next network peek?

Reliable Distributed Systems

Graham: Synchronizing Clocks by Leveraging Local Clock Properties

local clock 宕机了,还能继续保持相对有序性,但还需要相邻设备的 cross-validation reduce drift。 Graham characterizes the local clock using commodity sensors present in nearly every server and leverages this data to further improve clock accuracy, increasing its tolerance of Graham to failures. Graham reduces the clock drift of a commodity server by up to 2000×, reducing the maximum assumed drift in most situations from 200ppm to 100ppb.

Testing and Verification

Automated Verification of Network Function Binaries

用 symbolic execution 验证 function Binary。Model是简化版。




Map 无法 映射 regex

Differential Network Analysis








Programmable Switches - Part 2

Runtime Programmable Switches




IMap: Fast and Scalable In-Network Scanning with Programmable Switches

用 On Switch Logic 实现 带宽更高的NMap










Unlocking the Power of Inline Floating-Point Operations on Programmable Switches


Mentissa




利用CIDR TCAM算leading 1


Reliable Distributed Systems

DispersedLedger: High-Throughput Byzantine Consensus on Variable Bandwidth Networks[WIP]

Use State machine replication + Byzantine Consensus to do fast replica

Troubleshooting

Buffer-based End-to-end Request Event Monitoring in the Cloud

用 buffer 当数据的lifecycle hint






Missing RPC header and unpredictable location.

还要改RPC header,有够烦的。不过为了所有设备可读,没办法。

只有部分人用怎么办?一个team用就一个team看得到。

How to diagnose nanosecond network latencies in rich end-host stacks

CDF ofmemcachedrequest receive-latencies withand without profiling. (eBPF已经难以超越了,用messege lifetime profiling的trick搞定了)

BPF-1 stands for eBPF probing asingle function; Ftrace, Intel-PT and NSight.


又是一个整合pipeline的。
end-host 接受 NIC 请求,差值是Network latency,在此之上做 Message Profiling,intel-PT记录system latency。

  1. Time reconciliation
  2. Message profiling
  3. Anomaly disambiguation (Function(Who calls) stack)

Operational Track - Part 2

MLaaS in the Wild: Workload Analysis and Scheduling in Large-Scale Heterogeneous GPU Clusters



Cloud Scale Services

Accelerating Collective Communication in Data Parallel Training across Deep Learning Frameworks

Orca: Server-assisted Multicast for Datacenter Networks


Cocktail: A Multidimensional Optimization for Model Serving in Cloud



ISPs and CDNs

C2DN: How to Harness Erasure Codes at the Edge for Efficient Content Delivery


cISP: A Speed-of-Light Internet Service Provider

给高频交易用的现在做出来了,可能没alpha了就放出来了


星链话费高latency 高。

CloudCluster: Unearthing the Functional Structure of a Cloud Service

Data Center Network Infrastructure

Zeta: A Scalable and Robust East-West Communication Framework in Large-Scale Clouds

XDP 都用上了




Aquila: A unified, low-latency fabric for datacenter networks







RDC: Energy-Efficient Data Center Network Congestion Relief with Topological Reconfigurability at the Edge





Multitenancy

Isolation Mechanisms for High-Speed Packet-Processing Pipelines





Justitia: Software Multi-Tenancy in Hardware Kernel-Bypass Networks




Token 分开

最后一步骗 QP/CQ

NetHint: White-Box Networking for Multi-Tenant Data Centers