文章目录[隐藏]
- Cluster Resource Management
- Transport Layer - Part 1
- Programmable Switches - Part 1
- Security and Privacy
- Reliable Distributed Systems
- Testing and Verification
- Programmable Switches - Part 2
- Reliable Distributed Systems
- Troubleshooting
- Operational Track - Part 2
- Cloud Scale Services
- ISPs and CDNs
- Data Center Network Infrastructure
- Multitenancy
最终还是决定花这200🔪一睹世界最强Distributed System 长什么样。感觉整个Network的趋势就是当programmable switches & NICs 有计算能力以后,大家在尝试offload计算到端 。
Cluster Resource Management
Efficient Scheduling Policies for Microsecond-Scale Tasks
Do multiplexing on sensitive scheduling policies: ideally
The Arachne and Caladan does not good in Batch when co-using with memcached.
Low latency caused by higher level sotware and kernel distractions.
The policy is work-stealing
Balancing overhead emulation
To find the best load balancing policy.
-
work-stealing is easy to model.
-
use policy and signal
A Case for Task Sampling-based Learning for Cluster Job Scheduling
- The dataset is from two sigma and alibaba
2.
implementation details
Starlight: Fast Container Provisioning on the Edge and over the WAN
先mount,using delta file,到时候push。
Q&A
- proxy is to offload the congestion of registry.
- Throughput may be not scalable
- delta bundle?
Transport Layer - Part 1
PowerTCP: Pushing the Performance Limits of Datacenter Networks
用能量损耗的metrics来做拥塞算法。
这控制方法和$\alpha \beta$的那个网络算法you得一拼。
RDMA is Turing complete, we just did not know it yet!
The RDMA is turing complete.
Pretty much the tricky technique to emulate equal operations on CPU.
conditional
FlexTOE: Flexible TCP Offload with Fine-Grained Parallelism
SMartNIC 并行处理包逻辑,kernel bypass 接收到用户态.
Programmable Switches - Part 1
NetVRM: Virtual Register Memory for Programmable Networks
SwiSh: Distributed Shared State Abstractions for Programmable Switches
这篇牛逼
Network Troubleshooting and Debugging
Closed-loop Network Performance Monitoring and Diagnosis with SpiderMon
Time Database like monitor data storage.
Use Wait for graph
Collie: Finding Performance Anomalies in RDMA Subsystems
PFC to identify Deadlock
Test by vendor is not enough
Test in an integration
Not extensible
challenges:
- PFC is not sound
- efficient search algorithm for next peek for following network flow.
conduct few tests on the new region.
SCALE: Automatically Finding RFC Compliance Bugs in DNS Nameservers
Security and Privacy
Spectrum: High-bandwidth Anonymous Broadcast
Previous Chaumdistributed and aggregate. But cant pretend malicious client.
Need Blind message authentication.
同态加密?
How do you measure the efficiency of using Hardware counter momentum as metrics to find the next network peek?
Reliable Distributed Systems
Graham: Synchronizing Clocks by Leveraging Local Clock Properties
local clock 宕机了,还能继续保持相对有序性,但还需要相邻设备的 cross-validation reduce drift。 Graham characterizes the local clock using commodity sensors present in nearly every server and leverages this data to further improve clock accuracy, increasing its tolerance of Graham to failures. Graham reduces the clock drift of a commodity server by up to 2000×, reducing the maximum assumed drift in most situations from 200ppm to 100ppb.
Testing and Verification
Automated Verification of Network Function Binaries
用 symbolic execution 验证 function Binary。Model是简化版。
Map 无法 映射 regex
Differential Network Analysis
Programmable Switches - Part 2
Runtime Programmable Switches
IMap: Fast and Scalable In-Network Scanning with Programmable Switches
用 On Switch Logic 实现 带宽更高的NMap
Unlocking the Power of Inline Floating-Point Operations on Programmable Switches
Mentissa
利用CIDR TCAM算leading 1
Reliable Distributed Systems
DispersedLedger: High-Throughput Byzantine Consensus on Variable Bandwidth Networks[WIP]
Use State machine replication + Byzantine Consensus to do fast replica
Troubleshooting
Buffer-based End-to-end Request Event Monitoring in the Cloud
用 buffer 当数据的lifecycle hint
Missing RPC header and unpredictable location.
还要改RPC header,有够烦的。不过为了所有设备可读,没办法。
只有部分人用怎么办?一个team用就一个team看得到。
How to diagnose nanosecond network latencies in rich end-host stacks
CDF ofmemcachedrequest receive-latencies withand without profiling. (eBPF已经难以超越了,用messege lifetime profiling的trick搞定了)
BPF-1 stands for eBPF probing asingle function; Ftrace, Intel-PT and NSight.
又是一个整合pipeline的。
end-host 接受 NIC 请求,差值是Network latency,在此之上做 Message Profiling,intel-PT记录system latency。
- Time reconciliation
- Message profiling
- Anomaly disambiguation (Function(Who calls) stack)
Operational Track - Part 2
MLaaS in the Wild: Workload Analysis and Scheduling in Large-Scale Heterogeneous GPU Clusters
Cloud Scale Services
Accelerating Collective Communication in Data Parallel Training across Deep Learning Frameworks
Orca: Server-assisted Multicast for Datacenter Networks
Cocktail: A Multidimensional Optimization for Model Serving in Cloud
ISPs and CDNs
C2DN: How to Harness Erasure Codes at the Edge for Efficient Content Delivery
cISP: A Speed-of-Light Internet Service Provider
给高频交易用的现在做出来了,可能没alpha了就放出来了
星链话费高latency 高。
CloudCluster: Unearthing the Functional Structure of a Cloud Service
Data Center Network Infrastructure
Zeta: A Scalable and Robust East-West Communication Framework in Large-Scale Clouds
XDP 都用上了
Aquila: A unified, low-latency fabric for datacenter networks
RDC: Energy-Efficient Data Center Network Congestion Relief with Topological Reconfigurability at the Edge
Multitenancy
Isolation Mechanisms for High-Speed Packet-Processing Pipelines
Justitia: Software Multi-Tenancy in Hardware Kernel-Bypass Networks
Token 分开
最后一步骗 QP/CQ
NetHint: White-Box Networking for Multi-Tenant Data Centers