EuroSys 22 Atendency

文章目录[隐藏]

  • Day 2
  • Day 3
  • Day 4
  • 这次花了100💶。还是看看未来可能的合作者的文章长啥样。极其可恶的是我并没有被给予zoom权限,过了两天才拿到。

    Day 1

    CHEOPS

    这个 Workshop 真没意思。

    SLRL: A Simple Least Remaining Lifetime FileEvicition policy for HPC multi-tier storage systems


    life time prediction using NN rather than other eviction policy.

    Data-Aware Compression for HPC using Machine Learning


    Metrics:

    1. Compression Ratio.
    2. Compression Ratio per Time.
    3. Compression Speed
    4. Decompression Speed.

    Accelerating HPC Applications with Asynchronous I/O





    pMEMCPY: Effectively Leveraging Persistent Memory as a Storage Device



    Analysis and Workload Characterization of the CERN EOS Storage System



    Write Once, Read Maybe

    EuroSec

    Look Ma, No Constants: Practical Constant Blinding in GraalVM

    在 GraakVM(可以和LLVM IR互相转换),一个JVM的native code编译器上实现对已知System config的Control Flow Hijacking的防御,用的方法是constant blinding。 blinds 在 High-Level Intermediate Representation (HIR) 对 ConstantNode JIT的加密地址。

    RetTag: Hardware-assisted Return Address Integrity on RISC-V

    在RV上实现了了一下 Pointer authentication

    这老哥问了如果在 PAC 和 return 之间 interrupt 了怎么办,就没完成,这部分就是Arm这种neng改ti系结构的优势了,它完全可以做一个 RTM 操作。但是纯compiler实现的就只能这样了。

    Reproducible Research: from Paper to Artifact Evaluation



    Towards Securely Migrating Wasm Enclaves










    远端验证协议。



    On the Effectiveness of Same-Domain Memory Deduplication

    做 RIDL 的组做的。

    Windows 上 side channel 防御 Memory Deduplication。

    Day 2

    Concurrency

    Building an Efficient Key-Value Store in a FlexibleAddress Space

    CC发了两年终于发出来了.

    各种tree make tree sorted 都需要 extra indirections。虽然Ext4 有insert-rang和collapse-range 操作,但是有alignment的要求,shift需要$O(N)$,以及commonly used data indexing mechanisms cannot keep track of shifted contents in an address space.

    FlexSpace是用户态log-structured space management for
    write efficiency,所有操作都是(partial_offset, length, address)三元组的操作。 Drawback: An append-write to a file needs to expand the last extent in place or add new extents to the end of the mapping index.

    地址不连续,B+tree的 更新pointer变成底层三元组修改。

    实现感觉就是syscall优化过的log-based CoW B+tree variant。还自带GC

    Software security

    Sharing is Caring: Secure and Efficient SharedMemory Support for MVEEs

    Multi-Variant Execution Environments means protecting legacy software against memory corruption attacks.这篇主要做的是把IPC和IO访问放在MVEE上。


    handler + injection Redirection check + pointers in SHM


    这个divergence and terminates the variants before any data is written to SHM,可以被检测出来。

    KASLR in the age of MicroVMs.

    in monitor KASLR compared with Firecracker实现过bootstrap boots(decompression and relocation from the guest kernel)without KASLR/functional granular KASLR/KASLR.
    这块阿里的oclave container是不是做过?

    Threat model

    Hardening Binaries against More Memory Errors.

    Case introduction

    用binary JIT转写。low-fat ptr/poisoned memory 做OoB,overflow等检测,最后把FP搞出来。


    check batch

    PKRU-Safe: Automatically Locking Down the Heap Between Safe and Unsafe Languages.




    Rust(C/C++用llvm santinizer) safe unsafe间传播的heap加固

    Nyx-Net: Network Fuzzing with Incremental Snapshots.

    hypervisor-based snapshot fuzzing,对incremental snapshot的network fuzzing。从delta中拿到网络IO的diff,做以下mutator

    Day 3

    TEE

    Verified Programs Can Party: Optimizing KernelExtensions via Post-Verification In-Kernel Merging

    tianyin的文章。用eBPF transforming indirect jumps into direct jumps, unrolling loops, and saving memory accesses, without loss of security or flexibility。当时刚看到eBPF的时候就想到过类似的工作。简单的插桩优化还是可以做的。(不过由于map个数和插桩代码的限制,很可能只有这个是instrumentation是sound的。

    retpoline instrument.

    Minimum Viable Device Drivers for ARM TrustZone

    利用TEE隔离IO。(这东西IOMMU做不到吗) 2022.10.23 Update 需要一个secure channel,有很多开源实现了。

    需要改MMC代码和hardware quirks or bugs很复杂。


    在树莓派上做的实验。很应景。

    Performance Evolution of Mitigating Transient Execution Attacks

    对 TEA 的防御,像是Genkin会觉得不齿。

    We consider several security boundaries but not all (e.g., we don’t study the eBPF/-kernel boundary).

    WASM 有在userspace sandbox上的side channel 防御。以及加之compiler的Speculative Load Hardening

    之后tianyin的那篇其中一个是修indirect jmp的。

    intel mitigation好花cycle

    Syscall 对比

    speculative store bypass

    Slashing the Disaggregation Tax in Heterogeneous Data Centers with FractOS.



    用SMartNIC TCB和计算资源做devices P2P连接。


    对外设访问micro kernel化。GPU rCUDA化。交互都是用DMA+request。

    stateful operation? Open answer
    Universal? lose generality? Open answer
    Multi-Tenancy? Data distribution.
    SSD to GPU? dataflow/information flow stdlib is currently under development.

    OS Scheduling with Nest: Keeping Tasks Close Together on Warm Cores.



    Kite: Lightweight Critical Service Domains.

    Unicraft的思路在micro VM里实现了下。


    用 raw xen API 在bandwidth一样的情况下latency低了很多。(是不是高频交易系统需要这种?

    Persistent Memory

    Characterizing the Performance of Intel Optane Persistent Memory


    很多结论都是知道的了。

    SafePM: A Sanitizer for Persistent Memory.


    1. Trip-wire or shadow memory based approach
    2. Object-based approach. Such approaches check all the pointer manipulations to ensure that the resulting pointer is not out-of-bounds with respect to the object it points to.
    3. Pointer-based approach.
    4. ASan: a Shadow Memory-based Approach
      An example
      Santinizer memory oragement

    impl




    问题 Transactional Asynchronous abort?metadata protection? Multi-threading?

    ResPCT: Fast Checkpointing in Non-Volatile Memory for Multi-Threaded Applications.

    又一个王老师想做的发掉了,多线程下flush的visible order和persistent order。在PM那本书里有。

    利用这个做了fault tolerance。(故事一点也不好听)

    Day 4

    SSDs & I/O

    p2KVS: a Portable 2-Dimensional Parallelizing Framework to Improve Scalability of Key-value Stores on SSDs.

    分布式系统课有人做fast migration of LSM Tree 的 SSTable和Memtable,没有给出思路,我当时的思考是类似 Wisckey 的方法,或者把可回滚的log migrate。虽然恢复时间会很慢。这里主要问题是SSTable结构问题:指针和lock问题比较多。

    用batch+Log的方法做并行

    这篇是做root cause indentify和 queue-based opportunistic batching mechanism to improve the handling efficiency of each worker.

    Improving Scalability of Database Systems by Reshaping User Parallel I/O.


    用scheduler做

    1. Early intervention ( aggravate cache conflicts lock contentions and queuing effects
    2. Hiding the excessive I/O
      weighted round-robin (WRR) policy/P/$\mu$控制(一个排队论的证明)

    3. Portability and compatibility

    BetrFS: A Compleat File System for Commodity SSDs.

    (这项目20个人前前后后做7年了)

    $B^\epsilon-trees$(internal Bε-tree nodes have logs for messages; messages are serializable objects that logically describe an operation to be performed on one or more key-value pairs (e.g., update, delete). ) to improve Flash FS on SSDs.

    DB存了Metadata和data(path->inode),两张表。下面还是ext4.

    为什么不用adaptive radix tree?后面有优化。

    交互用Simple File layer(DB->FS)direct IO,为的是去除 page copying overheads and double buffering,为了不修改内核 all read and write requests supply a file offset and a pre-allocated buffer
    查 existance 的 metadata 用 conditional logging 来减少读。

    后面还有其他优化

    1. Coalescing range delete messages.
    2. Bypassing Bε-tree queries for empty directories.
    3. Removing redundant messages hooks (unlink and evict_inode)
    4. Fully caching readdir with opportunistic inode instantiation.
    5. Revisiting the apply-on-query optimization.(DB Parser execution concat.)

    Cooperative Memory Management
    kernel 力维护了笑文件写的allocation,防止allocate 放大。malloc方法就是realloc的逻辑。
    Sharing Pages between the VFS and Bε-tree: Zero-copy IO


    结果在latency/Throuput还是有很大优化的(exp zfs OLTP,主要是fsync实现问题。)

    Beating the I/O bottleneck: A case for log-structured virtual disks.

    也是吧log-structure结构弄到更底层的思路。

    We propose a Log-Structured Virtual Disk(LSVD) that couples log-structured approaches at both the cache and storage layer to provide a virtual disk on top of S3-like storage. Both cache and backend stores are order-preserving, enabling LSVD to provide strong consistency guarantees in case of failure.

    相比cej RADPS Block Device with Bcache有以下优势:

    1. Dramatically boost performance
    2. Preserve key properties of today’s virtual disks.
    3. Naturally enables functionality that is challenging with today’s virtual disk implementations.

    The in-memory map from vLBA $\rightarrow$ pLBA 在SSD上的log-structured的write record。

    read cache+Log-structured block store

    wcd 提到过,是block device 的 device mapper。



    结果bandwidth在某些测试上比LSVD好

    random write 因为log structure 很好。

    Faas 1

    Isolating Functions at the Hardware Limit with Virtines.


    light VM isolation for limiting resources

    Fireworks: A Fast, Efficient, and Safe Serverless Framework using VM-level post-JIT Snapshot.


    VMSH: Hypervisor-agnostic Guest Overlays for VMs.














    Jiffy: Elastic Far-Memory for Stateful Serverless Analytics.








    Faas 2

    FaaSnap: FaaS Made Fast Using Snapshot-based VMs






    Memory Deduplication for Serverless with Medes.








    Misc

    Apt-Get






    Binary JIT? ORC-JIT?