想了一下,这课最好的上法还是顺便学一下soot这个框架,感觉和LLVM有类似也有不同,由于生成的是jvm栈机以及可以调用jni这种ffi related stuff。是故边学边玩。
[Distributed System] Summary
Synchronization
Logic Clock
Lamport Timestamp
a ->b ==> TS(a) < TS(b) The reverse does not hold.
vector lock
Use vector to store the lock with every cores. Follows the reverse of the property.
Ordering
FLP
To prove that random timeouts can avoid live locking.
- Lemma 1: Disjoint schedules are commutative
- Lemma 2: There exists an initial configuration that is bivalent
- Lemma 3: Starting from a bivalent config., there is always another bivalent config. that is reachable
- Theorem (Impossibility of Consensus): There is always a run of events in an asynchronous distributed system such that the group of processes never reach consensus (i.e., stays bivalent all the time)
RPC
- RPC = Remote Procedure Call
- Proposed by Birrell and Nelson in 1984
- Important abstraction for processes to call functions in other processes
- Allows code reuse
- Implemented and used in most distributed systems, including cloud computing systems
- Counterpart in Object-based settings is called RMI (Remote Method Invocation)
- What’s the No. 1 design choice for RPC?
Local Procedure Call (LPC)
The call from one function to another function within the same process
- Uses stack to pass arguments and return values
- Accesses objects via pointers(e.g., C) or by reference(e.g., Java)
LPC has exactly-once semantics
Remote Procedure Call
Call from one function to another function, where caller and callee function reside in different processes
- Function call crosses a process boundary
- Accesses procedures via global references(e.g., Procedure address = IP + port + procedure number)
- Cannot use pointers across processes since a reference address in process P1 may point to a different object in another process P2(We need serialization)
Similarly, RMI (Remote Method Invocation) in Object-based settings
Semantics
- Under failures, hard to guarantee exactly-once semantics
- Function may not be executed if
- Request (call) message is dropped
- Reply (return) message is dropped
- Called process fails before executing called function
- Called process fails after executing called function
- Hard for the caller to distinguish these cases
- Function may be executed multiple times if
- Request (call) message is duplicated
Idempotent OPerations
- Idempotent operations are those that can be repeated multiple times, without any side effects
- Idempotent operations can be used with at-least-once semantics
implementation
- client stub: has same function signature as
callee()
- communication module: Forwards requests and replies to appropriate hosts
- dispatcher: Selects which server stub to forward a request to
- server stub: calls
callee()
, allows it to return a value
ACID
- Atomicity: "all or nothing"
- Consistency: "it looks correct to me"
- Isolation: "as if alone"
- Durability: "survive failures"
Fault in the Two-phase Commit
- What if the database node crashes?
- What if the coordinator crashes?
- Coordinator writes its decision to disk
- When it recovers, read the decision from the disk and send it to replicas (or abort if no decision was made before the crash)
- If the coordinate crashes after preparation, but before broadcasting the decision, other nodes do not know how it has been decided.
- Replicas participating in the transaction cannot commit or abort after responding “ok” to the prepared request
- Algorithm is blocked until coordinator recovers
Consensus Protocol
Replication
Fault Tolerance and Reliable Multicast
P2P System
Provenance:
- Specialized
- light-weight
- easy-to-use software to enable Internet-scale file sharing
Data-intensive Computing: Massive
- Some Examples
- Google MapReduce
- Yahoo Hadoop/PIG
- Data parallel computing
- IBM Research System S
- InfosphereStream product
- Continuous Data stream processing
- Microsoft Dryad/ LINQ
- DAG processing
- SQL query support
- Google MapReduce
- Google Infra
- Distributed file systems: GFS
- Distributed storage: BigTable
- Job scheduler: the work queue
- Parallel computation: MapReduce
- Distributed lock server: chubby
Map Reduce
- A user-specified
map
the function processes a key/value pair to generate a set of intermediate key/value pairs. - A user-specified
reduce
function merges all intermediate values associated with the same intermediate key.
Cloud Computing
Cloud Compute
AI for systems
- AI for IT operations. AIOps combines big data and ML to automate IT operations processes.
- digital economy - increasing complexity of modern application arch.
- no manual intervention
- increased automation and faster remediation
The proliferation of monitoring tools makes analytics challenging
- The use of disparate monitoring tools makes it extremely difficult to obtain end-to-end visibility across the entire business service or application
- 72% of IT organizations rely on up to nine different IT monitoring tools to support the modern application.
Case Study
Use un-supervised data to dig the vast majority of DevOps data.
runtime timeout bug identification tool.
- Combines timeout-related feature selection and runtime anomaly detection to achieve higher precision.
- does not require any application instrumentation for bug detection.
- Evaluates over 19 performance bugs and identifies 18 of them.
Docker container Vulnerability Detection
AIOps @Microsoft
Reference
[Program Analysis] Intermediate Representation
我又来学软工了😄,倒腾java的种种,今天就在乱搞soot,跑一个命令行真脑瘫。不过zhe次不像暑研要写在Phosphor上的ASM visitor,只需处理三地址码的jimple就好。
java -cp "/Users/yiweiyang/.m2/repository/org/soot-oss/soot/4.2.1/soot-4.2.1.jar:/Users/yiweiyang/.m2/repository/org/slf4j/slf4j-api/1.7.5/slf4j-api-1.7.5.jar:/Users/yiweiyang/.m2/repository/org/slf4j/slf4j-log4j12/1.7.5/slf4j-log4j12-1.7.5.jar:/Users/yiweiyang/.m2/repository/log4j/log4j/1.2.17/log4j-1.2.17.jar:/Users/yiweiyang/.m2/repository/de/upb/cs/swt/axml/2.0.0/axml-2.0.0.jar:/Users/yiweiyang/.m2/repository/com/google/guava/guava/19.0/guava-19.0.jar:/Users/yiweiyang/.m2/repository/org/ow2/asm/asm/9.2/asm-9.2.jar:/Users/yiweiyang/.m2/repository/org/ow2/asm/asm-commons/9.2/asm-commons-9.2.jar:/Users/yiweiyang/.m2/repository/org/ow2/asm/asm-tree/9.2/asm-tree-9.2.jar:/Users/yiweiyang/.m2/repository/de/upb/cs/swt/heros/1.2.2/heros-1.2.2.jar:/Users/yiweiyang/.m2/repository/org/ow2/asm/asm-util/9.2/asm-util-9.2.jar" soot.Main --soot-class-path /Users/yiweiyang/project/test_soot/target/classes:/Users/yiweiyang/.sdkman/candidates/java/current/jre/lib/rt.jar Bear
Continue reading "[Program Analysis] Intermediate Representation"
[Program Analysis] Intro
The course from NJU mainly focus on the classical technique to solve the current program analysis proble. The course from PKU is discussing different genre like symbolic execution/ formal methods/ JQF.
IOMMU conflicts with NCCL
[Program Analysis] CFL recheability
A path is considered to connect two nodes A and B, or B is reachable from A. Only if the concatenation of the labels on the edges of the path is a word in a specified context-free language.
MacVM Hypervisor performance
Mac has launched its API for virtulization and KhaosT has wrapped this api into a fully automated tool to install MacOS on Apple Sillicon just like opening sandbox.
The virtulization API is universal in swift.
let vm = VZVirtualMachine(configuration: configuration, queue: .main)
vm.delegate = self
vm.start { [weak self] result in
switch result {
case .success:
self?.document?.isRunning = true
NSLog("Success")
case .failure(let error):
NSLog("Error: \(error)")
}
}
Performance test between M1 and 9700k
# M1 Direct
❯ openssl speed -evp aes-128-cbc aes-256-cbc des-ede3 rsa2048 sha256 (base)
Doing sha256 for 3s on 16 size blocks: 13547385 sha256's in 2.98s
Doing sha256 for 3s on 64 size blocks: 6595521 sha256's in 2.91s
Doing sha256 for 3s on 256 size blocks: 2811715 sha256's in 2.98s
Doing sha256 for 3s on 1024 size blocks: 811209 sha256's in 2.90s
Doing sha256 for 3s on 8192 size blocks: 109612 sha256's in 2.96s
Doing sha256 for 3s on 16384 size blocks: 55935 sha256's in 2.94s
Doing des ede3 for 3s on 16 size blocks: 4252004 des ede3's in 2.99s
Doing des ede3 for 3s on 64 size blocks: 1017455 des ede3's in 2.91s
Doing des ede3 for 3s on 256 size blocks: 272326 des ede3's in 2.99s
Doing des ede3 for 3s on 1024 size blocks: 66040 des ede3's in 2.97s
Doing des ede3 for 3s on 8192 size blocks: 8196 des ede3's in 2.94s
Doing des ede3 for 3s on 16384 size blocks: 4052 des ede3's in 2.93s
Doing aes-256 cbc for 3s on 16 size blocks: 28033078 aes-256 cbc's in 2.95s
Doing aes-256 cbc for 3s on 64 size blocks: 7296104 aes-256 cbc's in 2.97s
Doing aes-256 cbc for 3s on 256 size blocks: 1724393 aes-256 cbc's in 2.88s
Doing aes-256 cbc for 3s on 1024 size blocks: 438037 aes-256 cbc's in 2.84s
Doing aes-256 cbc for 3s on 8192 size blocks: 57285 aes-256 cbc's in 2.93s
Doing aes-256 cbc for 3s on 16384 size blocks: 27726 aes-256 cbc's in 2.92s
Doing aes-128-cbc for 3s on 16 size blocks: 91712957 aes-128-cbc's in 2.89s
Doing aes-128-cbc for 3s on 64 size blocks: 24402793 aes-128-cbc's in 2.90s
Doing aes-128-cbc for 3s on 256 size blocks: 8617690 aes-128-cbc's in 2.97s
Doing aes-128-cbc for 3s on 1024 size blocks: 2327697 aes-128-cbc's in 2.98s
Doing aes-128-cbc for 3s on 8192 size blocks: 246604 aes-128-cbc's in 2.60s
Doing aes-128-cbc for 3s on 16384 size blocks: 157122 aes-128-cbc's in 2.95s
Doing 2048 bits private rsa's for 10s: 8364 2048 bits private RSA's in 9.92s
Doing 2048 bits public rsa's for 10s: 229578 2048 bits public RSA's in 7.99s
OpenSSL 1.1.1l 24 Aug 2021
built on: Tue Nov 23 07:53:43 2021 UTC
options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr)
compiler: /Users/yiweiyang/project/spack/lib/spack/env/clang/clang -fPIC -O3 -Wall -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -D_REENTRANT -DZLIB -DNDEBUG -I/Users/yiweiyang/project/spack/opt/spack/darwin-monterey-x86_64/apple-clang-13.0.0/zlib-1.2.11-fj3aancm72hxjybw6445d5uuftok6jvr/include
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
des ede3 22753.20k 22377.02k 23316.21k 22769.35k 22837.29k 22658.01k
aes-256 cbc 152043.81k 157222.44k 153279.38k 157940.10k 160163.39k 155569.45k
aes-128-cbc 507753.40k 538544.40k 742804.26k 799852.93k 776992.30k 872639.61k
sha256 72737.64k 145056.13k 241543.30k 286440.70k 303358.62k 311713.96k
sign verify sign/s verify/s
rsa 2048 bits 0.001186s 0.000035s 843.1 28733.2
# M1 hypervisor
test-vm@test-vms-Mac ~ % openssl speed -evp aes-128-cbc des-ede3 rsa2048 sha256
Doing sha256 for 3s on 16 size blocks: 12417802 sha256's in 2.91s
Doing sha256 for 3s on 64 size blocks: 8605571 sha256's in 2.73s
Doing sha256 for 3s on 256 size blocks: 7519183 sha256's in 2.97s
Doing sha256 for 3s on 1024 size blocks: 3766735 sha256's in 2.99s
Doing sha256 for 3s on 8192 size blocks: 675151 sha256's in 2.98s
Doing des ede3 for 3s on 16 size blocks: 4212079 des ede3's in 3.00s
Doing des ede3 for 3s on 64 size blocks: 1079073 des ede3's in 2.99s
Doing des ede3 for 3s on 256 size blocks: 274746 des ede3's in 2.98s
Doing des ede3 for 3s on 1024 size blocks: 71968 des ede3's in 2.99s
Doing des ede3 for 3s on 8192 size blocks: 8420 des ede3's in 2.98s
Doing aes-128-cbc for 3s on 16 size blocks: 43091935 aes-128-cbc's in 2.98s
Doing aes-128-cbc for 3s on 64 size blocks: 11225248 aes-128-cbc's in 2.99s
Doing aes-128-cbc for 3s on 256 size blocks: 2726199 aes-128-cbc's in 2.98s
Doing aes-128-cbc for 3s on 1024 size blocks: 673152 aes-128-cbc's in 2.96s
Doing aes-128-cbc for 3s on 8192 size blocks: 82678 aes-128-cbc's in 2.98s
Doing 2048 bit private rsa's for 10s: 4518 2048 bit private RSA's in 9.94s
Doing 2048 bit public rsa's for 10s: 103797 2048 bit public RSA's in 9.94s
LibreSSL 3.3.5
built on: date not available
options:bn(64,64) rc4(ptr,int) des(idx,cisc,16,int) aes(partial) blowfish(idx)
compiler: information not available
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
des ede3 22500.93k 23073.39k 23597.93k 24660.00k 23143.85k
aes-128-cbc 231587.58k 239916.76k 234029.06k 233230.99k 227646.27k
sha256 68226.66k 201824.46k 648416.14k 1288982.79k 1855047.46k
sign verify sign/s verify/s
rsa 2048 bits 0.002201s 0.000096s 454.4 10441.9
# 9700K Direct
❯ openssl speed -evp aes-128-cbc aes-256-cbc des-ede3 rsa2048 sha256
Doing sha256 for 3s on 16 size blocks: 21369624 sha256's in 2.86s
Doing sha256 for 3s on 64 size blocks: 11887732 sha256's in 2.92s
Doing sha256 for 3s on 256 size blocks: 5599953 sha256's in 2.97s
Doing sha256 for 3s on 1024 size blocks: 1737136 sha256's in 2.98s
Doing sha256 for 3s on 8192 size blocks: 229175 sha256's in 2.92s
Doing sha256 for 3s on 16384 size blocks: 117287 sha256's in 2.97s
Doing des ede3 for 3s on 16 size blocks: 8288297 des ede3's in 2.99s
Doing des ede3 for 3s on 64 size blocks: 2095278 des ede3's in 2.98s
Doing des ede3 for 3s on 256 size blocks: 525764 des ede3's in 2.98s
Doing des ede3 for 3s on 1024 size blocks: 130040 des ede3's in 2.95s
Doing des ede3 for 3s on 8192 size blocks: 16445 des ede3's in 2.99s
Doing des ede3 for 3s on 16384 size blocks: 8217 des ede3's in 2.98s
Doing aes-256 cbc for 3s on 16 size blocks: 49239507 aes-256 cbc's in 2.98s
Doing aes-256 cbc for 3s on 64 size blocks: 12584820 aes-256 cbc's in 2.99s
Doing aes-256 cbc for 3s on 256 size blocks: 3150137 aes-256 cbc's in 2.98s
Doing aes-256 cbc for 3s on 1024 size blocks: 792478 aes-256 cbc's in 2.99s
Doing aes-256 cbc for 3s on 8192 size blocks: 99018 aes-256 cbc's in 2.98s
Doing aes-256 cbc for 3s on 16384 size blocks: 49599 aes-256 cbc's in 2.99s
Doing aes-128-cbc for 3s on 16 size blocks: 205033256 aes-128-cbc's in 2.98s
Doing aes-128-cbc for 3s on 64 size blocks: 84091603 aes-128-cbc's in 2.92s
Doing aes-128-cbc for 3s on 256 size blocks: 21937430 aes-128-cbc's in 2.99s
Doing aes-128-cbc for 3s on 1024 size blocks: 5512203 aes-128-cbc's in 2.98s
Doing aes-128-cbc for 3s on 8192 size blocks: 689395 aes-128-cbc's in 2.98s
Doing aes-128-cbc for 3s on 16384 size blocks: 344040 aes-128-cbc's in 2.99s
Doing 2048 bits private rsa's for 10s: 24679 2048 bits private RSA's in 9.94s
Doing 2048 bits public rsa's for 10s: 800018 2048 bits public RSA's in 9.94s
OpenSSL 1.1.1m 14 Dec 2021
built on: Thu Feb 10 23:52:55 2022 UTC
options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr)
compiler: /Volumes/DataCorrupted/spack/lib/spack/env/clang/clang -fPIC -O3 -Wall -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -D_REENTRANT -DZLIB -DNDEBUG -I/Volumes/DataCorrupted/spack/opt/spack/darwin-monterey-skylake/apple-clang-13.0.0/zlib-1.2.11-7yfdtza4upfsjhx6nyigqab6ctnbiqxg/include
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
des ede3 44352.09k 44999.26k 45166.30k 45139.31k 45056.00k 45176.96k
aes-256 cbc 264373.19k 269374.07k 270615.80k 271403.84k 272199.82k 271782.61k
aes-128-cbc 1100849.70k 1843103.63k 1878254.88k 1894126.13k 1895142.23k 1885201.12k
sha256 119550.34k 260553.03k 482689.55k 596921.90k 642945.75k 647013.54k
sign verify sign/s verify/s
rsa 2048 bits 0.000403s 0.000012s 2482.8 80484.7
# 9700K hypervisor
test-vm@macos-12 ~ % openssl speed -evp aes-128-cbc aes-256-cbc des-ede3 rsa2048 sha256
Doing sha256 for 3s on 16 size blocks: 6998286 sha256's in 1.76s
Doing sha256 for 3s on 64 size blocks: 5937478 sha256's in 2.40s
Doing sha256 for 3s on 256 size blocks: 3640021 sha256's in 2.89s
Doing sha256 for 3s on 1024 size blocks: 1276928 sha256's in 2.99s
Doing sha256 for 3s on 8192 size blocks: 172371 sha256's in 2.98s
Doing des ede3 for 3s on 16 size blocks: 6392897 des ede3's in 2.97s
Doing des ede3 for 3s on 64 size blocks: 1840224 des ede3's in 2.97s
Doing des ede3 for 3s on 256 size blocks: 486441 des ede3's in 2.99s
Doing des ede3 for 3s on 1024 size blocks: 122497 des ede3's in 2.99s
Doing des ede3 for 3s on 8192 size blocks: 15081 des ede3's in 2.98s
Doing aes-256 cbc for 3s on 16 size blocks: 24622581 aes-256 cbc's in 2.96s
Doing aes-256 cbc for 3s on 64 size blocks: 6913334 aes-256 cbc's in 2.99s
Doing aes-256 cbc for 3s on 256 size blocks: 1779287 aes-256 cbc's in 2.99s
Doing aes-256 cbc for 3s on 1024 size blocks: 993353 aes-256 cbc's in 2.98s
Doing aes-256 cbc for 3s on 8192 size blocks: 130072 aes-256 cbc's in 2.99s
Doing aes-128-cbc for 3s on 16 size blocks: 248465144 aes-128-cbc's in 2.96s
Doing aes-128-cbc for 3s on 64 size blocks: 80633266 aes-128-cbc's in 2.98s
Doing aes-128-cbc for 3s on 256 size blocks: 20577300 aes-128-cbc's in 2.99s
Doing aes-128-cbc for 3s on 1024 size blocks: 5140306 aes-128-cbc's in 2.99s
Doing aes-128-cbc for 3s on 8192 size blocks: 631903 aes-128-cbc's in 2.99s
Doing 2048 bit private rsa's for 10s: 13162 2048 bit private RSA's in 9.97s
Doing 2048 bit public rsa's for 10s: 246512 2048 bit public RSA's in 9.96s
LibreSSL 2.8.3
built on: date not available
options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) blowfish(idx)
compiler: information not available
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
des ede3 34426.72k 39593.28k 41604.84k 41932.50k 41488.42k
aes-256 cbc 133310.22k 148026.58k 152125.43k 341155.78k 356036.10k
aes-128-cbc 1341209.76k 1730061.34k 1759369.75k 1758876.41k 1729626.63k
sha256 63774.17k 158049.71k 322534.94k 437365.11k 473238.50k
sign verify sign/s verify/s
rsa 2048 bits 0.000758s 0.000040s 1320.1 24743.0
高级语言 to LLVM 的解释层
最近在做编译原理课程设计的设计,看了很多到 LLVM 的编译器的想法,同时发现 Rust 类型体操作为黑魔法合集也能带给社区很多新鲜玩意,就把之前设计 Chocopy LLVM 层的一些小想法放在这,上科大的同学想玩可以加个piazza,invite code: CHOCOPY。有一部分参考 High Level Constructs to LLVM_IR, 范型的设计更多参考 rust 和 c。
POPL 22 attendency
1.21
1.22
CoqPL
PriSC
BPF and MDS
trace point autometa
Detection for kernel, maybe useful for metigation of the MDS.
MDS part
spectre
Cats and Rice game
Too strong to reduce the time difference of ARRAY_MAX and non ARRAY_MAX case
another case
spectre v2
spectre v4
FaCT
Towards Understanding Spectre-PHT in Memory Safe Language
SEEC
A weird thing in arm64 of operator << in gcc-11
I'm trying to do some log stuff in a Compiler project. When I'm trying to use the fmt::format
library.
It was safe and sound to run with apple-clang 13, but when it comes to gcc-11 for the following line:
if ((x.second)->is_list_type()) {
LOG(INFO) << fmt::format("{} : [{}]", x.first,
((ClassValueType *)((ListValueType *)x.second)->elementType)->className);
}
LogStream is something like:
class LogStream {
public:
LogStream() { sstream_ = new std::stringstream(); }
~LogStream() = default;
template <typename T> LogStream &operator<<(const T &val) noexcept {
(*sstream_) << val;
return *this;
}
friend class LogWriter;
private:
std::stringstream *sstream_;
};
The operator <<
gets error reading the memory byte from the fmt byte, possibly because the author of GCC is not aware the pointer passed do not fit in the following ldur
style of stream out. On x86 OSX machine, the GCC have some _M_is_leaked()
check in the same line and on Windows MSVC, the line has reported the memory leakage for doubly linked pointer.
The compiled code is:
There's trick to maintain a compiler that have a universal error code output.