The Checked-C and Checked-CUDA

The ASAN and Valgrind have false positives and false negatives. And the runtime overhead is not neglectable in production machines. The following is the practice of ASAN false negative. Other like heap overflow will require DevOps and SDE's mutual efforts.

Checked C

The checked-C proposed by Microsoft based on clang especially for the kernel checks is a good way to prove array access OoB and ptr access(No UAF(this still requires a dynamic may?)) in a formal way during the compilation process. The example can be like below. We see a lot of extra bound annotation:

int main(int argc, nt_array_ptr<char> argv checked[] : count(argc)) {
  int arr checked[5]  = { 0, 1, 2, 3, 4 };
  test(arr, 4);
  return 0;
}

void test(array_ptr<int> p : count(len), int len) {
  // Code originally:
  //   array_ptr<int> r : count(len - 1) = p + 1.
  array_ptr<int> r : count(len - 1) = 
    dynamic_bounds_cast<array_ptr<int>>(p + 1, count(len - 1));
}

Checked C’s annotated checked pointer is binary-compatible with legacy pointers.

The initialization for _Array_ptr<int> can have Bound Declarations: _Array_ptr<int> p : bounds(p, p + len)

  • Indicate memory that can be accessed by pointer p
  • Programmer declares bounds
  • Bounds are expressed in terms of other program variables

The Checked C has multiple Bounds Declaration forms: declare Range/Count/Null/left default. Those are labeled statically. The runtime check will check whether the memory access exceeds the bound with the above annotation.

The Scope will enforce the safety by iter/intra bound type safe/ bound safe separately. The unchecked scope will be marked as _Unchecked, just like rust's unsafe.

_Checked{    int b _Checked[5];    for (int i = 0; i < 5; i++) {        // all references are bounds checked        b[i] = compute(...);        _Unchecked {printf("Value at index %d is %d\n", i, b[i]);}    }}

For the upper definition, we have the following features:

  1. Bounds-safe interfaces: to resolve the interoperability of checked code with library and OS APIs that may contain unchecked code and do incremental addition of Checked C annotation
char *strstr(const char *s1 : itype(_Nt_array_ptr<const char>),                       const char *s2 : itype(_Nt_array_ptr<const char>))          : itype(_Nt_array_ptr<char>);

Just annotate the bound-safe interface in Checked C

  1. Dynamic bounds cast
void process(_Array_ptr<char> buf : count(len), int len) {   // We expect buf to have enough space for at least 12 integers.
   _Array_ptr <int> intbuf : count (12) =   _Dynamic_bounds_cast <_Array_ptr<int>>(buf, count(12));   ...}

The bound can be deduced but still need Dynamic Bounds Cast; the compiler is not smart enough to validate the declared bound.

  1. Bounds widening
    If the upper bound is reached, it can be widened by 1.
  2. Generic types for type safety
 _For_any(T) _Array_ptr<T>
memcpy(restrict _Array_ptr<T> dest : byte_count(n), restrict _Array_ptr<const T> src : byte_count(n),            size_t n) : byte_count(n);

Just add the generic type notation for other types.

  1. Existentially-quantified types
    For soundness, the callback information should be stored in expression.
struct IntSet<struct List *> listSet;_Exists(T, struct IntSet<T>) set =_Pack(listSet, _Exists(T, struct IntSet<T>), struct List *); _Unpack (T) struct IntSet<T> intSet = set; 
  1. Redeclaring bounds and pre/post conditions
    The compiler validates that

    • the arguments at the function call site satisfy the preconditions
    • and the return value of the function satisfies the postconditions
int snprintf(char* restrict s : itype(restrict _Nt_array_ptr<char>) count(n - 1),
    size_t n _Where n > 0,const char* restrict format 
    : itype(restrict _Nt_array_ptr<const char>), ...);

Formal Model of Checked-C

The Decision Procedure defines the formal semantics of pointer and array

Pointer Logic


Memory Axiom 1

Memory Axiom 2 & 3

We can apply the memory axiom to all the operation of pointer.

The memory model should first be illustrated as defined in c++.

Mapping to Array Types

Heap allocator semantics requires that Heap-allocated data structures play an important role in programs and are prone to pointer-related errors. We now illustrate how to model a number of commonly used data structures using pointer logic. Thus we have list and trees etc. for illustrating the heap allocator.


Tree's property:

  1. Reachability predicate online proof for linked Structure. Like reachability algorithm.
  2. Partition the memory with Axioms
  3. Separate the memory with Axioms

Array Logic

Array Logic has an index, we need to make sure that not exceeds the bound

The index theory is existential and universal quantification using Presburger arithmetic and linear arithmetic.

Axiom for Array


Array Reduction for Array Theory.

The implementation in z3 is purely checking the axiom.

Checked-C

All the strlen, malloc, and other memory stdlib are checked safe and added with the runtime semantic information recording. The Semantic is to check the array len or str len at every operation to make sure there's no null ptr dereference or out-of-bound access. The dynamic cast will store the new object with object bound and will do the widening for a certain operation. However, UAF and runtime thread crashes are not able to be checked by the static insertion and runtime library checking without instrumentation on instruction. The tools do not require much runtime check and no state-space exploration in SMT solver but put the efforts on codegen of safe code, but the unchecked part and unsoundness of all memory bugs lower this work.




Check-CUDA

Use BMC to model the Cuda launch kernel and kernel operations. Use Monotonic Partial Order Reduction (MPOR) to reduce state-space explosion.

GKLEE by Utah uses the concolic execution of CUDA programs. Based on the LLVM Byte code, race conditions also symbolically checking the thread. They symbolize tid dependency and thread reduction using partial order reduction and canonical scheduling.


Their symbolize the warp divergence and barrier semantic

Other counterparts contain Prover of User GPU Programs (PUG) analysis.

CIVL is a framework for static analysis and concurrent program verification. It can be used in GPU verification also.

The ESBMC-GPU Infra


The launched kernel, cudaMalloc, and other intrinsics will be illustrated as variable lifecycle with Cuda SIMT semantic. The verification is similar to multithread

The pruning process identifies transitions in which threads access different array locations and outlines its execution using a running example.

The first example thread t1 reaches v1 resulting in a = [0, 0]. Thread t2 runs and reaches v2 resulting in a = [0, 1]. The MPOR algorithm then checks whether v2 does not exist (step 2) and whether transition t1 → t2 is defined as an independent. So the above algorithm first marked t1->t2 dependent, then all possible interleavings after reaching v4 shows that t2->t1 is independent, thus marked redundant. In comparison, the second example, step 4 doesn't hold. Thus only v4 is labeled redundant.

For verification simplicity, it only cast 2 thread analysis.1

Reference

  1. SMT-Based Context-Bounded Model Checking
    for CUDA Programs
  2. https://leodemoura.github.io/files/oregon08.pdf
  3. CHECKED C - Enabling Type Safe C Code
  4. Redesigning Hardware to Support Security: CHERI
  5. https://download.arxiv.org/pdf/2201.13394v1wnload.arxiv.org/pdf/2201.13394v1wnload.arxiv.org/pdf/2201.13394v1