[Parallel Computing] Concurrency bugs

  • In parallel system, high performance achieved by using algorithm with high parallelism, good load balancing, memory locality, etc.
  • But must also ensure multiple concurrent threads / processes operate correctly.
  • Concurrency bugs can arise due to unexpected interleaving of concurrent threads.
  • One of the most difficult issues to deal with in parallel / distributed computing.
    • Bugs occur at random times depending on the interleaving.
    • Bugs don’t occur during testing, but they will eventually occur in system deployed system.
    • Humans have hard time anticipating and resolving concurrency bugs.Concurrency bugs can have very serious consequences.

      Therac-25 radiation therapy system had a concurrency bug that led to radiation overdose and death of several patients.

      Space shuttle aborted 20 minutes before maiden launch due to concurrency bug in its avionics software.

eliminating concurrency bugs

  • Multiple ways, each with pros and cons.
  • Critical sections and locks
    • Prevent processes from accessing a block of code at the same time.
    • Easy to use, effective for some problems.
    • But cause contention, overhead and serialization.
    • Need to decide how much code to lock.
    • Too little, and may still get concurrency bug.
    • Too much, and we lose parallelism and performance.
  • If processes acquire several locks, they need to coordinate to maintain correctness, avoid deadlock.
  • Low priority that acquires a lock can delay high priority thread (priority inversion).
  • Despite these problems, locks are still the most widely used solution.
  • transactional memory
    • A block of code is defined as a transaction, i.e. the block of code either executes atomically or doesn’t execute at all.
    • Keep track of reads and writes done by a transaction. If two concurrent transactions read and write to same memory location, abort one of them, i.e. undo all the changes it made.
    • Two concurrent transactions accessing different memory locations can both commit, i.e. all the changes it made are made permanent.
    • Transactional memory can either be implemented in hardware (HTM) or software (STM).
    • HTM has limits of size and type of transactions it can handle. Implemented in e.g. Intel Haswell, IBM Power8.
    • STM is more flexible, but can be very slow.
  • Write your own concurrent code, without hardware support.
    • Challenging for most programmers. Not scalable in terms of productivity.
    • Correct, efficient algorithms are often research level publications.

Mutual exclution

  • Given n concurrent processes that want to perform a critical section (CS), mutual exclusion can satisfy the following properties.
    • No two processes are in CS at same time.
    • If several processes want to enter the CS, at least one succeeds in finite time (deadlock freedom).
    • If several processes want to enter the CS, every process succeeds in finite time (wait freedom).
  • All (useful) mutex algorithms satisfy first and second properties.
  • Some algorithms satisfy the third property, but have lower performance.

Mutual exclusion algorithms

  • Mutex is provided by locks. But how are locks implemented?
    • Depends on the type of operations the underlying hardware supports.
    • First type of algorithm uses only read / write operations.
    • Second type uses hardware synchronization primitives such as test-and- set (TAS) or compare-and-swap (CAS), provided in most processors.
  • TAS(x) tests if a Boolean variable x is true.
    • If x == false, it sets x to true.
    • Returns x’s value before the TS.
    • All these steps done atomically, without interruption from other threads.
    • getAndSet(x) is like TAS(x), but allows non-Boolean x.
  • CAS(x,v,v’) tests if variable x currently equals v. If so, it sets x to v’. Otherwise, it doesn’t change x. It also returns x’s current value.
    • Again, all this is atomic.
    • Algorithms also depend on a processor’s memory model.
    • Some processors reorder instructions to avoid stalls and obtain higher performance. This can break many lock algorithms.
    • Most lock algorithms assume memory model is sequentially consistent, i.e. the execution order of instructions from different processes is an interleaving of the instructions of each process in program order.

first attempt

  1. image-20200330212904433
  2. image-20200330223832949