[Computer Architecture] superscalar

Greater Instruction-Level Parallelism (ILP)

  • Multiple issue “superscalar”
    • Replicate pipeline stages ⇒ multiple pipelines
    • Start multiple instructions per clock cycle – CPI < 1, so use Instructions Per Cycle (IPC)
    • E.g., 4GHz 4-way multiple-issue
      • 16 BIPS, peak CPI = 0.25, peak IPC = 4
    • But dependencies reduce this in practic
  • “Out-of-Order” execution
    • Reorder instructions dynamically in hardware to reduce impact of hazards
  • Hyper-threading

Pipelining recap

image-20200409104529546

image-20200409104719616

image-20200409104748552

pipelines complexities exlained

GPRs FPRs

  • More than one Functional Unit
  • Floating point execution!
    • Fadd & Fmul: fixed number of cycles; > 1
    • Fdiv: unknown number of cycles!
  • Memory access: on Cache miss unknown number of cycles
  • Issue: Assign instruction to functional unit

image-20200410165203353
image-20200410165333359

summary

image-20200409104937513

Some static multiple issues

VLIW: very long instruction word

image-20200410165502554

The solution can be easily found

image-20200409143437506

image-20200409144123473

Quiz

[ ] A. In-order processors have a CPI >=1

[x] B. more stages allow a higher clock frequency

[x] D. OoO pipleines need speculation

[ ] E. superscalar processor can execute