[Computer Architecture] data path

intro - where are we now?

image-20200331101915459

the cpu

Processor - datapath - control

cenario in RISC-V machine

image-20200331102713392

The datapath and control

image-20200331120145444
image-20200331120240686

Overview

  • problem: single monolithic bloc
  • solution: break up the process of excuting an instruction into stages
    • smaller stages are easier to design –
    • easy to optimize &modularity

five stages

image-20200331102935318

  • Instruction Fetch
    • pc+=4
    • take full use of mem hierarchy
  • Instruction Decode
  • read the opcode to determine instruction type and field lengths
  • second, (at the same time!) read in data from all necessary registers
    • for add, read two registers
    • for addi, read one register
  • third, generate the immediates
  • ALU
    • the real work of most instructions is done here: arithmetic (+, -, *, /), shifting, logic (&, |)
    • For instance, it's load and store
      • lw t0, 40(t1)
      • the address we are accessing in memory = the value in t1 PLUS the value 40
      • so we do this addition in this stage
  • Mem access
    • Actually only the load and store instruction do anything during this stage.
    • it's fast but unavoidable
  • Register write
    • write the result of some computation into a register.
    • for stores and brances idle

      misc

      image-20200331200948199

    image-20200331104008311

    memory alignment

    image-20200331201039177

data path elements state and sequencing

register

image-20200331112454437

instruction level

add

image-20200331105222087image-20200331105422860

image-20200331110018593

time diagram for add

image-20200331111255018

addi

image-20200331120334870

lw

image-20200331202936147

image-20200331203002110

sw - critical path

image-20200331203121826

image-20200331120529637

combined I+S Immediate generation

image-20200331203242961

branches

image-20200331120922178

image-20200331112154728

image-20200331114914251

pipelining

image-20200331120724483

summary

image-20200331180540752

quiz

image-20200331181726584

for D

image-20200331181714589image-20200331181804796image-20200331181819265

for E

image-20200331182129638

CS 150 is composed of the TsingHua's 计算机组成原理 and VSLI

CS 150 is composed of the TsingHua's 计算机组成原理 and VSLI

source:http://www-inst.eecs.berkeley.edu/~cs150/sp13/agenda/

image-20200326172234625

They studied verilog

FSM is really similar to our homework. actually, I found that their exams is given to us for homework.

P.S.: WHY they have spring break?image-20200326172618489

image-20200326172816315

run a linux on FPGA. and implement a CPU. 俗称“造机”。

If I have time, I'll implement the CPU by my own.

GPU applying the RISC-V architecture will be much better and cooler.

The more I read and learn , the more I found the knowledge in the EECS category can be nicely cascaded. In most of the cases, we get the point from another place and we can be taught within minutes.

After talking with upperclassman 罗浩聪, I guadully find my career as an PhD in Arch or PL, or sth, in between. But for now, let me finish the 2 cooperate papers.

[Computer Architecture] Circuit

intro

image-20200325123544978

synchronous digital systems

Hardware of a processor should be synchronous and digital

The hardware layer of abstraction

image-20200325123706861

switches

image-20200325123737670

arrow shows action if wire changes to 1 or is asserted.

and & or

image-20200325123840940

Transistors

High voltage$V_{dd}$ Low voltage \(Gnd\)

PWD

we implement switches using CMOS

CMOS Transistor Networkimage-20200325124156574

image-20200325124245875vsimage-20200325124301345

note PWD

material:image-20200325124415901

how MOSFET Operation

image-20200325124509056
image-20200325124827736

CMOS Networks

2 input network

image-20200325141011777

combined logit symbols

image-20200325141058284

example debugging waveform

image-20200325140800590
image-20200325140219499

types of circuits

CL

SL

image-20200325140345618

example using time chart

image-20200325140122110
image-20200325140122110

[Computer Architechture] call

Language Execution continuum

An interpreter is a program that executes other programs.

  1. langurage translation gives us another option
  2. In genreral, we interpret a high-level language to a lower-level language to increase preformance

image-20200320143508476

interpretation

  1. Python
  2. asm

Assembler directives

image-20200320143656584

pseudo-instruction replacement

image-20200320143736406

what's tail's about
{...
 lots of code
 return foo(y);
}
  • it's recursive call to foo() if this is within foo(), or call to a different function...
  • for effictiency
    • Evaluate the arguments for foo() in a0-a7
    • return ra, all callee saved registers and sp
    • Then call foo() with j or tail
  • when foo() returns, it can return directly to where it needs to return to
    • Rather than returning to wherever foo() was called and returning from there
    • Tail call optimization

branches

image-20200320144520209

Linking progress

image-20200320150133861

image-20200320150156550

Four types of Address

PC- relative addredding (beq)

Absolute addresses in RISC-V

image-20200320150357128

Loading process

image-20200320150929601

Running a Program - CALL(Compiling, Assembling, Linking, and Loading)

Clarifications

Project 1 RISC-v emulator

  1. Risc-v isa does not define assembly syntax
    behave exactly like Venus
  2. All I-Type instructions (including sltiu)
    • do sign-extension
    • (in venus) input number is signed, even if hex

a good tool

Interpretation

  1. any good reason to interpret machine language in software? debbuger Venus
  2. translated/compiled code almost always more efficitent and therfore higher performance:
    • important for many applications, particularly operating systems.

Steps in compiling a C program

Psuedo-instruciton Replacement

Producing Machine Language

  • simple case
    • arithmetic, logical, shifts and so on
    • All necessary info is within the instruction already
  • branches can be optimized

[Computer Architecture] Numbers Notes

big IDEAs

  1. Abstraction
  2. Moore's Law
  3. Principle of Locality/Memory Hierarchy
  4. Parallelism
  5. Performance Measurement and Improvement
  6. Dependability VIA Redundancy

old conventional wisdom

Moore's Law t+ Dennard Scaling = faster, cheaper, low-power

signed &unsigned Intergers

unsigned

e.g. for unsigned int: adresses
0000 0000 0000 0001\(_{two}\) = \(1_{ten}\)
0111 1111 1111 1111\(_{two}\) = \(2^{11}-1\ _{ten}\)

signed

e.g. for signed int: int x,y,z;
1000 0000 0000 0000\(_{two}\) = \(-2^{11}\ _{ten}\)

main idea

want \(\frac{1}{2}\) of the int >=0, \(\frac{1}{2}\) of the int <0

two complement

basic ideas

for 3\(_{ten}\)=0011\(_{two}\)
for 10000\(_{two}\)-0011\(_{two}\)=1\(_{two}\)+1111\(_{two}\)-0011\(_{two}\)=1101\(_{two}\)

more e.g.

Assume for simplicity 4 bit width, -8 to +7
represented
PNG图像
There's an overflow here
Overflow when magnitude of result too big to fit into result representation
Carry in = carry form less significant bits
Carry out = carry to more significant bits

take care of the MSB(Most significant bits)
to detect overflow is to check whether carry in = carry out in the MSB

summary