lec2a

download lec2a

of 39

Transcript of lec2a

  • 7/29/2019 lec2a

    1/39

  • 7/29/2019 lec2a

    2/39

    MIPS-32 ISA

    Instruction Categories

    z ComputationalR0 - R31

    Registers

    z Jump and Branch

    z Floating Point

    - coprocessor

    z Memory Management

    z

    PC

    HI

    LO

    3 Instruction Formats: all 32 bits wide

    op

    op

    rs rt rd sa funct

    rs rt immediate

    R format

    I format

    CSE431 Chapter 2.2 Irwin, PSU, 2008

    op jump target J format

  • 7/29/2019 lec2a

    3/39

    MIPS (RISC) Design Principles

    Simplicity favors regularity

    z fixed size instructions

    z

    z opcode always the first 6 bits

    Smaller is faster

    z limited instruction set

    z limited number of registers in register filez m e num er o a ress ng mo es

    Make the common case fast

    -z allow instructions to contain immediate operands

    CSE431 Chapter 2.3 Irwin, PSU, 2008

    z

    three instruction formats

  • 7/29/2019 lec2a

    4/39

    MIPS Arithmetic Instructions

    MIPS assembly language arithmetic statement

    add $t 0, $s1, $s2

    sub $t 0, $s1, $s2

    Each arithmetic instruction erforms one o eration

    Each specifies exactly three operands that are allcontained in the datapaths register file ($t 0, $s1, $s2)

    destination source1 op source2

    0 17 18 8 0 0x22

    CSE431 Chapter 2.4 Irwin, PSU, 2008

  • 7/29/2019 lec2a

    5/39

    MIPS Arithmetic Instructions

    MIPS assembly language arithmetic statement

    add $t 0, $s1, $s2

    sub $t 0, $s1, $s2

    Each arithmetic instruction erforms one o eration

    Each specifies exactly three operands that are allcontained in the datapaths register file ($t 0, $s1, $s2)

    destination source1 op source2

    0 17 18 8 0 0x22

    CSE431 Chapter 2.5 Irwin, PSU, 2008

  • 7/29/2019 lec2a

    6/39

    MIPS Instruction Fields

    MIPS fields are given names to make them easier torefer to

    op rs rt rd shamt funct

    op 6-bits opcode that specifies the operation

    rs 5-bits register file address of the first source operand-

    rd 5-bits register file address of the results destination

    shamt 5-bits shift amount (for shift instructions)

    funct 6-bits function code augmenting the opcode

    CSE431 Chapter 2.6 Irwin, PSU, 2008

  • 7/29/2019 lec2a

    7/39

    MIPS Register File

    Re ister File

    src1 addr

    32 bits

    src1data

    325

    o s r y- wo - reg s ers

    z Two read ports and

    z One write ortsrc a r

    dst addrsrc2

    32

    locations

    32

    5

    32 Registers are

    z Fasterthan main memory

    - But register files with more locationsare slower (e.g., a 64 word file could write controlbe as much as 5 slower than a word ile

    - Read/write port increase impacts speed quadratically

    z Easier for a compiler to use

    - e.g., (A*B) (C*D) (E*F) can do multiplies in any order vs.stack

    z Can hold variables so that

    CSE431 Chapter 2.7 Irwin, PSU, 2008

    - code density improves (since register are named with fewer bits

    than a memory location)

  • 7/29/2019 lec2a

    8/39

    Aside: MIPS Register Convention

    Name Register Number

    Usage Preserveon call?

    . .

    $at 1 reserved for assembler n.a.$v0 - $v1 2-3 returned values no

    $a0 - $a3 4-7 arguments yes

    $t0 - $t7 8-15 temporaries no$s0 - $s7 16-23 saved values yes

    $t8 - $t9 24-25 temporaries no

    $sp 29 stack pointer yes

    $f 30 frame ointer es

    CSE431 Chapter 2.8 Irwin, PSU, 2008

    $ra 31 return addr (hardware) yes

  • 7/29/2019 lec2a

    9/39

  • 7/29/2019 lec2a

    10/39

    Machine Language - Load Instruction

    oa ore ns ruc on orma orma :

    l w $t 0, 24( $s3)

    35 19 8 2410

    Memory

    0xf f f f f f f f2410 + $s3 =

    0x12004094

    . . . 0001 1000+ . . . 1001 0100

    =

    0x120040ac$t 0

    0x000000080x0000000c

    . . .0x120040ac

    CSE431 Chapter 2.10 Irwin, PSU, 2008data word address (hex)

    0x00000000x

  • 7/29/2019 lec2a

    11/39

    Byte Addresses

    Since 8-bit bytes are so useful, most architecturesaddress individual bytes in memory

    z Alignment restriction - the memory address of a word must beon natural word boundaries (a multiple of 4 in MIPS-32)

    Big Endian: leftmost byte is word address

    IBM 360/370, Motorola 68k, MIPS, Sparc, HP PA

    Little Endian: rightmost byte is word addressn e x , ax, p a n ows

    3 2 1 0lit tle endian byte 0

    msb lsb

    0 1 2 3

    CSE431 Chapter 2.11 Irwin, PSU, 2008

  • 7/29/2019 lec2a

    12/39

    Aside: Loading and Storing Bytes

    MIPS provides special instructions to move bytes

    l b $t 0, 1( $s3) #l oad byt e f r om memor y

    sb $t 0, 6( $s3) #st or e byt e t o memor y

    0x28 19 8 16 bit offset

    a s ge oa e an s ore

    z load byte places the byte from memory in the rightmost 8 bits ofthe destination register

    - what happens to the other bits in the register?

    z store byte takes the byte from the rightmost 8 bits of a register

    CSE431 Chapter 2.12 Irwin, PSU, 2008

    - what happens to the other bits in the memory word?

  • 7/29/2019 lec2a

    13/39

    MIPS Immediate Instructions

    Possible approaches?

    z create hard-wired registers (like $zero) for constants like 1z have special instructions that contain constants !

    addi $sp, $sp, 4 #$sp = $sp + 4

    sl t i t 0 s2 15 # t 0 = 1 i f s2

  • 7/29/2019 lec2a

    14/39

    Aside: How About Larger Constants?

    e a so e o e a e o oa a cons an n o aregister, for this we must use two instructions

    " "

    l ui $t 0, 1010101010101010

    Then must get the lower order bits right, use

    16 0 8 10101010101010102

    or i $t 0, $t 0, 1010101010101010

    0000000000000000 1010101010101010

    CSE431 Chapter 2.14 Irwin, PSU, 2008

    1010101010101010 1010101010101010

  • 7/29/2019 lec2a

    15/39

    Review: Unsigned Binary Representation

    Hex Binary Decimal

    0x00000000 00000 0

    0x00000001 00001 1 231 230 229 . . . 23 22 21 20 bit wei ht

    0x00000002 00010 20x00000003 00011 3

    1 1 1 . . . 1 1 1 1 bit

    31 30 29 . . . 3 2 1 0 bit position

    0x00000005 00101 5

    0x00000006 00110 61 0 0 0 . . . 0 0 0 0 - 1

    0x00000008 01000 8

    0x00000009 01001 9232 - 1

    0xFFFFFFFC 11100

    0xFFFFFFFD 11101 232 - 3

    232 - 4

    CSE431 Chapter 2.15 Irwin, PSU, 2008

    x

    0xFFFFFFFF 11111 232 - 1

    -

  • 7/29/2019 lec2a

    16/39

    Review: Signed Binary Representation

    1000 -8

    1001 -7-(23 - 1) =

    -23 =

    -

    1011 -5

    1100 -4complement all the bits

    1101 -3

    1110 -2

    1111 -1

    1011

    and add a 1

    0101

    0000 0

    0001 11010

    an a a

    0110

    0011 3

    0100 4

    complement all the bits

    CSE431 Chapter 2.16 Irwin, PSU, 2008

    0110 6

    0111 723 - 1 =

  • 7/29/2019 lec2a

    17/39

    MIPS Shift Operations

    -32-bit words

    sl l $t 2, $s0, 8 #$t 2 = $s0 > 8 bi t s

    Instruction Format (R format)

    0 16 10 8 0x00

    uc s s are ca e og ca ecause ey wzeros

    z Notice that a 5-bit shamt field is enou h to shift a 32-bit value

    CSE431 Chapter 2.17 Irwin, PSU, 2008

    25 1 or31 bit positions

  • 7/29/2019 lec2a

    18/39

    MIPS Logical Operations

    -MIPS ISA

    and $t 0, $t 1, $t 2 #$t 0 = $t 1 & $t 2

    or $t 0, $t 1, $t 2 #$t 0 = $t 1 | $t 2nor $t 0, $t 1, $t 2 #$t 0 = not ( $t 1 | $t 2)

    Instruction Format (R format)

    0 9 10 8 0 0x24

    andi $t 0, $t 1, 0xFF00 #$t 0 = $t 1 & f f 00

    or , , x = Instruction Format (I format)

    CSE431 Chapter 2.18 Irwin, PSU, 2008

    0x0D 9 8 0xFF00

  • 7/29/2019 lec2a

    19/39

    MIPS Control Flow Instructions

    bne $s0, $s1, Lbl #go t o Lbl i f $s0$s1be s0 s1 Lbl # o t o Lbl i f s0= s1

    z Ex: i f ( i ==j ) h = i + j ;

    bne $s0, $s1, Lbl 1

    add $s3, $s0, $s1

    Lbl 1: . . .

    Instruction Format (I format):

    0x05 16 17 16 bit offset

    How is the branch destination address s ecified?

    CSE431 Chapter 2.19 Irwin, PSU, 2008

  • 7/29/2019 lec2a

    20/39

    Specifying Branch Destinations

    -z which register? Instruction Address Register (the PC)

    - its use is automatically implied by instruction

    - PC gets updated (PC+4) during the fetch cycle so that it holds the

    address of the next instruction

    z limits the branch distance to -215 to +215-1 (word) instructions frome ns ruc on a er e ranc ns ruc on, u mos ranc es are

    local anyway

    from the low order 16 bits of the branch instruction

    offset

    16

    sign-extend

    PCAdd

    32 32

    3232branch dst

    address

    Add

    CSE431 Chapter 2.20 Irwin, PSU, 2008

    32 ?4 32

  • 7/29/2019 lec2a

    21/39

    In Support of Branch Instructions

    We have beq, bne, but what about other kinds ofbranches (e.g., branch-if-less-than)? For this, we need yetanother instruction, sl t

    Set on less than instruction:sl t $t 0, $s0, $s1 # i f $s0 < $s1 t hen

    # $t 0 = 1 el se# $t 0 = 0

    Instruction format R format :

    Alternate versions ofsl t

    0 16 17 8 0x24

    sl t i $t 0, $s0, 25 # i f $s0 < 25 t hen $t 0=1 . . .

    sl t u $t 0, $s0, $s1 # i f $s0 < $s1 t hen $t 0=1 . . .

    CSE431 Chapter 2.21 Irwin, PSU, 2008

    sl t i u $t 0, $s0, 25 # i f $s0 < 25 t hen $t 0=1 . . .

    2

  • 7/29/2019 lec2a

    22/39

    Aside: More Branch Instructions

    , , ,register$zer o to create other conditions

    z less than bl t $s1, $s2, Label

    sl t $at , $s1, $s2 #$at set t o 1 i f bne $at , $zer o, Label #$s1 < $s2

    z less than or equal to bl e $s1, $s2, Label

    z greater than bgt $s1, $s2, Labelz great than or equal to bge $s1, $s2, Label

    pseudo instructions - recognized (and expanded) by theassembler

    CSE431 Chapter 2.22 Irwin, PSU, 2008

    z Its why the assembler needs a reserved register ($at )

  • 7/29/2019 lec2a

    23/39

    Bounds Check Shortcut

    Treating signed numbers as if they were unsigned givesa low cost way of checking if 0 x < y (index out ofbounds for arrays)

    sl t u $t 0, $s1, $t 2 # $t 0 = 0 i f# $s1 > $t 2 ( max)

    beq $t 0, $zer o, I OOB # go t o I OOB i f

    # $t 0 = 0

    The ke is that ne ative inte ers in twos com lement

    look like large numbers in unsigned notation. Thus, anunsigned comparison of x < y also checks if x is negativeas well as if x is less than .

    CSE431 Chapter 2.23 Irwin, PSU, 2008

  • 7/29/2019 lec2a

    24/39

  • 7/29/2019 lec2a

    25/39

  • 7/29/2019 lec2a

    26/39

    Instructions for Accessing Procedures

    j al Pr ocedur eAddr ess #j ump and l i nk

    Saves PC+4 in register $ra to have a link to the nextinstruction for the procedure return

    Machine format (J format):

    0x03 26 bit address

    Then can do procedure return with a

    r r a r et ur n Instruction format (R format):

    CSE431 Chapter 2.26 Irwin, PSU, 2008

    0 31 0x08

  • 7/29/2019 lec2a

    27/39

    Six Steps in Execution of a Procedure

    . where the procedure (callee) can access them

    z $a0 - $a3: fourargument registers

    2. Callertransfers control to the callee3. Callee ac uires the stora e resources needed

    4. Callee performs the desired task

    . callercan access it

    z $v0 - $v1: two value registers for result values

    6. Callee returns control to the callerz $r a: one return address register to return to the point of origin

    CSE431 Chapter 2.27 Irwin, PSU, 2008

  • 7/29/2019 lec2a

    28/39

    Aside: Spilling Registers

    allocated to argument and return values?

    z callee uses a stack a last-in-first-out queue

    hi gh addr One of the general registers, $sp$29 , is used to address the stack

    $sp

    (which grows from high addressto low address)

    t op of st ackz a a a on o e s ac pus

    $sp = $sp 4data on stack at new $sp

    z remove data from the stack pop

    data from stack at $sp

    CSE431 Chapter 2.28 Irwin, PSU, 2008

    l ow addr $sp = $sp + 4

  • 7/29/2019 lec2a

    29/39

    Aside: Allocating Space on the Stack

    containing a proceduressaved registers and localhi gh addr

    var a es s s proce ure

    frame (aka activation record)z The frame pointer ($f p) points

    Saved ar gumentr egs ( i f any)

    $f p

    o e rs wor o e rame o aprocedure providing a stablebase register for the procedure

    -

    Saved l ocal r egs( i f any)

    Local ar r ays &

    call and $sp is restored using$f p on a return

    $sp

    st r uct ur es ( i f any)

    l ow addr

    CSE431 Chapter 2.29 Irwin, PSU, 2008

  • 7/29/2019 lec2a

    30/39

    Aside: Allocating Space on the Heap

    Static data segment forconstants and other staticvariables (e.g., arrays)

    Memory0x 7f f f f f f c$sp

    Dynamic data segment(aka heap) for structures. .,

    linked lists)

    z Allocate space on the heap

    Dynamic data

    (heap)w ma oc an reewith f r ee( ) in C

    Text

    Static data 0x 1000 0000x

    0x 0000 0000

    Reserved0x 0040 0000PC

    CSE431 Chapter 2.30 Irwin, PSU, 2008

  • 7/29/2019 lec2a

    31/39

    MIPS Instruction Classes Distribution

    Frequency of MIPS instruction classes for SPEC2006

    InstructionClass

    FrequencyInteger Ft. Pt.

    Arithmetic 16% 48%

    Data transfer 35% 36%

    Logical 12% 4%

    Cond. Branch 34% 8%

    Jump 2% 0%

    CSE431 Chapter 2.31 Irwin, PSU, 2008

  • 7/29/2019 lec2a

    32/39

    Atomic Exchange Support

    Need hardware support for synchronization mechanismsto avoid data races where the results of the program canchange depending on how events happen to occur

    z Two memory accesses from different threads to the same

    location, and at least one is a write

    in a register for a value in memory atomically, i.e., as one

    operation (instruction)z Implementing an atomic exchange would require both a memory

    read and a memory write in a single, uninterruptable instruction.An alternative is to have a pair of specially configuredinstructions

    l l $t 1, 0( $s1) #l oad l i nked

    CSE431 Chapter 2.32 Irwin, PSU, 2008

    sc $t 0, 0( $s1) #st or e condi t i onal

  • 7/29/2019 lec2a

    33/39

    Automic Exchange with ll and sc

    t e contents o t e memory ocat on spec e y t el l are changed before the sc to the same addressoccurs, the sc fails (returns a zero)

    t r y: add $t 0, $zer o, $s4 #$t 0=$s4 ( exchange val ue)l l $t 1, 0( $s1) #l oad memor y val ue t o $t 1

    sc $t 0, 0( $s1) #t r y t o st or e exchange#val ue t o memor y, i f f ai l

    #$t 0 wi l l be 0eq , zer o, r y r y aga n on a ur e

    add $s4, $zer o, $t 1 #l oad val ue i n $s4

    If the value in memory between the l l and the scinstructions changes, then sc returns a 0 in $t 0 causing

    CSE431 Chapter 2.33 Irwin, PSU, 2008

    e co e sequence o ry aga n.

  • 7/29/2019 lec2a

    34/39

    The C Code Translation Hierarchy

    C program

    compiler

    assembly code

    assembler

    object code library routines

    linker

    executable

    loader

    machine code

    CSE431 Chapter 2.34 Irwin, PSU, 2008memory

  • 7/29/2019 lec2a

    35/39

    Compiler Benefits

    Comparing performance for bubble (exchange) sortz To sort 100,000 words with the array initialized to random values

    on a Pentium 4 with a 3.06 clock rate, a 533 MHz system bus,w o , us ng nux vers on . .

    gcc opt Relative Clock Instr count CPI

    None 1.00 158,615 114,938 1.38

    O1 (medium) 2.37 66,990 37,470 1.79O2 (full) 2.38 66,521 39,993 1.66

    O3 (proc mig) 2.41 65,747 44,993 1.46

    The unoptimized code has the best CPI, the O1 versionhas the lowest instruction count, but the O3 version is thefastest. Why?

    CSE431 Chapter 2.35 Irwin, PSU, 2008

  • 7/29/2019 lec2a

    36/39

    The Java Code Translation Hierarchy

    Java program

    compiler

    compiler

    Machine

    Compiled Java methods (machine code)

    CSE431 Chapter 2.36 Irwin, PSU, 2008

  • 7/29/2019 lec2a

    37/39

    Sorting in C versus Java

    Comparing performance for two sort algorithms in C andJava

    z The JVM/JIT is Sun/Hotspot version 1.3.1/1.3.1

    Method Opt Bubble Quick Speedupuick vsbubbleperformance

    C Compiler None 1.00 1.00 2468

    . .

    C Compiler O2 2.38 1.50 1555

    C Compiler O3 2.41 1.91 1955

    Java Interpreted 0.12 0.05 1050Java JIT compiler 2.13 0.29 338

    CSE431 Chapter 2.37 Irwin, PSU, 2008

    Observations?

  • 7/29/2019 lec2a

    38/39

    Addressing Modes Illustrated1. Register addressing

    op rs rt rd funct Register

    word operand2. Base (displacement) addressing

    op rs rt offset

    base re ister

    Memory

    word orbyte operand

    3. Immediate addressing

    op rs rt operand

    -.

    op rs rt offset Memory

    branch destination instruction

    Program Counter (PC)

    5. Pseudo-direct addressing

    op jump address Memory

    CSE431 Chapter 2.38 Irwin, PSU, 2008

    Program Counter (PC)

    jump destination instruction||

  • 7/29/2019 lec2a

    39/39