lec2a

7/29/2019 lec2a

1/39

7/29/2019 lec2a

2/39

MIPS-32 ISA

Instruction Categories

z ComputationalR0 - R31

Registers

z Jump and Branch

z Floating Point

- coprocessor

z Memory Management

z

PC

HI

LO

3 Instruction Formats: all 32 bits wide

op

op

rs rt rd sa funct

rs rt immediate

R format

I format

CSE431 Chapter 2.2 Irwin, PSU, 2008

op jump target J format

7/29/2019 lec2a

3/39

MIPS (RISC) Design Principles

Simplicity favors regularity

z fixed size instructions

z

z opcode always the first 6 bits

Smaller is faster

z limited instruction set

z limited number of registers in register filez m e num er o a ress ng mo es

Make the common case fast

-z allow instructions to contain immediate operands


z

three instruction formats

7/29/2019 lec2a

4/39

MIPS Arithmetic Instructions

MIPS assembly language arithmetic statement

add $t 0, $s1, $s2

sub $t 0, $s1, $s2

Each arithmetic instruction erforms one o eration

Each specifies exactly three operands that are allcontained in the datapaths register file ($t 0, $s1, $s2)

destination source1 op source2

0 17 18 8 0 0x22


7/29/2019 lec2a

5/39

MIPS Arithmetic Instructions

MIPS assembly language arithmetic statement

add $t 0, $s1, $s2

sub $t 0, $s1, $s2

Each arithmetic instruction erforms one o eration

Each specifies exactly three operands that are allcontained in the datapaths register file ($t 0, $s1, $s2)

destination source1 op source2

0 17 18 8 0 0x22


7/29/2019 lec2a

6/39

MIPS Instruction Fields

MIPS fields are given names to make them easier torefer to

op rs rt rd shamt funct

op 6-bits opcode that specifies the operation

rs 5-bits register file address of the first source operand-

rd 5-bits register file address of the results destination

shamt 5-bits shift amount (for shift instructions)

funct 6-bits function code augmenting the opcode


7/29/2019 lec2a

7/39

MIPS Register File

Re ister File

src1 addr

32 bits

src1data

325

o s r y- wo - reg s ers

z Two read ports and

z One write ortsrc a r

dst addrsrc2

32

locations

32

5

32 Registers are

z Fasterthan main memory

- But register files with more locationsare slower (e.g., a 64 word file could write controlbe as much as 5 slower than a word ile

- Read/write port increase impacts speed quadratically

z Easier for a compiler to use

- e.g., (A*B) (C*D) (E*F) can do multiplies in any order vs.stack

z Can hold variables so that


- code density improves (since register are named with fewer bits

than a memory location)

7/29/2019 lec2a

8/39

Aside: MIPS Register Convention

Name Register Number

Usage Preserveon call?

. .

$at 1 reserved for assembler n.a.$v0 - $v1 2-3 returned values no

$a0 - $a3 4-7 arguments yes

$t0 - $t7 8-15 temporaries no$s0 - $s7 16-23 saved values yes

$t8 - $t9 24-25 temporaries no

$sp 29 stack pointer yes

$f 30 frame ointer es


$ra 31 return addr (hardware) yes

7/29/2019 lec2a

9/39

7/29/2019 lec2a

10/39

Machine Language - Load Instruction

oa ore ns ruc on orma orma :

l w $t 0, 24( $s3)

35 19 8 2410

Memory

0xf f f f f f f f2410 + $s3 =

0x12004094

. . . 0001 1000+ . . . 1001 0100

=

0x120040ac$t 0

0x000000080x0000000c

. . .0x120040ac

CSE431 Chapter 2.10 Irwin, PSU, 2008data word address (hex)

0x00000000x

7/29/2019 lec2a

11/39

Byte Addresses

Since 8-bit bytes are so useful, most architecturesaddress individual bytes in memory

z Alignment restriction - the memory address of a word must beon natural word boundaries (a multiple of 4 in MIPS-32)

Big Endian: leftmost byte is word address

IBM 360/370, Motorola 68k, MIPS, Sparc, HP PA

Little Endian: rightmost byte is word addressn e x , ax, p a n ows

3 2 1 0lit tle endian byte 0

msb lsb

0 1 2 3


7/29/2019 lec2a

12/39

Aside: Loading and Storing Bytes

MIPS provides special instructions to move bytes

l b $t 0, 1( $s3) #l oad byt e f r om memor y

sb $t 0, 6( $s3) #st or e byt e t o memor y

0x28 19 8 16 bit offset

a s ge oa e an s ore

z load byte places the byte from memory in the rightmost 8 bits ofthe destination register

- what happens to the other bits in the register?

z store byte takes the byte from the rightmost 8 bits of a register


- what happens to the other bits in the memory word?

7/29/2019 lec2a

13/39

MIPS Immediate Instructions

Possible approaches?

z create hard-wired registers (like $zero) for constants like 1z have special instructions that contain constants !

addi $sp, $sp, 4 #$sp = $sp + 4

sl t i t 0 s2 15 # t 0 = 1 i f s2

7/29/2019 lec2a

14/39

Aside: How About Larger Constants?

e a so e o e a e o oa a cons an n o aregister, for this we must use two instructions

" "

l ui $t 0, 1010101010101010

Then must get the lower order bits right, use

16 0 8 10101010101010102

or i $t 0, $t 0, 1010101010101010

0000000000000000 1010101010101010


1010101010101010 1010101010101010

7/29/2019 lec2a

15/39

Review: Unsigned Binary Representation

Hex Binary Decimal

0x00000000 00000 0

0x00000001 00001 1 231 230 229 . . . 23 22 21 20 bit wei ht

0x00000002 00010 20x00000003 00011 3

1 1 1 . . . 1 1 1 1 bit

31 30 29 . . . 3 2 1 0 bit position

0x00000005 00101 5

0x00000006 00110 61 0 0 0 . . . 0 0 0 0 - 1

0x00000008 01000 8

0x00000009 01001 9232 - 1

0xFFFFFFFC 11100

0xFFFFFFFD 11101 232 - 3

232 - 4


x

0xFFFFFFFF 11111 232 - 1

-

7/29/2019 lec2a

16/39

Review: Signed Binary Representation

1000 -8

1001 -7-(23 - 1) =

-23 =

-

1011 -5

1100 -4complement all the bits

1101 -3

1110 -2

1111 -1

1011

and add a 1

0101

0000 0

0001 11010

an a a

0110

0011 3

0100 4

complement all the bits


0110 6

0111 723 - 1 =

7/29/2019 lec2a

17/39

MIPS Shift Operations

-32-bit words

sl l $t 2, $s0, 8 #$t 2 = $s0 > 8 bi t s

Instruction Format (R format)

0 16 10 8 0x00

uc s s are ca e og ca ecause ey wzeros

z Notice that a 5-bit shamt field is enou h to shift a 32-bit value


25 1 or31 bit positions

7/29/2019 lec2a

18/39

MIPS Logical Operations

-MIPS ISA

and $t 0, $t 1, $t 2 #$t 0 = $t 1 & $t 2

or $t 0, $t 1, $t 2 #$t 0 = $t 1 | $t 2nor $t 0, $t 1, $t 2 #$t 0 = not ( $t 1 | $t 2)

Instruction Format (R format)

0 9 10 8 0 0x24

andi $t 0, $t 1, 0xFF00 #$t 0 = $t 1 & f f 00

or , , x = Instruction Format (I format)


0x0D 9 8 0xFF00

7/29/2019 lec2a

19/39

MIPS Control Flow Instructions

bne $s0, $s1, Lbl #go t o Lbl i f $s0$s1be s0 s1 Lbl # o t o Lbl i f s0= s1

z Ex: i f ( i ==j ) h = i + j ;

bne $s0, $s1, Lbl 1

add $s3, $s0, $s1

Lbl 1: . . .

Instruction Format (I format):

0x05 16 17 16 bit offset

How is the branch destination address s ecified?


7/29/2019 lec2a

20/39

Specifying Branch Destinations

-z which register? Instruction Address Register (the PC)

- its use is automatically implied by instruction

- PC gets updated (PC+4) during the fetch cycle so that it holds the

address of the next instruction

z limits the branch distance to -215 to +215-1 (word) instructions frome ns ruc on a er e ranc ns ruc on, u mos ranc es are

local anyway

from the low order 16 bits of the branch instruction

offset

16

sign-extend

PCAdd

32 32

3232branch dst

address

Add


32 ?4 32

7/29/2019 lec2a

21/39

In Support of Branch Instructions

We have beq, bne, but what about other kinds ofbranches (e.g., branch-if-less-than)? For this, we need yetanother instruction, sl t

Set on less than instruction:sl t $t 0, $s0, $s1 # i f $s0 < $s1 t hen

# $t 0 = 1 el se# $t 0 = 0

Instruction format R format :

Alternate versions ofsl t

0 16 17 8 0x24

sl t i $t 0, $s0, 25 # i f $s0 < 25 t hen $t 0=1 . . .

sl t u $t 0, $s0, $s1 # i f $s0 < $s1 t hen $t 0=1 . . .


sl t i u $t 0, $s0, 25 # i f $s0 < 25 t hen $t 0=1 . . .

2

7/29/2019 lec2a

22/39

Aside: More Branch Instructions

, , ,register$zer o to create other conditions

z less than bl t $s1, $s2, Label

sl t $at , $s1, $s2 #$at set t o 1 i f bne $at , $zer o, Label #$s1 < $s2

z less than or equal to bl e $s1, $s2, Label

z greater than bgt $s1, $s2, Labelz great than or equal to bge $s1, $s2, Label

pseudo instructions - recognized (and expanded) by theassembler


z Its why the assembler needs a reserved register ($at )

7/29/2019 lec2a

23/39

Bounds Check Shortcut

Treating signed numbers as if they were unsigned givesa low cost way of checking if 0 x < y (index out ofbounds for arrays)

sl t u $t 0, $s1, $t 2 # $t 0 = 0 i f# $s1 > $t 2 ( max)

beq $t 0, $zer o, I OOB # go t o I OOB i f

# $t 0 = 0

The ke is that ne ative inte ers in twos com lement

look like large numbers in unsigned notation. Thus, anunsigned comparison of x < y also checks if x is negativeas well as if x is less than .


7/29/2019 lec2a

24/39

7/29/2019 lec2a

25/39

7/29/2019 lec2a

26/39

Instructions for Accessing Procedures

j al Pr ocedur eAddr ess #j ump and l i nk

Saves PC+4 in register $ra to have a link to the nextinstruction for the procedure return

Machine format (J format):

0x03 26 bit address

Then can do procedure return with a

r r a r et ur n Instruction format (R format):


0 31 0x08

7/29/2019 lec2a

27/39

Six Steps in Execution of a Procedure

. where the procedure (callee) can access them

z $a0 - $a3: fourargument registers

2. Callertransfers control to the callee3. Callee ac uires the stora e resources needed

4. Callee performs the desired task

. callercan access it

z $v0 - $v1: two value registers for result values

6. Callee returns control to the callerz $r a: one return address register to return to the point of origin


7/29/2019 lec2a

28/39

Aside: Spilling Registers

allocated to argument and return values?

z callee uses a stack a last-in-first-out queue

hi gh addr One of the general registers, $sp$29 , is used to address the stack

$sp

(which grows from high addressto low address)

t op of st ackz a a a on o e s ac pus

$sp = $sp 4data on stack at new $sp

z remove data from the stack pop

data from stack at $sp


l ow addr $sp = $sp + 4

7/29/2019 lec2a

29/39

Aside: Allocating Space on the Stack

containing a proceduressaved registers and localhi gh addr

var a es s s proce ure

frame (aka activation record)z The frame pointer ($f p) points

Saved ar gumentr egs ( i f any)

$f p

o e rs wor o e rame o aprocedure providing a stablebase register for the procedure

-

Saved l ocal r egs( i f any)

Local ar r ays &

call and $sp is restored using$f p on a return

$sp

st r uct ur es ( i f any)

l ow addr


7/29/2019 lec2a

30/39

Aside: Allocating Space on the Heap

Static data segment forconstants and other staticvariables (e.g., arrays)

Memory0x 7f f f f f f c$sp

Dynamic data segment(aka heap) for structures. .,

linked lists)

z Allocate space on the heap

Dynamic data

(heap)w ma oc an reewith f r ee( ) in C

Text

Static data 0x 1000 0000x

0x 0000 0000

Reserved0x 0040 0000PC


7/29/2019 lec2a

31/39

MIPS Instruction Classes Distribution

Frequency of MIPS instruction classes for SPEC2006

InstructionClass

FrequencyInteger Ft. Pt.

Arithmetic 16% 48%

Data transfer 35% 36%

Logical 12% 4%

Cond. Branch 34% 8%

Jump 2% 0%


7/29/2019 lec2a

32/39

Atomic Exchange Support

Need hardware support for synchronization mechanismsto avoid data races where the results of the program canchange depending on how events happen to occur

z Two memory accesses from different threads to the same

location, and at least one is a write

in a register for a value in memory atomically, i.e., as one

operation (instruction)z Implementing an atomic exchange would require both a memory

read and a memory write in a single, uninterruptable instruction.An alternative is to have a pair of specially configuredinstructions

l l $t 1, 0( $s1) #l oad l i nked


sc $t 0, 0( $s1) #st or e condi t i onal

7/29/2019 lec2a

33/39

Automic Exchange with ll and sc

t e contents o t e memory ocat on spec e y t el l are changed before the sc to the same addressoccurs, the sc fails (returns a zero)

t r y: add $t 0, $zer o, $s4 #$t 0=$s4 ( exchange val ue)l l $t 1, 0( $s1) #l oad memor y val ue t o $t 1

sc $t 0, 0( $s1) #t r y t o st or e exchange#val ue t o memor y, i f f ai l

#$t 0 wi l l be 0eq , zer o, r y r y aga n on a ur e

add $s4, $zer o, $t 1 #l oad val ue i n $s4

If the value in memory between the l l and the scinstructions changes, then sc returns a 0 in $t 0 causing


e co e sequence o ry aga n.

7/29/2019 lec2a

34/39

The C Code Translation Hierarchy

C program

compiler

assembly code

assembler

object code library routines

linker

executable

loader

machine code

CSE431 Chapter 2.34 Irwin, PSU, 2008memory

7/29/2019 lec2a

35/39

Compiler Benefits

Comparing performance for bubble (exchange) sortz To sort 100,000 words with the array initialized to random values

on a Pentium 4 with a 3.06 clock rate, a 533 MHz system bus,w o , us ng nux vers on . .

gcc opt Relative Clock Instr count CPI

None 1.00 158,615 114,938 1.38

O1 (medium) 2.37 66,990 37,470 1.79O2 (full) 2.38 66,521 39,993 1.66

O3 (proc mig) 2.41 65,747 44,993 1.46

The unoptimized code has the best CPI, the O1 versionhas the lowest instruction count, but the O3 version is thefastest. Why?


7/29/2019 lec2a

36/39

The Java Code Translation Hierarchy

Java program

compiler

compiler

Machine

Compiled Java methods (machine code)


7/29/2019 lec2a

37/39

Sorting in C versus Java

Comparing performance for two sort algorithms in C andJava

z The JVM/JIT is Sun/Hotspot version 1.3.1/1.3.1

Method Opt Bubble Quick Speedupuick vsbubbleperformance

C Compiler None 1.00 1.00 2468

. .

C Compiler O2 2.38 1.50 1555

C Compiler O3 2.41 1.91 1955

Java Interpreted 0.12 0.05 1050Java JIT compiler 2.13 0.29 338


Observations?

7/29/2019 lec2a

38/39

Addressing Modes Illustrated1. Register addressing

op rs rt rd funct Register

word operand2. Base (displacement) addressing

op rs rt offset

base re ister

Memory

word orbyte operand

3. Immediate addressing

op rs rt operand

-.

op rs rt offset Memory

branch destination instruction

Program Counter (PC)

5. Pseudo-direct addressing

op jump address Memory


Program Counter (PC)

jump destination instruction||

7/29/2019 lec2a

39/39

lec2a

Documents

Transcript of lec2a