LEC2(CPUProcessors) (1)

52
Organization of a Simple Computer 

Transcript of LEC2(CPUProcessors) (1)

Page 1: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 1/52

Organization of a Simple

Computer 

Page 2: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 2/52

Computer Systems

Organization

The CPU (Central Processing Unit) is the

³brain´ of the computer.

Fetches instructions from main memory. Examines them, and then executes them one after 

another.

The components are connected by a bus, which is a

collection of parallel wires for transmitting address,

data, and control signals.

Busses can be external to the CPU, connecting memory

and I/O devices, but also internal to the CPU.

Page 3: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 3/52

Processors

The CPU is composed of several distinct parts:

A) CU(Control Unit) performs fetch execute cycle The control unit fetches instructions from main memory and

determines their type.

Functions: Moves data to and from CPU registers and other hardware

components (no change in data)

Accesses program instructions and issues commands to theALU

Subparts:  Memory management unit: supervises fetching instructions and

data

 I/O Interface: sometimes combined with memory managementunit as Bust Interface Unit 

Page 4: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 4/52

B) ALU(arithmetic logic unit)

Performs calculations and comparisons (data

changed)

Performs operations such as addition and boolean

AND needed to carry out the instructions.

Page 5: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 5/52

C) Registers

A small, high-speed memory made up of registers,

each of which has a certain size and function.

The most important register is the Program Counter

(PC) which points to the next instruction to be fetched.

The Instruction Register (IR) holds the instruction

currently being executed.

Page 6: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 6/52

System Block Diagram

Page 7: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 7/52

Concept on registers

Small,  permanent storage locations withinthe CPU used for a particular purpose

Manipulated directly by the Control Unit Wired for  s pecific function

Size in bits or bytes (not MB like memory)

Can hold data, an address, an instruction or special binary codes(kee p track of com puter  status or conditions of calculations in conditional 

branch instructions)

Page 8: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 8/52

Concept on registers

Use of Registers

Scratchpad for currently executing program

Holds data needed quickly or frequently

Stores information about status of CPU and

currently executing program

Address of next program instruction

Signals from external devices

Page 9: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 9/52

Types of Registers

General Purpose Registers(Accumulators)

User-visible registers-can be accessed by

instructions in user programs Hold intermediate results or data values, e.g.,

loop counters

Holds data used in arithmetic operations

Used to transfer data between different memory

locations and between IO and memory

Typically several dozen in current CPUs

Page 10: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 10/52

Special Purpose Registers

P rogram Count Register (PC)

Also called instruction pointer, holds the address of thecurrent instruction being executed

 Instruction Register (IR)

Stores instruction fetched from memory, holds theinstruction currently executed

 Memory Address Register (MAR)-holds the address of amemory location

 Memory Data Register (MDR)-memory buffer register,

hold data value that is being stored to or retrieved from thememory location currently addressed by the MAR 

Status Registers

Status of CPU and currently executing program

F lags (one bit Boolean variable) to track condition like

arithmetic carry and overflow, power failure, internal

Page 11: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 11/52

Register Operations

Stores values from other locations (registers

and memory)

Addition and subtraction

Shift or rotate data

Test contents for conditions such as zero or 

 positive

Page 12: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 12/52

CPU Organization

An important part of the organization of acomputer is called the data path.

It consists of the registers, the ALU, and several

 buses connecting the pieces.

The ALU performs simple operations on its inputs,yielding a result in the output register. Later theregister can be stored into memory, if desired.

Most instructions can be divided into twocategories:

Register-memory instructions allow memory words to befetched into registers, where they can be used as inputs insubsequent instructions, for example.

Page 13: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 13/52

CPU Organization

Register-register instructions fetch two operands

from the registers, brings them into the ALU input

registers, performs an operation, and stores the

result back in a register.

The process of running two operands through

the ALU and storing the result is called the

data path cycle.

The faster the data path cycle, the faster the

machine.

Page 14: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 14/52

 A von Neumann Machine

Page 15: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 15/52

Instruction Execution

The CPU executes as a series of small steps:

1. Fetch the next instruction from memory into the IR.

2. Change the PC to point to the following instruction.3. Determine the type of instruction fetched.

4. If the instruction uses a word in memory, determine

where it is.

5. Fetch the word into a CPU register.6. Execute the instruction.

7. Go to step 1 to execute next instruction.

Page 16: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 16/52

Interrupts

There¶s a mechanism by which other system modules may interrupt the normal

 processing of the CPU The processor and the OS are responsible

for recognizing an interrupt, suspending theuser program, servicing the interrupt, and

resuming user program Interrupts are processed in an interrupt

cycle within the overall instruction cycle.

Page 17: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 17/52

Interpreters

A program that fetches, examines, and executesthe instructions of another program is called aninterpreter.

Interpreted (as opposed to direct hardwareimplementation) of instructions has several

 benefits:

Incorrectly implemented instructions can be fixed in

the field. New instructions can be added at minimal cost.

Structured design permitting efficient development,testing, and documenting of complex instructions.

Page 18: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 18/52

Instruction Execution

By the late 70s, the use of simple processorsrunning interpreters was widespread.

The interpreters were held in fast read-onlymemories called control stores.

In 1980, a group at Berkeley began designingVLSI CPU chips that did not use interpretation.They used the term R ISC for this concept.

RISC stands for Reduced Instruction SetComputer, contrasted with CISC (ComplexInstruction Set Computer)

Page 19: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 19/52

Design Principles for Modern

Computers

The R ISC Design Principles Certain of the RISC design principles have now been

generally accepted as good practice: All instructions are executed directly by hardware.

Maximize the rate at which instructions are issued.

Use parallelism to execute multiple slow instructions in a shorttime period.

Instructions should be easy to decode.

Only loads and stores should reference memory. Since memory access time is unpredictable, it makes parallelism

difficult.

Provide plenty of registers.

Since accessing memory is slow.

Page 20: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 20/52

Design Principles for Modern

Computers

Prefetching

ability to fetch instructions in advance from

memory, so they would be there whenneeded.

-These instructions are stored in a set of 

registers called  prefetch buffer -divides instruction execution into two parts;

 prefetching and the actual execution

Page 21: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 21/52

Design Principles for Modern

Computers

Parallelism

-doing two or more things at the same time

Page 22: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 22/52

Instruction-Level Parallelism

Parallelism comes in two varieties:

Instruction-level parallelism exploits

 parallelism within individual instructions to getmore instructions/second

Processor-level parallelism allows multipleCPUs to work together on a problem

Fetching instructions from memory is a bottleneck.

Instructions can be fetched in advance andstored in a prefetch buffer.

Page 23: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 23/52

Pipelining

In pipelining, we break an instruction up intomany parts, each one handled by dedicatedhardware units running in parallel.

Each unit is called a stage. After the pipeline isfilled, an instruction completes at each (longeststage length) time interval. This time interval isthe clock cycle of the CPU. The time to fill the

 pipeline is called the latency.

Page 24: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 24/52

Pipelining

Page 25: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 25/52

Pipelining

Pipelining allows a trade-off between latency(how

long it takes to execute an instruction), and the

processor bandw

ith(how many MIPS the CPUhas).

Example:Suppose that the cycle time of this machine

is 2 nsec. Then it takes 10nsec for an instruction to

 progress all the way through the five-stage pipeline and at every clock cycle(2 nsec), one new

instruction is completed.

Page 26: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 26/52

Problem example

A)If 1 instruction takes 10 nsec, how many instructions can be

executed in 1 sec(CPU bandwith)?

1,000,000,000nsec/1 sec 100,000,000 inst/sec

10nsec/instruction

B)With pipelining, 1 instruction is executed every 2nsec, what

will be the improvement in the CPU¶s bandwith?

1,000,000,000nsec/1 sec 500,000,000 inst/sec

2nsec/instruction

=

=

Page 27: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 27/52

Superscalar  Architectures

We can also imagine having multiple pipelines.

One possibility is to have multiple equivalent

 pipelines with a common instruction fetch unit. The

Pentium adopted this approach with two pipelines.

Complex rules must be used to determine that the

two instructions don¶t conflict. Pentium-specific

compilers produced compatible pairs of instructions.

Another approach is to have a single pipeline withmultiple functional units. This approach is called

superscalar architecture and is used on high-end

CPUs (including the Pentium II).

Page 28: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 28/52

Superscalar  Architecture

Page 29: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 29/52

Superscalar  Architecture

Page 30: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 30/52

Processor-Level Parallelism

Instruction-level parallelism speed up execution

 by a factor of five or ten. To get speed-ups of 

50, 100, or more, we need to use multiple

CPUs.

Array processors consist of a large number of 

identical processors that perform the same

sequence of instructions on different sets of data.

The first array processor was the ILLIAC IV (1972)

with an 8x8 array of processors.

Page 31: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 31/52

Processor-Level Parallelism

A vector processor is similar to an array

 processor but while the array processor has as

many adders as data elements, in the vector 

 processor the addition operations are performed

in a single, highly pipelined adder.

Vector processors use vector registers which

are a set of conventional registers which can beloaded from memory in a single instruction.

Two vectors of elements are added together in a

 pipelined adder.

Page 32: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 32/52

 Array Processors

Page 33: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 33/52

Multiprocessors

The processing elements in an array processor are not independent since they have a commoncontrol unit.

A multiprocessor is a system with multipleCPUs sharing a common memory.

Multiprocessors can have a single global memory or aglobal memory with local memory for each CPU

Systems with no common memory are calledmulticomputers. They communicate via a fast network which may be connected in various topologies.Multicomputers are easier to build, but more difficult to program.

Page 34: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 34/52

Multiprocessors

Page 35: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 35/52

Primary Memory

The memory is that part of the computer where programs and data are stored.

The basic unit of memory is the binary digitcalled a bit. A bit may contain a 0 or a 1.

Binary arithmetic is used by computers since itis easy to distinguish between two values of acontinuous physical quantity such as voltage or current.

Memories consist of a number of cells. Eachcell has an address (number) used to refer to it.

Page 36: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 36/52

Primary Memory

Computers express memory addresses as binary

numbers. If an address has m bits, the

maximum number of cells addressable is 2m.

A cell is the smallest addressable unit.

 Nowadays, most all manufacturers use an 8-bit

cell called a byte.

Bytes are grouped into words. A computer witha 32-bit word has 4 bytes/word and 32-bit

registers and instructions.

Page 37: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 37/52

Memory Organization

Page 38: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 38/52

Memory Organization

Page 39: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 39/52

Byte Ordering

The bytes in a word can be ordered from

left-to-right or right-to-left. The first is

called big endian ordering while the secondis called little endian ordering.

Representation of integers is the same in the

two scheme, but strings are represented

differently.

Care must be taken when transferring data

among machines with different byte ordering.

Page 40: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 40/52

Memory Organization

Page 41: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 41/52

Memory Organization

Page 42: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 42/52

Error-Correcting Codes

Occasional errors may occur in computer 

memories due to voltage spikes or other causes.

Errors can be handles by adding extra check  bits to words of memory. Suppose a word of 

memory has m data bits and r check bits. Let

the total length be n = m + r . This n bit unit is

often referred to as a codeword. The number of bits in which two codewords

differ is called the Hamming distance.

Page 43: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 43/52

Error-Correcting Codes

To detect d single-bit errors requires a d + 1

code. To correct d single-bit errors requires a

2d + 1 code.

Consider a adding a single parity bit to the

data. The bit is chosen so that the number of 1

 bits in the codeword is even (or odd). Now a

single error results in an invalid codeword. Ittakes two errors to go from one valid codeword

to another.

Page 44: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 44/52

Error-Correcting Codes

Imagine we want to design a code with m data

 bits and r check bits that will allow all single-

 bit errors to be corrected. Each of the 2m legal

memory words has n illegal codewords at a

distance 1 from it.

Form these by inverting each of the n bits in the n-

 bit codeword.

Each of the 2m legal memory words requires n + 1

 bit patterns dedicated to it.

(n + 1) 2m <= 2n since n = m + r , (m + r + 1) <= 2r 

Page 45: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 45/52

Error-Correcting Codes

Page 46: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 46/52

Error-Correcting Codes

The following figure illustrates an error-correcting

code for 4-bit words. The three circles form 7

regions. Encode the 4-bit word 1100 in four of those

regions then add a parity bit to each of the threeempty regions so that the sum of the bits in each

circle is an even number.

Now suppose that the bit in the AC region goes bad,

changing from a 0 to a 1. Circles A and C have thewrong parity. The only single-bit change that

corrects them is to restore AC  back to 0, thus

correcting the error.

Page 47: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 47/52

Error-Correcting Codes

Page 48: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 48/52

Hamming¶s  Algorithm

Hamming¶s algorithm can be used toconstruct single error-correcting codes for anysize memory word. In a Hamming code, r 

 parity bits are added to an m-bit word, forminga new word of length m + r  bits.

The bits are numbered starting at 1, not 0, with bit 1 the leftmost (high-order) bit. All bits

whose bit number is a power of 2 are parity bits; the rest are used for data.

In a 16-bit word, 5 parity bits are added. Bits 1, 2, 4,8, and 16 are parity bits. The word has 21 total bits.

Page 49: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 49/52

Hamming¶s  Algorithm

Each parity bit checks specific bit positions; the parity bit is set so that the total number of 1s inthe checked positions is even. The positions

checked are:Bit 1 checks bits 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21.

Bit 2 checks bits 2, 3, 6, 7, 10, 11, 14, 15, 18, 19.

Bit 4 checks bits 4, 5, 6, 7, 12, 13, 14, 15, 20, 21.

Bit 8 checks bits 8, 9, 10, 11, 12, 13, 14, 15.Bit 16 checks bits 16, 17, 18, 19, 20, 21.

In general each bit b is checked by those bits b1, b2,«, b j such that b1 + b2 + « + b j = b.

Page 50: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 50/52

Error-Correcting Codes

Page 51: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 51/52

Hamming¶s  Algorithm

Consider what would happen if bit 5 in the word on the

 previous slide were inverted by a surge on the power 

line. Bit 5 would then be a 0. The 5 parity bits would be

checked with the following results:Parity bit 1 incorrect (positions checked contain 5 1s)

Parity bit 2 correct (positions checked contain 6 1s)

Parity bit 4 incorrect (positions checked contain 5 1s)

Parity bit 8 correct (positions checked contain two 1s)

Parity bit 16 correct (positions checked contain four 1s)

Page 52: LEC2(CPUProcessors) (1)

8/7/2019 LEC2(CPUProcessors) (1)

http://slidepdf.com/reader/full/lec2cpuprocessors-1 52/52

Hamming¶s  Algorithm

The incorrect bit must be one of the bitschecked by parity bit 1 and by parity bit 4.These are bits 5, 7, 13, 15, or 21. However, bit

2 is correct, eliminating 7 and 15. Similarly, bit8 is correct, eliminating 13. Finally, bit 16 iscorrect, eliminating 21. The only bit left is 5,which is the one in error.

If all parity bits are correct, there were no errors(or more than one). Otherwise, add up all theincorrect parity bits. The sum gives the positionof the incorrect bit.