lec10-memory-design.ppt
-
Upload
doomachaley -
Category
Documents
-
view
213 -
download
0
Transcript of lec10-memory-design.ppt
-
8/9/2019 lec10-memory-design.ppt
1/66
Lecture 10: Memory HierarchyDesign
http://list.zju.edu.cn/kaibu/comparch
-
8/9/2019 lec10-memory-design.ppt
2/66
Assignment 3 due May 13
-
8/9/2019 lec10-memory-design.ppt
3/66
Chapter 2Appendix B
-
8/9/2019 lec10-memory-design.ppt
4/66
Memory Hierarchy
-
8/9/2019 lec10-memory-design.ppt
5/66
Virtua Memory
Larger memory for more processes
-
8/9/2019 lec10-memory-design.ppt
6/66
Cache !er"ormance
A#erage Memory Access $ime %Hit $ime & Miss 'ate x Miss !enaty
-
8/9/2019 lec10-memory-design.ppt
7/66
(ix Basic Cache )ptimi*ations
• 1+ arger ,oc- si*e reduce miss rate !!! spatial locality;
reduce static po"er !!! lower tag #;
increase miss penalty# capacity/con$lict misses• 2+ ,igger caches
reduce miss rate !!! capacity misses
increase hit time
increase cost and %static & dynamic' po"er• 3+ higher associati#ity
reduce miss rate !!! conflict misses;
increase hit time
increase po"er
-
8/9/2019 lec10-memory-design.ppt
8/66
(ix Basic Cache )ptimi*ations
• .+ mutie#e caches
reduce miss penalty
reduce po"er
a(era)e memory access time *
+it time,1 - Miss rate,1
%+it time, - Miss rate, Miss penalty,'
• /+ gi#ing priority to read misseso#er rites
reduce miss penalty
introduce "rite bu$$er
-
8/9/2019 lec10-memory-design.ppt
9/66
(ix Basic Cache )ptimi*ations
• + a#oiding address transationduring indexing o" the cache
reduce hit time
use pa)e o$$set to inde cache
(irtually indeed# physically ta))ed
-
8/9/2019 lec10-memory-design.ppt
10/66
)utine
• 0en d(anced 2ache ptimizations
• Memory 0echnolo)y and ptimizations
• 4irtual Memory and 4irtual Machines• 5M 2orte!6 & 7ntel 2ore i8
-
8/9/2019 lec10-memory-design.ppt
11/66
)utine
• 0en d(anced 2ache ptimizations
• Memory 0echnolo)y and ptimizations
• 4irtual Memory and 4irtual Machines• 5M 2orte!6 & 7ntel 2ore i8
-
8/9/2019 lec10-memory-design.ppt
12/66
$en Ad#anced Cache )pts
• 9oal: a(era)e memory access time
• Metrics to reduce/optimize
hit timemiss rate
miss penalty
cache band"idthpo"er consumption
-
8/9/2019 lec10-memory-design.ppt
13/66
$en Ad#anced Cache )pts
• 'educe hit timesmall and simple $irst!le(el caches"ay predictiondecrease power;
• 'educe cache ,andidthpipelined/multibanked/nonblockin) cache
• 'educe miss penatycritical "ord $irst
mer)in) "rite bu$$ers• 'educe miss rate
compiler optimizations decrease power;• 'educe miss penaty or miss rate #ia
paraeismhard"are/compiler pre$etchin) increase power;
-
8/9/2019 lec10-memory-design.ppt
14/66
)pt 1: (ma and (impeirst4Le#e Caches
• 5educe hit time and po"er
-
8/9/2019 lec10-memory-design.ppt
15/66
)pt 1: (ma and (impeirst4Le#e Caches
• 5educe hit time and po"er
-
8/9/2019 lec10-memory-design.ppt
16/66
)pt 1: (ma and (impeirst4Le#e Caches
• 5xampe
a 3 KB cache
t"o!"ay set associati(e: .36 miss rate
$our!"ay set associati(e: .38 miss rate
$our!"ay cache access time is 1.; timest"o!"ay cache access time
miss penalty to , is 1< times the accesstime $or the $aster ,1 cache %i.e., two-way '
assume al"ays , hit
6: "hich has $aster memory access time=
-
8/9/2019 lec10-memory-design.ppt
17/66
)pt 1: (ma and (impeirst4Le#e Caches
• Anser
(era)e memory access time240ay
*+it time - Miss rate Miss penalty*1 - .36 1<
*1.36
(era)e memory access time.40ay
*1.; - .38 %1
-
8/9/2019 lec10-memory-design.ppt
18/66
)pt 2: 7ay !rediction
• 5educe con$lict misses and hit time
• 7ay prediction
block predictor bits are added to each block
to predict the "ay/block "ithin the set o$ thenext cache access
the multipleor is set early to select the
desired blockonly a sin)le ta) comparison is per$ormed inparallel "ith cache readin)
a miss results in checkin) the other blocks$or matches in the net clock cycle
-
8/9/2019 lec10-memory-design.ppt
19/66
)pt 3: !ipeined Cache Access
• 7ncrease cache band"idth
• +i)her latency
• 9reater penalty on mispredictedbranches and more clock cyclesbet"een issues the load and usin) thedata
-
8/9/2019 lec10-memory-design.ppt
20/66
)pt .: 8on,oc-ing Caches
• 7ncrease cache band"idth
• 8on,oc-ing9oc-up4"ree cache
allo"s data cache to continue to supplycache hits durin) a miss
-
8/9/2019 lec10-memory-design.ppt
21/66
)pt /: Muti,an-ed Caches
• 7ncrease cache band"idth
• >i(ide cache into independent banksthat support simultaneous accesses
• ?euential interlea(in)
spread the addresses o$ blocks
seuentially across the banks
-
8/9/2019 lec10-memory-design.ppt
22/66
)pt : Critica 7ord irst 5ary 'estart
• 5educe miss penalty
• Moti(ation: the processor normally needs just one "ord o$ the block at a time
• Critica ord "irstreuest the missed "ord $irst $rom thememory and send it to the processor as soonas it arri(es
• 5ary restart$etch the "ords in normal order#
as soon as the reuested "ord arri(es send itto the processor
-
8/9/2019 lec10-memory-design.ppt
23/66
)pt ;: Merging 7rite Bu""er
• 5educe miss penalty
• Arite mer)in) mer)es $our entries into
a sin)le bu$$er entry
-
8/9/2019 lec10-memory-design.ppt
24/66
)pt
-
8/9/2019 lec10-memory-design.ppt
25/66
)pt
-
8/9/2019 lec10-memory-design.ppt
26/66
)pt
-
8/9/2019 lec10-memory-design.ppt
27/66
)pt =: Hardare !re"etching
• 5educe miss penalty/rate
• Cre$etch items be$ore the processorreuests them# into the cache orexternal buffer
• >nstruction pre"etch
$etch t"o blocks on a miss: reuestedone into cache - net consecuti(e oneinto instruction stream buffer
• ?imilar Data pre"etch approaches
-
8/9/2019 lec10-memory-design.ppt
28/66
)pt =: Hardare !re"etching
-
8/9/2019 lec10-memory-design.ppt
29/66
)pt 10: Compier !re"etching
• 5educe miss penalty/rate
• 2ompiler to insert pre$etch instructionsto reuest data be$ore the processorneeds it
• 'egister pre"etch
load the (alue into a re)ister• Cache pre"etch
load data into the cache
-
8/9/2019 lec10-memory-design.ppt
30/66
)pt 10: Compier !re"etching
• 5xampe: 2/1 misses
1D!byte blocks
6!byte elements $or a and b
"rite!back strate)y
aEFEF miss# copy both aEFEF#aEFE1Fas one block contains 1D/6 *
so $or a: 3 %1/' * 1/0 misses
,EFEF G bE1FEF: 101 misses
-
8/9/2019 lec10-memory-design.ppt
31/66
• 5xampe: 1= misses
)pt 10: Compier !re"etching
7 misses: b[0][0] – b[6][0]
4 misses: 1/2 of a[0][0] – a[0][6]
4 misses: a[1][0] – b[1][6]4 misses: a[2][0] – b[2][6]
-
8/9/2019 lec10-memory-design.ppt
32/66
)utine
• 0en d(anced 2ache ptimizations
• Memory 0echnolo)y and ptimizations
• 4irtual Memory and 4irtual Machines• 5M 2orte!6 & 7ntel 2ore i8
-
8/9/2019 lec10-memory-design.ppt
33/66
Main Memory
• Main memory: 7/ inter$ace bet"eencaches and ser(ers
• >st o$ input & src o$ output
Control
Datapath
Secondary
Storage
(Disk)
Processor
R
e g i s
t e r
s
Main
Memory(DRAM)
Second
LevelCache
(SRAM)
n ! C h i p
C a c
h e
Fastest Slowest
Smallest BiggestHighest Lowest
Speed"
Si#e"Cost"
Compiler $ard%are
perating
System
-
8/9/2019 lec10-memory-design.ppt
34/66
Main Memory
Cer$ormance measures
• Latency
important $or cachesharder to reduce
• Bandidth
important $or multiprocessors# 7/# andcaches "ith lar)e block sizes
easier to impro(e "ith ne"
or)anizations
-
8/9/2019 lec10-memory-design.ppt
35/66
Main Memory
Cer$ormance measures
• Latency
access time: the time bet"een "hen aread is reuested and "hen the desired"ord arri(es
cyce time: the minimum time
bet"een unrelated reuests to memoryor the minimum time bet"een the starto$ on access and the start o$ the netaccess
-
8/9/2019 lec10-memory-design.ppt
36/66
Main Memory
• ?5M $or cache
• >5M $or main memory
-
8/9/2019 lec10-memory-design.ppt
37/66
('AM
• ?tatic 5andom ccess Memory
• ?i transistors per bit to pre(ent thein$ormation $rom bein) disturbed "henread
• >onHt need to re$resh# so access time is(ery close to cycle time
-
8/9/2019 lec10-memory-design.ppt
38/66
D'AM
• >ynamic 5andom ccess Memory
• ?in)le transistor per bit
• 5eadin) destroys the in$ormation• 5e$resh periodically
• cycle time I access time
-
8/9/2019 lec10-memory-design.ppt
39/66
D'AM
• >ynamic 5andom ccess Memory
• ?in)le transistor per bit
• 5eadin) destroys the in$ormation
• 5e$resh periodically
• cycle time I access time
• D!"s are commonly sold on smallboards called DIMM dual inlinememory modules$, typically containing
% '( D!"s
-
8/9/2019 lec10-memory-design.ppt
40/66
D'AM )rgani*ation
• 5?: ro" access strobe
• 2?: column access strobe
-
8/9/2019 lec10-memory-design.ppt
41/66
D'AM )rgani*ation
-
8/9/2019 lec10-memory-design.ppt
42/66
D'AM >mpro#ement
• $iming signas
allo" repeated accesses to the ro"bu$$er "/o another ro" access time
• ,e(era)e spatial locality
each array "ill bu$$er 1; to ;JD bits$or each access
-
8/9/2019 lec10-memory-design.ppt
43/66
D'AM >mpro#ement
• Coc- signa
added to the >5M inter$ace#
so that repeated trans$ers "ill notin(ol(e o(erhead to synchronize "ithmemory controller
• (D'AM: synchronous >5M
-
8/9/2019 lec10-memory-design.ppt
44/66
D'AM >mpro#ement
• 7ider D'AM
to o(ercome the problem o$ )ettin) a "idestream o$ bits $rom memory "ithout ha(in)
to make the memory system too lar)e asmemory system density increased
"idenin) the cache and memory "idensmemory band"idth
e.).# %-bit transfer mode up to '(-bit buses
-
8/9/2019 lec10-memory-design.ppt
45/66
D'AM >mpro#ement
• DD': dou,e data rate
to increase band"idth#
trans$er data on both the risin) ed)eand $allin) ed)e o$ the >5M clocksi)nal#
thereby doublin) the peak data rate
-
8/9/2019 lec10-memory-design.ppt
46/66
D'AM >mpro#ement
• Mutipe Ban-s
break a sin)le ?>5M into to 6blocks
they can operate independently
• Cro(ide some o$ the ad(anta)es o$interlea(in)
• +elp "ith po"er mana)ement
-
8/9/2019 lec10-memory-design.ppt
47/66
D'AM >mpro#ement
• 'educing poer consumption in(D'AMs
dynamic po"er: used in a read or "rite
static/standby po"er
• >epend on the operatin) (olta)e
• Co"er do"n mode: entered by tellin)the >5M to i)nore the clock
disables the ?>5M ecept $or internal
automatic re$resh
-
8/9/2019 lec10-memory-design.ppt
48/66
ash Memory
• type o$ 55!')M %electronicallyerasable pro)rammable read!onlymemory'
• 5ead!only but can be erased
• +old contents "/o any po"er
-
8/9/2019 lec10-memory-design.ppt
49/66
ash Memory
>i$$erences $rom >5M
• Must be erased %in blocks' be$ore it iso(er"ritten
• ?tatic and less po"er consumption
• +as a limited number o$ "rite cycles$or any block
• 2heaper than ?>5M but moreepensi(e than disk
• ?lo"er than ?>5M but $aster than
disk
-
8/9/2019 lec10-memory-design.ppt
50/66
Memory Dependa,iity
• (o"t errors
chan)es to a cellHs contents# not achan)e in the circuitry
• Hard errors
permanent chan)es in the operation o$one o$ more memory cells
-
8/9/2019 lec10-memory-design.ppt
51/66
Memory Dependa,iity
rror detection and $i
• !arity ony
only one bit o$ o(erhead to detect a sin)le
error in a seuence o$ bitse.).# one parity bit per 6 data bits
• 5CC ony
detect t"o errors and correct a sin)le error"ith 6!bit o(erhead per D; data bits
• Chip-i
handle multiple errors and complete $ailure
o$ a sin)le memory chip
-
8/9/2019 lec10-memory-design.ppt
52/66
Memory Dependa,iity
5ates o$ unreco(erable errors in 3 yrs
• !arity ony
about J## or one unreco(erable%undetected' $ailure e(ery 18 mins
• 5CC ony
about 3#
-
8/9/2019 lec10-memory-design.ppt
53/66
)utine
• 0en d(anced 2ache ptimizations
• Memory 0echnolo)y and ptimizations
• 4irtual Memory and 4irtual Machines• 5M 2orte!6 & 7ntel 2ore i8
-
8/9/2019 lec10-memory-design.ppt
54/66
• VMM: Virtua Machine Monitorthree essential characteristics:1. 4MM pro(ides an en(ironment $orpro)rams "hich is essentially identical "iththe ori)inal machine. pro)rams run in this en(ironment sho" at"orst only minor decreases in speed3. 4MM is in complete control o$ system
resources
• Mainy "or security and pri#acysharin) and protection amon) multipleprocesses
-
8/9/2019 lec10-memory-design.ppt
55/66
Virtua Memory
• 0he architecture must limit "hat aprocess can access "hen runnin) auser process yet allo" an ? process
to access more• Lour tasks $or the architecture
-
8/9/2019 lec10-memory-design.ppt
56/66
-
8/9/2019 lec10-memory-design.ppt
57/66
Virtua Machines
• Virtua Machine
a protection mode "ith a much smallercode base than the $ull ?
• VMM: #irtua machine monitor
hypervisor
so$t"are that supports 4Ms• Host
underlyin) hard"are plat$orm
-
8/9/2019 lec10-memory-design.ppt
58/66
Virtua Machines
• 5euirements
1. 9uest so$t"are should beha(e on a4M eactly as i$ it "ere runnin) on the
nati(e hard"are
. 9uest so$t"are should not be able tochan)e allocation o$ real system
resources directly
-
8/9/2019 lec10-memory-design.ppt
59/66
)utine
• 0en d(anced 2ache ptimizations
• Memory 0echnolo)y and ptimizations
• 4irtual Memory and 4irtual Machines
• 5M 2orte!6 & 7ntel 2ore i8
-
8/9/2019 lec10-memory-design.ppt
60/66
A'M Cortex A4<
-
8/9/2019 lec10-memory-design.ppt
61/66
A'M Cortex A4<
-
8/9/2019 lec10-memory-design.ppt
62/66
A'M Cortex A4<
-
8/9/2019 lec10-memory-design.ppt
63/66
>nte Core i;
-
8/9/2019 lec10-memory-design.ppt
64/66
>nte Core i;
-
8/9/2019 lec10-memory-design.ppt
65/66
>nte Core i;
-
8/9/2019 lec10-memory-design.ppt
66/66
?