lec10-memory-design.ppt

download lec10-memory-design.ppt

of 66

Transcript of lec10-memory-design.ppt

  • 8/9/2019 lec10-memory-design.ppt

    1/66

     

    Lecture 10: Memory HierarchyDesign

    Kai [email protected]

    http://list.zju.edu.cn/kaibu/comparch

  • 8/9/2019 lec10-memory-design.ppt

    2/66

     

    Assignment 3 due May 13

  • 8/9/2019 lec10-memory-design.ppt

    3/66

     

    Chapter 2Appendix B

  • 8/9/2019 lec10-memory-design.ppt

    4/66

     

    Memory Hierarchy

  • 8/9/2019 lec10-memory-design.ppt

    5/66

     

    Virtua Memory

    Larger memory for more processes

  • 8/9/2019 lec10-memory-design.ppt

    6/66

     

    Cache !er"ormance

    A#erage Memory Access $ime %Hit $ime & Miss 'ate x Miss !enaty

  • 8/9/2019 lec10-memory-design.ppt

    7/66

     

    (ix Basic Cache )ptimi*ations

    • 1+ arger ,oc- si*e reduce miss rate !!! spatial locality;

    reduce static po"er !!! lower tag #;

    increase miss penalty# capacity/con$lict misses• 2+ ,igger caches

    reduce miss rate !!! capacity misses

    increase hit time

    increase cost and %static & dynamic' po"er• 3+ higher associati#ity

    reduce miss rate !!! conflict misses;

    increase hit time

    increase po"er

  • 8/9/2019 lec10-memory-design.ppt

    8/66

     

    (ix Basic Cache )ptimi*ations

    • .+ mutie#e caches

    reduce miss penalty

    reduce po"er

    a(era)e memory access time *

    +it time,1 - Miss rate,1 

    %+it time, - Miss rate,  Miss penalty,'

    • /+ gi#ing priority to read misseso#er rites

    reduce miss penalty

    introduce "rite bu$$er

  • 8/9/2019 lec10-memory-design.ppt

    9/66

     

    (ix Basic Cache )ptimi*ations

    • + a#oiding address transationduring indexing o" the cache

    reduce hit time

    use pa)e o$$set to inde cache

    (irtually indeed# physically ta))ed

  • 8/9/2019 lec10-memory-design.ppt

    10/66

     

    )utine

    • 0en d(anced 2ache ptimizations

    • Memory 0echnolo)y and ptimizations

    • 4irtual Memory and 4irtual Machines• 5M 2orte!6 & 7ntel 2ore i8

  • 8/9/2019 lec10-memory-design.ppt

    11/66

     

    )utine

    • 0en d(anced 2ache ptimizations

    • Memory 0echnolo)y and ptimizations

    • 4irtual Memory and 4irtual Machines• 5M 2orte!6 & 7ntel 2ore i8

  • 8/9/2019 lec10-memory-design.ppt

    12/66

     

    $en Ad#anced Cache )pts

    • 9oal: a(era)e memory access time

    • Metrics to reduce/optimize

    hit timemiss rate

    miss penalty

    cache band"idthpo"er consumption

  • 8/9/2019 lec10-memory-design.ppt

    13/66

     

    $en Ad#anced Cache )pts

    • 'educe hit timesmall and simple $irst!le(el caches"ay predictiondecrease power;

    • 'educe cache ,andidthpipelined/multibanked/nonblockin) cache

    • 'educe miss penatycritical "ord $irst

    mer)in) "rite bu$$ers• 'educe miss rate

    compiler optimizations decrease power;• 'educe miss penaty or miss rate #ia

    paraeismhard"are/compiler pre$etchin) increase power;

  • 8/9/2019 lec10-memory-design.ppt

    14/66

     

    )pt 1: (ma and (impeirst4Le#e Caches

    • 5educe hit time and po"er

  • 8/9/2019 lec10-memory-design.ppt

    15/66

     

    )pt 1: (ma and (impeirst4Le#e Caches

    • 5educe hit time and po"er

  • 8/9/2019 lec10-memory-design.ppt

    16/66

     

    )pt 1: (ma and (impeirst4Le#e Caches

    • 5xampe

    a 3 KB cache

    t"o!"ay set associati(e: .36 miss rate

    $our!"ay set associati(e: .38 miss rate

    $our!"ay cache access time is 1.; timest"o!"ay cache access time

    miss penalty to , is 1< times the accesstime $or the $aster ,1 cache %i.e., two-way '

    assume al"ays , hit

    6: "hich has $aster memory access time=

  • 8/9/2019 lec10-memory-design.ppt

    17/66

     

    )pt 1: (ma and (impeirst4Le#e Caches

    • Anser

    (era)e memory access time240ay

    *+it time - Miss rate Miss penalty*1 - .36 1<

    *1.36

    (era)e memory access time.40ay

    *1.; - .38 %1

  • 8/9/2019 lec10-memory-design.ppt

    18/66

     

    )pt 2: 7ay !rediction

    • 5educe con$lict misses and hit time

    • 7ay prediction

    block predictor bits are added to each block

    to predict the "ay/block "ithin the set o$ thenext  cache access

    the multipleor is set early to select the

    desired blockonly a sin)le ta) comparison is per$ormed inparallel "ith cache readin)

    a miss results in checkin) the other blocks$or matches in the net clock cycle

  • 8/9/2019 lec10-memory-design.ppt

    19/66

     

    )pt 3: !ipeined Cache Access

    • 7ncrease cache band"idth

    • +i)her latency

    • 9reater penalty on mispredictedbranches and more clock cyclesbet"een issues the load and usin) thedata

  • 8/9/2019 lec10-memory-design.ppt

    20/66

     

    )pt .: 8on,oc-ing Caches

    • 7ncrease cache band"idth

    • 8on,oc-ing9oc-up4"ree cache

    allo"s data cache to continue to supplycache hits durin) a miss

  • 8/9/2019 lec10-memory-design.ppt

    21/66

     

    )pt /: Muti,an-ed Caches

    • 7ncrease cache band"idth

    • >i(ide cache into independent banksthat support simultaneous accesses

    • ?euential interlea(in)

    spread the addresses o$ blocks

    seuentially across the banks

  • 8/9/2019 lec10-memory-design.ppt

    22/66

     

    )pt : Critica 7ord irst 5ary 'estart

    • 5educe miss penalty

    • Moti(ation: the processor normally needs just one "ord o$ the block at a time

    • Critica ord "irstreuest the missed "ord $irst $rom thememory and send it to the processor as soonas it arri(es

    • 5ary restart$etch the "ords in normal order#

    as soon as the reuested "ord arri(es send itto the processor

  • 8/9/2019 lec10-memory-design.ppt

    23/66

     

    )pt ;: Merging 7rite Bu""er

    • 5educe miss penalty

    • Arite mer)in) mer)es $our entries into

    a sin)le bu$$er entry

  • 8/9/2019 lec10-memory-design.ppt

    24/66

     

    )pt

  • 8/9/2019 lec10-memory-design.ppt

    25/66

     

    )pt

  • 8/9/2019 lec10-memory-design.ppt

    26/66

     

    )pt

  • 8/9/2019 lec10-memory-design.ppt

    27/66

     

    )pt =: Hardare !re"etching

    • 5educe miss penalty/rate

    • Cre$etch items be$ore the processorreuests them# into the cache orexternal buffer 

    • >nstruction pre"etch

    $etch t"o blocks on a miss: reuestedone into cache - net consecuti(e oneinto instruction stream buffer 

    • ?imilar Data pre"etch approaches

  • 8/9/2019 lec10-memory-design.ppt

    28/66

     

    )pt =: Hardare !re"etching

  • 8/9/2019 lec10-memory-design.ppt

    29/66

     

    )pt 10: Compier !re"etching

    • 5educe miss penalty/rate

    • 2ompiler to insert pre$etch instructionsto reuest data be$ore the processorneeds it

    • 'egister pre"etch

    load the (alue into a re)ister• Cache pre"etch

    load data into the cache

  • 8/9/2019 lec10-memory-design.ppt

    30/66

     

    )pt 10: Compier !re"etching

    • 5xampe: 2/1 misses

    1D!byte blocks

    6!byte elements $or a and b

    "rite!back strate)y

    aEFEF miss# copy both aEFEF#aEFE1Fas one block contains 1D/6 *

    so $or a: 3 %1/' * 1/0 misses

    ,EFEF G bE1FEF: 101 misses

  • 8/9/2019 lec10-memory-design.ppt

    31/66

     

    • 5xampe: 1= misses

    )pt 10: Compier !re"etching

    7 misses: b[0][0] – b[6][0]

    4 misses: 1/2 of a[0][0] – a[0][6]

    4 misses: a[1][0] – b[1][6]4 misses: a[2][0] – b[2][6]

  • 8/9/2019 lec10-memory-design.ppt

    32/66

     

    )utine

    • 0en d(anced 2ache ptimizations

    • Memory 0echnolo)y and ptimizations

    • 4irtual Memory and 4irtual Machines• 5M 2orte!6 & 7ntel 2ore i8

  • 8/9/2019 lec10-memory-design.ppt

    33/66

     

    Main Memory

    • Main memory: 7/ inter$ace bet"eencaches and ser(ers

    • >st o$ input & src o$ output

    Control

    Datapath

    Secondary

    Storage

    (Disk)

    Processor 

    R     

     e      g    i      s    

     t      e     r    

     s    

    Main

    Memory(DRAM)

    Second

    LevelCache

    (SRAM)

          

    n    !    C     h     i       p    

     C      a     c    

    h      e    

    Fastest Slowest

    Smallest BiggestHighest Lowest

    Speed"

    Si#e"Cost"

    Compiler $ard%are

    perating

    System

  • 8/9/2019 lec10-memory-design.ppt

    34/66

     

    Main Memory

    Cer$ormance measures

    • Latency

    important $or cachesharder to reduce

    • Bandidth

    important $or multiprocessors# 7/# andcaches "ith lar)e block sizes

    easier to impro(e "ith ne"

    or)anizations

  • 8/9/2019 lec10-memory-design.ppt

    35/66

     

    Main Memory

    Cer$ormance measures

    • Latency

    access time: the time bet"een "hen aread is reuested and "hen the desired"ord arri(es

    cyce time: the minimum time

    bet"een unrelated reuests to memoryor the minimum time bet"een the starto$ on access and the start o$ the netaccess

  • 8/9/2019 lec10-memory-design.ppt

    36/66

     

    Main Memory

    • ?5M $or cache

    • >5M $or main memory

  • 8/9/2019 lec10-memory-design.ppt

    37/66

     

    ('AM

    • ?tatic 5andom ccess Memory

    • ?i transistors per bit to pre(ent thein$ormation $rom bein) disturbed "henread

    • >onHt need to re$resh# so access time is(ery close to cycle time

  • 8/9/2019 lec10-memory-design.ppt

    38/66

     

    D'AM

    • >ynamic 5andom ccess Memory

    • ?in)le transistor per bit

    • 5eadin) destroys the in$ormation• 5e$resh periodically

    • cycle time I access time

  • 8/9/2019 lec10-memory-design.ppt

    39/66

     

    D'AM

    • >ynamic 5andom ccess Memory

    • ?in)le transistor per bit

    • 5eadin) destroys the in$ormation

    • 5e$resh periodically

    • cycle time I access time

    • D!"s are commonly sold on smallboards called DIMM dual inlinememory modules$, typically containing

    % '( D!"s

  • 8/9/2019 lec10-memory-design.ppt

    40/66

     

    D'AM )rgani*ation

    • 5?: ro" access strobe

    • 2?: column access strobe

  • 8/9/2019 lec10-memory-design.ppt

    41/66

     

    D'AM )rgani*ation

  • 8/9/2019 lec10-memory-design.ppt

    42/66

     

    D'AM >mpro#ement

    • $iming signas

    allo" repeated accesses to the ro"bu$$er "/o another ro" access time

    • ,e(era)e spatial locality

    each array "ill bu$$er 1; to ;JD bits$or each access

  • 8/9/2019 lec10-memory-design.ppt

    43/66

     

    D'AM >mpro#ement

    • Coc- signa

    added to the >5M inter$ace#

    so that repeated trans$ers "ill notin(ol(e o(erhead to synchronize "ithmemory controller

    • (D'AM: synchronous >5M

  • 8/9/2019 lec10-memory-design.ppt

    44/66

     

    D'AM >mpro#ement

    • 7ider D'AM

    to o(ercome the problem o$ )ettin) a "idestream o$ bits $rom memory "ithout ha(in)

    to make the memory system too lar)e asmemory system density increased

    "idenin) the cache and memory "idensmemory band"idth

    e.).# %-bit transfer mode up to '(-bit buses

  • 8/9/2019 lec10-memory-design.ppt

    45/66

     

    D'AM >mpro#ement

    • DD': dou,e data rate

    to increase band"idth#

    trans$er data on both the risin) ed)eand $allin) ed)e o$ the >5M clocksi)nal#

    thereby doublin) the peak data rate

  • 8/9/2019 lec10-memory-design.ppt

    46/66

     

    D'AM >mpro#ement

    • Mutipe Ban-s

    break a sin)le ?>5M into to 6blocks

    they can operate independently

    • Cro(ide some o$ the ad(anta)es o$interlea(in)

    • +elp "ith po"er mana)ement

  • 8/9/2019 lec10-memory-design.ppt

    47/66

     

    D'AM >mpro#ement

    • 'educing poer consumption in(D'AMs

    dynamic po"er: used in a read or "rite

    static/standby po"er

    • >epend on the operatin) (olta)e

    • Co"er do"n mode: entered by tellin)the >5M to i)nore the clock

    disables the ?>5M ecept $or internal

    automatic re$resh

  • 8/9/2019 lec10-memory-design.ppt

    48/66

     

    ash Memory

    • type o$ 55!')M %electronicallyerasable pro)rammable read!onlymemory'

    • 5ead!only but can be erased

    • +old contents "/o any po"er

  • 8/9/2019 lec10-memory-design.ppt

    49/66

     

    ash Memory

    >i$$erences $rom >5M

    • Must be erased %in blocks' be$ore it iso(er"ritten

    • ?tatic and less po"er consumption

    • +as a limited number o$ "rite cycles$or any block

    • 2heaper than ?>5M but moreepensi(e than disk

    • ?lo"er than ?>5M but $aster than

    disk

  • 8/9/2019 lec10-memory-design.ppt

    50/66

     

    Memory Dependa,iity

    • (o"t errors

    chan)es to a cellHs contents# not achan)e in the circuitry

    • Hard errors

    permanent chan)es in the operation o$one o$ more memory cells

  • 8/9/2019 lec10-memory-design.ppt

    51/66

     

    Memory Dependa,iity

    rror detection and $i

    • !arity ony

    only one bit o$ o(erhead to detect a sin)le

    error in a seuence o$ bitse.).# one parity bit per 6 data bits

    • 5CC ony

    detect t"o errors and correct a sin)le error"ith 6!bit o(erhead per D; data bits

    • Chip-i

    handle multiple errors and complete $ailure

    o$ a sin)le memory chip

  • 8/9/2019 lec10-memory-design.ppt

    52/66

     

    Memory Dependa,iity

    5ates o$ unreco(erable errors in 3 yrs

    • !arity ony

    about J## or one unreco(erable%undetected' $ailure e(ery 18 mins

    • 5CC ony

    about 3#

  • 8/9/2019 lec10-memory-design.ppt

    53/66

     

    )utine

    • 0en d(anced 2ache ptimizations

    • Memory 0echnolo)y and ptimizations

    • 4irtual Memory and 4irtual Machines• 5M 2orte!6 & 7ntel 2ore i8

  • 8/9/2019 lec10-memory-design.ppt

    54/66

     

    • VMM: Virtua Machine Monitorthree essential characteristics:1. 4MM pro(ides an en(ironment $orpro)rams "hich is essentially identical "iththe ori)inal machine. pro)rams run in this en(ironment sho" at"orst only minor decreases in speed3. 4MM is in complete control o$ system

    resources

    • Mainy "or security and pri#acysharin) and protection amon) multipleprocesses

  • 8/9/2019 lec10-memory-design.ppt

    55/66

     

    Virtua Memory

    • 0he architecture must limit "hat aprocess can access "hen runnin) auser process yet allo" an ? process

    to access more• Lour tasks $or the architecture

  • 8/9/2019 lec10-memory-design.ppt

    56/66

  • 8/9/2019 lec10-memory-design.ppt

    57/66

     

    Virtua Machines

    • Virtua Machine

    a protection mode "ith a much smallercode base than the $ull ?

    • VMM: #irtua machine monitor

    hypervisor 

    so$t"are that supports 4Ms• Host

    underlyin) hard"are plat$orm

  • 8/9/2019 lec10-memory-design.ppt

    58/66

     

    Virtua Machines

    • 5euirements

    1. 9uest so$t"are should beha(e on a4M eactly as i$ it "ere runnin) on the

    nati(e hard"are

    . 9uest so$t"are should not be able tochan)e allocation o$ real system

    resources directly

  • 8/9/2019 lec10-memory-design.ppt

    59/66

     

    )utine

    • 0en d(anced 2ache ptimizations

    • Memory 0echnolo)y and ptimizations

    • 4irtual Memory and 4irtual Machines

    • 5M 2orte!6 & 7ntel 2ore i8

  • 8/9/2019 lec10-memory-design.ppt

    60/66

     

    A'M Cortex A4<

  • 8/9/2019 lec10-memory-design.ppt

    61/66

     

    A'M Cortex A4<

  • 8/9/2019 lec10-memory-design.ppt

    62/66

     

    A'M Cortex A4<

  • 8/9/2019 lec10-memory-design.ppt

    63/66

     

    >nte Core i;

  • 8/9/2019 lec10-memory-design.ppt

    64/66

     

    >nte Core i;

  • 8/9/2019 lec10-memory-design.ppt

    65/66

     

    >nte Core i;

  • 8/9/2019 lec10-memory-design.ppt

    66/66

    ?