ELEC-H-473fuuu.be/polytech/ELECH473/ELECH473_Th01.pdf · 2014. 7. 2. · ELEC-H-473 ... 1,

55
ELEC-H-473 Microprocessor architectures Lecture 01 Dragomir Milojevic [email protected]

Transcript of ELEC-H-473fuuu.be/polytech/ELECH473/ELECH473_Th01.pdf · 2014. 7. 2. · ELEC-H-473 ... 1,

  • ELEC-H-473Microprocessor architectures

    Lecture 01Dragomir Milojevic

    [email protected]

    mailto:[email protected]:[email protected]

  • Université libre de Bruxelles/Faculté des Sciences Appliquées/BEAMS/MILOJEVIC Dragomir

    General information1. AgendaLectures 2 ECTS = 12 sessions, 2h/sessionMonday → from 10.00 to 12.00 (C3.122); CONFLICT 2B solved ! Friday → from 08.00 to 10.00 (H.2213)TPs 3 ECTS Monday → from 14.00 to 18.00 (Solbosch, building U UA5.217)Friday → cancelled (moved to Monday)2. ELEC-H-473 Internet resources

    http://beams.ulb.ac.be/beams/login: etudiants, password: SquareG! (it is case sensitive)Attention: if you do not login you will not even see the notes.

    2

  • Université libre de Bruxelles/Faculté des Sciences Appliquées/BEAMS/MILOJEVIC Dragomir

    General information3. Conflict dates • 3, 7 and 21 March I am travelling (we will organise this)4. Practical work• Presence is mandatory !• Mini-projects to be implemented; each project to be presented (you

    have to show the working demo); Q&A are part of the evaluation • Practical work account for 45% of the final mark5. Examen• Is oral• Most of the questions are theoretical but some of the questions could

    be closely related to the practical work• You are expected not only to show the lecture content (copy slides),

    but be able to reason on the matter

    3

  • Université libre de Bruxelles/Faculté des Sciences Appliquées/BEAMS/MILOJEVIC Dragomir

    1. Tale on computing machines

    2. IC manufacturing technology perspective

    3. Computing systems performance

    4. Example of poor usage : data centers

    5. What could happen in the future ?

    4

    Today

  • The lecture starts with a tale on computing machines ...

    … how they are made and

    … and how to push their limits ...

    5

    Do u know from where does this

    comes from?

    So once upon a time ...

  • Université libre de Bruxelles/Faculté des Sciences Appliquées/BEAMS/MILOJEVIC Dragomir

    … there was a mathematician

    David Hilbert, 1900

    Journal of Signal Processing Systems manuscript No.(will be inserted by the editor)

    Roberto Airoldi · Tapani Ahonen · Fabio Garzia · Dragomir Milojevic ·Jari Nurmi

    Implementation of W-CDMA Cell Search on a HighlyParallel and Scalable MPSoC

    the date of receipt and acceptance should be inserted later

    Abstract The performance of the W-CDMA cell searchalgorithm can be significantly improved using homoge-neous general purpose Multi-Processor System-on-Chip(MPSoC) architectures. The application also scales well,as the number of processing nodes increases, allowingpractical accelerations to become close to the theoreti-cal maximum. In this work we describe a template MP-SoC architecture based on multiprocessor computationalclusters, called Ninesilica. Each Ninesilica consist of nineprocessing nodes based on COFFEE RISC architecture.MPSoC inter- and intra-cluster communication are en-abled using hierarchical Network-on-Chip with dedicatedpoint to point and broadcast communication services forbetter performance. Proposed template has been used toinstantiate complete systems with one and four Ninesil-ica clusters, resulting in MPSoCs with respectively 9 and36 computational nodes. The MPSoCs have been physi-cally prototyped on a FPGA device, and the W-CDMAcell search algorithm has been mapped on both MPSoCplatforms. The four Ninesilica MPSoC can execute W-CDMA in 20.5ms (at 115MHz, slow mode implementa-tion) with the total speed-up of 24.3X and 3.3X whencompared to a single processing core system and to asingle Ninesilica cluster respectively.

    Keywords Multi-Processor System-on-Chip · Network-on-Chip · W-CDMA · Software Defined Radio

    E = mc2 (1)

    Authors would like to thank Stephen Burgess for his invalu-able help.

    Roberto Airoldi, Tapani Ahonen, Fabio Garzia, Jari NurmiDepartment of Computer Systems, Tampere University ofTechnology, Korkeakoulunkatu 1, PO box 553, FI 33101,Tampere, FinlandE-mail: [email protected]

    Dragomir MilojevicUniversité Libre de Bruxelles - ULB, Faculty of AppliedSciences, Bio, Electro and Mechanical Systems, ULB CP-165/56, Av. F. Roosevelt 50, B-1050 Bruxelles, Belgium.E-mail: [email protected]

    1 Introduction

    The evolution of mobile devices over the past years andthe vide variety of wireless connectivity protocols re-sulted in the development of the Software Defined Radio(SDR). This technology will replace traditional wirelessreceivers, built to support only one radio protocol sincethe RF components are most of the time designed ac-cording to only one transmission frequency. Thus, thebaseband processor supports only the algorithms usedfor a specific standard. In an SDR device, the same hard-ware is able to receive signals from di�erent frequencybands and decode them using di�erent protocols. There-fore, the same receiver can synchronise with di�erentwireless networks (cellular networks, Wi-Fi, WiMAX,etc...). Unfortunately, practical design of such devices isquite complex. The RF front-end, for example, shouldbe able to receive the signals using very broad frequencyspectrum. For baseband processing, di�erent standardsrequire implementation of di�erent algorithms, each withits own timing constraints. While flexibility and low powerare unavoidable requirements for such systems, the per-formance is crucial, since SDR systems need to complywith extremely tight timing constraints.

    The simplest approach for a multi-standard basebandprocessor is to use a DSP and define the receiving chainin software. However, a traditional DSP is not powerfulenough for the current applications and its power con-sumption can be too high for use in mobile devices [12].Therefore, ad-hoc hardware blocks are needed for theexecution of specific kernels.

    Some baseband solutions present the coupling of ageneral-purpose core with dedicated co-processors ableto handle SDR kernels (for example Expresso platformin [1]). In particular, the application-specific acceleratorssupported the Wideband Code Division Multiple Access(W-CDMA [2]) and Orthogonal Frequency Division Mul-tiplexing (OFDM [3]).

    A possible alternative is to use a general-purpose co-processor. BUTTER, a coarse-grain reconfigurable array,

    Jour

    nal o

    f Signa

    l Pro

    cessingSy

    stem

    s man

    uscript N

    o.

    (will

    beins

    erted

    byth

    e edit

    or)

    Robe

    rtoAi

    roldi· T

    apan

    i Aho

    nen

    · Fab

    ioGarzia

    · Drago

    mir

    Milo

    jevic

    ·

    Jari

    Nur

    mi

    Implem

    entation

    ofW

    -CDM

    ACellSe

    arch

    onaHighly

    Paralle

    l and

    Scalab

    leM

    PSo

    C

    theda

    teof

    receiptan

    dac

    cept

    ance

    shou

    ldbe

    inserted

    later

    Abstract

    The p

    erform

    ance

    ofth

    e W-C

    DMA

    cell s

    earch

    algori

    thm

    can

    besig

    nifica

    ntly

    impr

    oved

    using

    homo

    ge-

    neou

    s gen

    eral p

    urpo

    seMult

    i-Proces

    sor S

    ystem

    -on-C

    hip

    (MPS

    oC) a

    rchite

    ctures

    . The

    appli

    catio

    n also

    scales

    well,

    asth

    enu

    mber

    ofpr

    ocess

    ingno

    des i

    ncrea

    ses, a

    llowi

    ng

    prac

    tical

    accel

    eratio

    nsto

    beco

    meclo

    seto

    the t

    heore

    ti-

    cal m

    axim

    um. I

    n this

    work

    wede

    scribe

    a tem

    plate

    MP-

    SoC

    archit

    ectur

    e based

    onmu

    ltipr

    ocess

    orco

    mput

    ation

    al

    cluste

    rs,ca

    lled N

    inesil

    ica. E

    ach N

    inesil

    icaco

    nsist

    ofnin

    e

    proc

    essing

    node

    s based

    onCO

    FFEE

    RISC

    archit

    ectur

    e.

    MPS

    oCint

    er-an

    dint

    ra-clu

    ster c

    ommu

    nicati

    onare

    en-

    abled

    using

    hierar

    chica

    l Netw

    ork-on

    -Chip

    with

    dedic

    ated

    point

    topo

    intan

    d broa

    dcast c

    ommu

    nicati

    onser

    vices

    for

    bette

    r perf

    orman

    ce.Pr

    oposed

    templa

    teha

    s been

    used

    to

    instan

    tiate

    comp

    lete s

    ystem

    s with

    one a

    ndfou

    r Nine

    sil-

    icaclu

    sters,

    result

    ingin

    MPS

    oCs w

    ithres

    pecti

    vely

    9 and

    36co

    mput

    ation

    alno

    des.

    The M

    PSoC

    s hav

    e been

    physi-

    cally

    proto

    typed

    ona F

    PGA

    devic

    e,an

    d the

    W-C

    DMA

    cell s

    earch

    algori

    thm

    has b

    eenma

    pped

    onbo

    thMPS

    oC

    platfo

    rms.

    The f

    our N

    inesil

    icaMPS

    oCca

    nex

    ecute

    W-

    CDMA

    in20

    .5ms(at

    115M

    Hz,

    slow

    mode

    imple

    menta-

    tion)

    with

    the t

    otal s

    peed

    -upof

    24.3X

    and

    3.3X

    when

    comp

    ared

    toa

    single

    proc

    essing

    core

    syste

    man

    dto

    a

    single

    Nine

    silica

    cluste

    r resp

    ective

    ly.

    Key

    word

    sMult

    i-Proces

    sor S

    ystem

    -on-C

    hip· N

    etwork

    -

    on-C

    hip· W

    -CDM

    A· S

    oftwa

    reDe

    fined

    Radio

    E=

    mc

    2

    (1)

    Aut

    hors

    wouldlik

    eto

    than

    kSt

    ephe

    nBu

    rgessforhisinva

    lu-

    able

    help.

    Rob

    erto

    Airo

    ldi,Ta

    pani

    Aho

    nen,

    FabioGarzia,

    Jari

    Nur

    mi

    Dep

    artm

    entof

    Com

    puterSy

    stem

    s,Ta

    mpe

    reUnive

    rsity

    of

    Tech

    nology

    ,Korke

    akou

    lunk

    atu

    1,PO

    box

    553,

    FI33

    101,

    Tampe

    re, F

    inland

    E-mail:fir

    stna

    me.lastna

    me@

    tut.fi

    Drago

    mir

    Milo

    jevic

    Unive

    rsité

    Libr

    ede

    Brux

    elles

    -ULB

    ,Fa

    culty

    ofApp

    lied

    Scienc

    es,Bi

    o,El

    ectro

    and

    Mecha

    nica

    lSy

    stem

    s,ULB

    CP-

    165/

    56, A

    v.F.

    Roo

    seve

    lt50

    , B-105

    0Br

    uxelles,

    Belgium.

    E-mail:Drago

    mir.

    Milo

    jevic@

    ulb.ac

    .be

    1Introd

    uctio

    n

    The e

    volut

    ionof

    mobil

    e dev

    ices o

    ver t

    hepa

    stye

    arsan

    d

    the

    vide

    varie

    tyof

    wirel

    essco

    nnect

    ivity

    proto

    cols

    re-

    sulte

    d in t

    hede

    velop

    ment

    ofth

    e Soft

    ware

    Defin

    edRa

    dio

    (SDR

    ).Th

    istec

    hnolo

    gywi

    llrep

    lace t

    raditi

    onal

    wirel

    ess

    receiv

    ers, b

    uilt t

    o sup

    port

    only

    one r

    adio

    proto

    col s

    ince

    the R

    Fco

    mpon

    ents

    aremo

    stof

    the t

    ime d

    esign

    edac

    -

    cord

    ingto

    only

    onetra

    nsmi

    ssion

    frequ

    ency.

    Thus

    , the

    baseb

    and

    proc

    essor

    supp

    orts o

    nlyth

    ealg

    orith

    msus

    ed

    fora s

    pecifi

    c stan

    dard

    . In a

    n SDR

    devic

    e,th

    e sam

    e hard

    -

    ware

    isab

    leto

    receiv

    e sign

    alsfro

    mdi�

    erent

    frequ

    ency

    band

    s and

    deco

    deth

    emus

    ingdi�

    erent

    proto

    cols.

    There

    -

    fore,

    the

    same

    receiv

    erca

    nsync

    hron

    isewi

    thdi�

    erent

    wirel

    essne

    twork

    s(ce

    llular

    netw

    orks,

    Wi-F

    i,W

    iMAX

    ,

    etc...)

    . Unfo

    rtuna

    tely,

    prac

    tical

    desig

    n of s

    uch d

    evice

    s is

    quite

    comp

    lex. T

    heRF

    front-en

    d,for

    exam

    ple, s

    hould

    beab

    leto

    receiv

    e the

    signa

    lsus

    ingve

    rybr

    oad f

    reque

    ncy

    spect

    rum.

    For b

    aseb

    and

    proc

    essing

    , di�e

    rent s

    tanda

    rds

    requir

    e imp

    lemen

    tation

    ofdi�

    erent

    algori

    thms

    , each w

    ith

    itsow

    n tim

    ingco

    nstra

    ints.W

    hile fl

    exibi

    lity a

    ndlow

    powe

    r

    areun

    avoid

    able

    requir

    emen

    tsfor

    such

    syste

    ms, t

    hepe

    r-

    forma

    nce i

    s cru

    cial,

    since

    SDR

    syste

    msne

    edto

    comp

    ly

    with

    extre

    mely

    tight

    timing

    cons

    traint

    s.

    The s

    imple

    stap

    proa

    chfor

    a mult

    i-stan

    dard

    baseb

    and

    proc

    essor

    isto

    use a

    DSP

    and d

    efine

    the r

    eceivi

    ngch

    ain

    insoftw

    are. H

    owev

    er,a t

    raditi

    onal

    DSP

    isno

    t pow

    erful

    enou

    ghfor

    the c

    urren

    t app

    licati

    ons a

    ndits

    powe

    r con

    -

    sump

    tion c

    anbe

    too h

    ighfor

    use i

    n mob

    ilede

    vices

    [12].

    There

    fore,

    ad-ho

    cha

    rdwa

    reblo

    cks a

    rene

    eded

    forth

    e

    execu

    tion o

    f spe

    cific k

    ernels

    .

    Some

    baseb

    and

    solut

    ions p

    resen

    t the

    coup

    ling

    ofa

    gene

    ral-pu

    rpose c

    orewi

    thde

    dicate

    dco

    -proc

    essors

    able

    toha

    ndle

    SDR

    kern

    els(fo

    r exa

    mple

    Expr

    essopla

    tform

    in[1]

    ).In

    parti

    cular

    , the

    appli

    catio

    n-spe

    cific a

    cceler

    ators

    supp

    orted

    the W

    ideband

    Code

    Divis

    ionMult

    iple A

    ccess

    (W-C

    DMA

    [2]) a

    ndOr

    thogona

    l Freq

    uenc

    y Divi

    sion M

    ul-

    tiplex

    ing(O

    FDM

    [3]).

    Apo

    ssible

    altern

    ative

    isto

    use a

    gene

    ral-pu

    rpose c

    o-

    proc

    essor.

    BUTT

    ER, a

    coars

    e-grai

    n reco

    nfigu

    rable

    array,

    Journ

    alofSig

    nalPro

    cess

    ing

    Syst

    em

    sm

    anusc

    riptN

    o.

    (willbe

    inse

    rted

    by

    the

    editor

    )

    Robert

    oAirold

    TapaniAhonen

    ·Fabio

    Garz

    ia·

    Dra

    gom

    irM

    ilojevic

    ·

    Jari

    Nurm

    i

    Implem

    entatio

    nofW

    -CDM

    ACell

    Search

    on

    aH

    ighly

    Paralleland

    Scalable

    MPSoC

    the

    date

    ofre

    ceip

    tand

    accepta

    nce

    should

    be

    inse

    rted

    late

    r

    Abst

    ract

    Theper

    form

    ance

    ofth

    eW

    -CDM

    Ace

    llse

    arch

    algo

    rith

    mca

    nbe

    sign

    ifica

    ntly

    impro

    ved

    using

    hom

    oge-

    neo

    us

    gener

    alpurp

    ose

    Multi-Pro

    cess

    orSyst

    em-o

    n-C

    hip

    (MPSoC

    )ar

    chitec

    ture

    s.Theap

    plica

    tion

    also

    scal

    eswell,

    asth

    enum

    ber

    ofpro

    cess

    ing

    nodes

    incr

    ease

    s,al

    lowin

    g

    pra

    ctical

    acce

    lera

    tion

    sto

    bec

    ome

    clos

    eto

    the

    theo

    reti-

    calm

    axim

    um

    .In

    this

    wor

    kwe

    des

    crib

    ea

    tem

    pla

    teM

    P-

    SoC

    arch

    itec

    ture

    bas

    edon

    multip

    roce

    ssor

    com

    puta

    tion

    al

    clust

    ers,

    called

    Nin

    esilica.

    Eac

    hNin

    esilica

    consist

    ofnin

    e

    pro

    cess

    ing

    nodes

    bas

    edon

    CO

    FFEE

    RIS

    Car

    chitec

    ture

    .

    MPSoC

    inte

    r-an

    din

    tra-

    clust

    erco

    mm

    unicat

    ion

    are

    en-

    abled

    usinghiera

    rchical

    Net

    wor

    k-o

    n-C

    hip

    with

    ded

    icat

    ed

    poi

    ntto

    poi

    ntan

    dbro

    adca

    stco

    mm

    unicat

    ion

    serv

    ices

    for

    bet

    terper

    form

    ance

    .Pro

    pos

    edte

    mpla

    tehas

    bee

    nuse

    dto

    inst

    antiat

    eco

    mplete

    syst

    emswith

    one

    and

    fourNin

    esil-

    icaclust

    ers,

    resu

    ltin

    gin

    MPSoC

    swith

    resp

    ective

    ly9an

    d

    36co

    mputa

    tion

    alnodes

    .The

    MPSoC

    shav

    ebee

    nphysi-

    cally

    pro

    toty

    ped

    ona

    FPG

    Adev

    ice,

    and

    the

    W-C

    DM

    A

    cell

    sear

    chal

    gorith

    mhas

    bee

    nm

    apped

    onbot

    hM

    PSoC

    pla

    tfor

    ms.

    The

    four

    Nin

    esilica

    MPSoC

    can

    exec

    ute

    W-

    CDM

    Ain

    20.5

    ms

    (at11

    5MH

    z,slow

    modeim

    plem

    enta

    -

    tion

    )with

    the

    tota

    lsp

    eed-u

    pof

    24.3

    Xan

    d3.

    3Xwhen

    com

    par

    edto

    asingl

    epro

    cess

    ing

    core

    syst

    eman

    dto

    a

    singl

    eNin

    esilica

    clust

    erre

    spec

    tive

    ly.

    Keyword

    sM

    ulti-Pro

    cess

    orSyst

    em-o

    n-C

    hip

    · Net

    wor

    k-

    on-C

    hip

    ·W

    -CDM

    A·Sof

    twar

    eDefi

    ned

    Rad

    io

    E=

    mc2

    (1)

    Auth

    ors

    would

    like

    toth

    ank

    Ste

    phen

    Burg

    ess

    forhis

    invalu

    -

    able

    help

    .

    Robert

    oAirold

    i,TapaniAhonen,Fabio

    Garz

    ia,Jari

    Nurm

    i

    Depart

    ment

    of

    Com

    pute

    rSyst

    em

    s,Tam

    pere

    Univ

    ers

    ity

    of

    Technolo

    gy,

    Kork

    eakoulu

    nkatu

    1,

    PO

    box

    553,

    FI

    33101,

    Tam

    pere

    ,Fin

    land

    E-m

    ail:firs

    tnam

    e.last

    nam

    e@

    tut.fi

    Dra

    gom

    irM

    ilojevic

    Univ

    ers

    ité

    Lib

    rede

    Bru

    xelles

    -ULB,

    Faculty

    of

    Applied

    Sciences,

    Bio

    ,Electr

    oand

    Mechanical

    Syst

    em

    s,ULB

    CP-

    165/56,Av.F.Roose

    velt

    50,B-1

    050

    Bru

    xelles,

    Belg

    ium

    .

    E-m

    ail:Dra

    gom

    ir.M

    ilojevic@

    ulb

    .ac.b

    e

    1In

    troduction

    The

    evol

    ution

    ofm

    obile

    dev

    ices

    over

    the

    pas

    tye

    arsan

    d

    the

    vid

    eva

    riet

    yof

    wireles

    sco

    nnec

    tivity

    pro

    toco

    lsre

    -

    sulted

    inth

    edev

    elop

    men

    tof

    theSof

    twar

    eDefi

    ned

    Rad

    io

    (SDR).

    This

    tech

    nol

    ogy

    willre

    pla

    cetr

    aditio

    nal

    wireles

    s

    rece

    iver

    s,builtto

    suppor

    ton

    lyon

    era

    dio

    pro

    toco

    lsince

    the

    RF

    com

    pon

    ents

    are

    mos

    tof

    the

    tim

    edes

    igned

    ac-

    cord

    ing

    toon

    lyon

    etr

    ansm

    ission

    freq

    uen

    cy.Thus,

    the

    bas

    eban

    dpro

    cess

    orsu

    ppor

    tson

    lyth

    eal

    gorith

    ms

    use

    d

    forasp

    ecifi

    cst

    andar

    d.In

    anSDR

    dev

    ice,

    thesa

    mehar

    d-

    war

    eis

    able

    tore

    ceiv

    esign

    als

    from

    di�

    eren

    tfreq

    uen

    cy

    ban

    dsan

    ddec

    odeth

    emusing

    di�

    eren

    tpro

    toco

    ls.Ther

    e-

    fore

    ,th

    esa

    me

    rece

    iver

    can

    synch

    ronise

    with

    di�

    eren

    t

    wireles

    snet

    wor

    ks

    (cellu

    lar

    net

    wor

    ks,

    Wi-Fi,

    WiM

    AX,

    etc...).

    Unfo

    rtunat

    ely,

    pra

    ctical

    des

    ign

    ofsu

    chdev

    ices

    is

    quite

    com

    plex.The

    RF

    fron

    t-en

    d,fo

    rex

    ample,sh

    ould

    beab

    leto

    rece

    iveth

    esign

    alsusing

    very

    bro

    adfreq

    uen

    cy

    spec

    trum

    .For

    bas

    eban

    dpro

    cess

    ing,

    di�

    eren

    tst

    andar

    ds

    requireim

    plem

    enta

    tion

    ofdi�

    eren

    tal

    gorith

    ms,

    each

    with

    itsow

    ntim

    ingco

    nst

    rain

    ts. W

    hileflex

    ibilityan

    dlow

    pow

    er

    are

    unav

    oidab

    lere

    quirem

    ents

    forsu

    chsy

    stem

    s,th

    eper

    -

    form

    ance

    iscr

    ucial

    ,since

    SDR

    syst

    ems

    nee

    dto

    com

    ply

    with

    extr

    emely

    tigh

    ttim

    ing

    const

    rain

    ts.

    Thesim

    plest

    appro

    ach

    foram

    ulti-st

    andar

    dbas

    eban

    d

    pro

    cess

    oris

    touse

    aDSP

    and

    defi

    neth

    ere

    ceiv

    ing

    chai

    n

    inso

    ftwar

    e.How

    ever

    ,a

    trad

    itio

    nal

    DSP

    isnot

    pow

    erfu

    l

    enou

    ghfo

    rth

    ecu

    rren

    tap

    plica

    tion

    san

    dits

    pow

    erco

    n-

    sum

    ption

    can

    be

    too

    hig

    hfo

    ruse

    inm

    obile

    dev

    ices

    [12]

    .

    Ther

    efor

    e,ad

    -hoc

    har

    dwar

    eblo

    cks

    are

    nee

    ded

    for

    the

    exec

    ution

    ofsp

    ecifi

    cke

    rnels.

    Som

    ebas

    eban

    dso

    lution

    spre

    sent

    the

    coupling

    ofa

    gener

    al-p

    urp

    ose

    core

    with

    ded

    icat

    edco

    -pro

    cess

    ors

    able

    tohan

    dle

    SDR

    kern

    els

    (for

    exam

    ple

    Expre

    sso

    pla

    tfor

    m

    in[1

    ]).In

    par

    ticu

    lar,

    theap

    plica

    tion

    -spec

    ificac

    celera

    tors

    suppor

    ted

    theW

    ideb

    and

    Cod

    eDivision

    Multiple

    Acces

    s

    (W-C

    DM

    A[2

    ])an

    dOrthog

    onalFrequ

    ency

    Division

    Mul-

    tiplexing

    (OFDM

    [3]).

    Apos

    sible

    alte

    rnat

    iveis

    touse

    age

    ner

    al-p

    urp

    oseco

    -

    pro

    cess

    or.BUTTER,aco

    arse

    -gra

    inre

    configu

    rable

    arra

    y,

    Journal of Signal Processing Systems manuscript No.

    (will be inserted by the editor)

    Roberto Airoldi · Tapani Ahonen · Fabio Garzia · Dragomir Milojevic ·

    Jari NurmiIm

    plementation

    of W-CDM

    ACell Search

    ona

    Highly

    Parallel andScalable M

    PSoC

    the date of receipt and acceptance should be inserted later

    Abstract The performance of the W-CDMA cell search

    algorithmcan be significantly improved using homoge-

    neous general purpose Multi-Processor System-on-Chip

    (MPSoC) architectures. The application also scales well,

    as the number of processing nodes increases, allowing

    practical accelerations to become close to the theoreti-

    cal maximum. In this work we describe a template MP-

    SoC architecture based on multiprocessor computational

    clusters, called Ninesilica. Each Ninesilica consist of nine

    processing nodes based on COFFEE RISC architecture.

    MPSoCinter- and intra-cluster communication are en-

    abled using hierarchical Network-on-Chip with dedicated

    point to point and broadcast communication services for

    better performance. Proposed template has been used to

    instantiate complete systems with one and four Ninesil-

    ica clusters, resulting in MPSoCs with respectively 9 and

    36 computational nodes. The MPSoCs have been physi-

    cally prototyped on a FPGA device, and the W-CDMA

    cell search algorithmhas been mapped on both MPSoC

    platforms. The four Ninesilica MPSoCcan execute W-

    CDMA in 20.5ms (at 115MHz, slow mode implementa-

    tion) with the total speed-up of 24.3Xand 3.3X

    when

    compared to a single processing core systemand to a

    single Ninesilica cluster respectively.

    Keywords Multi-Processor System-on-Chip · Network-

    on-Chip · W-CDMA · Software Defined Radio

    E =mc 2

    (1)

    Authors wouldlike to thank Stephen

    Burgess for his invalu-

    able help.Roberto Airoldi, Tapani Ahonen, Fabio Garzia, Jari Nurmi

    Departmentof Computer

    Systems, TampereUniversity

    of

    Technology, Korkeakoulunkatu1, PO

    box553, FI

    33101,

    Tampere, Finland

    E-mail: [email protected]

    Dragomir Milojevic

    UniversitéLibre

    deBruxelles

    -ULB, Faculty

    of Applied

    Sciences, Bio, Electroand

    Mechanical Systems, ULB

    CP-

    165/56, Av. F. Roosevelt 50, B-1050 Bruxelles, Belgium.

    E-mail: [email protected]

    1 Introduction

    The evolution of mobile devices over the past years and

    the vide variety of wireless connectivity protocols re-

    sulted in the development of the Software Defined Radio

    (SDR). This technology will replace traditional wireless

    receivers, built to support only one radio protocol since

    the RF components are most of the time designed ac-

    cording to only one transmission frequency. Thus, the

    baseband processor supports only the algorithms used

    for a specific standard. In an SDR device, the same hard-

    ware is able to receive signals fromdi�erent frequency

    bands and decode themusing di�erent protocols. There-

    fore, the same receiver can synchronise with di�erent

    wireless networks (cellular networks, Wi-Fi, WiMAX,

    etc...). Unfortunately, practical design of such devices is

    quite complex. The RF front-end, for example, should

    be able to receive the signals using very broad frequency

    spectrum. For baseband processing, di�erent standards

    require implementation of di�erent algorithms, each with

    its own timing constraints. While flexibility and low power

    are unavoidable requirements for such systems, the per-

    formance is crucial, since SDRsystems need to comply

    with extremely tight timing constraints.

    The simplest approach for a multi-standard baseband

    processor is to use a DSP and define the receiving chain

    in software. However, a traditional DSP is not powerful

    enough for the current applications and its power con-

    sumption can be too high for use in mobile devices [12].

    Therefore, ad-hoc hardware blocks are needed for the

    execution of specific kernels.

    Some baseband solutions present the coupling of a

    general-purpose core with dedicated co-processors able

    to handle SDRkernels (for example Expresso platform

    in [1]). In particular, the application-specific accelerators

    supported the Wideband Code Division Multiple Access

    (W-CDMA [2]) and Orthogonal Frequency Division Mul-

    tiplexing (OFDM[3]).

    A possible alternative is to use a general-purpose co-

    processor. BUTTER, a coarse-grain reconfigurable array,

    Journ

    alof

    Signa

    l Proc

    essing

    Syste

    msma

    nuscr

    iptNo

    .

    (will b

    e inser

    tedby

    theedi

    tor)

    Robe

    rtoAi

    roldi

    · Tap

    ani A

    hone

    n· F

    abio

    Garzi

    a ·Dr

    agomi

    r Milo

    jevic

    ·

    Jari N

    urmi

    Imple

    ment

    ation

    ofW

    -CDM

    ACe

    llSe

    arch

    ona

    High

    ly

    Para

    llel a

    ndSc

    alable

    MPS

    oC

    theda

    teof

    receip

    t and

    accep

    tance

    shou

    ldbe

    insert

    edlat

    er

    Abstr

    actTh

    e perfo

    rmanc

    e of th

    e W-CD

    MAcell

    search

    algorit

    hmcan

    besign

    ificant

    lyimp

    roved

    using

    homoge

    -

    neous

    genera

    l purp

    oseMu

    lti-Pro

    cessor

    System

    -on-Ch

    ip

    (MPS

    oC) a

    rchitec

    tures.

    The a

    pplica

    tionalso

    scales

    well,

    asthe

    number

    ofpro

    cessin

    g node

    s incre

    ases, a

    llowing

    practic

    alacc

    eleratio

    nsto

    becom

    e close

    tothe

    theore

    ti-

    calma

    ximum

    . Inthi

    s work

    wedes

    cribe a

    templa

    te MP-

    SoCarc

    hitect

    urebas

    edon

    multip

    rocess

    or com

    putatio

    nal

    cluste

    rs,cal

    ledNin

    esilica

    . Each

    Ninesil

    icacon

    sistof n

    ine

    proces

    sing n

    odes b

    ased o

    n COF

    FEE R

    ISCarc

    hitect

    ure.

    MPSoC

    inter-

    andint

    ra-clu

    ster c

    ommu

    nicatio

    n are

    en-

    abled

    using

    hierar

    chical

    Netwo

    rk-on-

    Chip w

    ithded

    icated

    point

    to poin

    t and b

    roadca

    st com

    munic

    ation s

    ervices

    for

    better

    perfor

    mance

    . Prop

    osed t

    empla

    te has b

    eenuse

    d to

    instan

    tiate c

    omple

    tesys

    tems w

    ithone

    andfou

    r Nine

    sil-

    icaclu

    sters,

    result

    ingin M

    PSoC

    s with

    respec

    tively 9

    and

    36com

    putatio

    nalnod

    es.Th

    e MPS

    oCs h

    avebee

    n phys

    i-

    cally p

    rototy

    pedon

    a FPG

    A devi

    ce,and

    theW-

    CDMA

    cellsea

    rchalg

    orithm

    hasbee

    n mapp

    edon

    both M

    PSoC

    platfo

    rms. T

    hefou

    r Nine

    silica

    MPSoC

    canexe

    cute W

    -

    CDMA

    in 20.5

    ms(at

    115M

    Hz, sl

    owmo

    deimp

    lement

    a-

    tion) w

    iththe

    total s

    peed-u

    p of 2

    4.3X

    and3.3

    Xwh

    en

    compar

    edto

    a sing

    lepro

    cessin

    g core

    system

    andto

    a

    single

    Ninesil

    icaclu

    ster re

    spectiv

    ely.

    Keyw

    ords M

    ulti-P

    rocess

    or Syst

    em-on

    -Chip ·

    Netwo

    rk-

    on-Ch

    ip ·W-

    CDMA

    · Softw

    areDe

    fined R

    adio

    E= m

    c2

    (1)

    Autho

    rswo

    uldlik

    e to t

    hank

    Steph

    enBu

    rgess

    forhis

    invalu

    -

    able

    help.

    Robe

    rtoAi

    roldi,

    Tapa

    niAh

    onen

    , Fab

    ioGa

    rzia,

    Jari

    Nurm

    i

    Depa

    rtmen

    t of C

    ompu

    terSy

    stems

    , Tam

    pere

    Unive

    rsity

    of

    Tech

    nolog

    y,Ko

    rkeak

    oulun

    katu

    1,PO

    box

    553,

    FI33

    101,

    Tamp

    ere, F

    inlan

    d

    E-ma

    il:firs

    tname

    .lastn

    ame@

    tut.fi

    Drag

    omir

    Miloj

    evic

    Unive

    rsité

    Libre

    deBr

    uxell

    es- U

    LB, F

    aculty

    ofAp

    plied

    Scien

    ces, B

    io,El

    ectro

    and

    Mech

    anica

    l Syst

    ems,

    ULB

    CP-

    165/

    56, A

    v.F.

    Roose

    velt 5

    0,B-

    1050

    Brux

    elles,

    Belgi

    um.

    E-ma

    il:Dr

    agom

    ir.Milo

    jevic@

    ulb.ac

    .be

    1 Intr

    oduct

    ion

    The e

    voluti

    onof m

    obile d

    evices

    over th

    e past

    years a

    nd

    thevid

    e varie

    tyof

    wireles

    s conn

    ectivit

    y prot

    ocols

    re-

    sulted

    in the

    develo

    pment

    of the

    Softwa

    re Defin

    edRa

    dio

    (SDR).

    This t

    echnol

    ogywil

    l repla

    cetra

    dition

    al wirel

    ess

    receiv

    ers, b

    uiltto

    suppor

    t only

    onerad

    io prot

    ocol si

    nce

    theRF

    compon

    ents a

    remo

    stof

    thetim

    e desig

    nedac-

    cordin

    g to o

    nlyone

    transm

    ission

    frequen

    cy.Th

    us,the

    baseba

    ndpro

    cessor

    suppor

    tsonl

    y the

    algorit

    hms u

    sed

    fora sp

    ecific st

    andard

    . Inan

    SDR d

    evice,

    thesam

    e hard

    -

    ware

    is able

    torec

    eive s

    ignals

    from

    di�ere

    ntfreq

    uency

    bands

    anddec

    odethe

    m usin

    g di�e

    rent p

    rotoco

    ls.Th

    ere-

    fore,

    thesam

    e rece

    iver c

    ansyn

    chroni

    sewit

    h di�e

    rent

    wireles

    s netw

    orks (

    cellula

    r netw

    orks,

    Wi-Fi

    , WiM

    AX,

    etc...)

    . Unfo

    rtunat

    ely,pra

    ctical d

    esign o

    f such

    devices

    is

    quite

    comple

    x.Th

    e RF f

    ront-en

    d,for

    examp

    le,sho

    uld

    beabl

    e torec

    eive th

    e signa

    ls usin

    g very

    broad

    frequen

    cy

    spectr

    um. F

    orbas

    eband

    proces

    sing, d

    i�eren

    t stan

    dards

    requir

    e imple

    menta

    tionof d

    i�eren

    t algor

    ithms

    , each

    with

    itsow

    n timin

    g const

    raints.

    While

    flexibil

    ityand

    lowpow

    er

    areuna

    voidab

    le requ

    irement

    s for su

    chsys

    tems, t

    heper

    -

    forma

    nceis c

    rucial,

    since

    SDR s

    ystem

    s need

    tocom

    ply

    with e

    xtrem

    elytigh

    t timin

    g cons

    traint

    s.

    The si

    mplest

    approa

    chfor

    a mult

    i-stand

    ardbas

    eband

    proces

    soris t

    o use a

    DSP a

    nddefi

    nethe

    receiv

    ingcha

    in

    insof

    tware.

    Howev

    er,a t

    raditio

    nalDS

    P is n

    otpow

    erful

    enough

    forthe

    curren

    t appl

    ication

    s and

    itspow

    ercon

    -

    sumpti

    oncan

    betoo

    high f

    oruse

    in mobi

    le devi

    ces[12

    ].

    There

    fore,

    ad-hoc

    hardw

    areblo

    cksare

    needed

    forthe

    execut

    ionof s

    pecific

    kernel

    s.

    Some b

    aseban

    d solu

    tions p

    resent

    thecou

    pling o

    f a

    genera

    l-purpo

    secor

    e with

    dedica

    tedco-

    proces

    sors a

    ble

    tohan

    dleSD

    R kern

    els(fo

    r exam

    pleEx

    presso

    platfo

    rm

    in [1]).

    Inpar

    ticular,

    theapp

    lication

    -specifi

    c acce

    lerator

    s

    suppor

    tedthe

    Wideb

    andCo

    deDiv

    ision M

    ultiple

    Access

    (W-CD

    MA[2])

    andOr

    thogon

    al Freq

    uency

    Divisio

    n Mul-

    tiplexi

    ng(OF

    DM[3])

    .

    A poss

    iblealte

    rnative

    is to u

    se agen

    eral-pu

    rpose c

    o-

    proces

    sor. B

    UTTE

    R,a c

    oarse-g

    rainrec

    onfigur

    able a

    rray,

    Journal

    of Signa

    l Proces

    singSyst

    emsman

    uscript

    No.

    (willbe in

    serted by

    the editor

    )

    Roberto

    Airoldi

    · Tapan

    i Ahonen

    · Fabio

    Garzia

    · Dragom

    ir Miloje

    vic·

    JariNur

    mi

    Implem

    entatio

    n of W

    -CDMA

    Cell Se

    archon

    a High

    ly

    Paralle

    l and S

    calable

    MPSoC

    thedate

    of recei

    pt and a

    cceptanc

    e should

    be inser

    tedlate

    r

    Abstrac

    t The per

    formance

    of the W-

    CDMA ce

    ll search

    algorithm

    canbe s

    ignificant

    ly improv

    ed using

    homoge-

    neous gen

    eralpurp

    ose Multi-

    Processor

    System-o

    n-Chip

    (MPSoC)

    architect

    ures.The

    applicati

    on also s

    caleswell,

    as the nu

    mber of

    processin

    g nodes

    increases,

    allowing

    practical

    accelerat

    ionsto b

    ecome clo

    se tothe t

    heoreti-

    cal maxim

    um.In th

    is work w

    e describe

    a template

    MP-

    SoCarch

    itecture b

    asedon m

    ultiproce

    ssorcomp

    utationa

    l

    clusters, c

    alledNine

    silica. Eac

    h Ninesil

    ica consis

    t of nine

    processin

    g nodes b

    asedon C

    OFFEE R

    ISCarch

    itecture.

    MPSoC i

    nter-and

    intra-clus

    ter comm

    unication

    are en-

    abledusing

    hierarchic

    al Networ

    k-on-Chi

    p with ded

    icated

    point to

    point and

    broadcas

    t commun

    ication se

    rvices for

    better per

    formance

    . Propose

    d template

    has been u

    sed to

    instantiat

    e complete

    systems w

    ith one a

    nd four N

    inesil-

    ica cluste

    rs, resulti

    ng inMPS

    oCswith

    respectiv

    ely 9and

    36 compu

    tational n

    odes. Th

    e MPSoCs

    havebeen

    physi-

    callyprot

    otyped o

    n a FPGA

    device, an

    d the W

    -CDMA

    cell search

    algorithm

    has been m

    apped on

    bothMPS

    oC

    platforms

    . The fou

    r Ninesil

    ica MPSoC

    canexec

    ute W-

    CDMA in

    20.5ms (at

    115MHz,

    slowmod

    e implem

    enta-

    tion)with

    the total

    speed-up

    of 24.3X

    and3.3X

    when

    compared

    to asingl

    e proces

    singcore

    system an

    d toa

    single Nin

    esilica clu

    sterresp

    ectively.

    Keywor

    ds Multi-

    Processor

    System-o

    n-Chip · N

    etwork-

    on-Chip ·

    W-CDMA

    · Softwar

    e Defined

    Radio

    E =mc

    2

    (1)

    Authors

    would l

    iketo t

    hank St

    ephen B

    urgess fo

    r his in

    valu-

    ablehelp

    .

    Roberto

    Airoldi,

    Tapani

    Ahonen

    , Fabio

    Garzia,

    JariNur

    mi

    Departm

    entof C

    omputer

    System

    s, Tamp

    ereUni

    versity

    of

    Techno

    logy, K

    orkeako

    ulunkatu

    1, PO b

    ox553

    , FI 33

    101,

    Tampere

    , Finlan

    d

    E-mail:

    firstnam

    e.lastna

    me@tut.

    fi

    Dragom

    ir Miloje

    vic

    Univers

    itéLib

    re de B

    ruxelles

    - ULB,

    Faculty

    of Appli

    ed

    Sciences

    , Bio,

    Electro

    andMec

    hanical

    System

    s, ULB

    CP-

    165/56

    , Av. F.

    Rooseve

    lt 50, B

    -1050 B

    ruxelles

    , Belgiu

    m.

    E-mail:

    Dragom

    ir.Miloje

    vic@ulb.

    ac.be

    1 Introd

    uction

    Theevolu

    tionof m

    obiledevic

    es over th

    e past ye

    ars and

    thevide

    variety o

    f wireles

    s connec

    tivity pr

    otocols r

    e-

    sulted in

    the develo

    pment of

    the Softw

    are Define

    d Radio

    (SDR). Th

    is techno

    logywill

    replace tr

    aditional

    wireless

    receivers,

    builtto su

    pport onl

    y one rad

    io protoc

    ol since

    theRF

    componen

    ts are mo

    st ofthe

    timedesig

    nedac-

    cording t

    o only on

    e transmi

    ssionfrequ

    ency. Th

    us, the

    baseband

    processor

    supports

    onlythe

    algorithm

    s used

    for aspec

    ific stand

    ard.In an

    SDRdevic

    e, the sam

    e hard-

    wareis ab

    le torecei

    ve signal

    s from d

    i�erent fr

    equency

    bands and

    decode th

    em using

    di�erent

    protocols

    . There-

    fore,the

    samerecei

    vercan

    synchron

    ise with d

    i�erent

    wireless

    networks

    (cellular

    networks

    , Wi-Fi,

    WiMAX,

    etc...). U

    nfortunat

    ely, practi

    cal design

    of such d

    evices is

    quitecomp

    lex.The

    RFfront

    -end, for

    example,

    should

    be able t

    o receive

    the signa

    ls using v

    ery broad

    frequency

    spectrum

    . Forbase

    bandproc

    essing, di

    �erent st

    andards

    require im

    plementa

    tionof di

    �erent al

    gorithms,

    eachwith

    its own tim

    ing const

    raints. W

    hile flexibi

    lity and lo

    w power

    are unavo

    idable re

    quirement

    s forsuch

    systems,

    the per-

    formance

    is crucial

    , since S

    DRsyste

    ms need

    to comply

    withextre

    melytight

    timing con

    straints.

    Thesimp

    lest appro

    ach for a

    multi-sta

    ndard ba

    seband

    processor

    is touse a

    DSPand

    define the

    receiving

    chain

    in softwa

    re. Howev

    er, atrad

    itional D

    SP is not

    powerful

    enough fo

    r thecurre

    nt applic

    ations an

    d itspowe

    r con-

    sumption

    canbe to

    o high for

    use in mo

    biledevic

    es [12].

    Therefore

    , ad-hoc

    hardware

    blocks are

    needed fo

    r the

    execution

    of specific

    kernels.

    Some ba

    seband s

    olutions

    present th

    e couplin

    g ofa

    general-p

    urpose co

    re with d

    edicated

    co-proce

    ssorsable

    to handl

    e SDR ke

    rnels(for

    example E

    xpresso p

    latform

    in [1]). In

    particula

    r, the app

    lication-s

    pecific acc

    elerators

    supported

    the Wideba

    nd Code

    Division

    Multiple

    Access

    (W-CDMA

    [2]) and O

    rthogonal

    Frequenc

    y Divisio

    n Mul-

    tiplexing

    (OFDM

    [3]).

    A possibl

    e alterna

    tiveis to

    use agene

    ral-purpo

    se co-

    processor

    . BUTTE

    R, acoar

    se-grain r

    econfigur

    ablearray

    ,that prepared a BIG question for XXth century:

    “Could maths be automatized?”

    6

  • Université libre de Bruxelles/Faculté des Sciences Appliquées/BEAMS/MILOJEVIC Dragomir

    BIG QUESTION got an answer: BIG NO!Kurt Gödel, 1931

    Journal of Signal Processing Systems manuscript No.(will be inserted by the editor)

    Roberto Airoldi · Tapani Ahonen · Fabio Garzia · Dragomir Milojevic ·Jari Nurmi

    Implementation of W-CDMA Cell Search on a HighlyParallel and Scalable MPSoC

    the date of receipt and acceptance should be inserted later

    Abstract The performance of the W-CDMA cell searchalgorithm can be significantly improved using homoge-neous general purpose Multi-Processor System-on-Chip(MPSoC) architectures. The application also scales well,as the number of processing nodes increases, allowingpractical accelerations to become close to the theoreti-cal maximum. In this work we describe a template MP-SoC architecture based on multiprocessor computationalclusters, called Ninesilica. Each Ninesilica consist of nineprocessing nodes based on COFFEE RISC architecture.MPSoC inter- and intra-cluster communication are en-abled using hierarchical Network-on-Chip with dedicatedpoint to point and broadcast communication services forbetter performance. Proposed template has been used toinstantiate complete systems with one and four Ninesil-ica clusters, resulting in MPSoCs with respectively 9 and36 computational nodes. The MPSoCs have been physi-cally prototyped on a FPGA device, and the W-CDMAcell search algorithm has been mapped on both MPSoCplatforms. The four Ninesilica MPSoC can execute W-CDMA in 20.5ms (at 115MHz, slow mode implementa-tion) with the total speed-up of 24.3X and 3.3X whencompared to a single processing core system and to asingle Ninesilica cluster respectively.

    Keywords Multi-Processor System-on-Chip · Network-on-Chip · W-CDMA · Software Defined Radio

    E = mc2 (1)

    Authors would like to thank Stephen Burgess for his invalu-able help.

    Roberto Airoldi, Tapani Ahonen, Fabio Garzia, Jari NurmiDepartment of Computer Systems, Tampere University ofTechnology, Korkeakoulunkatu 1, PO box 553, FI 33101,Tampere, FinlandE-mail: [email protected]

    Dragomir MilojevicUniversité Libre de Bruxelles - ULB, Faculty of AppliedSciences, Bio, Electro and Mechanical Systems, ULB CP-165/56, Av. F. Roosevelt 50, B-1050 Bruxelles, Belgium.E-mail: [email protected]

    1 Introduction

    The evolution of mobile devices over the past years andthe vide variety of wireless connectivity protocols re-sulted in the development of the Software Defined Radio(SDR). This technology will replace traditional wirelessreceivers, built to support only one radio protocol sincethe RF components are most of the time designed ac-cording to only one transmission frequency. Thus, thebaseband processor supports only the algorithms usedfor a specific standard. In an SDR device, the same hard-ware is able to receive signals from di�erent frequencybands and decode them using di�erent protocols. There-fore, the same receiver can synchronise with di�erentwireless networks (cellular networks, Wi-Fi, WiMAX,etc...). Unfortunately, practical design of such devices isquite complex. The RF front-end, for example, shouldbe able to receive the signals using very broad frequencyspectrum. For baseband processing, di�erent standardsrequire implementation of di�erent algorithms, each withits own timing constraints. While flexibility and low powerare unavoidable requirements for such systems, the per-formance is crucial, since SDR systems need to complywith extremely tight timing constraints.

    The simplest approach for a multi-standard basebandprocessor is to use a DSP and define the receiving chainin software. However, a traditional DSP is not powerfulenough for the current applications and its power con-sumption can be too high for use in mobile devices [12].Therefore, ad-hoc hardware blocks are needed for theexecution of specific kernels.

    Some baseband solutions present the coupling of ageneral-purpose core with dedicated co-processors ableto handle SDR kernels (for example Expresso platformin [1]). In particular, the application-specific acceleratorssupported the Wideband Code Division Multiple Access(W-CDMA [2]) and Orthogonal Frequency Division Mul-tiplexing (OFDM [3]).

    A possible alternative is to use a general-purpose co-processor. BUTTER, a coarse-grain reconfigurable array,

    Jour

    nal o

    f Signa

    l Pro

    cessingSy

    stem

    s man

    uscript N

    o.

    (will

    beins

    erted

    byth

    e edit

    or)

    Robe

    rtoAi

    roldi· T

    apan

    i Aho

    nen

    · Fab

    ioGarzia

    · Drago

    mir

    Milo

    jevic

    ·

    Jari

    Nur

    mi

    Implem

    entation

    ofW

    -CDM

    ACellSe

    arch

    onaHighly

    Paralle

    l and

    Scalab

    leM

    PSo

    C

    theda

    teof

    receiptan

    dac

    cept

    ance

    shou

    ldbe

    inserted

    later

    Abstract

    The p

    erform

    ance

    ofth

    e W-C

    DMA

    cell s

    earch

    algori

    thm

    can

    besig

    nifica

    ntly

    impr

    oved

    using

    homo

    ge-

    neou

    s gen

    eral p

    urpo

    seMult

    i-Proces

    sor S

    ystem

    -on-C

    hip

    (MPS

    oC) a

    rchite

    ctures

    . The

    appli

    catio

    n also

    scales

    well,

    asth

    enu

    mber

    ofpr

    ocess

    ingno

    des i

    ncrea

    ses, a

    llowi

    ng

    prac

    tical

    accel

    eratio

    nsto

    beco

    meclo

    seto

    the t

    heore

    ti-

    cal m

    axim

    um. I

    n this

    work

    wede

    scribe

    a tem

    plate

    MP-

    SoC

    archit

    ectur

    e based

    onmu

    ltipr

    ocess

    orco

    mput

    ation

    al

    cluste

    rs,ca

    lled N

    inesil

    ica. E

    ach N

    inesil

    icaco

    nsist

    ofnin

    e

    proc

    essing

    node

    s based

    onCO

    FFEE

    RISC

    archit

    ectur

    e.

    MPS

    oCint

    er-an

    dint

    ra-clu

    ster c

    ommu

    nicati

    onare

    en-

    abled

    using

    hierar

    chica

    l Netw

    ork-on

    -Chip

    with

    dedic

    ated

    point

    topo

    intan

    d broa

    dcast c

    ommu

    nicati

    onser

    vices

    for

    bette

    r perf

    orman

    ce.Pr

    oposed

    templa

    teha

    s been

    used

    to

    instan

    tiate

    comp

    lete s

    ystem

    s with

    one a

    ndfou

    r Nine

    sil-

    icaclu

    sters,

    result

    ingin

    MPS

    oCs w

    ithres

    pecti

    vely

    9 and

    36co

    mput

    ation

    alno

    des.

    The M

    PSoC

    s hav

    e been

    physi-

    cally

    proto

    typed

    ona F

    PGA

    devic

    e,an

    d the

    W-C

    DMA

    cell s

    earch

    algori

    thm

    has b

    eenma

    pped

    onbo

    thMPS

    oC

    platfo

    rms.

    The f

    our N

    inesil

    icaMPS

    oCca

    nex

    ecute

    W-

    CDMA

    in20

    .5ms(at

    115M

    Hz,

    slow

    mode

    imple

    menta-

    tion)

    with

    the t

    otal s

    peed

    -upof

    24.3X

    and

    3.3X

    when

    comp

    ared

    toa

    single

    proc

    essing

    core

    syste

    man

    dto

    a

    single

    Nine

    silica

    cluste

    r resp

    ective

    ly.

    Key

    word

    sMult

    i-Proces

    sor S

    ystem

    -on-C

    hip· N

    etwork

    -

    on-C

    hip· W

    -CDM

    A· S

    oftwa

    reDe

    fined

    Radio

    E=

    mc

    2

    (1)

    Aut

    hors

    wouldlik

    eto

    than

    kSt

    ephe

    nBu

    rgessforhisinva

    lu-

    able

    help.

    Rob

    erto

    Airo

    ldi,Ta

    pani

    Aho

    nen,

    FabioGarzia,

    Jari

    Nur

    mi

    Dep

    artm

    entof

    Com

    puterSy

    stem

    s,Ta

    mpe

    reUnive

    rsity

    of

    Tech

    nology

    ,Korke

    akou

    lunk

    atu

    1,PO

    box

    553,

    FI33

    101,

    Tampe

    re, F

    inland

    E-mail:fir

    stna

    me.lastna

    me@

    tut.fi

    Drago

    mir

    Milo

    jevic

    Unive

    rsité

    Libr

    ede

    Brux

    elles

    -ULB

    ,Fa

    culty

    ofApp

    lied

    Scienc

    es,Bi

    o,El

    ectro

    and

    Mecha

    nica

    lSy

    stem

    s,ULB

    CP-

    165/

    56, A

    v.F.

    Roo

    seve

    lt50

    , B-105

    0Br

    uxelles,

    Belgium.

    E-mail:Drago

    mir.

    Milo

    jevic@

    ulb.ac

    .be

    1Introd

    uctio

    n

    The e

    volut

    ionof

    mobil

    e dev

    ices o

    ver t

    hepa

    stye

    arsan

    d

    the

    vide

    varie

    tyof

    wirel

    essco

    nnect

    ivity

    proto

    cols

    re-

    sulte

    d in t

    hede

    velop

    ment

    ofth

    e Soft

    ware

    Defin

    edRa

    dio

    (SDR

    ).Th

    istec

    hnolo

    gywi

    llrep

    lace t

    raditi

    onal

    wirel

    ess

    receiv

    ers, b

    uilt t

    o sup

    port

    only

    one r

    adio

    proto

    col s

    ince

    the R

    Fco

    mpon

    ents

    aremo

    stof

    the t

    ime d

    esign

    edac

    -

    cord

    ingto

    only

    onetra

    nsmi

    ssion

    frequ

    ency.

    Thus

    , the

    baseb

    and

    proc

    essor

    supp

    orts o

    nlyth

    ealg

    orith

    msus

    ed

    fora s

    pecifi

    c stan

    dard

    . In a

    n SDR

    devic

    e,th

    e sam

    e hard

    -

    ware

    isab

    leto

    receiv

    e sign

    alsfro

    mdi�

    erent

    frequ

    ency

    band

    s and

    deco

    deth

    emus

    ingdi�

    erent

    proto

    cols.

    There

    -

    fore,

    the

    same

    receiv

    erca

    nsync

    hron

    isewi

    thdi�

    erent

    wirel

    essne

    twork

    s(ce

    llular

    netw

    orks,

    Wi-F

    i,W

    iMAX

    ,

    etc...)

    . Unfo

    rtuna

    tely,

    prac

    tical

    desig

    n of s

    uch d

    evice

    s is

    quite

    comp

    lex. T

    heRF

    front-en

    d,for

    exam

    ple, s

    hould

    beab

    leto

    receiv

    e the

    signa

    lsus

    ingve

    rybr

    oad f

    reque

    ncy

    spect

    rum.

    For b

    aseb

    and

    proc

    essing

    , di�e

    rent s

    tanda

    rds

    requir

    e imp

    lemen

    tation

    ofdi�

    erent

    algori

    thms

    , each w

    ith

    itsow

    n tim

    ingco

    nstra

    ints.W

    hile fl

    exibi

    lity a

    ndlow

    powe

    r

    areun

    avoid

    able

    requir

    emen

    tsfor

    such

    syste

    ms, t

    hepe

    r-

    forma

    nce i

    s cru

    cial,

    since

    SDR

    syste

    msne

    edto

    comp

    ly

    with

    extre

    mely

    tight

    timing

    cons

    traint

    s.

    The s

    imple

    stap

    proa

    chfor

    a mult

    i-stan

    dard

    baseb

    and

    proc

    essor

    isto

    use a

    DSP

    and d

    efine

    the r

    eceivi

    ngch

    ain

    insoftw

    are. H

    owev

    er,a t

    raditi

    onal

    DSP

    isno

    t pow

    erful

    enou

    ghfor

    the c

    urren

    t app

    licati

    ons a

    ndits

    powe

    r con

    -

    sump

    tion c

    anbe

    too h

    ighfor

    use i

    n mob

    ilede

    vices

    [12].

    There

    fore,

    ad-ho

    cha

    rdwa

    reblo

    cks a

    rene

    eded

    forth

    e

    execu

    tion o

    f spe

    cific k

    ernels

    .

    Some

    baseb

    and

    solut

    ions p

    resen

    t the

    coup

    ling

    ofa

    gene

    ral-pu

    rpose c

    orewi

    thde

    dicate

    dco

    -proc

    essors

    able

    toha

    ndle

    SDR

    kern

    els(fo

    r exa

    mple

    Expr

    essopla

    tform

    in[1]

    ).In

    parti

    cular

    , the

    appli

    catio

    n-spe

    cific a

    cceler

    ators

    supp

    orted

    the W

    ideband

    Code

    Divis

    ionMult

    iple A

    ccess

    (W-C

    DMA

    [2]) a

    ndOr

    thogona

    l Freq

    uenc

    y Divi

    sion M

    ul-

    tiplex

    ing(O

    FDM

    [3]).

    Apo

    ssible

    altern

    ative

    isto

    use a

    gene

    ral-pu

    rpose c

    o-

    proc

    essor.

    BUTT

    ER, a

    coars

    e-grai

    n reco

    nfigu

    rable

    array,

    Journ

    alofSig

    nalPro

    cess

    ing

    Syst

    em

    sm

    anusc

    riptN

    o.

    (willbe

    inse

    rted

    by

    the

    editor

    )

    Robert

    oAirold

    TapaniAhonen

    ·Fabio

    Garz

    ia·

    Dra

    gom

    irM

    ilojevic

    ·

    Jari

    Nurm

    i

    Implem

    entatio

    nofW

    -CDM

    ACell

    Search

    on

    aH

    ighly

    Paralleland

    Scalable

    MPSoC

    the

    date

    ofre

    ceip

    tand

    accepta

    nce

    should

    be

    inse

    rted

    late

    r

    Abst

    ract

    Theper

    form

    ance

    ofth

    eW

    -CDM

    Ace

    llse

    arch

    algo

    rith

    mca

    nbe

    sign

    ifica

    ntly

    impro

    ved

    using

    hom

    oge-

    neo

    us

    gener

    alpurp

    ose

    Multi-Pro

    cess

    orSyst

    em-o

    n-C

    hip

    (MPSoC

    )ar

    chitec

    ture

    s.Theap

    plica

    tion

    also

    scal

    eswell,

    asth

    enum

    ber

    ofpro

    cess

    ing

    nodes

    incr

    ease

    s,al

    lowin

    g

    pra

    ctical

    acce

    lera

    tion

    sto

    bec

    ome

    clos

    eto

    the

    theo

    reti-

    calm

    axim

    um

    .In

    this

    wor

    kwe

    des

    crib

    ea

    tem

    pla

    teM

    P-

    SoC

    arch

    itec

    ture

    bas

    edon

    multip

    roce

    ssor

    com

    puta

    tion

    al

    clust

    ers,

    called

    Nin

    esilica.

    Eac

    hNin

    esilica

    consist

    ofnin

    e

    pro

    cess

    ing

    nodes

    bas

    edon

    CO

    FFEE

    RIS

    Car

    chitec

    ture

    .

    MPSoC

    inte

    r-an

    din

    tra-

    clust

    erco

    mm

    unicat

    ion

    are

    en-

    abled

    usinghiera

    rchical

    Net

    wor

    k-o

    n-C

    hip

    with

    ded

    icat

    ed

    poi

    ntto

    poi

    ntan

    dbro

    adca

    stco

    mm

    unicat

    ion

    serv

    ices

    for

    bet

    terper

    form

    ance

    .Pro

    pos

    edte

    mpla

    tehas

    bee

    nuse

    dto

    inst

    antiat

    eco

    mplete

    syst

    emswith

    one

    and

    fourNin

    esil-

    icaclust

    ers,

    resu

    ltin

    gin

    MPSoC

    swith

    resp

    ective

    ly9an

    d

    36co

    mputa

    tion

    alnodes

    .The

    MPSoC

    shav

    ebee

    nphysi-

    cally

    pro

    toty

    ped

    ona

    FPG

    Adev

    ice,

    and

    the

    W-C

    DM

    A

    cell

    sear

    chal

    gorith

    mhas

    bee

    nm

    apped

    onbot

    hM

    PSoC

    pla

    tfor

    ms.

    The

    four

    Nin

    esilica

    MPSoC

    can

    exec

    ute

    W-

    CDM

    Ain

    20.5

    ms

    (at11

    5MH

    z,slow

    modeim

    plem

    enta

    -

    tion

    )with

    the

    tota

    lsp

    eed-u

    pof

    24.3

    Xan

    d3.

    3Xwhen

    com

    par

    edto

    asingl

    epro

    cess

    ing

    core

    syst

    eman

    dto

    a

    singl

    eNin

    esilica

    clust

    erre

    spec

    tive

    ly.

    Keyword

    sM

    ulti-Pro

    cess

    orSyst

    em-o

    n-C

    hip

    · Net

    wor

    k-

    on-C

    hip

    ·W

    -CDM

    A·Sof

    twar

    eDefi

    ned

    Rad

    io

    E=

    mc2

    (1)

    Auth

    ors

    would

    like

    toth

    ank

    Ste

    phen

    Burg

    ess

    forhis

    invalu

    -

    able

    help

    .

    Robert

    oAirold

    i,TapaniAhonen,Fabio

    Garz

    ia,Jari

    Nurm

    i

    Depart

    ment

    of

    Com

    pute

    rSyst

    em

    s,Tam

    pere

    Univ

    ers

    ity

    of

    Technolo

    gy,

    Kork

    eakoulu

    nkatu

    1,

    PO

    box

    553,

    FI

    33101,

    Tam

    pere

    ,Fin

    land

    E-m

    ail:firs

    tnam

    e.last

    nam

    e@

    tut.fi

    Dra

    gom

    irM

    ilojevic

    Univ

    ers

    ité

    Lib

    rede

    Bru

    xelles

    -ULB,

    Faculty

    of

    Applied

    Sciences,

    Bio

    ,Electr

    oand

    Mechanical

    Syst

    em

    s,ULB

    CP-

    165/56,Av.F.Roose

    velt

    50,B-1

    050

    Bru

    xelles,

    Belg

    ium

    .

    E-m

    ail:Dra

    gom

    ir.M

    ilojevic@

    ulb

    .ac.b

    e

    1In

    troduction

    The

    evol

    ution

    ofm

    obile

    dev

    ices

    over

    the

    pas

    tye

    arsan

    d

    the

    vid

    eva

    riet

    yof

    wireles

    sco

    nnec

    tivity

    pro

    toco

    lsre

    -

    sulted

    inth

    edev

    elop

    men

    tof

    theSof

    twar

    eDefi

    ned

    Rad

    io

    (SDR).

    This

    tech

    nol

    ogy

    willre

    pla

    cetr

    aditio

    nal

    wireles

    s

    rece

    iver

    s,builtto

    suppor

    ton

    lyon

    era

    dio

    pro

    toco

    lsince

    the

    RF

    com

    pon

    ents

    are

    mos

    tof

    the

    tim

    edes

    igned

    ac-

    cord

    ing

    toon

    lyon

    etr

    ansm

    ission

    freq

    uen

    cy.Thus,

    the

    bas

    eban

    dpro

    cess

    orsu

    ppor

    tson

    lyth

    eal

    gorith

    ms

    use

    d

    forasp

    ecifi

    cst

    andar

    d.In

    anSDR

    dev

    ice,

    thesa

    mehar

    d-

    war

    eis

    able

    tore

    ceiv

    esign

    als

    from

    di�

    eren

    tfreq

    uen

    cy

    ban

    dsan

    ddec

    odeth

    emusing

    di�

    eren

    tpro

    toco

    ls.Ther

    e-

    fore

    ,th

    esa

    me

    rece

    iver

    can

    synch

    ronise

    with

    di�

    eren

    t

    wireles

    snet

    wor

    ks

    (cellu

    lar

    net

    wor

    ks,

    Wi-Fi,

    WiM

    AX,

    etc...).

    Unfo

    rtunat

    ely,

    pra

    ctical

    des

    ign

    ofsu

    chdev

    ices

    is

    quite

    com

    plex.The

    RF

    fron

    t-en

    d,fo

    rex

    ample,sh

    ould

    beab

    leto

    rece

    iveth

    esign

    alsusing

    very

    bro

    adfreq

    uen

    cy

    spec

    trum

    .For

    bas

    eban

    dpro

    cess

    ing,

    di�

    eren

    tst

    andar

    ds

    requireim

    plem

    enta

    tion

    ofdi�

    eren

    tal

    gorith

    ms,

    each

    with

    itsow

    ntim

    ingco

    nst

    rain

    ts. W

    hileflex

    ibilityan

    dlow

    pow

    er

    are

    unav

    oidab

    lere

    quirem

    ents

    forsu

    chsy

    stem

    s,th

    eper

    -

    form

    ance

    iscr

    ucial

    ,since

    SDR

    syst

    ems

    nee

    dto

    com

    ply

    with

    extr

    emely

    tigh

    ttim

    ing

    const

    rain

    ts.

    Thesim

    plest

    appro

    ach

    foram

    ulti-st

    andar

    dbas

    eban

    d

    pro

    cess

    oris

    touse

    aDSP

    and

    defi

    neth

    ere

    ceiv

    ing

    chai

    n

    inso

    ftwar

    e.How

    ever

    ,a

    trad

    itio

    nal

    DSP

    isnot

    pow

    erfu

    l

    enou

    ghfo

    rth

    ecu

    rren

    tap

    plica

    tion

    san

    dits

    pow

    erco

    n-

    sum

    ption

    can

    be

    too

    hig

    hfo

    ruse

    inm

    obile

    dev

    ices

    [12]

    .

    Ther

    efor

    e,ad

    -hoc

    har

    dwar

    eblo

    cks

    are

    nee

    ded

    for

    the

    exec

    ution

    ofsp

    ecifi

    cke

    rnels.

    Som

    ebas

    eban

    dso

    lution

    spre

    sent

    the

    coupling

    ofa

    gener

    al-p

    urp

    ose

    core

    with

    ded

    icat

    edco

    -pro

    cess

    ors

    able

    tohan

    dle

    SDR

    kern

    els

    (for

    exam

    ple

    Expre

    sso

    pla

    tfor

    m

    in[1

    ]).In

    par

    ticu

    lar,

    theap

    plica

    tion

    -spec

    ificac

    celera

    tors

    suppor

    ted

    theW

    ideb

    and

    Cod

    eDivision

    Multiple

    Acces

    s

    (W-C

    DM

    A[2

    ])an

    dOrthog

    onalFrequ

    ency

    Division

    Mul-

    tiplexing

    (OFDM

    [3]).

    Apos

    sible

    alte

    rnat

    iveis

    touse

    age

    ner

    al-p

    urp

    oseco

    -

    pro

    cess

    or.BUTTER,aco

    arse

    -gra

    inre

    configu

    rable

    arra

    y,

    Journal of Signal Processing Systems manuscript No.

    (will be inserted by the editor)

    Roberto Airoldi · Tapani Ahonen · Fabio Garzia · Dragomir Milojevic ·

    Jari NurmiIm

    plementation

    of W-CDM

    ACell Search

    ona

    Highly

    Parallel andScalable M

    PSoC

    the date of receipt and acceptance should be inserted later

    Abstract The performance of the W-CDMA cell search

    algorithmcan be significantly improved using homoge-

    neous general purpose Multi-Processor System-on-Chip

    (MPSoC) architectures. The application also scales well,

    as the number of processing nodes increases, allowing

    practical accelerations to become close to the theoreti-

    cal maximum. In this work we describe a template MP-

    SoC architecture based on multiprocessor computational

    clusters, called Ninesilica. Each Ninesilica consist of nine

    processing nodes based on COFFEE RISC architecture.

    MPSoCinter- and intra-cluster communication are en-

    abled using hierarchical Network-on-Chip with dedicated

    point to point and broadcast communication services for

    better performance. Proposed template has been used to

    instantiate complete systems with one and four Ninesil-

    ica clusters, resulting in MPSoCs with respectively 9 and

    36 computational nodes. The MPSoCs have been physi-

    cally prototyped on a FPGA device, and the W-CDMA

    cell search algorithmhas been mapped on both MPSoC

    platforms. The four Ninesilica MPSoCcan execute W-

    CDMA in 20.5ms (at 115MHz, slow mode implementa-

    tion) with the total speed-up of 24.3Xand 3.3X

    when

    compared to a single processing core systemand to a

    single Ninesilica cluster respectively.

    Keywords Multi-Processor System-on-Chip · Network-

    on-Chip · W-CDMA · Software Defined Radio

    E =mc 2

    (1)

    Authors wouldlike to thank Stephen

    Burgess for his invalu-

    able help.Roberto Airoldi, Tapani Ahonen, Fabio Garzia, Jari Nurmi

    Departmentof Computer

    Systems, TampereUniversity

    of

    Technology, Korkeakoulunkatu1, PO

    box553, FI

    33101,

    Tampere, Finland

    E-mail: [email protected]

    Dragomir Milojevic

    UniversitéLibre

    deBruxelles

    -ULB, Faculty

    of Applied

    Sciences, Bio, Electroand

    Mechanical Systems, ULB

    CP-

    165/56, Av. F. Roosevelt 50, B-1050 Bruxelles, Belgium.

    E-mail: [email protected]

    1 Introduction

    The evolution of mobile devices over the past years and

    the vide variety of wireless connectivity protocols re-

    sulted in the development of the Software Defined Radio

    (SDR). This technology will replace traditional wireless

    receivers, built to support only one radio protocol since

    the RF components are most of the time designed ac-

    cording to only one transmission frequency. Thus, the

    baseband processor supports only the algorithms used

    for a specific standard. In an SDR device, the same hard-

    ware is able to receive signals fromdi�erent frequency

    bands and decode themusing di�erent protocols. There-

    fore, the same receiver can synchronise with di�erent

    wireless networks (cellular networks, Wi-Fi, WiMAX,

    etc...). Unfortunately, practical design of such devices is

    quite complex. The RF front-end, for example, should

    be able to receive the signals using very broad frequency

    spectrum. For baseband processing, di�erent standards

    require implementation of di�erent algorithms, each with

    its own timing constraints. While flexibility and low power

    are unavoidable requirements for such systems, the per-

    formance is crucial, since SDRsystems need to comply

    with extremely tight timing constraints.

    The simplest approach for a multi-standard baseband

    processor is to use a DSP and define the receiving chain

    in software. However, a traditional DSP is not powerful

    enough for the current applications and its power con-

    sumption can be too high for use in mobile devices [12].

    Therefore, ad-hoc hardware blocks are needed for the

    execution of specific kernels.

    Some baseband solutions present the coupling of a

    general-purpose core with dedicated co-processors able

    to handle SDRkernels (for example Expresso platform

    in [1]). In particular, the application-specific accelerators

    supported the Wideband Code Division Multiple Access

    (W-CDMA [2]) and Orthogonal Frequency Division Mul-

    tiplexing (OFDM[3]).

    A possible alternative is to use a general-purpose co-

    processor. BUTTER, a coarse-grain reconfigurable array,

    Journ

    alof

    Signa

    l Proc

    essing

    Syste

    msma

    nuscr

    iptNo

    .

    (will b

    e inser

    tedby

    theedi

    tor)

    Robe

    rtoAi

    roldi

    · Tap

    ani A

    hone

    n· F

    abio

    Garzi

    a ·Dr

    agomi

    r Milo

    jevic

    ·

    Jari N

    urmi

    Imple

    ment

    ation

    ofW

    -CDM

    ACe

    llSe

    arch

    ona

    High

    ly

    Para

    llel a

    ndSc

    alable

    MPS

    oC

    theda

    teof

    receip

    t and

    accep

    tance

    shou

    ldbe

    insert

    edlat

    er

    Abstr

    actTh

    e perfo

    rmanc

    e of th

    e W-CD

    MAcell

    search

    algorit

    hmcan

    besign

    ificant

    lyimp

    roved

    using

    homoge

    -

    neous

    genera

    l purp

    oseMu

    lti-Pro

    cessor

    System

    -on-Ch

    ip

    (MPS

    oC) a

    rchitec

    tures.

    The a

    pplica

    tionalso

    scales

    well,

    asthe

    number

    ofpro

    cessin

    g node

    s incre

    ases, a

    llowing

    practic

    alacc

    eleratio

    nsto

    becom

    e close

    tothe

    theore

    ti-

    calma

    ximum

    . Inthi

    s work

    wedes

    cribe a

    templa

    te MP-

    SoCarc

    hitect

    urebas

    edon

    multip

    rocess

    or com

    putatio

    nal

    cluste

    rs,cal

    ledNin

    esilica

    . Each

    Ninesil

    icacon

    sistof n

    ine

    proces

    sing n

    odes b

    ased o

    n COF

    FEE R

    ISCarc

    hitect

    ure.

    MPSoC

    inter-

    andint

    ra-clu

    ster c

    ommu

    nicatio

    n are

    en-

    abled

    using

    hierar

    chical

    Netwo

    rk-on-

    Chip w

    ithded

    icated

    point

    to poin

    t and b

    roadca

    st com

    munic

    ation s

    ervices

    for

    better

    perfor

    mance

    . Prop

    osed t

    empla

    te has b

    eenuse

    d to

    instan

    tiate c

    omple

    tesys

    tems w

    ithone

    andfou

    r Nine

    sil-

    icaclu

    sters,

    result

    ingin M

    PSoC

    s with

    respec

    tively 9

    and

    36com

    putatio

    nalnod

    es.Th

    e MPS

    oCs h

    avebee

    n phys

    i-

    cally p

    rototy

    pedon

    a FPG

    A devi

    ce,and

    theW-

    CDMA

    cellsea

    rchalg

    orithm

    hasbee

    n mapp

    edon

    both M

    PSoC

    platfo

    rms. T

    hefou

    r Nine

    silica

    MPSoC

    canexe

    cute W

    -

    CDMA

    in 20.5

    ms(at

    115M

    Hz, sl

    owmo

    deimp

    lement

    a-

    tion) w

    iththe

    total s

    peed-u

    p of 2

    4.3X

    and3.3

    Xwh

    en

    compar

    edto

    a sing

    lepro

    cessin

    g core

    system

    andto

    a

    single

    Ninesil

    icaclu

    ster re

    spectiv

    ely.

    Keyw

    ords M

    ulti-P

    rocess

    or Syst

    em-on

    -Chip ·

    Netwo

    rk-

    on-Ch

    ip ·W-

    CDMA

    · Softw

    areDe

    fined R

    adio

    E= m

    c2

    (1)

    Autho

    rswo

    uldlik

    e to t

    hank

    Steph

    enBu

    rgess

    forhis

    invalu

    -

    able

    help.

    Robe

    rtoAi

    roldi,

    Tapa

    niAh

    onen

    , Fab

    ioGa

    rzia,

    Jari

    Nurm

    i

    Depa

    rtmen

    t of C

    ompu

    terSy

    stems

    , Tam

    pere

    Unive

    rsity

    of

    Tech

    nolog

    y,Ko

    rkeak

    oulun

    katu

    1,PO

    box

    553,

    FI33

    101,

    Tamp

    ere, F

    inlan

    d

    E-ma

    il:firs

    tname

    .lastn

    ame@

    tut.fi

    Drag

    omir

    Miloj

    evic

    Unive

    rsité

    Libre

    deBr

    uxell

    es- U

    LB, F

    aculty

    ofAp

    plied

    Scien

    ces, B

    io,El

    ectro

    and

    Mech

    anica

    l Syst

    ems,

    ULB

    CP-

    165/

    56, A

    v.F.

    Roose

    velt 5

    0,B-

    1050

    Brux

    elles,

    Belgi

    um.

    E-ma

    il:Dr

    agom

    ir.Milo

    jevic@

    ulb.ac

    .be

    1 Intr

    oduct

    ion

    The e

    voluti

    onof m

    obile d

    evices

    over th

    e past

    years a

    nd

    thevid

    e varie

    tyof

    wireles

    s conn

    ectivit

    y prot

    ocols

    re-

    sulted

    in the

    develo

    pment

    of the

    Softwa

    re Defin

    edRa

    dio

    (SDR).

    This t

    echnol

    ogywil

    l repla

    cetra

    dition

    al wirel

    ess

    receiv

    ers, b

    uiltto

    suppor

    t only

    onerad

    io prot

    ocol si

    nce

    theRF

    compon

    ents a

    remo

    stof

    thetim

    e desig

    nedac-

    cordin

    g to o

    nlyone

    transm

    ission

    frequen

    cy.Th

    us,the

    baseba

    ndpro

    cessor

    suppor

    tsonl

    y the

    algorit

    hms u

    sed

    fora sp

    ecific st

    andard

    . Inan

    SDR d

    evice,

    thesam

    e hard

    -

    ware

    is able

    torec

    eive s

    ignals

    from

    di�ere

    ntfreq

    uency

    bands

    anddec

    odethe

    m usin

    g di�e

    rent p

    rotoco

    ls.Th

    ere-

    fore,

    thesam

    e rece

    iver c

    ansyn

    chroni

    sewit

    h di�e

    rent

    wireles

    s netw

    orks (

    cellula

    r netw

    orks,

    Wi-Fi

    , WiM

    AX,

    etc...)

    . Unfo

    rtunat

    ely,pra

    ctical d

    esign o

    f such

    devices

    is

    quite

    comple

    x.Th

    e RF f

    ront-en

    d,for

    examp

    le,sho

    uld

    beabl

    e torec

    eive th

    e signa

    ls usin

    g very

    broad

    frequen

    cy

    spectr

    um. F

    orbas

    eband

    proces

    sing, d

    i�eren

    t stan

    dards

    requir

    e imple

    menta

    tionof d

    i�eren

    t algor

    ithms

    , each

    with

    itsow

    n timin

    g const

    raints.

    While

    flexibil

    ityand

    lowpow

    er

    areuna

    voidab

    le requ

    irement

    s for su

    chsys

    tems, t

    heper

    -

    forma

    nceis c

    rucial,

    since

    SDR s

    ystem

    s need

    tocom

    ply

    with e

    xtrem

    elytigh

    t timin

    g cons

    traint

    s.

    The si

    mplest

    approa

    chfor

    a mult

    i-stand

    ardbas

    eband

    proces

    soris t

    o use a

    DSP a

    nddefi

    nethe

    receiv

    ingcha

    in

    insof

    tware.

    Howev

    er,a t

    raditio

    nalDS

    P is n

    otpow

    erful

    enough

    forthe

    curren

    t appl

    ication

    s and

    itspow

    ercon

    -

    sumpti

    oncan

    betoo

    high f

    oruse

    in mobi

    le devi

    ces[12

    ].

    There

    fore,

    ad-hoc

    hardw

    areblo

    cksare

    needed

    forthe

    execut

    ionof s

    pecific

    kernel

    s.

    Some b

    aseban

    d solu

    tions p

    resent

    thecou

    pling o

    f a

    genera

    l-purpo

    secor

    e with

    dedica

    tedco-

    proces

    sors a

    ble

    tohan

    dleSD

    R kern

    els(fo

    r exam

    pleEx

    presso

    platfo

    rm

    in [1]).

    Inpar

    ticular,

    theapp

    lication

    -specifi

    c acce

    lerator

    s

    suppor

    tedthe

    Wideb

    andCo

    deDiv

    ision M

    ultiple

    Access

    (W-CD

    MA[2])

    andOr

    thogon

    al Freq

    uency

    Divisio

    n Mul-

    tiplexi

    ng(OF

    DM[3])

    .

    A poss

    iblealte

    rnative

    is to u

    se agen

    eral-pu

    rpose c

    o-

    proces

    sor. B

    UTTE

    R,a c

    oarse-g

    rainrec

    onfigur

    able a

    rray,

    Journal

    of Signa

    l Proces

    singSyst

    emsman

    uscript

    No.

    (willbe in

    serted by

    the editor

    )

    Roberto

    Airoldi

    · Tapan

    i Ahonen

    · Fabio

    Garzia

    · Dragom

    ir Miloje

    vic·

    JariNur

    mi

    Implem

    entatio

    n of W

    -CDMA

    Cell Se

    archon

    a High

    ly

    Paralle

    l and S

    calable

    MPSoC

    thedate

    of recei

    pt and a

    cceptanc

    e should

    be inser

    tedlate

    r

    Abstrac

    t The per

    formance

    of the W-

    CDMA ce

    ll search

    algorithm

    canbe s

    ignificant

    ly improv

    ed using

    homoge-

    neous gen

    eralpurp

    ose Multi-

    Processor

    System-o

    n-Chip

    (MPSoC)

    architect

    ures.The

    applicati

    on also s

    caleswell,

    as the nu

    mber of

    processin

    g nodes

    increases,

    allowing

    practical

    accelerat

    ionsto b

    ecome clo

    se tothe t

    heoreti-

    cal maxim

    um.In th

    is work w

    e describe

    a template

    MP-

    SoCarch

    itecture b

    asedon m

    ultiproce

    ssorcomp

    utationa

    l

    clusters, c

    alledNine

    silica. Eac

    h Ninesil

    ica consis

    t of nine

    processin

    g nodes b

    asedon C

    OFFEE R

    ISCarch

    itecture.

    MPSoC i

    nter-and

    intra-clus

    ter comm

    unication

    are en-

    abledusing

    hierarchic

    al Networ

    k-on-Chi

    p with ded

    icated

    point to

    point and

    broadcas

    t commun

    ication se

    rvices for

    better per

    formance

    . Propose

    d template

    has been u

    sed to

    instantiat

    e complete

    systems w

    ith one a

    nd four N

    inesil-

    ica cluste

    rs, resulti

    ng inMPS

    oCswith

    respectiv

    ely 9and

    36 compu

    tational n

    odes. Th

    e MPSoCs

    havebeen

    physi-

    callyprot

    otyped o

    n a FPGA

    device, an

    d the W

    -CDMA

    cell search

    algorithm

    has been m

    apped on

    bothMPS

    oC

    platforms

    . The fou

    r Ninesil

    ica MPSoC

    canexec

    ute W-

    CDMA in

    20.5ms (at

    115MHz,

    slowmod

    e implem

    enta-

    tion)with

    the total

    speed-up

    of 24.3X

    and3.3X

    when

    compared

    to asingl

    e proces

    singcore

    system an

    d toa

    single Nin

    esilica clu

    sterresp

    ectively.

    Keywor

    ds Multi-

    Processor

    System-o

    n-Chip · N

    etwork-

    on-Chip ·

    W-CDMA

    · Softwar

    e Defined

    Radio

    E =mc

    2

    (1)

    Authors

    would l

    iketo t

    hank St

    ephen B

    urgess fo

    r his in

    valu-

    ablehelp

    .

    Roberto

    Airoldi,

    Tapani

    Ahonen

    , Fabio

    Garzia,

    JariNur

    mi

    Departm

    entof C

    omputer

    System

    s, Tamp

    ereUni

    versity

    of

    Techno

    logy, K

    orkeako

    ulunkatu

    1, PO b

    ox553

    , FI 33

    101,

    Tampere

    , Finlan

    d

    E-mail:

    firstnam

    e.lastna

    me@tut.

    fi

    Dragom

    ir Miloje

    vic

    Univers

    itéLib

    re de B

    ruxelles

    - ULB,

    Faculty

    of Appli

    ed

    Sciences

    , Bio,

    Electro

    andMec

    hanical

    System

    s, ULB

    CP-

    165/56

    , Av. F.

    Rooseve

    lt 50, B

    -1050 B

    ruxelles

    , Belgiu

    m.

    E-mail:

    Dragom

    ir.Miloje

    vic@ulb.

    ac.be

    1 Introd

    uction

    Theevolu

    tionof m

    obiledevic

    es over th

    e past ye

    ars and

    thevide

    variety o

    f wireles

    s connec

    tivity pr

    otocols r

    e-

    sulted in

    the develo

    pment of

    the Softw

    are Define

    d Radio

    (SDR). Th

    is techno

    logywill

    replace tr

    aditional

    wireless

    receivers,

    builtto su

    pport onl

    y one rad

    io protoc

    ol since

    theRF

    componen

    ts are mo

    st ofthe

    timedesig

    nedac-

    cording t

    o only on

    e transmi

    ssionfrequ

    ency. Th

    us, the

    baseband

    processor

    supports

    onlythe

    algorithm

    s used

    for aspec

    ific stand

    ard.In an

    SDRdevic

    e, the sam

    e hard-

    wareis ab

    le torecei

    ve signal

    s from d

    i�erent fr

    equency

    bands and

    decode th

    em using

    di�erent

    protocols

    . There-

    fore,the

    samerecei

    vercan

    synchron

    ise with d

    i�erent

    wireless

    networks

    (cellular

    networks

    , Wi-Fi,

    WiMAX,

    etc...). U

    nfortunat

    ely, practi

    cal design

    of such d

    evices is

    quitecomp

    lex.The

    RFfront

    -end, for

    example,

    should

    be able t

    o receive

    the signa

    ls using v

    ery broad

    frequency

    spectrum

    . Forbase

    bandproc