ELEC-H-473fuuu.be/polytech/ELECH473/ELECH473_Th01.pdf · 2014. 7. 2. · ELEC-H-473 ... 1,
Transcript of ELEC-H-473fuuu.be/polytech/ELECH473/ELECH473_Th01.pdf · 2014. 7. 2. · ELEC-H-473 ... 1,
-
ELEC-H-473Microprocessor architectures
Lecture 01Dragomir Milojevic
-
Université libre de Bruxelles/Faculté des Sciences Appliquées/BEAMS/MILOJEVIC Dragomir
General information1. AgendaLectures 2 ECTS = 12 sessions, 2h/sessionMonday → from 10.00 to 12.00 (C3.122); CONFLICT 2B solved ! Friday → from 08.00 to 10.00 (H.2213)TPs 3 ECTS Monday → from 14.00 to 18.00 (Solbosch, building U UA5.217)Friday → cancelled (moved to Monday)2. ELEC-H-473 Internet resources
http://beams.ulb.ac.be/beams/login: etudiants, password: SquareG! (it is case sensitive)Attention: if you do not login you will not even see the notes.
2
-
Université libre de Bruxelles/Faculté des Sciences Appliquées/BEAMS/MILOJEVIC Dragomir
General information3. Conflict dates • 3, 7 and 21 March I am travelling (we will organise this)4. Practical work• Presence is mandatory !• Mini-projects to be implemented; each project to be presented (you
have to show the working demo); Q&A are part of the evaluation • Practical work account for 45% of the final mark5. Examen• Is oral• Most of the questions are theoretical but some of the questions could
be closely related to the practical work• You are expected not only to show the lecture content (copy slides),
but be able to reason on the matter
3
-
Université libre de Bruxelles/Faculté des Sciences Appliquées/BEAMS/MILOJEVIC Dragomir
1. Tale on computing machines
2. IC manufacturing technology perspective
3. Computing systems performance
4. Example of poor usage : data centers
5. What could happen in the future ?
4
Today
-
The lecture starts with a tale on computing machines ...
… how they are made and
… and how to push their limits ...
5
Do u know from where does this
comes from?
So once upon a time ...
-
Université libre de Bruxelles/Faculté des Sciences Appliquées/BEAMS/MILOJEVIC Dragomir
… there was a mathematician
David Hilbert, 1900
Journal of Signal Processing Systems manuscript No.(will be inserted by the editor)
Roberto Airoldi · Tapani Ahonen · Fabio Garzia · Dragomir Milojevic ·Jari Nurmi
Implementation of W-CDMA Cell Search on a HighlyParallel and Scalable MPSoC
the date of receipt and acceptance should be inserted later
Abstract The performance of the W-CDMA cell searchalgorithm can be significantly improved using homoge-neous general purpose Multi-Processor System-on-Chip(MPSoC) architectures. The application also scales well,as the number of processing nodes increases, allowingpractical accelerations to become close to the theoreti-cal maximum. In this work we describe a template MP-SoC architecture based on multiprocessor computationalclusters, called Ninesilica. Each Ninesilica consist of nineprocessing nodes based on COFFEE RISC architecture.MPSoC inter- and intra-cluster communication are en-abled using hierarchical Network-on-Chip with dedicatedpoint to point and broadcast communication services forbetter performance. Proposed template has been used toinstantiate complete systems with one and four Ninesil-ica clusters, resulting in MPSoCs with respectively 9 and36 computational nodes. The MPSoCs have been physi-cally prototyped on a FPGA device, and the W-CDMAcell search algorithm has been mapped on both MPSoCplatforms. The four Ninesilica MPSoC can execute W-CDMA in 20.5ms (at 115MHz, slow mode implementa-tion) with the total speed-up of 24.3X and 3.3X whencompared to a single processing core system and to asingle Ninesilica cluster respectively.
Keywords Multi-Processor System-on-Chip · Network-on-Chip · W-CDMA · Software Defined Radio
E = mc2 (1)
Authors would like to thank Stephen Burgess for his invalu-able help.
Roberto Airoldi, Tapani Ahonen, Fabio Garzia, Jari NurmiDepartment of Computer Systems, Tampere University ofTechnology, Korkeakoulunkatu 1, PO box 553, FI 33101,Tampere, FinlandE-mail: [email protected]
Dragomir MilojevicUniversité Libre de Bruxelles - ULB, Faculty of AppliedSciences, Bio, Electro and Mechanical Systems, ULB CP-165/56, Av. F. Roosevelt 50, B-1050 Bruxelles, Belgium.E-mail: [email protected]
1 Introduction
The evolution of mobile devices over the past years andthe vide variety of wireless connectivity protocols re-sulted in the development of the Software Defined Radio(SDR). This technology will replace traditional wirelessreceivers, built to support only one radio protocol sincethe RF components are most of the time designed ac-cording to only one transmission frequency. Thus, thebaseband processor supports only the algorithms usedfor a specific standard. In an SDR device, the same hard-ware is able to receive signals from di�erent frequencybands and decode them using di�erent protocols. There-fore, the same receiver can synchronise with di�erentwireless networks (cellular networks, Wi-Fi, WiMAX,etc...). Unfortunately, practical design of such devices isquite complex. The RF front-end, for example, shouldbe able to receive the signals using very broad frequencyspectrum. For baseband processing, di�erent standardsrequire implementation of di�erent algorithms, each withits own timing constraints. While flexibility and low powerare unavoidable requirements for such systems, the per-formance is crucial, since SDR systems need to complywith extremely tight timing constraints.
The simplest approach for a multi-standard basebandprocessor is to use a DSP and define the receiving chainin software. However, a traditional DSP is not powerfulenough for the current applications and its power con-sumption can be too high for use in mobile devices [12].Therefore, ad-hoc hardware blocks are needed for theexecution of specific kernels.
Some baseband solutions present the coupling of ageneral-purpose core with dedicated co-processors ableto handle SDR kernels (for example Expresso platformin [1]). In particular, the application-specific acceleratorssupported the Wideband Code Division Multiple Access(W-CDMA [2]) and Orthogonal Frequency Division Mul-tiplexing (OFDM [3]).
A possible alternative is to use a general-purpose co-processor. BUTTER, a coarse-grain reconfigurable array,
Jour
nal o
f Signa
l Pro
cessingSy
stem
s man
uscript N
o.
(will
beins
erted
byth
e edit
or)
Robe
rtoAi
roldi· T
apan
i Aho
nen
· Fab
ioGarzia
· Drago
mir
Milo
jevic
·
Jari
Nur
mi
Implem
entation
ofW
-CDM
ACellSe
arch
onaHighly
Paralle
l and
Scalab
leM
PSo
C
theda
teof
receiptan
dac
cept
ance
shou
ldbe
inserted
later
Abstract
The p
erform
ance
ofth
e W-C
DMA
cell s
earch
algori
thm
can
besig
nifica
ntly
impr
oved
using
homo
ge-
neou
s gen
eral p
urpo
seMult
i-Proces
sor S
ystem
-on-C
hip
(MPS
oC) a
rchite
ctures
. The
appli
catio
n also
scales
well,
asth
enu
mber
ofpr
ocess
ingno
des i
ncrea
ses, a
llowi
ng
prac
tical
accel
eratio
nsto
beco
meclo
seto
the t
heore
ti-
cal m
axim
um. I
n this
work
wede
scribe
a tem
plate
MP-
SoC
archit
ectur
e based
onmu
ltipr
ocess
orco
mput
ation
al
cluste
rs,ca
lled N
inesil
ica. E
ach N
inesil
icaco
nsist
ofnin
e
proc
essing
node
s based
onCO
FFEE
RISC
archit
ectur
e.
MPS
oCint
er-an
dint
ra-clu
ster c
ommu
nicati
onare
en-
abled
using
hierar
chica
l Netw
ork-on
-Chip
with
dedic
ated
point
topo
intan
d broa
dcast c
ommu
nicati
onser
vices
for
bette
r perf
orman
ce.Pr
oposed
templa
teha
s been
used
to
instan
tiate
comp
lete s
ystem
s with
one a
ndfou
r Nine
sil-
icaclu
sters,
result
ingin
MPS
oCs w
ithres
pecti
vely
9 and
36co
mput
ation
alno
des.
The M
PSoC
s hav
e been
physi-
cally
proto
typed
ona F
PGA
devic
e,an
d the
W-C
DMA
cell s
earch
algori
thm
has b
eenma
pped
onbo
thMPS
oC
platfo
rms.
The f
our N
inesil
icaMPS
oCca
nex
ecute
W-
CDMA
in20
.5ms(at
115M
Hz,
slow
mode
imple
menta-
tion)
with
the t
otal s
peed
-upof
24.3X
and
3.3X
when
comp
ared
toa
single
proc
essing
core
syste
man
dto
a
single
Nine
silica
cluste
r resp
ective
ly.
Key
word
sMult
i-Proces
sor S
ystem
-on-C
hip· N
etwork
-
on-C
hip· W
-CDM
A· S
oftwa
reDe
fined
Radio
E=
mc
2
(1)
Aut
hors
wouldlik
eto
than
kSt
ephe
nBu
rgessforhisinva
lu-
able
help.
Rob
erto
Airo
ldi,Ta
pani
Aho
nen,
FabioGarzia,
Jari
Nur
mi
Dep
artm
entof
Com
puterSy
stem
s,Ta
mpe
reUnive
rsity
of
Tech
nology
,Korke
akou
lunk
atu
1,PO
box
553,
FI33
101,
Tampe
re, F
inland
E-mail:fir
stna
me.lastna
me@
tut.fi
Drago
mir
Milo
jevic
Unive
rsité
Libr
ede
Brux
elles
-ULB
,Fa
culty
ofApp
lied
Scienc
es,Bi
o,El
ectro
and
Mecha
nica
lSy
stem
s,ULB
CP-
165/
56, A
v.F.
Roo
seve
lt50
, B-105
0Br
uxelles,
Belgium.
E-mail:Drago
mir.
Milo
jevic@
ulb.ac
.be
1Introd
uctio
n
The e
volut
ionof
mobil
e dev
ices o
ver t
hepa
stye
arsan
d
the
vide
varie
tyof
wirel
essco
nnect
ivity
proto
cols
re-
sulte
d in t
hede
velop
ment
ofth
e Soft
ware
Defin
edRa
dio
(SDR
).Th
istec
hnolo
gywi
llrep
lace t
raditi
onal
wirel
ess
receiv
ers, b
uilt t
o sup
port
only
one r
adio
proto
col s
ince
the R
Fco
mpon
ents
aremo
stof
the t
ime d
esign
edac
-
cord
ingto
only
onetra
nsmi
ssion
frequ
ency.
Thus
, the
baseb
and
proc
essor
supp
orts o
nlyth
ealg
orith
msus
ed
fora s
pecifi
c stan
dard
. In a
n SDR
devic
e,th
e sam
e hard
-
ware
isab
leto
receiv
e sign
alsfro
mdi�
erent
frequ
ency
band
s and
deco
deth
emus
ingdi�
erent
proto
cols.
There
-
fore,
the
same
receiv
erca
nsync
hron
isewi
thdi�
erent
wirel
essne
twork
s(ce
llular
netw
orks,
Wi-F
i,W
iMAX
,
etc...)
. Unfo
rtuna
tely,
prac
tical
desig
n of s
uch d
evice
s is
quite
comp
lex. T
heRF
front-en
d,for
exam
ple, s
hould
beab
leto
receiv
e the
signa
lsus
ingve
rybr
oad f
reque
ncy
spect
rum.
For b
aseb
and
proc
essing
, di�e
rent s
tanda
rds
requir
e imp
lemen
tation
ofdi�
erent
algori
thms
, each w
ith
itsow
n tim
ingco
nstra
ints.W
hile fl
exibi
lity a
ndlow
powe
r
areun
avoid
able
requir
emen
tsfor
such
syste
ms, t
hepe
r-
forma
nce i
s cru
cial,
since
SDR
syste
msne
edto
comp
ly
with
extre
mely
tight
timing
cons
traint
s.
The s
imple
stap
proa
chfor
a mult
i-stan
dard
baseb
and
proc
essor
isto
use a
DSP
and d
efine
the r
eceivi
ngch
ain
insoftw
are. H
owev
er,a t
raditi
onal
DSP
isno
t pow
erful
enou
ghfor
the c
urren
t app
licati
ons a
ndits
powe
r con
-
sump
tion c
anbe
too h
ighfor
use i
n mob
ilede
vices
[12].
There
fore,
ad-ho
cha
rdwa
reblo
cks a
rene
eded
forth
e
execu
tion o
f spe
cific k
ernels
.
Some
baseb
and
solut
ions p
resen
t the
coup
ling
ofa
gene
ral-pu
rpose c
orewi
thde
dicate
dco
-proc
essors
able
toha
ndle
SDR
kern
els(fo
r exa
mple
Expr
essopla
tform
in[1]
).In
parti
cular
, the
appli
catio
n-spe
cific a
cceler
ators
supp
orted
the W
ideband
Code
Divis
ionMult
iple A
ccess
(W-C
DMA
[2]) a
ndOr
thogona
l Freq
uenc
y Divi
sion M
ul-
tiplex
ing(O
FDM
[3]).
Apo
ssible
altern
ative
isto
use a
gene
ral-pu
rpose c
o-
proc
essor.
BUTT
ER, a
coars
e-grai
n reco
nfigu
rable
array,
Journ
alofSig
nalPro
cess
ing
Syst
em
sm
anusc
riptN
o.
(willbe
inse
rted
by
the
editor
)
Robert
oAirold
i·
TapaniAhonen
·Fabio
Garz
ia·
Dra
gom
irM
ilojevic
·
Jari
Nurm
i
Implem
entatio
nofW
-CDM
ACell
Search
on
aH
ighly
Paralleland
Scalable
MPSoC
the
date
ofre
ceip
tand
accepta
nce
should
be
inse
rted
late
r
Abst
ract
Theper
form
ance
ofth
eW
-CDM
Ace
llse
arch
algo
rith
mca
nbe
sign
ifica
ntly
impro
ved
using
hom
oge-
neo
us
gener
alpurp
ose
Multi-Pro
cess
orSyst
em-o
n-C
hip
(MPSoC
)ar
chitec
ture
s.Theap
plica
tion
also
scal
eswell,
asth
enum
ber
ofpro
cess
ing
nodes
incr
ease
s,al
lowin
g
pra
ctical
acce
lera
tion
sto
bec
ome
clos
eto
the
theo
reti-
calm
axim
um
.In
this
wor
kwe
des
crib
ea
tem
pla
teM
P-
SoC
arch
itec
ture
bas
edon
multip
roce
ssor
com
puta
tion
al
clust
ers,
called
Nin
esilica.
Eac
hNin
esilica
consist
ofnin
e
pro
cess
ing
nodes
bas
edon
CO
FFEE
RIS
Car
chitec
ture
.
MPSoC
inte
r-an
din
tra-
clust
erco
mm
unicat
ion
are
en-
abled
usinghiera
rchical
Net
wor
k-o
n-C
hip
with
ded
icat
ed
poi
ntto
poi
ntan
dbro
adca
stco
mm
unicat
ion
serv
ices
for
bet
terper
form
ance
.Pro
pos
edte
mpla
tehas
bee
nuse
dto
inst
antiat
eco
mplete
syst
emswith
one
and
fourNin
esil-
icaclust
ers,
resu
ltin
gin
MPSoC
swith
resp
ective
ly9an
d
36co
mputa
tion
alnodes
.The
MPSoC
shav
ebee
nphysi-
cally
pro
toty
ped
ona
FPG
Adev
ice,
and
the
W-C
DM
A
cell
sear
chal
gorith
mhas
bee
nm
apped
onbot
hM
PSoC
pla
tfor
ms.
The
four
Nin
esilica
MPSoC
can
exec
ute
W-
CDM
Ain
20.5
ms
(at11
5MH
z,slow
modeim
plem
enta
-
tion
)with
the
tota
lsp
eed-u
pof
24.3
Xan
d3.
3Xwhen
com
par
edto
asingl
epro
cess
ing
core
syst
eman
dto
a
singl
eNin
esilica
clust
erre
spec
tive
ly.
Keyword
sM
ulti-Pro
cess
orSyst
em-o
n-C
hip
· Net
wor
k-
on-C
hip
·W
-CDM
A·Sof
twar
eDefi
ned
Rad
io
E=
mc2
(1)
Auth
ors
would
like
toth
ank
Ste
phen
Burg
ess
forhis
invalu
-
able
help
.
Robert
oAirold
i,TapaniAhonen,Fabio
Garz
ia,Jari
Nurm
i
Depart
ment
of
Com
pute
rSyst
em
s,Tam
pere
Univ
ers
ity
of
Technolo
gy,
Kork
eakoulu
nkatu
1,
PO
box
553,
FI
33101,
Tam
pere
,Fin
land
E-m
ail:firs
tnam
e.last
nam
e@
tut.fi
Dra
gom
irM
ilojevic
Univ
ers
ité
Lib
rede
Bru
xelles
-ULB,
Faculty
of
Applied
Sciences,
Bio
,Electr
oand
Mechanical
Syst
em
s,ULB
CP-
165/56,Av.F.Roose
velt
50,B-1
050
Bru
xelles,
Belg
ium
.
E-m
ail:Dra
gom
ir.M
ilojevic@
ulb
.ac.b
e
1In
troduction
The
evol
ution
ofm
obile
dev
ices
over
the
pas
tye
arsan
d
the
vid
eva
riet
yof
wireles
sco
nnec
tivity
pro
toco
lsre
-
sulted
inth
edev
elop
men
tof
theSof
twar
eDefi
ned
Rad
io
(SDR).
This
tech
nol
ogy
willre
pla
cetr
aditio
nal
wireles
s
rece
iver
s,builtto
suppor
ton
lyon
era
dio
pro
toco
lsince
the
RF
com
pon
ents
are
mos
tof
the
tim
edes
igned
ac-
cord
ing
toon
lyon
etr
ansm
ission
freq
uen
cy.Thus,
the
bas
eban
dpro
cess
orsu
ppor
tson
lyth
eal
gorith
ms
use
d
forasp
ecifi
cst
andar
d.In
anSDR
dev
ice,
thesa
mehar
d-
war
eis
able
tore
ceiv
esign
als
from
di�
eren
tfreq
uen
cy
ban
dsan
ddec
odeth
emusing
di�
eren
tpro
toco
ls.Ther
e-
fore
,th
esa
me
rece
iver
can
synch
ronise
with
di�
eren
t
wireles
snet
wor
ks
(cellu
lar
net
wor
ks,
Wi-Fi,
WiM
AX,
etc...).
Unfo
rtunat
ely,
pra
ctical
des
ign
ofsu
chdev
ices
is
quite
com
plex.The
RF
fron
t-en
d,fo
rex
ample,sh
ould
beab
leto
rece
iveth
esign
alsusing
very
bro
adfreq
uen
cy
spec
trum
.For
bas
eban
dpro
cess
ing,
di�
eren
tst
andar
ds
requireim
plem
enta
tion
ofdi�
eren
tal
gorith
ms,
each
with
itsow
ntim
ingco
nst
rain
ts. W
hileflex
ibilityan
dlow
pow
er
are
unav
oidab
lere
quirem
ents
forsu
chsy
stem
s,th
eper
-
form
ance
iscr
ucial
,since
SDR
syst
ems
nee
dto
com
ply
with
extr
emely
tigh
ttim
ing
const
rain
ts.
Thesim
plest
appro
ach
foram
ulti-st
andar
dbas
eban
d
pro
cess
oris
touse
aDSP
and
defi
neth
ere
ceiv
ing
chai
n
inso
ftwar
e.How
ever
,a
trad
itio
nal
DSP
isnot
pow
erfu
l
enou
ghfo
rth
ecu
rren
tap
plica
tion
san
dits
pow
erco
n-
sum
ption
can
be
too
hig
hfo
ruse
inm
obile
dev
ices
[12]
.
Ther
efor
e,ad
-hoc
har
dwar
eblo
cks
are
nee
ded
for
the
exec
ution
ofsp
ecifi
cke
rnels.
Som
ebas
eban
dso
lution
spre
sent
the
coupling
ofa
gener
al-p
urp
ose
core
with
ded
icat
edco
-pro
cess
ors
able
tohan
dle
SDR
kern
els
(for
exam
ple
Expre
sso
pla
tfor
m
in[1
]).In
par
ticu
lar,
theap
plica
tion
-spec
ificac
celera
tors
suppor
ted
theW
ideb
and
Cod
eDivision
Multiple
Acces
s
(W-C
DM
A[2
])an
dOrthog
onalFrequ
ency
Division
Mul-
tiplexing
(OFDM
[3]).
Apos
sible
alte
rnat
iveis
touse
age
ner
al-p
urp
oseco
-
pro
cess
or.BUTTER,aco
arse
-gra
inre
configu
rable
arra
y,
Journal of Signal Processing Systems manuscript No.
(will be inserted by the editor)
Roberto Airoldi · Tapani Ahonen · Fabio Garzia · Dragomir Milojevic ·
Jari NurmiIm
plementation
of W-CDM
ACell Search
ona
Highly
Parallel andScalable M
PSoC
the date of receipt and acceptance should be inserted later
Abstract The performance of the W-CDMA cell search
algorithmcan be significantly improved using homoge-
neous general purpose Multi-Processor System-on-Chip
(MPSoC) architectures. The application also scales well,
as the number of processing nodes increases, allowing
practical accelerations to become close to the theoreti-
cal maximum. In this work we describe a template MP-
SoC architecture based on multiprocessor computational
clusters, called Ninesilica. Each Ninesilica consist of nine
processing nodes based on COFFEE RISC architecture.
MPSoCinter- and intra-cluster communication are en-
abled using hierarchical Network-on-Chip with dedicated
point to point and broadcast communication services for
better performance. Proposed template has been used to
instantiate complete systems with one and four Ninesil-
ica clusters, resulting in MPSoCs with respectively 9 and
36 computational nodes. The MPSoCs have been physi-
cally prototyped on a FPGA device, and the W-CDMA
cell search algorithmhas been mapped on both MPSoC
platforms. The four Ninesilica MPSoCcan execute W-
CDMA in 20.5ms (at 115MHz, slow mode implementa-
tion) with the total speed-up of 24.3Xand 3.3X
when
compared to a single processing core systemand to a
single Ninesilica cluster respectively.
Keywords Multi-Processor System-on-Chip · Network-
on-Chip · W-CDMA · Software Defined Radio
E =mc 2
(1)
Authors wouldlike to thank Stephen
Burgess for his invalu-
able help.Roberto Airoldi, Tapani Ahonen, Fabio Garzia, Jari Nurmi
Departmentof Computer
Systems, TampereUniversity
of
Technology, Korkeakoulunkatu1, PO
box553, FI
33101,
Tampere, Finland
E-mail: [email protected]
Dragomir Milojevic
UniversitéLibre
deBruxelles
-ULB, Faculty
of Applied
Sciences, Bio, Electroand
Mechanical Systems, ULB
CP-
165/56, Av. F. Roosevelt 50, B-1050 Bruxelles, Belgium.
E-mail: [email protected]
1 Introduction
The evolution of mobile devices over the past years and
the vide variety of wireless connectivity protocols re-
sulted in the development of the Software Defined Radio
(SDR). This technology will replace traditional wireless
receivers, built to support only one radio protocol since
the RF components are most of the time designed ac-
cording to only one transmission frequency. Thus, the
baseband processor supports only the algorithms used
for a specific standard. In an SDR device, the same hard-
ware is able to receive signals fromdi�erent frequency
bands and decode themusing di�erent protocols. There-
fore, the same receiver can synchronise with di�erent
wireless networks (cellular networks, Wi-Fi, WiMAX,
etc...). Unfortunately, practical design of such devices is
quite complex. The RF front-end, for example, should
be able to receive the signals using very broad frequency
spectrum. For baseband processing, di�erent standards
require implementation of di�erent algorithms, each with
its own timing constraints. While flexibility and low power
are unavoidable requirements for such systems, the per-
formance is crucial, since SDRsystems need to comply
with extremely tight timing constraints.
The simplest approach for a multi-standard baseband
processor is to use a DSP and define the receiving chain
in software. However, a traditional DSP is not powerful
enough for the current applications and its power con-
sumption can be too high for use in mobile devices [12].
Therefore, ad-hoc hardware blocks are needed for the
execution of specific kernels.
Some baseband solutions present the coupling of a
general-purpose core with dedicated co-processors able
to handle SDRkernels (for example Expresso platform
in [1]). In particular, the application-specific accelerators
supported the Wideband Code Division Multiple Access
(W-CDMA [2]) and Orthogonal Frequency Division Mul-
tiplexing (OFDM[3]).
A possible alternative is to use a general-purpose co-
processor. BUTTER, a coarse-grain reconfigurable array,
Journ
alof
Signa
l Proc
essing
Syste
msma
nuscr
iptNo
.
(will b
e inser
tedby
theedi
tor)
Robe
rtoAi
roldi
· Tap
ani A
hone
n· F
abio
Garzi
a ·Dr
agomi
r Milo
jevic
·
Jari N
urmi
Imple
ment
ation
ofW
-CDM
ACe
llSe
arch
ona
High
ly
Para
llel a
ndSc
alable
MPS
oC
theda
teof
receip
t and
accep
tance
shou
ldbe
insert
edlat
er
Abstr
actTh
e perfo
rmanc
e of th
e W-CD
MAcell
search
algorit
hmcan
besign
ificant
lyimp
roved
using
homoge
-
neous
genera
l purp
oseMu
lti-Pro
cessor
System
-on-Ch
ip
(MPS
oC) a
rchitec
tures.
The a
pplica
tionalso
scales
well,
asthe
number
ofpro
cessin
g node
s incre
ases, a
llowing
practic
alacc
eleratio
nsto
becom
e close
tothe
theore
ti-
calma
ximum
. Inthi
s work
wedes
cribe a
templa
te MP-
SoCarc
hitect
urebas
edon
multip
rocess
or com
putatio
nal
cluste
rs,cal
ledNin
esilica
. Each
Ninesil
icacon
sistof n
ine
proces
sing n
odes b
ased o
n COF
FEE R
ISCarc
hitect
ure.
MPSoC
inter-
andint
ra-clu
ster c
ommu
nicatio
n are
en-
abled
using
hierar
chical
Netwo
rk-on-
Chip w
ithded
icated
point
to poin
t and b
roadca
st com
munic
ation s
ervices
for
better
perfor
mance
. Prop
osed t
empla
te has b
eenuse
d to
instan
tiate c
omple
tesys
tems w
ithone
andfou
r Nine
sil-
icaclu
sters,
result
ingin M
PSoC
s with
respec
tively 9
and
36com
putatio
nalnod
es.Th
e MPS
oCs h
avebee
n phys
i-
cally p
rototy
pedon
a FPG
A devi
ce,and
theW-
CDMA
cellsea
rchalg
orithm
hasbee
n mapp
edon
both M
PSoC
platfo
rms. T
hefou
r Nine
silica
MPSoC
canexe
cute W
-
CDMA
in 20.5
ms(at
115M
Hz, sl
owmo
deimp
lement
a-
tion) w
iththe
total s
peed-u
p of 2
4.3X
and3.3
Xwh
en
compar
edto
a sing
lepro
cessin
g core
system
andto
a
single
Ninesil
icaclu
ster re
spectiv
ely.
Keyw
ords M
ulti-P
rocess
or Syst
em-on
-Chip ·
Netwo
rk-
on-Ch
ip ·W-
CDMA
· Softw
areDe
fined R
adio
E= m
c2
(1)
Autho
rswo
uldlik
e to t
hank
Steph
enBu
rgess
forhis
invalu
-
able
help.
Robe
rtoAi
roldi,
Tapa
niAh
onen
, Fab
ioGa
rzia,
Jari
Nurm
i
Depa
rtmen
t of C
ompu
terSy
stems
, Tam
pere
Unive
rsity
of
Tech
nolog
y,Ko
rkeak
oulun
katu
1,PO
box
553,
FI33
101,
Tamp
ere, F
inlan
d
E-ma
il:firs
tname
.lastn
ame@
tut.fi
Drag
omir
Miloj
evic
Unive
rsité
Libre
deBr
uxell
es- U
LB, F
aculty
ofAp
plied
Scien
ces, B
io,El
ectro
and
Mech
anica
l Syst
ems,
ULB
CP-
165/
56, A
v.F.
Roose
velt 5
0,B-
1050
Brux
elles,
Belgi
um.
E-ma
il:Dr
agom
ir.Milo
jevic@
ulb.ac
.be
1 Intr
oduct
ion
The e
voluti
onof m
obile d
evices
over th
e past
years a
nd
thevid
e varie
tyof
wireles
s conn
ectivit
y prot
ocols
re-
sulted
in the
develo
pment
of the
Softwa
re Defin
edRa
dio
(SDR).
This t
echnol
ogywil
l repla
cetra
dition
al wirel
ess
receiv
ers, b
uiltto
suppor
t only
onerad
io prot
ocol si
nce
theRF
compon
ents a
remo
stof
thetim
e desig
nedac-
cordin
g to o
nlyone
transm
ission
frequen
cy.Th
us,the
baseba
ndpro
cessor
suppor
tsonl
y the
algorit
hms u
sed
fora sp
ecific st
andard
. Inan
SDR d
evice,
thesam
e hard
-
ware
is able
torec
eive s
ignals
from
di�ere
ntfreq
uency
bands
anddec
odethe
m usin
g di�e
rent p
rotoco
ls.Th
ere-
fore,
thesam
e rece
iver c
ansyn
chroni
sewit
h di�e
rent
wireles
s netw
orks (
cellula
r netw
orks,
Wi-Fi
, WiM
AX,
etc...)
. Unfo
rtunat
ely,pra
ctical d
esign o
f such
devices
is
quite
comple
x.Th
e RF f
ront-en
d,for
examp
le,sho
uld
beabl
e torec
eive th
e signa
ls usin
g very
broad
frequen
cy
spectr
um. F
orbas
eband
proces
sing, d
i�eren
t stan
dards
requir
e imple
menta
tionof d
i�eren
t algor
ithms
, each
with
itsow
n timin
g const
raints.
While
flexibil
ityand
lowpow
er
areuna
voidab
le requ
irement
s for su
chsys
tems, t
heper
-
forma
nceis c
rucial,
since
SDR s
ystem
s need
tocom
ply
with e
xtrem
elytigh
t timin
g cons
traint
s.
The si
mplest
approa
chfor
a mult
i-stand
ardbas
eband
proces
soris t
o use a
DSP a
nddefi
nethe
receiv
ingcha
in
insof
tware.
Howev
er,a t
raditio
nalDS
P is n
otpow
erful
enough
forthe
curren
t appl
ication
s and
itspow
ercon
-
sumpti
oncan
betoo
high f
oruse
in mobi
le devi
ces[12
].
There
fore,
ad-hoc
hardw
areblo
cksare
needed
forthe
execut
ionof s
pecific
kernel
s.
Some b
aseban
d solu
tions p
resent
thecou
pling o
f a
genera
l-purpo
secor
e with
dedica
tedco-
proces
sors a
ble
tohan
dleSD
R kern
els(fo
r exam
pleEx
presso
platfo
rm
in [1]).
Inpar
ticular,
theapp
lication
-specifi
c acce
lerator
s
suppor
tedthe
Wideb
andCo
deDiv
ision M
ultiple
Access
(W-CD
MA[2])
andOr
thogon
al Freq
uency
Divisio
n Mul-
tiplexi
ng(OF
DM[3])
.
A poss
iblealte
rnative
is to u
se agen
eral-pu
rpose c
o-
proces
sor. B
UTTE
R,a c
oarse-g
rainrec
onfigur
able a
rray,
Journal
of Signa
l Proces
singSyst
emsman
uscript
No.
(willbe in
serted by
the editor
)
Roberto
Airoldi
· Tapan
i Ahonen
· Fabio
Garzia
· Dragom
ir Miloje
vic·
JariNur
mi
Implem
entatio
n of W
-CDMA
Cell Se
archon
a High
ly
Paralle
l and S
calable
MPSoC
thedate
of recei
pt and a
cceptanc
e should
be inser
tedlate
r
Abstrac
t The per
formance
of the W-
CDMA ce
ll search
algorithm
canbe s
ignificant
ly improv
ed using
homoge-
neous gen
eralpurp
ose Multi-
Processor
System-o
n-Chip
(MPSoC)
architect
ures.The
applicati
on also s
caleswell,
as the nu
mber of
processin
g nodes
increases,
allowing
practical
accelerat
ionsto b
ecome clo
se tothe t
heoreti-
cal maxim
um.In th
is work w
e describe
a template
MP-
SoCarch
itecture b
asedon m
ultiproce
ssorcomp
utationa
l
clusters, c
alledNine
silica. Eac
h Ninesil
ica consis
t of nine
processin
g nodes b
asedon C
OFFEE R
ISCarch
itecture.
MPSoC i
nter-and
intra-clus
ter comm
unication
are en-
abledusing
hierarchic
al Networ
k-on-Chi
p with ded
icated
point to
point and
broadcas
t commun
ication se
rvices for
better per
formance
. Propose
d template
has been u
sed to
instantiat
e complete
systems w
ith one a
nd four N
inesil-
ica cluste
rs, resulti
ng inMPS
oCswith
respectiv
ely 9and
36 compu
tational n
odes. Th
e MPSoCs
havebeen
physi-
callyprot
otyped o
n a FPGA
device, an
d the W
-CDMA
cell search
algorithm
has been m
apped on
bothMPS
oC
platforms
. The fou
r Ninesil
ica MPSoC
canexec
ute W-
CDMA in
20.5ms (at
115MHz,
slowmod
e implem
enta-
tion)with
the total
speed-up
of 24.3X
and3.3X
when
compared
to asingl
e proces
singcore
system an
d toa
single Nin
esilica clu
sterresp
ectively.
Keywor
ds Multi-
Processor
System-o
n-Chip · N
etwork-
on-Chip ·
W-CDMA
· Softwar
e Defined
Radio
E =mc
2
(1)
Authors
would l
iketo t
hank St
ephen B
urgess fo
r his in
valu-
ablehelp
.
Roberto
Airoldi,
Tapani
Ahonen
, Fabio
Garzia,
JariNur
mi
Departm
entof C
omputer
System
s, Tamp
ereUni
versity
of
Techno
logy, K
orkeako
ulunkatu
1, PO b
ox553
, FI 33
101,
Tampere
, Finlan
d
E-mail:
firstnam
e.lastna
me@tut.
fi
Dragom
ir Miloje
vic
Univers
itéLib
re de B
ruxelles
- ULB,
Faculty
of Appli
ed
Sciences
, Bio,
Electro
andMec
hanical
System
s, ULB
CP-
165/56
, Av. F.
Rooseve
lt 50, B
-1050 B
ruxelles
, Belgiu
m.
E-mail:
Dragom
ir.Miloje
vic@ulb.
ac.be
1 Introd
uction
Theevolu
tionof m
obiledevic
es over th
e past ye
ars and
thevide
variety o
f wireles
s connec
tivity pr
otocols r
e-
sulted in
the develo
pment of
the Softw
are Define
d Radio
(SDR). Th
is techno
logywill
replace tr
aditional
wireless
receivers,
builtto su
pport onl
y one rad
io protoc
ol since
theRF
componen
ts are mo
st ofthe
timedesig
nedac-
cording t
o only on
e transmi
ssionfrequ
ency. Th
us, the
baseband
processor
supports
onlythe
algorithm
s used
for aspec
ific stand
ard.In an
SDRdevic
e, the sam
e hard-
wareis ab
le torecei
ve signal
s from d
i�erent fr
equency
bands and
decode th
em using
di�erent
protocols
. There-
fore,the
samerecei
vercan
synchron
ise with d
i�erent
wireless
networks
(cellular
networks
, Wi-Fi,
WiMAX,
etc...). U
nfortunat
ely, practi
cal design
of such d
evices is
quitecomp
lex.The
RFfront
-end, for
example,
should
be able t
o receive
the signa
ls using v
ery broad
frequency
spectrum
. Forbase
bandproc
essing, di
�erent st
andards
require im
plementa
tionof di
�erent al
gorithms,
eachwith
its own tim
ing const
raints. W
hile flexibi
lity and lo
w power
are unavo
idable re
quirement
s forsuch
systems,
the per-
formance
is crucial
, since S
DRsyste
ms need
to comply
withextre
melytight
timing con
straints.
Thesimp
lest appro
ach for a
multi-sta
ndard ba
seband
processor
is touse a
DSPand
define the
receiving
chain
in softwa
re. Howev
er, atrad
itional D
SP is not
powerful
enough fo
r thecurre
nt applic
ations an
d itspowe
r con-
sumption
canbe to
o high for
use in mo
biledevic
es [12].
Therefore
, ad-hoc
hardware
blocks are
needed fo
r the
execution
of specific
kernels.
Some ba
seband s
olutions
present th
e couplin
g ofa
general-p
urpose co
re with d
edicated
co-proce
ssorsable
to handl
e SDR ke
rnels(for
example E
xpresso p
latform
in [1]). In
particula
r, the app
lication-s
pecific acc
elerators
supported
the Wideba
nd Code
Division
Multiple
Access
(W-CDMA
[2]) and O
rthogonal
Frequenc
y Divisio
n Mul-
tiplexing
(OFDM
[3]).
A possibl
e alterna
tiveis to
use agene
ral-purpo
se co-
processor
. BUTTE
R, acoar
se-grain r
econfigur
ablearray
,that prepared a BIG question for XXth century:
“Could maths be automatized?”
6
-
Université libre de Bruxelles/Faculté des Sciences Appliquées/BEAMS/MILOJEVIC Dragomir
BIG QUESTION got an answer: BIG NO!Kurt Gödel, 1931
Journal of Signal Processing Systems manuscript No.(will be inserted by the editor)
Roberto Airoldi · Tapani Ahonen · Fabio Garzia · Dragomir Milojevic ·Jari Nurmi
Implementation of W-CDMA Cell Search on a HighlyParallel and Scalable MPSoC
the date of receipt and acceptance should be inserted later
Abstract The performance of the W-CDMA cell searchalgorithm can be significantly improved using homoge-neous general purpose Multi-Processor System-on-Chip(MPSoC) architectures. The application also scales well,as the number of processing nodes increases, allowingpractical accelerations to become close to the theoreti-cal maximum. In this work we describe a template MP-SoC architecture based on multiprocessor computationalclusters, called Ninesilica. Each Ninesilica consist of nineprocessing nodes based on COFFEE RISC architecture.MPSoC inter- and intra-cluster communication are en-abled using hierarchical Network-on-Chip with dedicatedpoint to point and broadcast communication services forbetter performance. Proposed template has been used toinstantiate complete systems with one and four Ninesil-ica clusters, resulting in MPSoCs with respectively 9 and36 computational nodes. The MPSoCs have been physi-cally prototyped on a FPGA device, and the W-CDMAcell search algorithm has been mapped on both MPSoCplatforms. The four Ninesilica MPSoC can execute W-CDMA in 20.5ms (at 115MHz, slow mode implementa-tion) with the total speed-up of 24.3X and 3.3X whencompared to a single processing core system and to asingle Ninesilica cluster respectively.
Keywords Multi-Processor System-on-Chip · Network-on-Chip · W-CDMA · Software Defined Radio
E = mc2 (1)
Authors would like to thank Stephen Burgess for his invalu-able help.
Roberto Airoldi, Tapani Ahonen, Fabio Garzia, Jari NurmiDepartment of Computer Systems, Tampere University ofTechnology, Korkeakoulunkatu 1, PO box 553, FI 33101,Tampere, FinlandE-mail: [email protected]
Dragomir MilojevicUniversité Libre de Bruxelles - ULB, Faculty of AppliedSciences, Bio, Electro and Mechanical Systems, ULB CP-165/56, Av. F. Roosevelt 50, B-1050 Bruxelles, Belgium.E-mail: [email protected]
1 Introduction
The evolution of mobile devices over the past years andthe vide variety of wireless connectivity protocols re-sulted in the development of the Software Defined Radio(SDR). This technology will replace traditional wirelessreceivers, built to support only one radio protocol sincethe RF components are most of the time designed ac-cording to only one transmission frequency. Thus, thebaseband processor supports only the algorithms usedfor a specific standard. In an SDR device, the same hard-ware is able to receive signals from di�erent frequencybands and decode them using di�erent protocols. There-fore, the same receiver can synchronise with di�erentwireless networks (cellular networks, Wi-Fi, WiMAX,etc...). Unfortunately, practical design of such devices isquite complex. The RF front-end, for example, shouldbe able to receive the signals using very broad frequencyspectrum. For baseband processing, di�erent standardsrequire implementation of di�erent algorithms, each withits own timing constraints. While flexibility and low powerare unavoidable requirements for such systems, the per-formance is crucial, since SDR systems need to complywith extremely tight timing constraints.
The simplest approach for a multi-standard basebandprocessor is to use a DSP and define the receiving chainin software. However, a traditional DSP is not powerfulenough for the current applications and its power con-sumption can be too high for use in mobile devices [12].Therefore, ad-hoc hardware blocks are needed for theexecution of specific kernels.
Some baseband solutions present the coupling of ageneral-purpose core with dedicated co-processors ableto handle SDR kernels (for example Expresso platformin [1]). In particular, the application-specific acceleratorssupported the Wideband Code Division Multiple Access(W-CDMA [2]) and Orthogonal Frequency Division Mul-tiplexing (OFDM [3]).
A possible alternative is to use a general-purpose co-processor. BUTTER, a coarse-grain reconfigurable array,
Jour
nal o
f Signa
l Pro
cessingSy
stem
s man
uscript N
o.
(will
beins
erted
byth
e edit
or)
Robe
rtoAi
roldi· T
apan
i Aho
nen
· Fab
ioGarzia
· Drago
mir
Milo
jevic
·
Jari
Nur
mi
Implem
entation
ofW
-CDM
ACellSe
arch
onaHighly
Paralle
l and
Scalab
leM
PSo
C
theda
teof
receiptan
dac
cept
ance
shou
ldbe
inserted
later
Abstract
The p
erform
ance
ofth
e W-C
DMA
cell s
earch
algori
thm
can
besig
nifica
ntly
impr
oved
using
homo
ge-
neou
s gen
eral p
urpo
seMult
i-Proces
sor S
ystem
-on-C
hip
(MPS
oC) a
rchite
ctures
. The
appli
catio
n also
scales
well,
asth
enu
mber
ofpr
ocess
ingno
des i
ncrea
ses, a
llowi
ng
prac
tical
accel
eratio
nsto
beco
meclo
seto
the t
heore
ti-
cal m
axim
um. I
n this
work
wede
scribe
a tem
plate
MP-
SoC
archit
ectur
e based
onmu
ltipr
ocess
orco
mput
ation
al
cluste
rs,ca
lled N
inesil
ica. E
ach N
inesil
icaco
nsist
ofnin
e
proc
essing
node
s based
onCO
FFEE
RISC
archit
ectur
e.
MPS
oCint
er-an
dint
ra-clu
ster c
ommu
nicati
onare
en-
abled
using
hierar
chica
l Netw
ork-on
-Chip
with
dedic
ated
point
topo
intan
d broa
dcast c
ommu
nicati
onser
vices
for
bette
r perf
orman
ce.Pr
oposed
templa
teha
s been
used
to
instan
tiate
comp
lete s
ystem
s with
one a
ndfou
r Nine
sil-
icaclu
sters,
result
ingin
MPS
oCs w
ithres
pecti
vely
9 and
36co
mput
ation
alno
des.
The M
PSoC
s hav
e been
physi-
cally
proto
typed
ona F
PGA
devic
e,an
d the
W-C
DMA
cell s
earch
algori
thm
has b
eenma
pped
onbo
thMPS
oC
platfo
rms.
The f
our N
inesil
icaMPS
oCca
nex
ecute
W-
CDMA
in20
.5ms(at
115M
Hz,
slow
mode
imple
menta-
tion)
with
the t
otal s
peed
-upof
24.3X
and
3.3X
when
comp
ared
toa
single
proc
essing
core
syste
man
dto
a
single
Nine
silica
cluste
r resp
ective
ly.
Key
word
sMult
i-Proces
sor S
ystem
-on-C
hip· N
etwork
-
on-C
hip· W
-CDM
A· S
oftwa
reDe
fined
Radio
E=
mc
2
(1)
Aut
hors
wouldlik
eto
than
kSt
ephe
nBu
rgessforhisinva
lu-
able
help.
Rob
erto
Airo
ldi,Ta
pani
Aho
nen,
FabioGarzia,
Jari
Nur
mi
Dep
artm
entof
Com
puterSy
stem
s,Ta
mpe
reUnive
rsity
of
Tech
nology
,Korke
akou
lunk
atu
1,PO
box
553,
FI33
101,
Tampe
re, F
inland
E-mail:fir
stna
me.lastna
me@
tut.fi
Drago
mir
Milo
jevic
Unive
rsité
Libr
ede
Brux
elles
-ULB
,Fa
culty
ofApp
lied
Scienc
es,Bi
o,El
ectro
and
Mecha
nica
lSy
stem
s,ULB
CP-
165/
56, A
v.F.
Roo
seve
lt50
, B-105
0Br
uxelles,
Belgium.
E-mail:Drago
mir.
Milo
jevic@
ulb.ac
.be
1Introd
uctio
n
The e
volut
ionof
mobil
e dev
ices o
ver t
hepa
stye
arsan
d
the
vide
varie
tyof
wirel
essco
nnect
ivity
proto
cols
re-
sulte
d in t
hede
velop
ment
ofth
e Soft
ware
Defin
edRa
dio
(SDR
).Th
istec
hnolo
gywi
llrep
lace t
raditi
onal
wirel
ess
receiv
ers, b
uilt t
o sup
port
only
one r
adio
proto
col s
ince
the R
Fco
mpon
ents
aremo
stof
the t
ime d
esign
edac
-
cord
ingto
only
onetra
nsmi
ssion
frequ
ency.
Thus
, the
baseb
and
proc
essor
supp
orts o
nlyth
ealg
orith
msus
ed
fora s
pecifi
c stan
dard
. In a
n SDR
devic
e,th
e sam
e hard
-
ware
isab
leto
receiv
e sign
alsfro
mdi�
erent
frequ
ency
band
s and
deco
deth
emus
ingdi�
erent
proto
cols.
There
-
fore,
the
same
receiv
erca
nsync
hron
isewi
thdi�
erent
wirel
essne
twork
s(ce
llular
netw
orks,
Wi-F
i,W
iMAX
,
etc...)
. Unfo
rtuna
tely,
prac
tical
desig
n of s
uch d
evice
s is
quite
comp
lex. T
heRF
front-en
d,for
exam
ple, s
hould
beab
leto
receiv
e the
signa
lsus
ingve
rybr
oad f
reque
ncy
spect
rum.
For b
aseb
and
proc
essing
, di�e
rent s
tanda
rds
requir
e imp
lemen
tation
ofdi�
erent
algori
thms
, each w
ith
itsow
n tim
ingco
nstra
ints.W
hile fl
exibi
lity a
ndlow
powe
r
areun
avoid
able
requir
emen
tsfor
such
syste
ms, t
hepe
r-
forma
nce i
s cru
cial,
since
SDR
syste
msne
edto
comp
ly
with
extre
mely
tight
timing
cons
traint
s.
The s
imple
stap
proa
chfor
a mult
i-stan
dard
baseb
and
proc
essor
isto
use a
DSP
and d
efine
the r
eceivi
ngch
ain
insoftw
are. H
owev
er,a t
raditi
onal
DSP
isno
t pow
erful
enou
ghfor
the c
urren
t app
licati
ons a
ndits
powe
r con
-
sump
tion c
anbe
too h
ighfor
use i
n mob
ilede
vices
[12].
There
fore,
ad-ho
cha
rdwa
reblo
cks a
rene
eded
forth
e
execu
tion o
f spe
cific k
ernels
.
Some
baseb
and
solut
ions p
resen
t the
coup
ling
ofa
gene
ral-pu
rpose c
orewi
thde
dicate
dco
-proc
essors
able
toha
ndle
SDR
kern
els(fo
r exa
mple
Expr
essopla
tform
in[1]
).In
parti
cular
, the
appli
catio
n-spe
cific a
cceler
ators
supp
orted
the W
ideband
Code
Divis
ionMult
iple A
ccess
(W-C
DMA
[2]) a
ndOr
thogona
l Freq
uenc
y Divi
sion M
ul-
tiplex
ing(O
FDM
[3]).
Apo
ssible
altern
ative
isto
use a
gene
ral-pu
rpose c
o-
proc
essor.
BUTT
ER, a
coars
e-grai
n reco
nfigu
rable
array,
Journ
alofSig
nalPro
cess
ing
Syst
em
sm
anusc
riptN
o.
(willbe
inse
rted
by
the
editor
)
Robert
oAirold
i·
TapaniAhonen
·Fabio
Garz
ia·
Dra
gom
irM
ilojevic
·
Jari
Nurm
i
Implem
entatio
nofW
-CDM
ACell
Search
on
aH
ighly
Paralleland
Scalable
MPSoC
the
date
ofre
ceip
tand
accepta
nce
should
be
inse
rted
late
r
Abst
ract
Theper
form
ance
ofth
eW
-CDM
Ace
llse
arch
algo
rith
mca
nbe
sign
ifica
ntly
impro
ved
using
hom
oge-
neo
us
gener
alpurp
ose
Multi-Pro
cess
orSyst
em-o
n-C
hip
(MPSoC
)ar
chitec
ture
s.Theap
plica
tion
also
scal
eswell,
asth
enum
ber
ofpro
cess
ing
nodes
incr
ease
s,al
lowin
g
pra
ctical
acce
lera
tion
sto
bec
ome
clos
eto
the
theo
reti-
calm
axim
um
.In
this
wor
kwe
des
crib
ea
tem
pla
teM
P-
SoC
arch
itec
ture
bas
edon
multip
roce
ssor
com
puta
tion
al
clust
ers,
called
Nin
esilica.
Eac
hNin
esilica
consist
ofnin
e
pro
cess
ing
nodes
bas
edon
CO
FFEE
RIS
Car
chitec
ture
.
MPSoC
inte
r-an
din
tra-
clust
erco
mm
unicat
ion
are
en-
abled
usinghiera
rchical
Net
wor
k-o
n-C
hip
with
ded
icat
ed
poi
ntto
poi
ntan
dbro
adca
stco
mm
unicat
ion
serv
ices
for
bet
terper
form
ance
.Pro
pos
edte
mpla
tehas
bee
nuse
dto
inst
antiat
eco
mplete
syst
emswith
one
and
fourNin
esil-
icaclust
ers,
resu
ltin
gin
MPSoC
swith
resp
ective
ly9an
d
36co
mputa
tion
alnodes
.The
MPSoC
shav
ebee
nphysi-
cally
pro
toty
ped
ona
FPG
Adev
ice,
and
the
W-C
DM
A
cell
sear
chal
gorith
mhas
bee
nm
apped
onbot
hM
PSoC
pla
tfor
ms.
The
four
Nin
esilica
MPSoC
can
exec
ute
W-
CDM
Ain
20.5
ms
(at11
5MH
z,slow
modeim
plem
enta
-
tion
)with
the
tota
lsp
eed-u
pof
24.3
Xan
d3.
3Xwhen
com
par
edto
asingl
epro
cess
ing
core
syst
eman
dto
a
singl
eNin
esilica
clust
erre
spec
tive
ly.
Keyword
sM
ulti-Pro
cess
orSyst
em-o
n-C
hip
· Net
wor
k-
on-C
hip
·W
-CDM
A·Sof
twar
eDefi
ned
Rad
io
E=
mc2
(1)
Auth
ors
would
like
toth
ank
Ste
phen
Burg
ess
forhis
invalu
-
able
help
.
Robert
oAirold
i,TapaniAhonen,Fabio
Garz
ia,Jari
Nurm
i
Depart
ment
of
Com
pute
rSyst
em
s,Tam
pere
Univ
ers
ity
of
Technolo
gy,
Kork
eakoulu
nkatu
1,
PO
box
553,
FI
33101,
Tam
pere
,Fin
land
E-m
ail:firs
tnam
e.last
nam
e@
tut.fi
Dra
gom
irM
ilojevic
Univ
ers
ité
Lib
rede
Bru
xelles
-ULB,
Faculty
of
Applied
Sciences,
Bio
,Electr
oand
Mechanical
Syst
em
s,ULB
CP-
165/56,Av.F.Roose
velt
50,B-1
050
Bru
xelles,
Belg
ium
.
E-m
ail:Dra
gom
ir.M
ilojevic@
ulb
.ac.b
e
1In
troduction
The
evol
ution
ofm
obile
dev
ices
over
the
pas
tye
arsan
d
the
vid
eva
riet
yof
wireles
sco
nnec
tivity
pro
toco
lsre
-
sulted
inth
edev
elop
men
tof
theSof
twar
eDefi
ned
Rad
io
(SDR).
This
tech
nol
ogy
willre
pla
cetr
aditio
nal
wireles
s
rece
iver
s,builtto
suppor
ton
lyon
era
dio
pro
toco
lsince
the
RF
com
pon
ents
are
mos
tof
the
tim
edes
igned
ac-
cord
ing
toon
lyon
etr
ansm
ission
freq
uen
cy.Thus,
the
bas
eban
dpro
cess
orsu
ppor
tson
lyth
eal
gorith
ms
use
d
forasp
ecifi
cst
andar
d.In
anSDR
dev
ice,
thesa
mehar
d-
war
eis
able
tore
ceiv
esign
als
from
di�
eren
tfreq
uen
cy
ban
dsan
ddec
odeth
emusing
di�
eren
tpro
toco
ls.Ther
e-
fore
,th
esa
me
rece
iver
can
synch
ronise
with
di�
eren
t
wireles
snet
wor
ks
(cellu
lar
net
wor
ks,
Wi-Fi,
WiM
AX,
etc...).
Unfo
rtunat
ely,
pra
ctical
des
ign
ofsu
chdev
ices
is
quite
com
plex.The
RF
fron
t-en
d,fo
rex
ample,sh
ould
beab
leto
rece
iveth
esign
alsusing
very
bro
adfreq
uen
cy
spec
trum
.For
bas
eban
dpro
cess
ing,
di�
eren
tst
andar
ds
requireim
plem
enta
tion
ofdi�
eren
tal
gorith
ms,
each
with
itsow
ntim
ingco
nst
rain
ts. W
hileflex
ibilityan
dlow
pow
er
are
unav
oidab
lere
quirem
ents
forsu
chsy
stem
s,th
eper
-
form
ance
iscr
ucial
,since
SDR
syst
ems
nee
dto
com
ply
with
extr
emely
tigh
ttim
ing
const
rain
ts.
Thesim
plest
appro
ach
foram
ulti-st
andar
dbas
eban
d
pro
cess
oris
touse
aDSP
and
defi
neth
ere
ceiv
ing
chai
n
inso
ftwar
e.How
ever
,a
trad
itio
nal
DSP
isnot
pow
erfu
l
enou
ghfo
rth
ecu
rren
tap
plica
tion
san
dits
pow
erco
n-
sum
ption
can
be
too
hig
hfo
ruse
inm
obile
dev
ices
[12]
.
Ther
efor
e,ad
-hoc
har
dwar
eblo
cks
are
nee
ded
for
the
exec
ution
ofsp
ecifi
cke
rnels.
Som
ebas
eban
dso
lution
spre
sent
the
coupling
ofa
gener
al-p
urp
ose
core
with
ded
icat
edco
-pro
cess
ors
able
tohan
dle
SDR
kern
els
(for
exam
ple
Expre
sso
pla
tfor
m
in[1
]).In
par
ticu
lar,
theap
plica
tion
-spec
ificac
celera
tors
suppor
ted
theW
ideb
and
Cod
eDivision
Multiple
Acces
s
(W-C
DM
A[2
])an
dOrthog
onalFrequ
ency
Division
Mul-
tiplexing
(OFDM
[3]).
Apos
sible
alte
rnat
iveis
touse
age
ner
al-p
urp
oseco
-
pro
cess
or.BUTTER,aco
arse
-gra
inre
configu
rable
arra
y,
Journal of Signal Processing Systems manuscript No.
(will be inserted by the editor)
Roberto Airoldi · Tapani Ahonen · Fabio Garzia · Dragomir Milojevic ·
Jari NurmiIm
plementation
of W-CDM
ACell Search
ona
Highly
Parallel andScalable M
PSoC
the date of receipt and acceptance should be inserted later
Abstract The performance of the W-CDMA cell search
algorithmcan be significantly improved using homoge-
neous general purpose Multi-Processor System-on-Chip
(MPSoC) architectures. The application also scales well,
as the number of processing nodes increases, allowing
practical accelerations to become close to the theoreti-
cal maximum. In this work we describe a template MP-
SoC architecture based on multiprocessor computational
clusters, called Ninesilica. Each Ninesilica consist of nine
processing nodes based on COFFEE RISC architecture.
MPSoCinter- and intra-cluster communication are en-
abled using hierarchical Network-on-Chip with dedicated
point to point and broadcast communication services for
better performance. Proposed template has been used to
instantiate complete systems with one and four Ninesil-
ica clusters, resulting in MPSoCs with respectively 9 and
36 computational nodes. The MPSoCs have been physi-
cally prototyped on a FPGA device, and the W-CDMA
cell search algorithmhas been mapped on both MPSoC
platforms. The four Ninesilica MPSoCcan execute W-
CDMA in 20.5ms (at 115MHz, slow mode implementa-
tion) with the total speed-up of 24.3Xand 3.3X
when
compared to a single processing core systemand to a
single Ninesilica cluster respectively.
Keywords Multi-Processor System-on-Chip · Network-
on-Chip · W-CDMA · Software Defined Radio
E =mc 2
(1)
Authors wouldlike to thank Stephen
Burgess for his invalu-
able help.Roberto Airoldi, Tapani Ahonen, Fabio Garzia, Jari Nurmi
Departmentof Computer
Systems, TampereUniversity
of
Technology, Korkeakoulunkatu1, PO
box553, FI
33101,
Tampere, Finland
E-mail: [email protected]
Dragomir Milojevic
UniversitéLibre
deBruxelles
-ULB, Faculty
of Applied
Sciences, Bio, Electroand
Mechanical Systems, ULB
CP-
165/56, Av. F. Roosevelt 50, B-1050 Bruxelles, Belgium.
E-mail: [email protected]
1 Introduction
The evolution of mobile devices over the past years and
the vide variety of wireless connectivity protocols re-
sulted in the development of the Software Defined Radio
(SDR). This technology will replace traditional wireless
receivers, built to support only one radio protocol since
the RF components are most of the time designed ac-
cording to only one transmission frequency. Thus, the
baseband processor supports only the algorithms used
for a specific standard. In an SDR device, the same hard-
ware is able to receive signals fromdi�erent frequency
bands and decode themusing di�erent protocols. There-
fore, the same receiver can synchronise with di�erent
wireless networks (cellular networks, Wi-Fi, WiMAX,
etc...). Unfortunately, practical design of such devices is
quite complex. The RF front-end, for example, should
be able to receive the signals using very broad frequency
spectrum. For baseband processing, di�erent standards
require implementation of di�erent algorithms, each with
its own timing constraints. While flexibility and low power
are unavoidable requirements for such systems, the per-
formance is crucial, since SDRsystems need to comply
with extremely tight timing constraints.
The simplest approach for a multi-standard baseband
processor is to use a DSP and define the receiving chain
in software. However, a traditional DSP is not powerful
enough for the current applications and its power con-
sumption can be too high for use in mobile devices [12].
Therefore, ad-hoc hardware blocks are needed for the
execution of specific kernels.
Some baseband solutions present the coupling of a
general-purpose core with dedicated co-processors able
to handle SDRkernels (for example Expresso platform
in [1]). In particular, the application-specific accelerators
supported the Wideband Code Division Multiple Access
(W-CDMA [2]) and Orthogonal Frequency Division Mul-
tiplexing (OFDM[3]).
A possible alternative is to use a general-purpose co-
processor. BUTTER, a coarse-grain reconfigurable array,
Journ
alof
Signa
l Proc
essing
Syste
msma
nuscr
iptNo
.
(will b
e inser
tedby
theedi
tor)
Robe
rtoAi
roldi
· Tap
ani A
hone
n· F
abio
Garzi
a ·Dr
agomi
r Milo
jevic
·
Jari N
urmi
Imple
ment
ation
ofW
-CDM
ACe
llSe
arch
ona
High
ly
Para
llel a
ndSc
alable
MPS
oC
theda
teof
receip
t and
accep
tance
shou
ldbe
insert
edlat
er
Abstr
actTh
e perfo
rmanc
e of th
e W-CD
MAcell
search
algorit
hmcan
besign
ificant
lyimp
roved
using
homoge
-
neous
genera
l purp
oseMu
lti-Pro
cessor
System
-on-Ch
ip
(MPS
oC) a
rchitec
tures.
The a
pplica
tionalso
scales
well,
asthe
number
ofpro
cessin
g node
s incre
ases, a
llowing
practic
alacc
eleratio
nsto
becom
e close
tothe
theore
ti-
calma
ximum
. Inthi
s work
wedes
cribe a
templa
te MP-
SoCarc
hitect
urebas
edon
multip
rocess
or com
putatio
nal
cluste
rs,cal
ledNin
esilica
. Each
Ninesil
icacon
sistof n
ine
proces
sing n
odes b
ased o
n COF
FEE R
ISCarc
hitect
ure.
MPSoC
inter-
andint
ra-clu
ster c
ommu
nicatio
n are
en-
abled
using
hierar
chical
Netwo
rk-on-
Chip w
ithded
icated
point
to poin
t and b
roadca
st com
munic
ation s
ervices
for
better
perfor
mance
. Prop
osed t
empla
te has b
eenuse
d to
instan
tiate c
omple
tesys
tems w
ithone
andfou
r Nine
sil-
icaclu
sters,
result
ingin M
PSoC
s with
respec
tively 9
and
36com
putatio
nalnod
es.Th
e MPS
oCs h
avebee
n phys
i-
cally p
rototy
pedon
a FPG
A devi
ce,and
theW-
CDMA
cellsea
rchalg
orithm
hasbee
n mapp
edon
both M
PSoC
platfo
rms. T
hefou
r Nine
silica
MPSoC
canexe
cute W
-
CDMA
in 20.5
ms(at
115M
Hz, sl
owmo
deimp
lement
a-
tion) w
iththe
total s
peed-u
p of 2
4.3X
and3.3
Xwh
en
compar
edto
a sing
lepro
cessin
g core
system
andto
a
single
Ninesil
icaclu
ster re
spectiv
ely.
Keyw
ords M
ulti-P
rocess
or Syst
em-on
-Chip ·
Netwo
rk-
on-Ch
ip ·W-
CDMA
· Softw
areDe
fined R
adio
E= m
c2
(1)
Autho
rswo
uldlik
e to t
hank
Steph
enBu
rgess
forhis
invalu
-
able
help.
Robe
rtoAi
roldi,
Tapa
niAh
onen
, Fab
ioGa
rzia,
Jari
Nurm
i
Depa
rtmen
t of C
ompu
terSy
stems
, Tam
pere
Unive
rsity
of
Tech
nolog
y,Ko
rkeak
oulun
katu
1,PO
box
553,
FI33
101,
Tamp
ere, F
inlan
d
E-ma
il:firs
tname
.lastn
ame@
tut.fi
Drag
omir
Miloj
evic
Unive
rsité
Libre
deBr
uxell
es- U
LB, F
aculty
ofAp
plied
Scien
ces, B
io,El
ectro
and
Mech
anica
l Syst
ems,
ULB
CP-
165/
56, A
v.F.
Roose
velt 5
0,B-
1050
Brux
elles,
Belgi
um.
E-ma
il:Dr
agom
ir.Milo
jevic@
ulb.ac
.be
1 Intr
oduct
ion
The e
voluti
onof m
obile d
evices
over th
e past
years a
nd
thevid
e varie
tyof
wireles
s conn
ectivit
y prot
ocols
re-
sulted
in the
develo
pment
of the
Softwa
re Defin
edRa
dio
(SDR).
This t
echnol
ogywil
l repla
cetra
dition
al wirel
ess
receiv
ers, b
uiltto
suppor
t only
onerad
io prot
ocol si
nce
theRF
compon
ents a
remo
stof
thetim
e desig
nedac-
cordin
g to o
nlyone
transm
ission
frequen
cy.Th
us,the
baseba
ndpro
cessor
suppor
tsonl
y the
algorit
hms u
sed
fora sp
ecific st
andard
. Inan
SDR d
evice,
thesam
e hard
-
ware
is able
torec
eive s
ignals
from
di�ere
ntfreq
uency
bands
anddec
odethe
m usin
g di�e
rent p
rotoco
ls.Th
ere-
fore,
thesam
e rece
iver c
ansyn
chroni
sewit
h di�e
rent
wireles
s netw
orks (
cellula
r netw
orks,
Wi-Fi
, WiM
AX,
etc...)
. Unfo
rtunat
ely,pra
ctical d
esign o
f such
devices
is
quite
comple
x.Th
e RF f
ront-en
d,for
examp
le,sho
uld
beabl
e torec
eive th
e signa
ls usin
g very
broad
frequen
cy
spectr
um. F
orbas
eband
proces
sing, d
i�eren
t stan
dards
requir
e imple
menta
tionof d
i�eren
t algor
ithms
, each
with
itsow
n timin
g const
raints.
While
flexibil
ityand
lowpow
er
areuna
voidab
le requ
irement
s for su
chsys
tems, t
heper
-
forma
nceis c
rucial,
since
SDR s
ystem
s need
tocom
ply
with e
xtrem
elytigh
t timin
g cons
traint
s.
The si
mplest
approa
chfor
a mult
i-stand
ardbas
eband
proces
soris t
o use a
DSP a
nddefi
nethe
receiv
ingcha
in
insof
tware.
Howev
er,a t
raditio
nalDS
P is n
otpow
erful
enough
forthe
curren
t appl
ication
s and
itspow
ercon
-
sumpti
oncan
betoo
high f
oruse
in mobi
le devi
ces[12
].
There
fore,
ad-hoc
hardw
areblo
cksare
needed
forthe
execut
ionof s
pecific
kernel
s.
Some b
aseban
d solu
tions p
resent
thecou
pling o
f a
genera
l-purpo
secor
e with
dedica
tedco-
proces
sors a
ble
tohan
dleSD
R kern
els(fo
r exam
pleEx
presso
platfo
rm
in [1]).
Inpar
ticular,
theapp
lication
-specifi
c acce
lerator
s
suppor
tedthe
Wideb
andCo
deDiv
ision M
ultiple
Access
(W-CD
MA[2])
andOr
thogon
al Freq
uency
Divisio
n Mul-
tiplexi
ng(OF
DM[3])
.
A poss
iblealte
rnative
is to u
se agen
eral-pu
rpose c
o-
proces
sor. B
UTTE
R,a c
oarse-g
rainrec
onfigur
able a
rray,
Journal
of Signa
l Proces
singSyst
emsman
uscript
No.
(willbe in
serted by
the editor
)
Roberto
Airoldi
· Tapan
i Ahonen
· Fabio
Garzia
· Dragom
ir Miloje
vic·
JariNur
mi
Implem
entatio
n of W
-CDMA
Cell Se
archon
a High
ly
Paralle
l and S
calable
MPSoC
thedate
of recei
pt and a
cceptanc
e should
be inser
tedlate
r
Abstrac
t The per
formance
of the W-
CDMA ce
ll search
algorithm
canbe s
ignificant
ly improv
ed using
homoge-
neous gen
eralpurp
ose Multi-
Processor
System-o
n-Chip
(MPSoC)
architect
ures.The
applicati
on also s
caleswell,
as the nu
mber of
processin
g nodes
increases,
allowing
practical
accelerat
ionsto b
ecome clo
se tothe t
heoreti-
cal maxim
um.In th
is work w
e describe
a template
MP-
SoCarch
itecture b
asedon m
ultiproce
ssorcomp
utationa
l
clusters, c
alledNine
silica. Eac
h Ninesil
ica consis
t of nine
processin
g nodes b
asedon C
OFFEE R
ISCarch
itecture.
MPSoC i
nter-and
intra-clus
ter comm
unication
are en-
abledusing
hierarchic
al Networ
k-on-Chi
p with ded
icated
point to
point and
broadcas
t commun
ication se
rvices for
better per
formance
. Propose
d template
has been u
sed to
instantiat
e complete
systems w
ith one a
nd four N
inesil-
ica cluste
rs, resulti
ng inMPS
oCswith
respectiv
ely 9and
36 compu
tational n
odes. Th
e MPSoCs
havebeen
physi-
callyprot
otyped o
n a FPGA
device, an
d the W
-CDMA
cell search
algorithm
has been m
apped on
bothMPS
oC
platforms
. The fou
r Ninesil
ica MPSoC
canexec
ute W-
CDMA in
20.5ms (at
115MHz,
slowmod
e implem
enta-
tion)with
the total
speed-up
of 24.3X
and3.3X
when
compared
to asingl
e proces
singcore
system an
d toa
single Nin
esilica clu
sterresp
ectively.
Keywor
ds Multi-
Processor
System-o
n-Chip · N
etwork-
on-Chip ·
W-CDMA
· Softwar
e Defined
Radio
E =mc
2
(1)
Authors
would l
iketo t
hank St
ephen B
urgess fo
r his in
valu-
ablehelp
.
Roberto
Airoldi,
Tapani
Ahonen
, Fabio
Garzia,
JariNur
mi
Departm
entof C
omputer
System
s, Tamp
ereUni
versity
of
Techno
logy, K
orkeako
ulunkatu
1, PO b
ox553
, FI 33
101,
Tampere
, Finlan
d
E-mail:
firstnam
e.lastna
me@tut.
fi
Dragom
ir Miloje
vic
Univers
itéLib
re de B
ruxelles
- ULB,
Faculty
of Appli
ed
Sciences
, Bio,
Electro
andMec
hanical
System
s, ULB
CP-
165/56
, Av. F.
Rooseve
lt 50, B
-1050 B
ruxelles
, Belgiu
m.
E-mail:
Dragom
ir.Miloje
vic@ulb.
ac.be
1 Introd
uction
Theevolu
tionof m
obiledevic
es over th
e past ye
ars and
thevide
variety o
f wireles
s connec
tivity pr
otocols r
e-
sulted in
the develo
pment of
the Softw
are Define
d Radio
(SDR). Th
is techno
logywill
replace tr
aditional
wireless
receivers,
builtto su
pport onl
y one rad
io protoc
ol since
theRF
componen
ts are mo
st ofthe
timedesig
nedac-
cording t
o only on
e transmi
ssionfrequ
ency. Th
us, the
baseband
processor
supports
onlythe
algorithm
s used
for aspec
ific stand
ard.In an
SDRdevic
e, the sam
e hard-
wareis ab
le torecei
ve signal
s from d
i�erent fr
equency
bands and
decode th
em using
di�erent
protocols
. There-
fore,the
samerecei
vercan
synchron
ise with d
i�erent
wireless
networks
(cellular
networks
, Wi-Fi,
WiMAX,
etc...). U
nfortunat
ely, practi
cal design
of such d
evices is
quitecomp
lex.The
RFfront
-end, for
example,
should
be able t
o receive
the signa
ls using v
ery broad
frequency
spectrum
. Forbase
bandproc