1628 IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 58, NO....

1628 IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 58, NO. 4, AUGUST 2011

A Fully Bidirectional Optical Network With LatencyMonitoring Capability for the Distribution of

Timing-Trigger and Control Signals inHigh-Energy Physics Experiments

Ioannis Papakonstantinou, Member, IEEE, Csaba Soos, Spyridon Papadopoulos, Student Member, IEEE,Stéphane Détraz, Christophe Sigaud, Pavel Stejskal, Sarah Storey, Jan Troska, Member, IEEE,

François Vasey, Member, IEEE, and Izzat Darwazeh, Senior Member, IEEE

Abstract—The present paper discusses recent advances on a Pas-sive Optical Network inspired Timing-Trigger and Control schemefor the future upgrade of the TTC system installed in the LHC ex-periments’ and more specifically the currently known as TTCex toTTCrx link. The timing PON is implemented with commerciallyavailable FPGAs and 1-Gigabit Ethernet PON transceivers andprovides a fixed latency gigabit downlink that can carry level-1trigger accept decisions and commands as well as an upstream linkfor feedback from the front-end electronics.

Index Terms—Field-programmable gate arrays and gigabitlinks, fixed and low latency serial links, passive optical networksin high energy physics experiments, timing trigger and controlapplications.

I. INTRODUCTION

P OINT-TO-MULTIPOINT optical links are extensively de-ployed in the Large Hadron Collider (LHC) for Timing-

Trigger and Control (TTC) applications, [1]–[5]. These links areunidirectional and carry the level-1 trigger accept (L1A) deci-sion as well as individually addressed or broadcast commandsfor the various detector sub-partitions [1]–[3]. In the reversedirection, a separate electrical, “throttle/busy” link is respon-sible for communicating the status of the front-end buffers tothe trigger control system (TCS), [4]. Prior to reaching TCS,the “throttle/busy” signals from all front-end buffers within thesame partition are logically combined to generate a single signalper detector partition, introducing yet another layer of com-plexity. TTC links will be upgraded in the future as some ofits key components, such as the TTC laser transmitter, have al-ready started becoming obsolete and in addition, it is desirable

Manuscript received June 18, 2010; revised November 19, 2010 and February21, 2011; accepted March 31, 2011. Date of publication June 16, 2011; date ofcurrent version August 17, 2011. This work was supported in part by ACEOLE,a Marie Curie mobility action at CERN, funded by the European Commissionunder the 7th Framework Programme.

I. Papakonstantinou, C. Soos, S. Papadopoulos, S. Détraz, C. Sigaud, P. Ste-jskal, S. Storey, J. Troska, and F. Vasey are with CERN—European Organ-ization for Nuclear Research, PH Department, CERN CH 1211, Geneva 23,Switzerland (e-mail: [email protected]).

I. Darwazeh is with the Department of Electronic and Electrical Engineering,University College London, London, WC1E 7JE, U.K.

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TNS.2011.2154364

to simplify the network and to enhance the system’s function-alities. A significant simplification in the network architectureand the number of transmission cables and components neededto be installed and maintained would occur if the TTC could bemade bidirectional and “throttle/busy” networks were consol-idated over the same transmission medium infrastructure. Theobjective of this work is to develop a general purpose bidirec-tional, serial optical link for TTC and feedback signal distri-bution, by exploiting the rapidly emerging, passive optical net-work (PON) transceiver technology in conjunction with fieldprogrammable gate arrays (FPGAs). The current paper presentsthe main features of the developed PON-inspired TTC proto-type, as well as reports on the limitations of such a scheme andfuture work to overcome them.

A number of papers have reported on FPGA based syn-chronous optical links, some of which are utilizing PONtransceivers. Authors in [5] have proposed a TTC distributionoptical tree with FPGAs at the end nodes which guaranteesfixed latency. However, the reference clock is made available toboth ends of the link, a feature that is not available in the TTCsystems we are targeting, and, in addition, the implementedoptical link is unidirectional. The same authors proposedanother FPGA based synchronous implementation, [6], forthe replacement of the obsolete GLink chip-set in the ATLASexperiment at CERN, but yet again this link is unidirectional.The White Rabbit consortium, [7], has developed fixed latency,bidirectional optical networks by exploiting the synchronousEthernet protocol and Gigabit-PON (GPON) transceivers. Thisscheme implements gigabit serial links; recovers the clock atthe destination with low jitter; and constantly monitors thelatency. However, signals are routed to the various destinationswith Ethernet switches, which are expensive and might intro-duce an intolerable amount of latency in the delivery of theL1A signal, if used in a TTC context. Finally, authors in [8] alsopropose a bidirectional optical link with latency monitoring andcorrecting capability. However, their specific system designallows them to lock both source and destination clocks to acommon reference which, as mentioned previously, is not thecase in our system.

The paper is organized as following. Section II brieflysummarizes the TTC system already installed in the LHC.

0018-9499/$26.00 © 2011 IEEE

PAPAKONSTANTINOU et al.: FULLY BIDIRECTIONAL OPTICAL NETWORK WITH LATENCY MONITORING CAPABILITY 1629

TABLE INOMENCLATURE

Section III gives an overview of the proposed PON-TTC con-cept, approaching it from a system level. Details of the FPGAspecific transceiver developments are presented in Section IV.Section V describes the tests carried out to qualify the proposedPON-TTC scheme. Section VI provides a list with potentialfuture developments and improvements and finally, Section VIIconcludes.

II. LHC TIMING TRIGGER AND CONTROL SYSTEM

The existing TTC architecture in LHC is shown in Fig. 1(a)[1]–[3]. It consists of a TTCex module which communicateswith a number of TTCrx receiver application specific integratedcircuits (ASICs), [9], [10], via a passive optical tree. TTCex re-ceives information from two channels activated by the triggercontrol system; Channel A contains the level 1 trigger accept(L1A) decision and channel B carries general purpose com-mands for the synchronization and calibration of the subde-tector partitions. TTCex multiplexes the two channels in time,encodes them and uses the data to drive a bank of up to 10Fabry-Perot lasers. Data are transmitted through an optical fiberwhich is about 100 m long and are distributed to a maximumof 32 TTCrxs via an optical fan-out. The downstream data rateis 40 Mb/s and the bi-phase mark format is used to encode thedata.

TTCrx acts as an interface between the TTC system and thedetector partitions. Its function is to recover the LHC clock and

distribute it to the front-end detector electronics. The clock isde-skewed to compensate for variable particle times of flightand cleaned before distribution. The clock can be tuned in finesteps of 104 ps and in coarse steps of 25 ns. The jitter of therecovered clock in the TTCrx, is ps RMS and a QPLL (TICDCL6010) filters it to ps RMS. TTCrx also demultiplexeschannels A and B and delivers the synchronization commands,the L1A trigger-accept decisions and their associated bunch andevent identification numbers to the front-end electronics.

A separate electrical “busy/throttle” link delivers feedback onthe status of the front-end readout buffers and the data acquisi-tion system to the trigger control system. In fact, the “busy” sig-nals from all sub-partitions are merged in the fast merging mod-ules (FMM) in the CMS case [11] or in the ROD Busy modulesin the ATLAS case [12], so that only one signal per detectorpartition finally reaches TCS. If a front-end buffer is ready tooverflow a “warning” signal is issued and TCS inhibits the L1Atrigger-accept until the occupancy in the buffers falls below apredefined threshold and a “ready” signal is issued.

III. PON-TTC SYSTEM

A. Commercial Passive Optical Networks

Passive optical networks are bidirectional, point-to-multi-point optical networks which are employed in the last mile ofcommercial telecommunication networks to deliver high band-width to the end users, [13]–[15]. A typical PON is shown inFig. 2(a) and consists of a master node or optical line terminal(OLT) which communicates with up to 64 slave nodes or opticalnetwork units (ONUs) with the aid of a feeder fiber that canreach 20 km in length, a passive optical splitter and a numberof shorter distribution fibers. Two directions of propagation aredistinguished; in the downstream, the OLT broadcasts packetsto all ONUs, which then decide whether to process them basedon an address field embedded in the packets. The transmissionmedium is shared among the ONUs in the upstream direc-tion and a channel arbitration mechanism is adopted to avoidcollisions from multiple ONU transmissions. The arbitrationmethod of choice in the first generation commercial deploy-ments has been time division multiple access (TDMA), [16],[17], but prototypical, symmetric bandwidth PONs based onwavelength division multiplexing (WDM) [18], sub-carriermultiplexing (SCM) [19], optical code division multiplexing(OCDMA) [20] and others have been extensively reported inthe literature. PONs are also known as Fiber-To-The-Home/Premises (FTTH/P) networks and are now formally part of theof the Ethernet protocol, (EPON), by IEEE [21]. A secondprotocol called Gigabit-PON (GPON) by ITU, [22], designedalong the principles of asynchronous transfer mode (ATM) andwhich satisfies more stringent timing requirements also exists.

The work in this paper concentrates on TDMA-PONs, an im-portant feature of which is that data are transmitted in the twoopposite directions using two different wavelengths; 1490 nm inthe upstream and 1310 nm in the downstream. The two wave-lengths are combined/separated by dielectric optical filters in thetransceivers, Fig. 2(a), and they co-propagate in a single opticalfiber with insignificant interference. A reduction by a factor of2 in the number of fibers and connectors is gained by basing the


Fig. 1. (a) TTC system currently installed in LHC and, (b) block diagram of PON-TTC prototype.

modules on this design rather than on the popular small factorpackaging (SFP) technology, Fig. 2(b).

B. PON-TTC Prototype

The developed PON-TTC system has the following charac-teristics: i) the downstream link is a low and fixed latency, gi-gabit link which is capable of carrying the L1A and commands.ii) The 40 MHz LHC clock is recovered from the serial data inthe receiver and a deskewing process based on the FPGA dig-ital clock management (DCM) module makes it possible to finetune its phase. iii) The recovered clock can be used to drive ahigh-speed serializer for further distribution of data to the de-tector front-end components and, iv) communication is bidirec-tional with the upstream link being used to deliver feedback in-formation and monitor the feeder fiber latency. The main prop-erties of the PON-TTC system are summarized in Table II.

A simplified block diagram of the prototype PON-TTCsystem developed in PH-ESE at CERN is shown in Fig. 1(b).An equivalent TTCex instance has been realized by combininga Virtex-5 FPGA and a commercially available 1G-EPON OLTtransceiver (OBL4333F by OESolutions) while two equivalentTTCrx instances have been realized by using a Virtex-5 FPGAand a Spartan-6 FPGA respectively and 1G-EPON ONU trans-ceivers (OBN3433F by OESolutions). A photo of the actualPON-TTC prototype is shown in Fig. 3.

Trigger and Command Distribution Network: A RocketIOGTX transmitter tile of Virtex-5 FPGA in the OLT implementsthe TCCex logic, which receives the reference 40 MHz LHCclock and generates synchronous frames containing the L1Atrigger accept and the commands for the various detector par-titions. The frames are serialized in the GTX Tx at a rate of 1.6Gb/s and are routed out from the FPGA fabric onto a pair of


Fig. 2. (a) Commercial TDMA PON, (b) SFP� and EPON OLT optical trans-ceivers.

TABLE IIPON-TTC CHARACTERISTICS

differential SubMiniature version A (SMA) connectors, Fig. 3.This electrical serial data stream is subsequently used to drivethe laser in the 1G-EPON OLT transceiver whose optical outputis then delivered to the two TTCrxs via a 1 km single modefiber (SMF). For practical reasons, only two equivalent TTCrxhave been implemented in the ONUs of our prototype, but wehave emulated the losses of a system that supports 64 ONUs byusing a 1:8 splitter and a variable optical attenuator. Note thatthe implementation of two ONUs only is not introducing anylimitations as all properties of interest in a bidirectional opticaltree can be investigated with two end nodes.

Optical data received by the 1G-EPON ONU transceivers areconverted into electrical data and then routed to the two FPGAsthat emulate the TTCrx logic, whose main purpose is to recovera high quality clock from the input datastream and to separatecommands from L1A triggers, as mentioned in Section II. Thesystem has to operate under strict timing mandates. The latencythrough the TTC link must be kept as small as possible to pre-vent the front-end buffers from overflowing and, in addition, itshould be both deterministic and fixed to maintain the stabilityof the phase of the recovered clock under power cycling or resetsof the various electronics in the system pipeline. On the otherhand, ONUs should be able to respond to the OLT by sharing thesame transmission medium in the upstream direction. To avoid

Fig. 3. Photo of PON-TTC prototype.

collisions, a time multiplexing mechanism based on a simpleround robin algorithm has been implemented, which has the ad-vantage of keeping the logic in the ONU simple. According tothe algorithm, the OLT passes a token to each ONU sequentiallyand the ONU that holds the token transmits its data over a pre-defined window in time. These data arrive at the OLT side inbursts, as the distribution fiber lengths cannot be guaranteed tobe equal. The bursts have random phases between them and thereceiver in the OLT should quickly adjust its decision point intime to be able to reconstruct the incoming data stream with avery low error rate.

C. Protocol of Communication

We begin with the latency critical downstream directionwhere superframes are continuously transmitted, Fig. 4(a).Each superframe starts with a comma, , special characterwhich is used for frame alignment at the receiver. A 1-bytelong field follows which carries the L1A trigger decision.A single bit was sufficient for this purpose in earlier implemen-tations, [1], as the trigger decision field was simply signalingwhether an interesting event had occurred or not. However, it isexpected that it will become necessary to distinguish betweendifferent types of events in future experiments and so thetrigger field has been extended to satisfy this requirement. Anauxiliary field is then transmitted, whose functionality isleft open to be defined in the future. Then the andfields follow, that carry the commands for the TTCrxs in theONUs. The commands can be of broadcast type in which casethey need to be executed by all ONUs instances or they can beaddressing individual ONUs. The distinction between broad-cast or individually addressing commands is possible due tothe synchronous nature of the superframes. Thefields constitute a subframe the duration of which is precisely25 ns at 1.6 Gb/s, so that triggers arrive to the ONUs at a 40


Fig. 4. (a) Frame structure in the downstream direction, Downstream Superframe (b) upstream channel occupation process. If packets coming from differentdestinations have similar power levels then a short IFG is sufficient, as can shown between the transmissions of ONU �� and ONU �� , Upstream channelarbitration.

MHz rate, a prerequisite in today’s TTC system. There are 65subframes within each superframe with a total duration of 1.625

s. Subframes are numbered from 1 to 65 and a local counterimplemented in each ONU is incremented with the arrival ofeach new subframe. Each subframe contains commands forexactly one ONU unless the operation is set to the broadcastmode. This distinction is being made with the first bit in the

field. By referring to the first subframe in Fig. 4(a) asan example, if the first bit in is 1 then this command isread by the first ONU only, otherwise it is read by all ONUs.Similarly, the commands in the second subframe are intendedfor the second ONU only, unless we operate in the broadcastmode and so on until the 65th and last subframe is transmitted.The 65th subframe is 3 bytes long to restore the symmetry inthe superframe and ends with an character which is used toimplement the round robin algorithm in the upstream direction.The local counters in the ONUs are reset at the end of eachsuperframe and they wait for a new superframe. It should benoted that the ONUs recover the commands without errors buttake no further action in the present prototype as a TTC systemis merely a messenger. It is left to other systems such as thefront end detector sub-partitions to react to these commands.

A unique 8-bit address has been pre-assigned to each ONU.The ONUs are continuously receiving the OLT superframes anddecode the character which contains the address of the nextONU to occupy the upstream channel. Once an ONU finds amatch with the character, it switches its laser on, transmitsits data over a predefined window in time and finally, switchesits laser off, Fig. 4(b). An interframe gap (IFG) has been in-troduced between consecutive bursts to allow the burst modereceiver enough time to settle and adjust its decision thresholdproportionally to the average power level of the next transmis-sion. Unlike conventional optical links, where lasers are biasedabove threshold and stay on even when data are not transmitted,the lasers of the ONUs not occupying the transmission mediumswitch-off. This prevents the accumulation of noise from a largenumber of sources, which can cause unacceptable signal-to-

noise-ratio (SNR) at the OLT receiver. The IFG can last upto a few hundred ns in commercial 1G- and 10G-EPON net-works where large dynamic ranges, i.e., the ratio between theaverage power level of one burst over the average power leverof its preceding burst, of the order of 20 dBs should be tolerated,thus resulting in a considerable amount of upstream bandwidthbeing wasted. However, since our system will be installed in awell controlled environment, losses are anticipated to be wellbalanced amongst the various branches of the optical tree andso, the IFG can be minimized by allowing the switch-on andswitch-off processes to overlap as shown in Fig. 4(b), signifi-cantly saving bandwidth. The minimum IFG was measured tobe 50 ns [25], before the quality of the upstream link starteddegrading for bursts with dynamic range dB, which is a suf-ficiently large margin for our balanced system. Each upstreamframe begins with a training sequence of alternating ones andzeros to assist the receiver in adjusting the position of its ver-tical sampling point, before the actual payload of the ONU istransmitted.

An important aspect of the PON-TTC is the latency being in-troduced in the upstream direction. Currently installed “busy/throttle” systems have processing times that range between 2–4

s, depending on the buffer overflow probability in each de-tector partition, [2], [4]. On the other hand, each ONU occupiesthe upstream channel for 1.625 s in our prototype and in theworst case scenario an ONU might have to buffer its data for amaximum of 102 s before it is allowed to transmit them. How-ever, such long latencies need not be the case as the much morefrequent, auxiliary field can be used for the regulation of theupstream channel in a manner similar to the character, sig-nificantly compressing the waiting time. The minimum durationof the upstream frames has been calculated in Table III for thecurrently used and for other commercially available PON trans-ceiver technologies, where a 4 byte payload has been assigned,as a bare minimum, for the representation of “busy” signals. Ac-cording to Table III, the upstream frame duration can be reducedto 215 ns, in which case a new ONU address can be carried on


Fig. 5. Upstream blocking time as a function of total number of ONUs forvarious PON transceiver technologies.

TABLE IIIMIN UPSTREAM FRAME DURATION BREAKDOWN ANALYSIS

every ninth character. Furthermore, Fig. 5 shows the max-imum amount of time that an ONU needs to wait before it oc-cupies the upstream channel as a function of the total number ofONUs in the network and for the calculated minimum frame du-rations. The requirement for 2–4 s of upstream latency couldbe satisfied by the current prototype, if the number of ONUs wasreduced to 8–18. Moreover, if next generation 10GPON com-ponents were to be used, then 40–80 ONUs could be theoreti-cally served by a TDMA PON-TTC whilst, still complying withthe upstream latency requirements. Lastly, there is an additionalsimplification that a TDMA PON-TTC can introduce to the net-work architecture; the multiplexing of all “busy” signals overthe same fiber presents a natural way to merge these signals,a task currently being undertaken by separate modules. If, forexample, all ONUs at the branches of the splitter belong to thesame detector partition, then the first upstream frame carryinga “warning” could instantly stimulate a command to throttletriggers. Such a rule is equivalent to the “OR” logical opera-tion being performed by the FMM or the Busy ROD modules.More operations could be implemented for standalone testingand other purposes as well.

D. Latency Monitoring Scheme

The bidirectionality of the PON-TTC can be exploited tomonitor variations in the downstream latency due to environ-mental changes like for example temperature drifts or agingof components. The thermal time delay factor of single modefibers is ps/Km C [24], mainly attributed to fiber strain,which could introduce a large amount of uncertainty in thephase of the recovered clock, especially in systems using

Fig. 6. (a) Feeder fiber latency monitoring scheme and (b) full-ranging latencymonitoring scheme.

long distribution fibers parts of which might even be installedoutdoors.

The latency through the 1 km feeder fiber has been moni-tored in the PON-TTC by implementing the scheme shown inFig. 6(a). There are two main modifications compared to theoriginal system presented in Fig. 1(b). The first is that one outputof the optical splitter has been looped back to another splitterport and so downstream data are now circulated in the upstreamdirection. The second modification is that an ONU has been in-stantiated at the top of the optical tree next to the OLT, and up-stream data are now delivered to both of them via a 1:2 splitter.It is reminded that data transmitted in the upstream and down-stream directions are using different wavelengths and therefore,do not interfere. The ONU at the top recovers the clock from thecirculated downstream data and rejects transmissions comingfrom the ONUs at the branches. It then shares the locally re-covered clock with the OLT where a fine phase comparator hasbeen implemented. The phase of the recovered clock is com-pared with that of the reference clock, and so any deviationsfrom initial reference value can be detected. Finally, althoughnot yet implemented, the OLT can instruct the ONUs to dynam-ically reprogram their look up tables to compensate for thesephase deviations.

IV. HARDWARE IMPLEMENTATION

A. OLT Transmitter

A Virtex-5 FPGA is used for the OLT transmitter design, asimplified block diagram of which is shown in Fig. 7(a). Thereference 40 MHz LHC clock is fed to a CDCE62005 phaselocked loop (PLL) by Texas Instrument, which generates anexact copy of the LHC clock as well as multiplies its frequencyby 2 to produce an 80 MHz clock. This frequency multiplica-tion step is necessary as the Virtex-5 GTX transceivers requirea minimum clock frequency of 60 MHz to operate. The 40 MHzclock is then routed to a frame generator which generates one32-bit frame every 25 ns to be transmitted in the downstream.Before these bits are allowed to be routed to the Tx-PCS part ofthe GTX transceiver, the first domain in the GTX transmitter, agear-box logic has been implemented to perform the transition


Fig. 7. (a) Simplified block diagram of OLT transmitter, (b) gear-box operation and (c) timing of frames before and after gear-box.

of the operational clock frequency from the LHC 40 MHz to theinternal FPGA 80 MHz. The gear box receives the 32-bit framesynchronously with the 40 MHz clock at its input and outputs a16-bit frame synchronously with the 80 MHz clock. The com-bined operation of the frame generator and the gear box, whichplay the role of an equivalent TTCex in our system, is clarifiedin Fig. 7(b) and (c). The frame generator passes 4 bytes of infor-mation to the gear-box on the rising edges of the 40 MHz clock.The operation of the gear-box is equivalent to a multiplexer thatoutputs the first 2 of the received 4 bytes on the odd edges of the80 MHz clock and the last 2 on the even edges. Now, every timea comma alignment character is generated, an “IscharK” lineis toggled indicating the existence of the character to thegear-box. The first edge of the TXUSRCLK (Tx-PCS internalparallel clock) that occurs immediately after the “IscharK” isactivated is by convention the first odd edge. In this way, dataalways exit the gear box in the correct order even after a trans-mitter reset which might leave the internal counters in arbitraryinitial states. The relative timing characteristics and event oc-currences are shown in Fig. 7(c).

Next, the 16-bit frames enter an 8b/10b encoder stagewhich produces parallel 20-bit wide words. A second clock

domain transition then occurs between the Tx-PCS domain andTx-PMA whose main task is to serialize the data. The Tx-PCSand Tx-PMA domains are clocked by TXUSRCLK and byXCLK (Tx-PMA internal parallel clock) respectively, whichare not phase aligned, although they need to be for the correctoperation of the serializer. The usual method to resolve phasedifferences between clocks is by employing an elastic bufferon the data path. However, since this buffer may introduce anamount of non-deterministic latency it has been bypassed andthe dedicated phase alignment circuit of the GTX transceiverhas instead been used that performs the desired alignmentby utilizing the PMA PLL. Finally, data enter the PISO (par-allel-in-serial-out) block, which serializes them with the aid ofan 800 MHz double data rate clock, also coming from the PMAPLL, to generate the 1.6 Gb/s downlink serial datastream.

B. ONU Receiver

A block diagram of the ONU receiver is shown in Fig. 8(a).Although two different FPGA families, Virtex-5 and Spartan-6,were used to implement the two ONUs in order to emulate areal system implementation where more than one technologies


Fig. 8. (a) Simplified block diagram of ONU receiver, (b) comma detect logic implementation and (c) timing of incoming frames relative to the two versions ofthe recovers 40 MHz clock.

can coexist, both receiver designs are similar. Data are first con-verted in electrical form in the 1G-EPON ONU receiver and the1.6 Gbit/s serial datastream is presented to the input of the FPGAreceiver clock and data recovery (CDR) circuit. The CDR takesa serial clock from the locally shared PMA PLL, which approx-imately matches the frequency of the serial data, and adjustsits phase and frequency until it matches the transitions in theincoming data. In this way, a serial 800 MHz double data rateclock is recovered from the data. The CDR then retimes the dataand passes them on to the deserializer block or the serial-in-par-allel-out (SIPO) circuit. Following, the PMA PLL divides theserial clock to generate the GTX/GTP receiver internal 80 MHzXCLK parallel clock, which is also fed to the SIPO. Serial dataare entering the SIPO block one by one on both edges of the 800MHz double data rate serial clock (every 625 ps) and leave theSIPO in groups of 20 parallel bits every 12.5 ns. The operationof the divider is the most vulnerable part in the receiver with re-gards to achieving deterministic latency. This is because the 80MHz XCLK parallel clock can itself in 20 different states after areset as it can lock on any arbitrary edge of the serial 800 MHzclock. The phase of the XCLK clock can therefore vary in 20discrete steps of 625 ps each as ps, .In order to detect the exact state of the parallel XCLK clock, abarrel shifter and comma character detection logic has been in-troduced after the SIPO, Fig. 8(b). Frames transmitted in thedownstream always begin with a comma alignment characterand therefore, in the event that the parallel clock is initiated onthe first edge of the serial clock the first bit of the comma char-acter should come out from the first parallel line of the SIPO.However, if the parallel clock lags behind the serial clock thenthe first bit of the comma character is transferred to a different

from the first parallel line of the SIPO. By shifting the bits inthe barrel shifter and counting how many steps it took beforethe first bit of the comma character reached the most signifi-cant bit (MSB), the state of XCLK is precisely determined. Todemonstrate the operation of the barrel shifter the phase of areference, invariant 80 MHz clock has been plotted in Fig. 9against the phase of the recovered in the two ONUs clocks forall possible barrel shifter values. The relative phase differencebetween reference and recovered clocks follows a linear trendwith the slope of the two lines being 625 ps for both ONU im-plementations, proving the validity of our method. The infor-mation from the barrel shifter is then provided to a control logicwhich instructs the digital clock management (DCM) block toshift the phase of the 80 MHz clock back into alignment withthe reference clock, with the shift values taken from a pre-con-structed look-up-table. The DCM must be dynamically recon-figured, as every time the receiver is reset a different value ofthe barrel shifter might occur. The final step is to halve the 80MHz clock to derive a copy of the original LHC 40 MHz clock.This step also contains an element of non-deterministic latencyas there are two different versions of the 40 MHz that can berecovered differing by 180 as shown in Fig. 8. A final “commadetect” logic has been implemented in order to resolve this am-biguity. The “comma detect” logic samples the data from thebarrel shifter with both versions of the 40 MHz clock and looksfor the character. When a character is found a counteris incremented, Fig. 8(b). The operation of the whole conceptis based on the fact that only one of the two copies of the 40MHz will see the character, Fig. 8(c), causing its corre-sponding counter to increment, which is the version that is fi-nally selected with the aid of a multiplexer. A copy of the 40


Fig. 9. Relative phase difference between ONU recovered and reference clocksas a function of barrel shifter value.

Fig. 10. Block diagram of OLT burst mode receiver based on oversamplingcircuit of the Virtex-5 GTX receiver.

MHz clock whose phase always remains unchanged even afterpower cycles or FPGA resets in either transmitter or receiverside, is obtained with this method. After the phase is corrected,data are fed to an 8b/10b decoder (not shown in Fig. 8) and arethen available for further processing.

C. OLT Burst Mode Receiver

Data from each ONU are transmitted in the upstream inbursts. The OLT receiver has to be able to retrieve the informa-tion from all ONU bursts, a feature which excludes using thesame receiver architecture in the OLT as the one implementedin the ONUs. This is because data coming from different ONUsmight have an arbitrary phase difference between them causingthe CDR to lose lock. Each time the CDR needs to relock to anew burst, a reset process which lasts for 5 s is instantiated,leading to an excess waste of bandwidth. We are thereforeforced to pursue a different architecture, which is based on theoversampling capability of the GTX receiver. GTX receivershave 5x digital oversampling circuits built-in, Fig. 10. If theseare used the CDR circuit is not trying to lock on the incomingdatastream but instead it is bypassed and data recovery takesplace by making use of the blind oversampling technique,[25]. In this case, the incoming data are blindly oversampledat 5x their nominal data rate generating 5 distinct samples perbit. A decision logic subsequently collects all samples from

Fig. 11. Histograms of latency stability in Spartan-6 based ONU after 100system resets.

a predefined window of incoming bits and implements a ma-jority voting algorithm to identify these samples which occurclosest to the center of each bit. Since the decision points arecontinuously updated at the end of each window of bits, datafrom different ONUs can be correctly received even though twosequential bursts might have random phase differences betweenthem. A clock needs not be recovered from the data nonetheless,it is important to provide a sufficient number of transitions inthe upstream data to facilitate the operation of the oversamplingcircuit. This condition is satisfied by incorporating the trainingsequence in the upstream frame shown in Fig. 4, which shouldbe longer than 100 bits to ensure error free operation [25].

V. PON-TTC SYSTEM CHARACTERIZATION

The experiments that were performed to characterize thequality of the recovered clocks are now presented.

A. Latency and Phase Stability of Recovered Clocks

The latency in the downstream direction has been measuredand the results are presented in Table IV. The transmitter sidecontributes 77.11 ns which corresponds to the cumulative timethrough the GTX transmitter and the 1G-EPON OLT trans-ceiver and evaluation board. The latency through the EPONONU transceiver board and FPGA receiver at the other end ofthe link is 139.65 ns, whereas the optical fiber adds another5 ns/m. The total latency of the PON-TTC system, excludingthe optical fiber, is therefore 216.77 ns, which is almost twicethat of the current TTC system with a total latency of 90–110ns, [8]. The excess PON-TTC latency implies that data needto be stored for longer in the detector before they can be readout which is not an option in the LHC currently due to thelimited front-end buffer depth. However, this prototype has notyet been optimized to achieve lowest latency. In Section VI wepresent a strategy of how to reduce the PON-TTC latency tobring it down to a level comparable to the current’s system. Thephase stability of the recovered clocks in the ONUs has alsobeen characterized after 100 system resets. A system reset is asequence of a power cycle of the TI CDCE62005 PLL, a reset ofthe OLT transmitter and a reset of the ONU receiver. The phaseof the recovered clock is compared to the phase of the reference


Fig. 12. (a) Set-up used to characterize the quality of the ONU recovered clock based on BER measurements. (b) BER curves of serial data generated by ONUrecovered clock and reference clock.

TABLE IVCURRENT TTC AND PON-TTC LATENCY CHARACTERISTICS

40 MHz clock after each system reset and a histogram has beenbuilt as shown in Fig. 11 for the Spartan-6 based ONU. Thepeak-to-peak variation of the recovered clock’s phase isps. The uncertainty to the average phase of the recovered clocksshould, in theory, equal the DCM step used to perform thephase alignment task, which in our case is ps. The residualuncertainty is due to the implementation of the look-up-table inthe ONUs, and it should be possible to be eliminated.

B. Quality of Recovered Clocks

Jitter characterization measurements have also been per-formed on the recovered clocks in the two ONUs and the resultsare summarized in Table V. The recovered clocks in the twoFPGAs are of similar quality before they are phase shifted andfrequency divided by the DCM as both have ps of RMSjitter. However, there is a rather large discrepancy between thejitter values for the two FPGAs after the DCM. The 40 MHzrecovered clock from the Virtex-5 ONU has a cycle-to-cyclejitter with an RMS value of 36.6 ps while the RMS jitter ofthe clock from the Spartan-6 ONU is 53.12 ps. Both of thesevalues are too high for the purpose of driving high-speed links,for example National Instrument’s DS25C400 Quad 2.5 GbpsSERDES requires an input clock with an RMS cycle-to-cyclejitter of ps, [27] to operate. An external PLL had to beincorporated in the link for this reason, in order to clean thejitter. By using the CDCL6010 PLL by Texas Instrument,[28], RMS jitter values of ps were achieved in both cases.In order to further qualify the use of the recovered clocks todrive a multigigabit serial link we performed a loop-back biterror rate (BER) measurements the set-up of which is shown in

Fig. 12(a). The reference 40 MHz clock was used to generatea serial 3.2 Gb/s, pseudo-random bit sequence (PRBS) with

bits, looped-back to the receiver part of the SFP+ via anoptical attenuator and the BER was measured as a function ofthe average received (optical) power. Once this experiment wasfinished, the reference 40 MHz clock source was replaced withthe recovered 40 MHz clock from the Spartan-6 ONU, whichhad shown the worst performance between the two ONUs, andthe BER measurements were repeated. The BER tests wereperformed by using hardware design containing an integratedbit error ratio test (IBERT) core running in a Virtex-6 FPGA,[26]. The parameters of the experiment and monitoring ofthe results were performed with the aid of the ChipScope proanalyzer’s IBERT console running on a PC and interfacingto the Virtex-6 evaluation board through a USB cable. Theobjective of the two experiments was to reveal any potentialpenalties introduced to the BER curves due to the expecteddegradation of the recovered clocks compared to the BERcurves obtained with the reference clock. The results presentedin Fig. 12(b) show indistinguishable performance of the exam-ined serial link when the reference 40 MHz and the recoveredin the ONU clocks, with no power penalties introduced by theprocess of transmitting the clock downstream and recoveringit from the data, thus qualifying our clock recovery scheme foruse in gigabit serial links. It is noted that all experiments toqualify the recovered clocks were performed with the upstreamchannels being active in order to include the effect of opticaland electronic cross-talk.

C. Latency Monitoring

The feeder fiber was placed in a CTS T40/50 climate chamberand the phase of the recovered in the local ONU clock was com-pared with the phase of the reference 40 MHz clock in the OLTfor temperatures in the range C with a 5 C step.The phase difference between the two clocks as a function oftemperature is presented in Fig. 13 in both phase comparatorDCM values, which is the output of our system, and in ps mea-sured separately on an oscilloscope. Measurements are refer-enced to room temperature which assumes that the two clocksare perfectly aligned at 20 C. The phase between the two clocks


TABLE VPON-TTC JITTER CHARACTERISTICS

Fig. 13. Phase difference, in ps and in relative DCM values, between reference40 MHz clock and ONU recovered clock as a function of the temperature in thefeeder fiber. Room temperature (20 C) was taken as reference.

increases linearly as temperature increases with the slope of thecurve being 49.21 ps. Since the round trip length of the fiberis 2 km (the feeder fiber is 1 km and is traversed twice), thetemperature delay gradient of the fiber is 24.5 ps/ C/km, whichalmost perfectly matches earlier measurements. The DCM wasincremented by 6 steps on average every time the temperaturein the chamber was changed by 5 C and the phase between thetwo clocks was increasing by 280 ps. Therefore, the sensitivityof the implemented phase monitoring scheme is 47 ps or Cfor the given feeder fiber length.

VI. FUTURE DEVELOPMENTS

Some potential directions for the PON-TTC project will nowbe presented

Downstream Latency: As mentioned in Section III the la-tency in the downstream direction is 216.8 ns. This is almosttwice that of the current TTC’s whose development is basedon ASIC electronics which can be optimized by the designerfor minimum latency. In contrast, FPGA based architectures in-volve a number of standard blocks which even when not usedthey are still introducing excess latency, as for example hap-pens when the elastic buffers are bypassed. As all individualblock latencies in the FPGAs are measured in clock cycles, themost straightforward path to reduce it is by doubling the internalclock frequency from 80 MHz to 160 MHz which also impliesthe increase of the serial data rate from 1.6 Gb/s to 3.2 Gb/s. Thisis feasible after the recent emergence of the 10G-EPON standard[23], which has made available optical transceivers running atthese speeds commercially available.

Full Ranging: The PON-TTC system has a feeder fiber la-tency monitoring capability built-in, nevertheless, the ultimategoal is to perform a full ranging scheme which is described inFig. 6(b). In the full ranging scenario, the clock recovered in theONU receiver from the downstream data is looped-back to theONU transmitter to generate the upstream data. The OLT thenrecovers the clock from the upstream data and compares it withthe reference LHC 40 MHz clock. The advantages of the fullranging scheme are obvious, both feeder and distribution fiberlatencies can be monitored and be corrected for, on an ONU perONU basis. In addition, initial look up tables do not need to bestored in the ONUs, since they can be constructed and updatedon-line as the full ranging algorithm runs. There are two majordevelopment steps need to be undertaken in order for the fullranging scheme to be realized. At first, an external clock needsto be fed to the OLT transmitter to kick-off its operation and soa dynamic transition from this external clock to the recoveredfrom the downstream data clock needs to be implemented. Sec-ondly, the latency in the upstream direction must be constant aswell in order for the ranging to be meaningful, which is not a re-quirement that can be guaranteed with the oversampling circuit.Probably, the optimum solution in this case is to split the oper-ation of the system in two phases one dedicated to the rangingmechanism and a second to data transmission.

Non-blocking Upstream Architecture: ONUs have to wait toreceive the “token” before they occupy the upstream channel, aprocess that introduces excess latency. In order to reduce the up-stream blocking time, TDMA schemes with faster transceiverscan be used as proposed in Section III and/or alternative mul-tiplexing techniques. As only low rate, management data areflowing in the upstream direction, a spread spectrum techniquesuch as the electronic code division multiple access (CDMA) al-gorithm [27] can be used in a PON-TTC application. Accordingto the CDMA algorithm, data from each user are encoded byusing a unique pseudo-random code which runs at a faster ratethan the original data. The codes are orthogonal to each otherand data from different sources can be separated at the desti-nation by means of cross-correlation digital signal processing.Since ONUs transmit in bursts, an asynchronous CDMA [28]version is required in the PON-TTC context. If users wereto be multiplexed, then each pseudo-random code should con-tain bits according to the Walsh-Hadamard [29], or Gold se-quences [30], typically employed in asynchronous CDMA sys-tems. CDMA based passive optical networks have already beendemonstrated for commercial fiber-to-the-home applications. Abidirectional CDMA PON that supports 32 users at 62.5 Mbit/sover 100 km has been reported in [31], while a faster CDMAPON system that can potentially support 64 users at 280 Mbit/sover 10 km has been demonstrated in [32]. The previous twosystems were implemented with ASICs but an FPGA basedCDMA PON with potential to support 64 users at 10 Mbit/s over10 km has been reported in [33]. It is noted that CDMA cannotincrease the bandwidth available to each user, but it rather aimsto eliminate the blocking time on the ONU side, at the expenseof more complex receiver architecture in the OLT. The most im-portant component of a CDMA receiver is the decoder, whichin most implementations consists of a phase matched finite im-pulse response (FIR) filter [27]. Such digital filters can be im-


Fig. 14. Clock frequency agnostic TTC implemented with tri-band PON trans-ceivers. L1A are delivered at 1550 nm while commands at 1310 nm.

plemented with high accuracy in FPGAs, while at the sametime the decoding process can in principle be restrained to afew clock cycles, by taking advantage of the FPGA parallel dis-tributed arithmetic digital signal processing capability. It is ex-actly this property that makes CDMA a feasible proposition forPON-TTC systems, as discarding blocking time on one side ofthe network is not traded with an equal amount of processingtime on the other. However, a number of issues need to be in-vestigated before such a system can be deployed in a TTC en-vironment. The number of parallel decoders that can be imple-mented in a single FPGA and the maximum data rate that eachuser can sustain are still unclear, for example. Such issues willbe the subject of future investigations.

Clock Frequency Agnostic TTC: The demonstratedPON-TTC system has been designed to operate with a 40MHz trigger frequency. If this system is to be used by otherexperiments running at a different rate, then the communicationprotocol should change to adapt to the new clock frequency.Another option would be to use the so called tri-band PONtransceivers, Fig. 14. Tri-band PONs utilize a 1550 nm band inthe downstream in addition to the already existing 1490 nm.Channels A and B can now be multiplexed in wavelength ratherthan in time and the system can operate free of the trigger rate.Another advantage of using this method is that a conventionalprotocol, such as Ethernet, can be used to deliver the commands

VII. CONCLUSION

A bidirectional TTC system has been demonstrated with pas-sive optical network optical transceivers and FPGAs. The de-veloped prototype system satisfies the strict timing mandate of aTTC network - namely fixed and deterministic downlink latencyand distribution of a reference clock with low jitter and, also,features a latency monitoring capability. In addition, it has beenshown that the main limiting factor for a shared “busy/throt-tling” uplink to be implemented in a bidirectional TTC is theblocking time that each end-node has to suffer before occu-pying the upstream channel. In order for this limitation to belifted, next generation PON transceivers or more advanced, mul-tiplexing techniques would be required, both of which will besubject to future investigations. Finally, although the system hasbeen designed in accordance with the LHC TTC requirements,

its implementation is general enough to be considered in otherhigh-energy physics experiments.

REFERENCES

[1] B. G. Taylor, “Optical timing, trigger and control distribution for LHCdetectors,” IEEE Trans. Nucl. Sci., vol. 41, no. 4, pp. 1294–1299, Aug.1994.

[2] J. Varela, “Timing and synchronization in the LHC experiments,”presented at the 6th Workshop Electron. for LHC Experim., Krakow,Poland, 2009.

[3] J. Troska, E. Corrin, Y. Kojevnikov, T. Rohlev, and J. Varela, “Imple-mentation of the timing, trigger and control system of the CMS exper-iment,” IEEE Trans. Nucl. Sci., vol. 53, no. 3, pp. 834–837, Jun. 2006.

[4] A. Racz, “Trigger throttling system for CMS DAQ,” presented at the6th Workshop Electron. for LHC Experim., Krakow, Poland, 2009.

[5] A. Aloisio, F. Cevenini, R. Giordano, and V. Izzo, “High-Speed,fixed-latency serial links with FPGAs for synchronous transfers,”IEEE Trans. Nucl. Sci., vol. 56, no. 5, pp. 2864–2873, Oct. 2009.

[6] A. Aloisio, F. Cevenini, R. Giordano, and V. Izzo, “Emulating the glinkchip set with FPGA serial transceivers in the ATLAS level-1 muontrigger,” IEEE Trans. Nucl. Sci., vol. 57, no. 2, pp. 467–471, Apr. 2010.

[7] J. Serrano, P. Alvarez, M. Cattin, E. Garcia Cotta, J. Lewis, O. Moreira,T. Wlostowski, G. Gaderer, P. Loschmidt, and J. Dedic, “The whiterabbit project,” presented at the Int. Conf. Accelerator Large Experim.Phys. Control Syst., Kobe, Japan, 2009.

[8] A. Rohlev, A. Borga, J. Serrano, M. Cattin, and M. Stettler,“Sub-nanosecond machine timing and frequency distribution viaserial links,” presented at the TWEPP, Naxos, Greece, 2008.

[9] J. Christiansen, A. Marchioro, P. Moreira, and A. Sancho, “ReceiverASIC for timing, trigger and control distribution in LHC experiments,”IEEE, Trans. Nucl. Sci., vol. 43, no. 3, pp. 1773–1777, Jun. 1996.

[10] J. Christiansen, A. Marchioro, P. Moreira, and T. Toifl, “TTCrx refer-ence manual,” 2005. [Online]. Available: http://ttc.web.cern.ch/TTC/TTCrx_manual3.11.pdf

[11] A. Racz, “The final prototype of the fast merging module (FMM) forreadout status processing in CMS DAQ,” presented at the 10th Work-shop Electron. for LHC Future Experiments, Boston, MA, 2004.

[12] T. Pauly et al., “ATLAS level-1 trigger timing and monitoring,” pre-sented at the 11th Workshop Electron. for LHC Future Experiments,Heidelberg, Germany, 2005.

[13] F. Effenberger, D. Clearly, O. Haran, G. Kramer, R. Ding Li, M.Oron, and T. Pfeiffer, “An introduction to PON technologies,” IEEECommun. Mag., vol. 45, pp. S17–S25, Mar. 2007.

[14] M. Gagnaire, “An overview of broad-band access technologies,” Proc.IEEE, vol. 85, no. 12, pp. 1958–1972, Dec. 1997.

[15] T. Koonen, “Fiber to the Home/Fiber to the Premises: What, Where,and When?,” Proc. IEEE, vol. 94, no. 3, pp. 911–934, May 2006.

[16] G. Kramer, B. Mukherjee, and G. Pesavento, “IPACT a dynamic pro-tocol for an ethernet pon (EPON),” IEEE Commun. Mag., vol. 40, pp.74–80, Feb. 2002.

[17] L. Yuanqiu and N. Ansari, “Bandwidth allocation for multiservice ac-cess on EPONs,” IEEE Commun. Mag., vol. 43, no. 2, pp. S16–S21,Feb. 2005.

[18] K. Grobe and J.-P. Elbers, “PON in adolescence: From TDMA toWDM-PON,” IEEE Commun. Mag., vol. 46, no. 1, pp. 26–34, Jan.2008.

[19] J. M. Fabrega, E. T. Lopez, J. A. Lazaro, M. Zuhdi, and J. Prat,“Demonstration of a full duplex PON featuring 2.5 Gbps sub carriermultiplexing downstream and 1.25 Gbps upstream with colourlessONU and simple optics,” in Proc. 34th Euro. Conf. Opt. Commun.,2008, pp. 1–2.

[20] Y. Chao-Chin Yang, “Compact optical CDMA passive optical networkwith differentiated services,” IEEE Trans. Commun., vol. 57, no. 8, pp.2402–2409, Aug. 2009.

[21] Ethernet Passive Optical Network (EPON), IEEE Standard 802.3ah,2004. [Online]. Available: http://standards.ieee.org/getieee802/802.3.html

[22] Gigabit-capable Passive Optical Networks (GPON), ITU StandardG.984, 2003. [Online]. Available: http://www.itu.int/rec/T-REC-G/e

[23] 10G Ethernet Passive Optical Network (10G EPON), IEEE Stan-dard 802.3av, 2009. [Online]. Available: http://standards.ieee.org/getieee802/802.3.html

[24] N. Shibata, Y. Katsuyama, Y. Mitsunaga, M. Tateda, and S. Seikai,“Thermal characteristics of optical pulse transit time delay and fiberstrain in a single-mode optical fiber cable,” Appl. Opt., pp. 979–984,Apr. 1983.


[25] J. Kim and D.-K. Jeong, “Multi-gigabit-rate clock and data recoverybased on blind oversampling,” IEEE Commun. Mag., vol. 41, pp.68–74, Dec. 2003.

[26] Xilinx, Inc., San Jose, CA, “Chipscope integrated bit error ratio test(IBERT) for Virtex-6 GTX, datasheet,” 2010. [Online]. Available:http://www.xilinx.com/support/documentation/ipembedprocess_de-bugtrace_chipscope-ibert-v6gtx.htm

[27] R. Pickholtz, D. Schilling, and L. Milstein, “Theory of spread-spectrumcommunications—A tutorial,” IEEE Trans. Commun., vol. 30, no. 5,pp. 855–884, May 1982.

[28] W. C. Kwong, P. A. Perrier, and P. R. Prucnal, “Performance compar-ison of asynchronous and synchronous code-division multiple-accesstechniques for fiber-optic local area networks,” IEEE Trans. Commun.,vol. 39, no. 11, pp. 1625–1634, Nov. 1991.

[29] H.-H. Chen and Y.-C. Yeh, “Capacity of space-time block-codedCDMA systems: Comparison of unitary and complementary codes,”IEE Proc. Commun., vol. 152, no. 2, pp. 203–214, Apr. 2005.

[30] S. Tamura, S. Nakano, and K. Okazaki, “Optical code-multiplex trans-mission by gold sequences,” J. Lightw. Technol., vol. 3, no. 1, pp.121–127, Feb. 1985.

[31] G. C. Gupta, M. Kashima, H. Iwamura, H. Tamai, T. Ushikubo, and T.Kamijoh, “A simple one-system solution COF-PON for metro/accessnetworks,” J. Lightw. Technol., vol. 25, no. 1, pp. 193–200, Jan. 2007.

[32] J. B. Rosas-Fernandez, J. D. Ingham, R. V. Penty, and I. H. White, “18Gchips/s electronic CDMA for low-cost optical access networks,” J.Lightw. Technol., vol. 27, no. 3, pp. 306–313, Feb. 1, 2009.

[33] B.-G. Ahn and P. Y. Park, “A symmetric-structure CDMA-PONsystem and its implementation,” IEEE Photon. Technol. Lett., vol. 14,no. 9, pp. 1381–1383, Sep. 2002.

1628 IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 58, NO....

Documents

Transcript of 1628 IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 58, NO....