HC19.21.510.Multiterabit Switch Fabrics Enabled by Proximity … · 2013-07-28 · Multiterabit...

12
Multiterabit Switch Fabrics Enabled by Proximity Communication Hans Eberle, Alex Chow, Bill Coates, Jack Cunningham, Robert Drost, Jo Ebergen, Scott Fairbanks, Jon Gainsley, Nils Gura, Ron Ho, David Hopkins, Ashok Krishnamoorthy, Jon Lexau, Wladek Olesinski, Tarik Ono, Justin Schauer Sun Microsystems Laboratories 2 Hot Chips 19 © Sun Microsystems, Inc. Future Interconnect Needs The interconnect becomes an increasingly critical system component > Fatter compute nodes > Increasing disparity between local and remote communication Data center trends > Server consolidation > Network consolidation > Virtualization > Clustering > Horizontal scale beyond the chassis

Transcript of HC19.21.510.Multiterabit Switch Fabrics Enabled by Proximity … · 2013-07-28 · Multiterabit...

Page 1: HC19.21.510.Multiterabit Switch Fabrics Enabled by Proximity … · 2013-07-28 · Multiterabit Switch Fabrics Enabled by Proximity Communication Hans Eberle, Alex Chow, Bill Coates,

Multiterabit Switch FabricsEnabled by ProximityCommunication

Hans Eberle, Alex Chow, Bill Coates, JackCunningham, Robert Drost, Jo Ebergen,Scott Fairbanks, Jon Gainsley, Nils Gura,Ron Ho, David Hopkins, AshokKrishnamoorthy, Jon Lexau, WladekOlesinski, Tarik Ono, Justin Schauer

Sun Microsystems Laboratories

2Hot Chips 19 © Sun Microsystems, Inc.

Future Interconnect Needs

� The interconnect becomes an increasinglycritical system component> Fatter compute nodes

> Increasing disparity between local and remotecommunication

� Data center trends> Server consolidation

> Network consolidation

> Virtualization

> Clustering

> Horizontal scale beyond the chassis

Page 2: HC19.21.510.Multiterabit Switch Fabrics Enabled by Proximity … · 2013-07-28 · Multiterabit Switch Fabrics Enabled by Proximity Communication Hans Eberle, Alex Chow, Bill Coates,

3Hot Chips 19 © Sun Microsystems, Inc.

Proximity Communication (PxC)

300 m

Transmit

Transmit

Receive

Receive

Tx Micropads Rx Pads

XVernier

YV

ern

ier

0-20 m

5-20mm

Chips overlap face-to-face

Capacitively couple over micron distances Utilize on-chip electronic alignment

Chip 1Chip 2

Chip 3

4Hot Chips 19 © Sun Microsystems, Inc.

Removing the Chip IO Bottleneck

I/O

lanes p

er

mm

2

Area Ball Bonding

Huge Bandwidth Gain Comparison of Scale

2003 2005 2007 2009

10

100

1000Proximity I/O

Area Ball Bonding

Year120_m

15_m

Proximity Communication10 Tbps per mm2

Page 3: HC19.21.510.Multiterabit Switch Fabrics Enabled by Proximity … · 2013-07-28 · Multiterabit Switch Fabrics Enabled by Proximity Communication Hans Eberle, Alex Chow, Bill Coates,

5Hot Chips 19 © Sun Microsystems, Inc.

Proximity Communication Advantages

� Increases bandwidth/area

� Avoids off-chip wires

� Obviates ESD protection

� Shrinks transceiver circuits

� Lowers power consumption

� Makes multi-chip modules reworkable

� Enables smaller chips

6Hot Chips 19 © Sun Microsystems, Inc.

Opportunity

� Proximity Communication allows forbuilding switch fabrics that scale tothousands of ports and multiple Tbpsthroughput using a flat single-stagenetwork rather than a hierarchical multi-stage network

Switch bisection bandwidth

Chip 1Chip 2

Chip 3

Page 4: HC19.21.510.Multiterabit Switch Fabrics Enabled by Proximity … · 2013-07-28 · Multiterabit Switch Fabrics Enabled by Proximity Communication Hans Eberle, Alex Chow, Bill Coates,

7Hot Chips 19 © Sun Microsystems, Inc.

Blocking Multi-stage Switch

...

...

...

...

...

...

...

...

1

288

1

288

S2,1

24x24

S2,12

24x24

S1,1

12x12

S3,1

12x12

S1,2

12x12

S1,24

12x12

S3,2

12x12

S3,24

12x12

� 36 switches

� 3 stages

� 576 internallinks

...

...

12

13

24

277

12

13

24

277

(Folded network combines S1 and S3)

8Hot Chips 19 © Sun Microsystems, Inc.

Non-blocking Multi-stage Switch

...

...

S2,1

24x24

...

...

...

...S

2,2

24x24

S2,24

24x24

S1,1

12x24

S3,1

24x12

S1,2

12x24

S1,24

12x24

S3,2

24x12

S3,24

24x12

� 72 switches

� 3 stages

� 1,152 internallinks

......

...

...

...

...

1

288

12

13

24

277

1

288

12

13

24

277

n m Expansion: m 2n -1

Page 5: HC19.21.510.Multiterabit Switch Fabrics Enabled by Proximity … · 2013-07-28 · Multiterabit Switch Fabrics Enabled by Proximity Communication Hans Eberle, Alex Chow, Bill Coates,

9Hot Chips 19 © Sun Microsystems, Inc.

Proximity Communication Switch

...

...

...

...

...

...

S1,1

24x24

S1,2

24x24

S1,12

24x24

� 12 switches

� 1 stage

� PxC links

...

1

288

24

25

48

265

1

288

24

25

48

265

10Hot Chips 19 © Sun Microsystems, Inc.

Vector Multi-Chip Module

Switch Element(Island Chip)

PxC Link(Bridge Chip)

Off-Module IO(Wire Bonds)

Page 6: HC19.21.510.Multiterabit Switch Fabrics Enabled by Proximity … · 2013-07-28 · Multiterabit Switch Fabrics Enabled by Proximity Communication Hans Eberle, Alex Chow, Bill Coates,

11Hot Chips 19 © Sun Microsystems, Inc.

Port-Sliced Crossbar Switch

Input Ports

Output ports

12Hot Chips 19 © Sun Microsystems, Inc.

Single-Stage PxC Switch Advantages

� Low deterministic latency

� Simple global scheduling> No internal blocking

> No out-of-sequence delivery

> Service guarantees possible

� Lower cost> Fewer switch elements

> Less internal wiring

� Less power

� Higher reliability

Page 7: HC19.21.510.Multiterabit Switch Fabrics Enabled by Proximity … · 2013-07-28 · Multiterabit Switch Fabrics Enabled by Proximity Communication Hans Eberle, Alex Chow, Bill Coates,

13Hot Chips 19 © Sun Microsystems, Inc.

Switch Prototype Characteristics

� System characteristics> 4 x 10GE ports

> Layer2 switching

> Based on ATCA standard

> Off-the-shelf line cards

> Proprietary switch blade

� Switch fabric> "Vector switch" with 4 Island chips + 2

Bridge chips (3 PxC links)

> Off-chip connections through wire bonds

14Hot Chips 19 © Sun Microsystems, Inc.

Switch Prototype

Line Card

Switch Motherboard

Switch Daughtercard

Page 8: HC19.21.510.Multiterabit Switch Fabrics Enabled by Proximity … · 2013-07-28 · Multiterabit Switch Fabrics Enabled by Proximity Communication Hans Eberle, Alex Chow, Bill Coates,

15Hot Chips 19 © Sun Microsystems, Inc.

PacketChecker

Switch Prototype Organization

PacketGenerator

PacketChecker

Encoder Decoder

Monitor

Flow Ctrl

Switch Fabric (4 x 4 Crossbar)

Line Card 1 Line Card 2 Line Card 3 Line Card 4

Monitor

Switch

16Hot Chips 19 © Sun Microsystems, Inc.

Bridge and Island Chips16 x 1 Gpbs DDR

3 x

16

x 1

Gp

bs D

DR

SER

PxC

Rx

16 x 1 Gpbs DDR

250 MHz

3 x

16

x 1

Gp

bs D

DR

DE

SD

ES

DE

S

DES

Island 1 Island 2 Island 3 Island 4

Bridge 1 Bridge 2

PxC

Rx

PxC

Rx

PxC

Tx

PxC

Tx

PxC

Tx

Page 9: HC19.21.510.Multiterabit Switch Fabrics Enabled by Proximity … · 2013-07-28 · Multiterabit Switch Fabrics Enabled by Proximity Communication Hans Eberle, Alex Chow, Bill Coates,

17Hot Chips 19 © Sun Microsystems, Inc.

Bridge and Island Chips

11.8 mm

10.3

mm

22.9 mm

7.0

mm

Bridge Power

Alignment Measurement

PxC Tx Array

PxC Rx Array

Off-Chip IO

Island

Bridge

Alignment Marker

Switch Logic

Bridge Power

Process Technology:6-Layer Aluminum TSMC 0.18 m CMOS

18Hot Chips 19 © Sun Microsystems, Inc.

Vector Switch Prototype

Page 10: HC19.21.510.Multiterabit Switch Fabrics Enabled by Proximity … · 2013-07-28 · Multiterabit Switch Fabrics Enabled by Proximity Communication Hans Eberle, Alex Chow, Bill Coates,

19Hot Chips 19 © Sun Microsystems, Inc.

Scaling Up

SwitchFabric

Element

SwitchFabric

Element

IngressEgressIngressEgress

IngressEgress

...

Electrical

Phy

Electrical

Phy

SwitchFabric

Element

IngressEgressIngressEgress

IngressEgress

Electrical

Phy

Electrical

Phy

PxC

IngressEgressIngressEgress

IngressEgress

Electrical

Phy

Electrical

Phy

SwitchFabric

Element

IngressEgressIngressEgress

IngressEgress

Electrical

Phy

Electrical

Phy

PxCSwitchFabric

Element

SwitchFabric

Element

SwitchFabric

Element

SwitchFabric

Element

IngressEgressIngressEgress

IngressEgress

OpticalPhy

OpticalPhy

IngressEgressIngressEgress

IngressEgress

Phy

Phy

IngressEgressIngressEgress

IngressEgress

OpticalPhy

OpticalPhy

IngressEgressIngressEgress

IngressEgress

OpticalPhy

OpticalPhy

IngressEgressIngressEgress

IngressEgress

OpticalPhy

OpticalPhy

PxC PxC...

PxC

PxC

PxC

PxC

PxC

PxC

PxC

PxC

PhyIngressEgress

PhyIngressEgress

PhyIngressEgress

PhyIngressEgress

PhyIngressEgress

PhyIngressEgress

PhyIngressEgress

PhyIngressEgress

SwitchFabric

Element

SwitchFabric

Element

SwitchFabric

Element

SwitchFabric

Element

SwitchFabric

Element

SwitchFabric

Element

SwitchFabric

Element

SwitchFabric

Element

...

...

...

...

...

...

...

...

PxC

PxC

PxC

PxC

PxC

PxC

PxC

PxC

PxC

PxC

PxC

PxC

PxC

PxC

PxC

PxC

256 Ports2.5 Tbps

1,024 Ports10 Tbps

4,096 Ports40 Tbps

20Hot Chips 19 © Sun Microsystems, Inc.

Scalable Switch Architecture

� "Output Buffered Switch with Input Groups“> Reduces memory requirements from O(n2) to

O(n � # Island Chips)

> To be presented at Globecom 2007

� "Parallel Wrapped Wave Front Arbiter"> Increases throughput of n x n Wrapped Wave Front

Arbiter by a factor of n

> Presented at HPSR 2007

Page 11: HC19.21.510.Multiterabit Switch Fabrics Enabled by Proximity … · 2013-07-28 · Multiterabit Switch Fabrics Enabled by Proximity Communication Hans Eberle, Alex Chow, Bill Coates,

21Hot Chips 19 © Sun Microsystems, Inc.

Output Buffered Switch with Input Groups

... ... ... ... ... ...Arbiter Arbiter Arbiter

22Hot Chips 19 © Sun Microsystems, Inc.

Applications

� Data center backbone

� Blade system interconnect

� ATCA chassis aggregation

� Cluster interconnect

� System interconnect

Page 12: HC19.21.510.Multiterabit Switch Fabrics Enabled by Proximity … · 2013-07-28 · Multiterabit Switch Fabrics Enabled by Proximity Communication Hans Eberle, Alex Chow, Bill Coates,

23Hot Chips 19 © Sun Microsystems, Inc.

Summary

� Proximity Communication allows forbuilding a flat single-stage switch fabricthat scales to thousands of ports andmultiple Tbps throughput> Low latency

> High efficiency

> Service guarantees

> Low power

> High physical density

Hans Eberle

[email protected]