Running the largest HDFS cluster - University of …cloud.berkeley.edu/data/hdfs-scalability.pdf4!...

24
Running the largest HDFS cluster Hairong Kuang, Tom Nykiel [email protected] [email protected]

Transcript of Running the largest HDFS cluster - University of …cloud.berkeley.edu/data/hdfs-scalability.pdf4!...

Page 1: Running the largest HDFS cluster - University of …cloud.berkeley.edu/data/hdfs-scalability.pdf4! HDFS Architecture Read Metadata ops Client Metadata (Name, #replicas, …): /users/foo/data,

Running the largest HDFS cluster

Hairong Kuang, Tom Nykiel [email protected] [email protected]

Page 2: Running the largest HDFS cluster - University of …cloud.berkeley.edu/data/hdfs-scalability.pdf4! HDFS Architecture Read Metadata ops Client Metadata (Name, #replicas, …): /users/foo/data,

Agenda �  HDFS at Facebook

�  Improving HDFS scalability

�  HDFS federation

Page 3: Running the largest HDFS cluster - University of …cloud.berkeley.edu/data/hdfs-scalability.pdf4! HDFS Architecture Read Metadata ops Client Metadata (Name, #replicas, …): /users/foo/data,

What is HDFS �  HDFS:

�  Storage layer for Hadoop Open Source Apache project �  Scale: petabytes of data on thousands of nodes

�  Characteristics: �  Uses clusters of commodity computers �  Use replication across servers to deal with unreliable storage/servers �  Metadata-data separation - simple design �  Slightly Restricted file semantics

�  Focus is mostly sequential access �  Single writers �  No file locking features

�  Supports moving computation close to data �  Single ‘storage + compute’ cluster vs. Separate clusters

Page 4: Running the largest HDFS cluster - University of …cloud.berkeley.edu/data/hdfs-scalability.pdf4! HDFS Architecture Read Metadata ops Client Metadata (Name, #replicas, …): /users/foo/data,

4

HDFS Architecture

Read

Metadata ops

Client

Metadata (Name, #replicas, …): /users/foo/data, 3, …

Namenode

Client

Datanodes

Rack 1 Rack 2

Replication

Block ops

Datanodes

Write

Blocks

Metadata ops

Page 5: Running the largest HDFS cluster - University of …cloud.berkeley.edu/data/hdfs-scalability.pdf4! HDFS Architecture Read Metadata ops Client Metadata (Name, #replicas, …): /users/foo/data,

Facebook Use of HDFS

HDFS

HBase

titan ODS

Hive Scribe …

•  Quiz: What is the total number of HDFS clusters used in Facebook ???

•  The biggest one: warehouse cluster storing Hive tables

Page 6: Running the largest HDFS cluster - University of …cloud.berkeley.edu/data/hdfs-scalability.pdf4! HDFS Architecture Read Metadata ops Client Metadata (Name, #replicas, …): /users/foo/data,

The largest HDFS cluster �  Thousands of nodes

�  Close to 100 PB of configured capacity

�  100+ million files

�  Thousands of concurrent clients access the cluster

�  At peak hour, thousands of audit requests per second

�  It is growing each day

Page 7: Running the largest HDFS cluster - University of …cloud.berkeley.edu/data/hdfs-scalability.pdf4! HDFS Architecture Read Metadata ops Client Metadata (Name, #replicas, …): /users/foo/data,

Growth of the cluster

Number of files(m) Used Capacity (PB)

2009

2010

2011

Page 8: Running the largest HDFS cluster - University of …cloud.berkeley.edu/data/hdfs-scalability.pdf4! HDFS Architecture Read Metadata ops Client Metadata (Name, #replicas, …): /users/foo/data,

Agenda

�  Improving HDFS scalability

�  Scale of the system �  System monitoring �  Communication �  Synchronization �  Data structures and algorithms �  Network awareness �  Handling persistency �  Memory management �  Tiny bugs – huge losses

Page 9: Running the largest HDFS cluster - University of …cloud.berkeley.edu/data/hdfs-scalability.pdf4! HDFS Architecture Read Metadata ops Client Metadata (Name, #replicas, …): /users/foo/data,

Scale of the system �  FSDirectory

�  Information about all files/directories in the namespace

�  BlocksMap �  Information about all the blocks in the filesystem

�  Other associated structures - examples: �  Queues for storing replication status �  Table of datanodes

Memory utilization:

�  In memory state

�  FSImage + FSEditsLog �  ensuring persistency

�  System logs �  debugging and monitoring

Page 10: Running the largest HDFS cluster - University of …cloud.berkeley.edu/data/hdfs-scalability.pdf4! HDFS Architecture Read Metadata ops Client Metadata (Name, #replicas, …): /users/foo/data,

System monitoring - logging

10

Read

Metadata ops

Client

Namenode

Datanodes

Replication

Block ops

Datanodes

Blocks

Page 11: Running the largest HDFS cluster - University of …cloud.berkeley.edu/data/hdfs-scalability.pdf4! HDFS Architecture Read Metadata ops Client Metadata (Name, #replicas, …): /users/foo/data,

Communication

Namenode

Datanodes

Datanodes

Blocks

Page 12: Running the largest HDFS cluster - University of …cloud.berkeley.edu/data/hdfs-scalability.pdf4! HDFS Architecture Read Metadata ops Client Metadata (Name, #replicas, …): /users/foo/data,

Synchronization

ONE LANE ROAD

AHEAD

Page 13: Running the largest HDFS cluster - University of …cloud.berkeley.edu/data/hdfs-scalability.pdf4! HDFS Architecture Read Metadata ops Client Metadata (Name, #replicas, …): /users/foo/data,

Data structures and algorithms

Page 14: Running the largest HDFS cluster - University of …cloud.berkeley.edu/data/hdfs-scalability.pdf4! HDFS Architecture Read Metadata ops Client Metadata (Name, #replicas, …): /users/foo/data,

Network awareness

Namenode

Datanodes

Datanodes

Rack1

Master Rack

Rack2

Network switch

Page 15: Running the largest HDFS cluster - University of …cloud.berkeley.edu/data/hdfs-scalability.pdf4! HDFS Architecture Read Metadata ops Client Metadata (Name, #replicas, …): /users/foo/data,

Handling persistency

Namenode

SecondaryNamenode

FSImage t0

FSEdits FSImage

t1

Client

Client

Page 16: Running the largest HDFS cluster - University of …cloud.berkeley.edu/data/hdfs-scalability.pdf4! HDFS Architecture Read Metadata ops Client Metadata (Name, #replicas, …): /users/foo/data,

Memory management

�  Namenode running with enormous heap space

�  Problem: A full GC takes at least 10 minutes �  The NameNode is non-responsive !

�  Improvements: �  Configuration changes

�  Avoid unnecessary creations of temporary data

Page 17: Running the largest HDFS cluster - University of …cloud.berkeley.edu/data/hdfs-scalability.pdf4! HDFS Architecture Read Metadata ops Client Metadata (Name, #replicas, …): /users/foo/data,

Tiny bugs - huge losses

�  A bug in the MR application layer caused the scan of the whole /tmp subtree for each job submitted: �  Huge number of VALID requests to the NameNode

�  Another bug in the application layer exploded the number of metadata read requests by 12 times.

Page 18: Running the largest HDFS cluster - University of …cloud.berkeley.edu/data/hdfs-scalability.pdf4! HDFS Architecture Read Metadata ops Client Metadata (Name, #replicas, …): /users/foo/data,

Agenda

�  HDFS federation

Page 19: Running the largest HDFS cluster - University of …cloud.berkeley.edu/data/hdfs-scalability.pdf4! HDFS Architecture Read Metadata ops Client Metadata (Name, #replicas, …): /users/foo/data,

Static Partitioned Clusters

HDFS2

Page 20: Running the largest HDFS cluster - University of …cloud.berkeley.edu/data/hdfs-scalability.pdf4! HDFS Architecture Read Metadata ops Client Metadata (Name, #replicas, …): /users/foo/data,

Cluster overlay

NN1 NN2

DN1 DN1 DN2 DN2 DN3 DN3

Page 21: Running the largest HDFS cluster - University of …cloud.berkeley.edu/data/hdfs-scalability.pdf4! HDFS Architecture Read Metadata ops Client Metadata (Name, #replicas, …): /users/foo/data,

Federation

NN1 NN2

DN1 DN2 DN3

Page 22: Running the largest HDFS cluster - University of …cloud.berkeley.edu/data/hdfs-scalability.pdf4! HDFS Architecture Read Metadata ops Client Metadata (Name, #replicas, …): /users/foo/data,

Federation

NN1 NN2

DN1 DN2 DN3

NN3

Page 23: Running the largest HDFS cluster - University of …cloud.berkeley.edu/data/hdfs-scalability.pdf4! HDFS Architecture Read Metadata ops Client Metadata (Name, #replicas, …): /users/foo/data,

Conclusion

�  We have tons of data stored in HDFS in many clusters,

including one of the largest clusters in the world.

�  We need to deal with problems never faced before

�  Our job is to keep it running efficiently, not lose data,

and make it highly available !

Page 24: Running the largest HDFS cluster - University of …cloud.berkeley.edu/data/hdfs-scalability.pdf4! HDFS Architecture Read Metadata ops Client Metadata (Name, #replicas, …): /users/foo/data,

Future �  Improve NameNode availability

�  Manual / Automatic failover

�  Improve I/O efficiency

�  Cross data-center support