Introduction

55
Systèmes et Applications Reparties Dr DIALLO Mohamed UFRMI 2016 [email protected] 1

Transcript of Introduction

Page 1: Introduction

1

Systèmes et Applications Reparties

Dr DIALLO MohamedUFRMI 2016

[email protected]

Page 2: Introduction

2

Objectifs du cours • Comprendre les challenges dans un système repartis• Se familiariser avec la mise en œuvre de systèmes repartis • Découvrir l’algorithmique repartie• Etudier des exemples de systèmes distribues • Explorer la recherche dans les systèmes distribues

L'éducation est l'allumage d'une flamme, et non pas le remplissage d'un navire.

(Socrate)

Page 3: Introduction

3

Présentation de l’UE• Huit séances de 4h• CM - 10h - TD 10h – TP 8h• Expose - 4h

• Evaluation• Projet: Présentation d’un papier

de recherche ou d’un système distribue (DEMO) en binôme.• Examen sur table

• Introduction• Communication• Socket et RMI

• Algorithmique distribuée• Synchronisation• Election• Exclusion

• Tolérance aux pannes et P2P• Services web

Page 4: Introduction

4

Définition A distributed system is a collection of independent computers that appears to its users as a single coherent system. (A. Tanenbaum)

Un système réparti : • Des sites indépendants avec un but commun • Un système de communication

A distributed system is one that stops you from getting any work done when a machine you’re never heard of crashes (L. Lamport)

Crédit C. Rabat – Introduction aux systèmes repartis

Page 5: Introduction

5

Characteristics of distributed systems• Each node executes a program concurrently• Knowledge is local

• Nodes have fast access only to their local state, and any information about global state is potentially out of date

• Nodes can fail and recover from failure independently• Messages can be delayed or lost

• Independent of node failure; • it is not easy to distinguish network failure and node failure

• Clocks are not synchronized across nodes • local timestamps do not correspond to the global real time order, which cannot be

easily observed

Distributed Systems for fun and profit - book.mixu.net/distsys/ebook.html

Page 6: Introduction

6

Fallacies of distributed computing• The network is reliable.

• Redundancy / Reliable messaging

• Latency is zero.• Strive to make as few as possible calls / Move

as much data in each call

• Bandwidth is infinite.• Strive to limit the size of the information we

send over the wire

• The network is secure.• Assess risks• Be aware of security and implications

• Topology doesn't change.• Do not depend on specific routes/addresses• Location transparency (ESB, multicast) / Directory

services

• There is one administrator.• Different agendas / rules that can constrain your

app • Help them manage your app.

• Transport cost is zero.• Overhead (Marshalling…)• Costs for running the network

• The network is homogeneous• Do not rely on proprietary protocols, rather

XML…

Arnon Rotem - Fallacies of Distributed Computing Explained

Page 7: Introduction

7

Sample distributed system : The Google cluster architecture (2003)

• Scale• Raw documents (tens of terabytes of

data)• Inverted index (#terabyte)

• Approach• Partitioning and replication (load

balancing)

Combining more than 15,000 commodity-class PCs with fault-tolerant software creates a solution that is more cost-effective than a comparable system built out of a smaller number of high-end servers

Page 8: Introduction

8

Real Facts

Lots of Data out there• NYSE generates 1TB/day• Google processes 700PB/month• Facebook hosts 10 billion photos

taking 1PB of storage

Google search workloads• Google now processes over

40,000 search queries every second on average.• A single Google query uses 1,000

computers in 0.2 seconds to retrieve an answer

Snia.org http://www.internetlivestats.com/google-search-statistics/

Page 9: Introduction

9

Objectifs des systèmes repartis •Accès aux ressources • Transparence •Passage à l’échelle

(Scalability)• Tolérance aux pannes

• Fiabilité (Reliability)•Ouverture

(Interoperability)• Sécurité

Crédit C. Rabat – Introduction aux systèmes repartis

Page 10: Introduction

10

TransparenceTransparency DescriptionAccess Hide differences in data representation and how a resource is

accessedLocation Hide where a resource is locatedMigration Hide that a resource may me moved to another locationRelocation Hide that a resource may me moved to another location while in

useReplication Hide that a resource is replicatedConcurrency Hide that a resource may be shared by several competitive usersFailure Hide the failure and recovery of a resource

Credit A. Tanenbaum

Page 11: Introduction

11

Scalability

• Size scalability• Adding more nodes should make the system linearly faster; • Growing the dataset should not increase latency

• Geographic scalability• Administrative scalability• Adding more nodes should not increase the administrative costs of the

system

A scalable system is one that continues to meet the needs of its users as scale increases

Distributed Systems for fun and profit - book.mixu.net/distsys/ebook.html

Page 12: Introduction

12

Scalability: Performance• Short response time/low latency for a given piece of work • High throughput (rate of processing work) • Low utilization of computing resource(s)

Distributed Systems for fun and profit - book.mixu.net/distsys/ebook.html

Page 13: Introduction

13

Scalability: Availability (and Fault tolerance)Distributed systems can take a bunch of unreliable components, and build a reliable system on top of them (Design for fault tolerance)

Because the probability of a failure occurring increases with the number of components, the system should be able to compensate so as to not become less reliable as the number of components increases.

Fault toleranceAbility of a system to behave in a well-defined manner once faults occur

Distributed Systems for fun and profit - book.mixu.net/distsys/ebook.html

Page 14: Introduction

14

Scale out vs Scale up ?

Distributed Systems for fun and profit - book.mixu.net/distsys/ebook.html

High-end (128 core) – low-end (4 core)

Page 15: Introduction

15

Service Level Agreement• If I write data, how quickly can I access it elsewhere? • After the data is written, what guarantees do I have of

durability?• If I ask the system to run a computation, how quickly will it

return results? •When components fail, or are taken out of operation, what

impact will this have on the system?

Distributed Systems for fun and profit - book.mixu.net/distsys/ebook.html

Page 16: Introduction

16

Consequences of distribution• An increase in the number of independent nodes increases the

probability of failure in a system • Reducing availability and increasing administrative costs

• An increase in the number of independent nodes may increase the need for communication between nodes • Reducing performance as scale increases

• An increase in geographic distance increases the minimum latency for communication between distant nodes • Reducing performance for certain operations

Distributed Systems for fun and profit - book.mixu.net/distsys/ebook.html

Page 17: Introduction

17

Théorie des systèmes repartis• Efficient solutions to specific

problems .• Guidance about what is possible.• Minimum cost of a correct

implementation.• What is impossible.

• Timestamping distributed events. (Lamport)• Leader election• Consistent snapshoting• Consensus is impossible to solve in

fewer than 2 rounds of messages in general• CAP theorem• FLP impossibility• Two Generals problem

Distributed Systems for fun and profit - book.mixu.net/distsys/ebook.html

Page 18: Introduction

18

FLP impossibility result

• Validity: the value agreed upon must have been proposed by some process – safety

• Agreement: all deciding processes agree on the same value - safety

• Termination: at least one non-faulty process eventually decides - liveness

Consensus is the problem of having a set of processes agree on a value proposed by one of those processes.

Page 19: Introduction

19

FLP impossibility resultIn an asynchronous setting, where only one processor might crash, there is no distributed algorithm that solves the consensus problem

Fischer, M. J., Lynch, N. A., & Paterson, M. S. (1985). Impossibility of distributed consensus with one faulty process. Journal of the ACM (JACM), 32(2), 374-382.

Page 20: Introduction

20

CAP Theorem (Brewer Theorem)

Partition tolerance The system continues to operate despite arbitrary partitioning due to network failures

Consistency Every read receives the most recent write or an error

Availability Every request receives a response, without guarantee that it contains the most recent version of the information

http://book.mixu.net/distsys/abstractions.html

Page 21: Introduction

21

Beware ! C in ACID • If the system has certain

invariants that must always hold, if they held before the transaction, they will hold afterward too.

(Example: law of conservation of money)

• In distributed systems : when transactions run concurrently, the result is the same as if it runs in serial.

C in CAP• Relates to data updates

spreading accross all replicas in a cluster.• How operations on a single item

are ordered, and made visible to all nodes of the database.

Page 22: Introduction

22

Technologies pour les systèmes repartis• Intergiciels (Corba, ESB)• RPC, RMI, Web services• Amazon Dynamo / Apache Cassandra• Apache Hadoop

Page 23: Introduction

23

Amazon Dynamo: Highly available NoSQL• A highly available key-value storage

system that some of Amazon’s core services use to provide an “always-on” experience. • To achieve this level of availability,

Dynamo sacrifices consistency under certain failure scenarios.

Giuseppe DeCandia, et al, “Dynamo: Amazon's Highly Available Key-Value Store”, in the Proceedings of the 21st ACM Symposium on Operating Systems Principles, Stevenson, WA, October 2007.

Page 24: Introduction

24

Hadoop: Distributed framework for Big Data.

• Apache top level project, open-source implementation of frameworks for reliable, scalable, distributed computing and data storage.• It is a flexible and highly-

available architecture for large scale computation and data processing on a network of commodity hardware.

• Hadoop fractionne les fichiers en gros blocs et les distribue à travers les nœuds du cluster.• Pour traiter les données, Hadoop

transfère le code à chaque nœud et chaque nœud traite les données dont il dispose

Page 25: Introduction

25

Apache Hadoop• Hadoop Usage scenarios• Search through data looking

for particular patterns.• Sort large amount of data

(#Terabytes)

Page 26: Introduction

26

Intergiciel

Page 27: Introduction

27

Enterprise Service Bus• Middleware oriente message

• Echange de message asynchrone

• Services web (SOA)• Transformations• Routage intelligent

• Découplage expéditeur et destinataire

• Business activity monitoring (BAM)• Business process modeling (BPM) • Mule ESB

• Talend ESB

Wikipedia.fr

Page 28: Introduction

28

Service Oriented Architecture

Page 29: Introduction

29

Modèles fonctionnelsDeux/Trois/N-tiers

Page 30: Introduction

30

Architecture deux tiers

Page 31: Introduction

31

Architecture trois-tiers

Page 32: Introduction

32

Architecture n-tiers

Page 33: Introduction

33

Modèles d’échangeClient/serveurCommunication par message Code mobile Mémoire partagée

Page 34: Introduction

34

Modèle client/serveur (1/2)

Page 35: Introduction

35

Modèle client/serveur (2/2)

Page 36: Introduction

36

Communication par message• Pas de réponse attendue • Messages non sollicites• Exemple: Message Oriented Middleware.• Point-a-point • Publish-Subscribe(Apache ActiveMQ, IBM Websphere MQ, OpenJMS)

Page 37: Introduction

37

Code mobile

Page 38: Introduction

38

Mémoire virtuelle partagée• Les différentes applications partagent une zone mémoire commune.• Applications parallèles: thread• Application distribuée: intergiciel

Page 39: Introduction

39

ConfigurationsCentraliseTotalement décentraliseHybride

Page 40: Introduction

40

Centralise

! Un système peut être centralise mais distribue.

Page 41: Introduction

41

Totalement décentralisée1. No machine has complete information

about the system state. 2. Machines make decisions based only

on local information, 3. Failure of one machine does not ruin

the algorithm/system. 4. There is no implicit assumption that a

global clock exists (no strong coordination).

(Credit A. Tanenbaum)

• Symétrie• Autonomie (administrative)• Fédération

Page 42: Introduction

42

Hiérarchiquei.e. DNS Exemple de système décentralisé mais:• Serveurs racines• Serveurs TLD• Serveurs autorités

Page 43: Introduction

43

Hybridei.e. KazaaSystème décentraliséMais Peers vs Super-peers

Page 44: Introduction

44

Cloud et VirtualisationCloud computing Virtualisation

Page 45: Introduction

Environnement Cloud

45

Community: the members of the community generally share similar security, privacy, performance and compliance requirements.

Credit Bamba Gueye - UCAD

Page 46: Introduction

Modèles d’utilisation

SaaS : c’est la plateforme applicative mettant à disposition des applications complètes fournies à la demande. On y trouve différents types d'application allant du CRM, à la gestion des ressources humaines, comptabilité, outils collaboratifs, messagerie et d'autres applications métiers.

46

PaaS : c’est la plate-forme d’exécution, de déploiement et de développement des applications sur la plate-forme du Cloud Computing.

IaaS : permet d'externaliser les serveurs, le réseau, le stockage dans des salles informatiques distantes. Les entreprises démarrent ou arrêtent des serveurs virtuels hébergés sur la plate-forme de Cloud Computing.

Credit Bamba Gueye - UCAD

Page 47: Introduction

47

Exemple d’application (AWS)

Credit C. Rabat - CNAM

Page 48: Introduction

48

Common virtualization uses today

Page 49: Introduction

49

Common virtualisation uses…• Run legacy software on non-legacy hardware• Run multiple operating systems on the same hardware• Create a manageable upgrade path• Reduce costs by consolidating services onto the fewest number of

physical machines

http://www.vmware.com/img/serverconsolidation.jpg

Page 50: Introduction

50

Non-virtualized data centers • Too many servers for too little work

• High costs and infrastructure needsMaintenanceNetworkingFloor spaceCoolingPowerDisaster Recovery

Page 51: Introduction

51

Virtualisation Features

VM IsolationSecure Multiplexing• Processor HW isolates VMsStrong guarantees• Software bugs, crashes, viruses

within one VM cannot affect other VMs

Performance Isolation• Partition system resources(Controls for reservation, limit, shares)

VM EncapsulationEntire VM is a FileSnapshots and clonesEasy content distribution• Pre-configured apps, demos• Virtual appliances

VM CompatibilityHardware-independentCreate Once, Run Anywhere• Migrate VMs between hostsLegacy VMs• Run ancient OS on new platform

Page 52: Introduction

52

PlanetLabDifferent organizations contribute machines, which they subsequently share for various experiments.

Problem: We need to ensure that different distributed applications do not get into each other’s way => VIRTUALISATION

Page 53: Introduction

53

Planetlab

Vserver: Independent and protected environment with its own libraries, server versions and so on.Distributed apps are assigned a collection of vservers distributed accross multiple machines (slice).

Page 54: Introduction

54

Planetlab map

https://www.planet-lab.org/

Page 55: Introduction

55

Références et liens• Cyril Rabat – Introduction aux systèmes repartis (CNAM)• Distributed systems reading list • https://dancres.github.io/Pages/