Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan...
-
Upload
codemotion -
Category
Technology
-
view
80 -
download
0
Transcript of Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan...
![Page 1: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/1.jpg)
Sviluppare applicazioni nell'era dei "Big Data" con Scala e SparkMario Cartia
MILAN 25-26 NOVEMBER 2016
![Page 2: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/2.jpg)
$ whoamiMario CartiaChief System Egineer
![Page 3: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/3.jpg)
Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark
![Page 4: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/4.jpg)
![Page 5: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/5.jpg)
![Page 6: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/6.jpg)
Big DataRisk
![Page 7: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/7.jpg)
Big DataOpportunity
![Page 8: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/8.jpg)
Jonas Bonér
![Page 9: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/9.jpg)
![Page 10: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/10.jpg)
The Reactive Manifesto (2013)
![Page 11: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/11.jpg)
The Reactive Manifesto Responsiveo The system responds in a
timely manner if at all possible
Resiliento The system stays responsive
in the face of failure
![Page 12: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/12.jpg)
The Reactive Manifesto Event-Driven
o Reactive Systems rely on asynchronous message-passing to establish a boundary between components that ensures loose coupling, isolation and location transparency
Elastico The system stays responsive under
varying workload reacting to changes in the input rate by increasing or decreasing the allocated resources
![Page 13: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/13.jpg)
![Page 14: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/14.jpg)
![Page 15: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/15.jpg)
BytecodeInteroperability
![Page 16: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/16.jpg)
![Page 17: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/17.jpg)
History The design of Scala started in
2001 at EPFL, Switzerland by Martin Odersky
First internal use in 2003 to teach “Functional and Logic Programming Course”
Public announcement of Scala 1.0 in 2004
![Page 18: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/18.jpg)
History
The latest version of Scala is 2.12.0 released on 3 november 2016
Scala 2.0 was released in march 2006 On May 2011 Odersky and Bonér launched Typesafe Inc. to provide commercial support and education for Scala (Lightbend from feb 2016)
![Page 19: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/19.jpg)
![Page 20: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/20.jpg)
Features Object Orientedo You can construct elegant class
hierarchies for maximum code reuse and extensibility
Functionalo You can implement object
behavior using higher-order functions
![Page 21: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/21.jpg)
Features Object Orientedo In contrast to Java, all values in
Scala are objects (including primitive types and functions)
o Multiple inheritance using traits (mixin-based composition )
o Statically typedo …
![Page 22: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/22.jpg)
Features Functionalo Every function is a valueo Lambda expressionso Immutable objectso Higher-order functionso Case classes with support for
pattern matching to model algebraic types
o …
![Page 23: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/23.jpg)
Features Othero Type inferenceo Infix notationo Parallel and concurrent
programmingo Actor model (Akka)o …
![Page 24: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/24.jpg)
Akka Is a free and open-source toolkit
and runtime simplifying the construction of concurrent and distributed applications on the JVM
Supports multiple programming models for concurrency, but it emphasizes actor-based concurrency, with inspiration drawn from Erlang
![Page 25: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/25.jpg)
Akka Language bindings exist for both
Java and Scala Akka is written in Scala and, as of
Scala 2.10, Akka's actor implementation is included as part of the Scala standard library
Concurrency is message-based and asynchronous
![Page 26: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/26.jpg)
O’REILLY 2016 European Software Development Salary Survey
![Page 27: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/27.jpg)
Top Adopters
![Page 28: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/28.jpg)
Useful Tools scala scalac scaladoc scalap
similar to Javacounterpart
![Page 29: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/29.jpg)
Useful Tools scalao With no arguments specified,
a Scala shell (REPL) starts and reads commands interactively
$ scalaWelcome to Scala version 2.12.0 Type in expressions to have them evaluated.Type :help for more information.
scala> val i = 2i: Int = 2
scala>
![Page 30: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/30.jpg)
Useful Tools Scala Build Tool (sbt) is an open
source build tool for Scala projects, similar to Maven or Ant with the following characteristics:o build descriptions written in Scala
using a DSLo dependency management using
Ivy (supports Maven-format repositories)
o support for mixed Java/Scala projects
o …
![Page 31: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/31.jpg)
Hello, World!
![Page 32: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/32.jpg)
Hello, World! (REPL)
![Page 33: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/33.jpg)
Learning Resources
![Page 34: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/34.jpg)
Learning Resources
![Page 35: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/35.jpg)
Learning Resources
![Page 36: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/36.jpg)
Learning Resources
![Page 37: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/37.jpg)
![Page 38: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/38.jpg)
History Originally developed in 2012 at
the University of California, Berkeley's AMPLab
In 2013 creators founded a company named Databricks that provide services and support for Spark
First stable release (1.0) on May 2014
![Page 39: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/39.jpg)
![Page 40: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/40.jpg)
Features Provides an interface for
programming entire clusters with implicit data parallelism and fault-tolerance
Provides programmers with an API centered on a data structure called the resilient distributed dataset (RDD)
Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk
![Page 41: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/41.jpg)
Modules Spark SQLo Lets you query structured data
inside Spark programs, using either SQL or a easy to use DataFrame API
o Spark SQL reuses the Hive frontend and metastore, giving you full compatibility with existing Hive data, queries, and UDFs
![Page 42: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/42.jpg)
Modules Spark MLlibo Contains many algorithms and
utilities, including:• Classification• Regression• Clustering• Recommendation• Distributed linear algebra• Statistics• …
![Page 43: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/43.jpg)
Modules Spark Streamingo Brings Apache Spark's language-
integrated API to stream processing, letting you write streaming jobs the same way you write batch jobs
o Recovers both lost work and operator state (e.g. sliding windows) out of the box, without any extra code on your part
![Page 44: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/44.jpg)
Modules Spark Streamingo Lets you reuse the same code for
batch processing, join streams against historical data, or run ad-hoc queries on stream state
o Can read data from HDFS, Flume, Kafka, Twitter and ZeroMQ or custom data sources
![Page 45: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/45.jpg)
Modules Spark GraphXo Collection of API for graphs and
graph-parallel computationo Provides a variety of graph
algorithms like:• PageRank• Connected components• Label propagation• SVD++• Strongly connected components• Triangle count
![Page 46: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/46.jpg)
How it works? Spark features an advanced
Directed Acyclic Graph (DAG) engine supporting cyclic data flow
Each Spark job creates a DAG of task stages to be performed on the cluster
![Page 47: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/47.jpg)
How it works?
![Page 48: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/48.jpg)
How it works?
![Page 49: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/49.jpg)
How it works?val textFile = sc.textFile("hdfs://...") val counts = textFile.flatMap(line => line.split(" ")).map( word => (word, 1)) .reduceByKey(_ + _)
counts.saveAsTextFile("hdfs://...")
Driver Program
RDD
SparkContext
Transformations
Action
![Page 50: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/50.jpg)
How it works?Spark UI
![Page 51: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/51.jpg)
Running modes Standaloneo Spark provides a simple
standalone deploy mode mainly for testing purpose
YARNo Send jobs to Hadoop cluster
Mesoso Send jobs to Apache Mesos
distributed kernel
![Page 52: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/52.jpg)
Learning Resources
![Page 53: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/53.jpg)
Learning Resources
![Page 54: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/54.jpg)
Learning Resources
![Page 55: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/55.jpg)
I corsi di Codemotion Training
Percorsi didattici dal taglio pratico – anche online
> WEB APP SECURITY
> WEB DEVELOPMENT
> IOT
> UX & UI
> BIG DATA
> MOBILE DEVELOPMENT
> LEGAL SOFTWARE DISCIPLINE
> FRONTEND DEVELOPMENT
![Page 56: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/56.jpg)
Bootcamp “Sviluppo Applicazioni Big Data con
Scala e Spark”Dove: Milano
Quando: 2 dicembre 2016Info: desk Codemotion
Prossimo appuntamento!
Email: [email protected]
![Page 57: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/57.jpg)
Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark
Question Time!
![Page 58: Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan 2016](https://reader036.fdocuments.fr/reader036/viewer/2022062523/587080f61a28ab57368b6609/html5/thumbnails/58.jpg)
Thanks!
MILAN 25-26 NOVEMBER 2016
Follow me!https://twitter.com/mariocartiahttps://it.linkedin.com/in/mariocartia
Email:[email protected]
All pictures belongto their respective authors