Infrastructure - HFU Furtwangenhalfback.in.hs-furtwangen.de/home/wp-content/... · Infrastructure...

10
1 Infrastructure Matthias Lermer Hochschule Furtwangen [email protected] HALFbACk Project meeting – November 16 th

Transcript of Infrastructure - HFU Furtwangenhalfback.in.hs-furtwangen.de/home/wp-content/... · Infrastructure...

Page 1: Infrastructure - HFU Furtwangenhalfback.in.hs-furtwangen.de/home/wp-content/... · Infrastructure – Technologies Spark (Open Source) “Lightning-fast cluster computing” Fast

1

• Infrastructure

• Matthias Lermer• Hochschule Furtwangen

[email protected]

HALFbACk Project meeting – November 16th

Page 2: Infrastructure - HFU Furtwangenhalfback.in.hs-furtwangen.de/home/wp-content/... · Infrastructure – Technologies Spark (Open Source) “Lightning-fast cluster computing” Fast

Infrastructure – Overview The Infrastructure provides the means to:

Store sensor data Interfaces and Storage for scalable database: Apache Cassandra Interface for streaming data: Apache Kafka

Store machine fingerprint data OPC-UA Interface Will be linked to corresponding sensor data in Cassandra

Preprocess data in the Halfback Cloud (Openstack Environment) Interfaces and VMs for Real-Time processing: Apache Storm Interfaces and Vms for Batch processing: Apache Spark

Page 3: Infrastructure - HFU Furtwangenhalfback.in.hs-furtwangen.de/home/wp-content/... · Infrastructure – Technologies Spark (Open Source) “Lightning-fast cluster computing” Fast

Infrastructure – Overview The Infrastructure provides the means to:

Analyze the data with the help of Machine Learning in the Halfback Cloud Interface and VMs: Apache Spark Further solutions, e.g., Tensorflow, possible if needed (You can also use your own solutions, as access to the data in the

database will be provided)

Provide access only to authenticated and authorized persons Virtual Privat Network (VPN) Customized Access to interested companies can be provided

Ensure High Availability and Fault Tolerance with Ceph

Page 4: Infrastructure - HFU Furtwangenhalfback.in.hs-furtwangen.de/home/wp-content/... · Infrastructure – Technologies Spark (Open Source) “Lightning-fast cluster computing” Fast

Infrastructure – Overview

Hardware:

Openstack Environment: Halfback Cloud 10 Computing Nodes with Intel Xeon Quad Cores Storage about 4 TB usable right now, everything is replicated in case of fault

Still in the process of upgrading and distributing storage more evenly

(Bottleneck right now because of Ceph)

Page 5: Infrastructure - HFU Furtwangenhalfback.in.hs-furtwangen.de/home/wp-content/... · Infrastructure – Technologies Spark (Open Source) “Lightning-fast cluster computing” Fast

Infrastructure – Overview Storage in HDFS also

possible Depending on the Use

Case (Data) Has to be evaluated

(e.g., Machine Profiles probably better as XML)

Kafka and Storm (or Spark Streaming) in case near Real Time or continuous data processing is needed

Page 6: Infrastructure - HFU Furtwangenhalfback.in.hs-furtwangen.de/home/wp-content/... · Infrastructure – Technologies Spark (Open Source) “Lightning-fast cluster computing” Fast

Infrastructure – Overview Procedure for Access

You will get a VPN Certificate and connect to the Halfback Cloud

Now you can: Access the Database with sensor data / machine fingerprints directly,

copy the data to your own pc for analysis Log in to a VM with pre-installed components e.g., Spark and run your

scripts (Preprocess/Machine Learning) distributed directly in our Cloud Install new or needed components on your VM (Ubuntu 16.04 OS)

Page 7: Infrastructure - HFU Furtwangenhalfback.in.hs-furtwangen.de/home/wp-content/... · Infrastructure – Technologies Spark (Open Source) “Lightning-fast cluster computing” Fast

Infrastructure – Technologies OpenStack (Open-Source)

OpenStack provides the means to create self hosted clouds Control Storage, Computation and Networking Create Virtual Machines with predefined installed components Robust, fault tolerance environment

2 Controller Nodes, 10 Computing Nodes Storage Replication with Ceph

Page 8: Infrastructure - HFU Furtwangenhalfback.in.hs-furtwangen.de/home/wp-content/... · Infrastructure – Technologies Spark (Open Source) “Lightning-fast cluster computing” Fast

Infrastructure – Technologies Ceph (Open-Source) “The Future of Storage”

Storage Clusters with infinite scalability Object Storage (Providing File/Block Layers), think of Valet parking Fault tolerance, Everything is replicated and distributed across nodes

Reliable Autonomic Distributed Object Store (RADOS) Controlled Replication Under Scalable Hashing (CRUSH)

Supports Snapshots, Cloning, Load Balancing Hardware independent

Page 9: Infrastructure - HFU Furtwangenhalfback.in.hs-furtwangen.de/home/wp-content/... · Infrastructure – Technologies Spark (Open Source) “Lightning-fast cluster computing” Fast

Infrastructure – Technologies Spark (Open Source) “Lightning-fast cluster computing”

Fast processing in Big Data Environments (e.g. Cassandra) Managing and processing streams or events (later: Broker) Distributed Machine Learning (E.g. Gridsearch, hyperparameter tuning)

Java, Scala, Python with integrated Mlib library Use existing Tensorflow code (small code change needed) Also support for Cafe, sklearn, Keras, etc.

Page 10: Infrastructure - HFU Furtwangenhalfback.in.hs-furtwangen.de/home/wp-content/... · Infrastructure – Technologies Spark (Open Source) “Lightning-fast cluster computing” Fast

Infrastructure – Technologies Only adaptable components (Open Source) are used Technology stack can quickly be modified and adapted to needs Automation and flexibility plays a big role

Example 1: Machine → OPC-UA → Kafka → Spark Streaming

enables continous Machine Learning Modeling Example 2: Machine → OPC-UA → Cassandra → Spark enables batch Machine Learning Modeling

Components like Kafka (publish/subscribe) will provide basis for automation of broker mechanisms