Embarquez votre Intelligence Artificielle (IA) sur CPU ... · Artificial Intelligence opportunities...

Post on 07-Jul-2020

4 views 0 download

Transcript of Embarquez votre Intelligence Artificielle (IA) sur CPU ... · Artificial Intelligence opportunities...

1© 2018 The MathWorks, Inc.

Embarquez votre Intelligence Artificielle (IA) sur CPU,

GPU et FPGA

Pierre Nowodzienski – Application Engineer

pierre.nowodzienski@mathworks.fr

2

From Data to Business value

Generate raw data

End devices

Extract information

Data analysis

Get valuable knowledge

Make decisions

Artificial

Intelligence

3

1 0 1 1 0 11 0 1 1 0 1

Amount of data

Transport cost

High latency

Availability

Artificial Intelligence opportunities in « Internet of Everything » world

CLOUD

Energy cost

4

Do the right thing at the right place

Artificial Intelligence opportunities in « Internet of Everything » world

CLOUD

Mission Real-time analyticsLocal control center

Operational Intelligence

Global control center

Business intelligence

SWaP-C High Medium Low

Latency Very Low Low - Medium High

Today webinar focus:

How can we design and deploy Neural

Networks on embedded targets ?

5

Embedded targets & mitigations

Efficiency

(performance/watt)

Development

productivity

Low

High

HighLow

Code generation

Code generation

Code

generation

• C/C++ programing language

• Sequential processing

• CUDA/ OpenCL programing

language

• Partly parallel processing

• VHDL/Verilog programing

language

• Partly parallel processing

6

MathWorks workflows: Neural Network to embedded targets

Artificial Neural Network

Design & Training

Application

design

Dataset

Train the Network

Trained

Convolutional or

DAG Network

Trained

Shallow Neural

Network

GPU Coder

Embedded

Coder

HDL CoderASIC

ANSI/ISO

compliant

Application

logic

Application

logic

First part:

Deploying Deep Neural

Network

Second part:

Deploying Shallow

Neural Network

7

Deep Learning is a Subset of Machine Learning

Machine Learning

Deep Learning

8

Algorithm Design to Embedded Deployment Workflow

MATLAB algorithm

(functional reference)

Functional test1 Deployment

unit-test

2

Desktop

GPU

C++

Deployment

integration-test

3

Desktop

GPU

C++

Real-time test4

Embedded GPU

.mex .lib Cross-compiled

.lib

Build type

Call CUDA

from MATLAB

directly

Call CUDA from

(C++) hand-

coded main()

Call CUDA from (C++)

hand-coded main().

Application

logic

9

Demo: Alexnet Deployment with ‘mex’ Code Generation

10

Algorithm Design to Embedded Deployment on Tegra GPU

Functional test1

(Test in MATLAB on host)

Deployment

unit-test

2

(Test generated code in

MATLAB on host + GPU)

Tesla

GPU

C++

Deployment

integration-test

3

(Test generated code within

C/C++ app on host + GPU)

Tesla

GPU

C++

Real-time test4

(Test generated code within

C/C++ app on Tegra target)

Tegra GPU

.mex .lib Cross-compiled

.lib

Build type

Call CUDA

from MATLAB

directly

Call CUDA from

(C++) hand-

coded main()

Call CUDA from (C++)

hand-coded main().

Cross-compiled on host

with Linaro toolchain

MATLAB algorithm

(functional reference)

Application

logic

11

Alexnet Deployment to Tegra: Cross-Compiled with ‘lib’

Two small changes

1. Change build-type to ‘lib’

2. Select cross-compile toolchain

12

Desktop CPU

Raspberry Pi board

Deploying to CPUs

GPU

Coder

NVIDIA

TensorRT &

cuDNN

Libraries

Application

logic

13

GPU Coder for Deployment

Deep Neural Networks

Deep Learning, machine learning

Image Processing and

Computer Vision

Image filtering, feature detection/extraction

Signal Processing and

Communications FFT, filtering, cross correlation,

5x faster than TensorFlow

2x faster than MXNet

60x faster than CPUs

for stereo disparity

20x faster than

CPUs for FFTs

ARM Compute

Library

Intel

MKL-DNN

Library

GPU Coder

14

MathWorks workflows: Neural Network to embedded targets

Artificial Neural Network

Design & Training

Application

design

Dataset

Train the Network

Trained

Convolutional or

DAG Network

Trained

Shallow Neural

Network

GPU Coder

Embedded

Coder

HDL CoderASIC

ANSI/ISO

compliant

Application

logic

Application

logic

Second part:

Deploying Shallow

Neural Network

15

Demo: Shallow network deployment on Zynq platform

Neural network as gas emission estimator (sensorless)

Engine

Shallow Neural

Network

Engine torque

Gas emission

Estimated torque

Estimated gas emission

Speed command

Fuel Rate

16

Demo workflow

Train the

Network

Create the

Network structure

Test the

Network

Iterate

Export to

Simulink

Fine-tune &

optimize for

the target

Generate

code

17

Demo summary

Train the

Network

Create the

Network structure

Test the

Network

Iterate

Export to

Simulink

Fine-tune &

optimize for

the target

Generate

code

Neural Network Toolbox

Parallel Computing Toolbox

Fixed Point

Designer

HDL Coder

Embedded

Coder

18

HDL Optimization options

▪ HDL Coder with Simulink

– Streaming

– Sharing

– Line buffers as RAMs

– RAM Fusion

– Architecture Flattening

– Efficient resource mapping

▪ HDL Coder with MATLAB

– RAM Mapping

– Loop Streaming

– Resource Sharing

– CSD/FCSD

▪ HDL Coder with Simulink

– Input/Output pipelining

– Distributed Pipelining

– Hierarchical Dist. Pipelining

– Constrained Pipelining

– Clock-Rate Pipelining

– Back-Annotation

– Adaptive Pipelining

▪ HDL Coder with MATLAB

– Input/Output pipelining

– Distributed pipelining

– Loop Unrolling

▪ HDL Workflow Advisor

▪ Automatic Delay Balancing

▪ Validation model generation

Area Optimizations Speed Optimizations

Workflow and Verification

19

Key takeaways

▪ Comprehensive & integrated development environment from dataset to target

▪ Fast design space exploration and trade-off

▪ Target-independant functional reference for target-optimized implementation

model

▪ Deploy « Smart application », not Neural network only

20

MathWorks workflows: Neural Network to embedded targets

Artificial Neural Network

Design & Training

Application

design

Dataset

Train the Network

Trained

Convolutional or

DAG Network

Trained

Shallow Neural

Network

GPU Coder

Embedded

Coder

HDL CoderASIC

ANSI/ISO

compliant

Application

logic

Application

logic

21

Next steps

▪ Web site technical resources

– Lookup Table Optimization

– Data Type Optimization (documentation)

– Efficient Implementation on FPGAs (documentation)

– Deep Learning Inference for Object Detection on Raspberry Pi

– Pedestrian Detection on a NVIDIA GPU with TensorRT

▪ Contact us

– pierre.nowodzienski@mathworks.fr

– +33-1-41-14-88-45

Special thanks to Vaidehi Venkatesan (Fixed-Point Designer development team)

for her great job to create this demo material!