Embarquez votre Intelligence Artificielle (IA) sur CPU ... · Artificial Intelligence opportunities...
Transcript of Embarquez votre Intelligence Artificielle (IA) sur CPU ... · Artificial Intelligence opportunities...
1© 2018 The MathWorks, Inc.
Embarquez votre Intelligence Artificielle (IA) sur CPU,
GPU et FPGA
Pierre Nowodzienski – Application Engineer
2
From Data to Business value
Generate raw data
End devices
Extract information
Data analysis
Get valuable knowledge
Make decisions
Artificial
Intelligence
3
1 0 1 1 0 11 0 1 1 0 1
Amount of data
Transport cost
High latency
Availability
Artificial Intelligence opportunities in « Internet of Everything » world
CLOUD
Energy cost
4
Do the right thing at the right place
Artificial Intelligence opportunities in « Internet of Everything » world
CLOUD
Mission Real-time analyticsLocal control center
Operational Intelligence
Global control center
Business intelligence
SWaP-C High Medium Low
Latency Very Low Low - Medium High
Today webinar focus:
How can we design and deploy Neural
Networks on embedded targets ?
5
Embedded targets & mitigations
Efficiency
(performance/watt)
Development
productivity
Low
High
HighLow
Code generation
Code generation
Code
generation
• C/C++ programing language
• Sequential processing
• CUDA/ OpenCL programing
language
• Partly parallel processing
• VHDL/Verilog programing
language
• Partly parallel processing
6
MathWorks workflows: Neural Network to embedded targets
Artificial Neural Network
Design & Training
Application
design
Dataset
Train the Network
Trained
Convolutional or
DAG Network
Trained
Shallow Neural
Network
GPU Coder
Embedded
Coder
HDL CoderASIC
ANSI/ISO
compliant
Application
logic
Application
logic
First part:
Deploying Deep Neural
Network
Second part:
Deploying Shallow
Neural Network
7
Deep Learning is a Subset of Machine Learning
Machine Learning
Deep Learning
8
Algorithm Design to Embedded Deployment Workflow
MATLAB algorithm
(functional reference)
Functional test1 Deployment
unit-test
2
Desktop
GPU
C++
Deployment
integration-test
3
Desktop
GPU
C++
Real-time test4
Embedded GPU
.mex .lib Cross-compiled
.lib
Build type
Call CUDA
from MATLAB
directly
Call CUDA from
(C++) hand-
coded main()
Call CUDA from (C++)
hand-coded main().
Application
logic
9
Demo: Alexnet Deployment with ‘mex’ Code Generation
10
Algorithm Design to Embedded Deployment on Tegra GPU
Functional test1
(Test in MATLAB on host)
Deployment
unit-test
2
(Test generated code in
MATLAB on host + GPU)
Tesla
GPU
C++
Deployment
integration-test
3
(Test generated code within
C/C++ app on host + GPU)
Tesla
GPU
C++
Real-time test4
(Test generated code within
C/C++ app on Tegra target)
Tegra GPU
.mex .lib Cross-compiled
.lib
Build type
Call CUDA
from MATLAB
directly
Call CUDA from
(C++) hand-
coded main()
Call CUDA from (C++)
hand-coded main().
Cross-compiled on host
with Linaro toolchain
MATLAB algorithm
(functional reference)
Application
logic
11
Alexnet Deployment to Tegra: Cross-Compiled with ‘lib’
Two small changes
1. Change build-type to ‘lib’
2. Select cross-compile toolchain
12
Desktop CPU
Raspberry Pi board
Deploying to CPUs
GPU
Coder
NVIDIA
TensorRT &
cuDNN
Libraries
Application
logic
13
GPU Coder for Deployment
Deep Neural Networks
Deep Learning, machine learning
Image Processing and
Computer Vision
Image filtering, feature detection/extraction
Signal Processing and
Communications FFT, filtering, cross correlation,
5x faster than TensorFlow
2x faster than MXNet
60x faster than CPUs
for stereo disparity
20x faster than
CPUs for FFTs
ARM Compute
Library
Intel
MKL-DNN
Library
GPU Coder
14
MathWorks workflows: Neural Network to embedded targets
Artificial Neural Network
Design & Training
Application
design
Dataset
Train the Network
Trained
Convolutional or
DAG Network
Trained
Shallow Neural
Network
GPU Coder
Embedded
Coder
HDL CoderASIC
ANSI/ISO
compliant
Application
logic
Application
logic
Second part:
Deploying Shallow
Neural Network
15
Demo: Shallow network deployment on Zynq platform
Neural network as gas emission estimator (sensorless)
Engine
Shallow Neural
Network
Engine torque
Gas emission
Estimated torque
Estimated gas emission
Speed command
Fuel Rate
16
Demo workflow
Train the
Network
Create the
Network structure
Test the
Network
Iterate
Export to
Simulink
Fine-tune &
optimize for
the target
Generate
code
17
Demo summary
Train the
Network
Create the
Network structure
Test the
Network
Iterate
Export to
Simulink
Fine-tune &
optimize for
the target
Generate
code
Neural Network Toolbox
Parallel Computing Toolbox
Fixed Point
Designer
HDL Coder
Embedded
Coder
18
HDL Optimization options
▪ HDL Coder with Simulink
– Streaming
– Sharing
– Line buffers as RAMs
– RAM Fusion
– Architecture Flattening
– Efficient resource mapping
▪ HDL Coder with MATLAB
– RAM Mapping
– Loop Streaming
– Resource Sharing
– CSD/FCSD
▪ HDL Coder with Simulink
– Input/Output pipelining
– Distributed Pipelining
– Hierarchical Dist. Pipelining
– Constrained Pipelining
– Clock-Rate Pipelining
– Back-Annotation
– Adaptive Pipelining
▪ HDL Coder with MATLAB
– Input/Output pipelining
– Distributed pipelining
– Loop Unrolling
▪ HDL Workflow Advisor
▪ Automatic Delay Balancing
▪ Validation model generation
Area Optimizations Speed Optimizations
Workflow and Verification
19
Key takeaways
▪ Comprehensive & integrated development environment from dataset to target
▪ Fast design space exploration and trade-off
▪ Target-independant functional reference for target-optimized implementation
model
▪ Deploy « Smart application », not Neural network only
20
MathWorks workflows: Neural Network to embedded targets
Artificial Neural Network
Design & Training
Application
design
Dataset
Train the Network
Trained
Convolutional or
DAG Network
Trained
Shallow Neural
Network
GPU Coder
Embedded
Coder
HDL CoderASIC
ANSI/ISO
compliant
Application
logic
Application
logic
21
Next steps
▪ Web site technical resources
– Lookup Table Optimization
– Data Type Optimization (documentation)
– Efficient Implementation on FPGAs (documentation)
– Deep Learning Inference for Object Detection on Raspberry Pi
– Pedestrian Detection on a NVIDIA GPU with TensorRT
▪ Contact us
– +33-1-41-14-88-45
Special thanks to Vaidehi Venkatesan (Fixed-Point Designer development team)
for her great job to create this demo material!