Сергей Гололобов Intelligent Computing · •Генераторы случайных...
Transcript of Сергей Гололобов Intelligent Computing · •Генераторы случайных...
Сергей Гололобов
Intelligent Computing
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice2
• Что такое МКЛ?• Почему МКЛ?• Что нового в МКЛ 2018?
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice3
• Ускоряет математические вычисления в задачах машинного обучения, научных, финансовых и инженерных приложений
• Содержит функции для линейной алгебры (BLAS, LAPACK, PARDISO), БПФ, векторной математики, статистики и др.
• Использует имеющиеся на рынке стандартные интерфейсы для лёгкой смены библиотек
• Оптимизированные, параллелизованные и векторизованные программы для максимальной производительности
Intel® Math Kernel LibraryEnergy Financial
AnalyticsEngineering
DesignDigital
Content Creation
Science & Research
Signal Processing
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice4
Компоненты Intel® MKL 2018
Линейная алгебра
• BLAS• LAPACK• ScaLAPACK• Sparse BLAS• Sparse Solvers
• Iterative • PARDISO*
• Cluster Sparse Solver
Быстрые преобразования
Фурье
• Многомерные• FFTW интерфейсы• Cluster FFT
Векторные функции
• Тригонометрические• Гиперболические• Экспоненты• Логарифмы• Степенные• Корни• Генераторы случайных
чисел
Статистика
• Эксцесс• Коэффициенты
вариации• Статистика
рангов• Min/max• Дисперсия,
ковариация
А также
• Сплайны• Интерполяция• Trust Region• Fast Poisson
Solver• Свёртки• Нормализации• И др. ☺
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice5
Оптимизированные математические примитивы
▪ Производительность – наше всё☺!
Самая популярная библиотека математических примитивов в мире
▪ Используется везде ☺ (например, в Matlab, CADах)
Заточена под существующие и будущие процессоры компании Интел, НО работает на всех х86 совместимых процессорах
Как Intel® Math Kernel Library (Intel® MKL) работает?
Большинство пользователей математических библиотек используют
MKL
EDC North America Development Survey 2016,
Volume I
Знает всё о многоядерных
процессорах и системах
• Поддерживает все платформы
• Векторизована, параллельна и работает с кластерами
Инженеры компании Интел
Эксперты
Автоматический диспетчер
Intel®-совместимые процессоры
Прошлые поколения
процессоровIntel®
Современныепроцессоры
Intel®
Будущие поколения
процессоровIntel®
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
Intel® MKL: оцените производительность на Зеонах!
6Последняя версия Intel® MKL даёт высочайшую
производительность на всех процессорах компании Интел
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
Intel® MKL: оцените производительность на Зеонах!
7Последняя версия Intel® MKL даёт высочайшую
производительность на всех процессорах компании Интел
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
Intel® MKL: оцените производительность на Зеон Фи☺!
8
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/performance.
Последняя версия Intel® MKL даёт высочайшую производительность на всех процессорах компании Интел
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
Intel® MKL: оцените производительность на Зеон Фи☺!
9Последняя версия Intel® MKL даёт высочайшую
производительность на всех процессорах компании Интел
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice10
Новые и дополненные математические функцииIntel® Math Kernel Library версии 2018 Бета
BLAS
Новые функции целочисленного умножения
Улучшенная производительностьSGEMM
Оптимизированные свёртки и внутренние произведения
BLASГрупповой интерфейс
Эффективность и производительность
Новый групповой интерфейс для решения треугольных матриц
УскоренныйGEMM_BATCH
Подходит для маленьких и средних матриц
LAPACK
Функции факторизации и решения систем
Улучшенная производительность
Новые векторные функции
Богатый выбор оптимизированных функций
Алгоритм Аасена
24 новых алгоритма
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
Библиотека Intel® Math Kernel Library доступна бесплатно для целей обучения студентов и школьников, а также для ряда
научных исследований
ПРОБУЙТЕ И НАСЛАЖДАЙТЕСЬ ПРОИЗВОДИТЕЛЬНОСТЬЮ!
https://software.intel.com/ru-ru/intel-mkl
11
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
Что нового в Intel® Math Kernel Library 2017 • Intel® Math Kernel Library (Intel® MKL) 2017 оптимизирована для 2-го поколения процессоров Intel®
Xeon Phi™
• Добавлены оптимизации для последнего поколения процессоров Intel® Xeon® & Core™
• Расширена поддержка Intel® TBB для всех функций BLAS 1-го уровня
• Добавлены расширения из LAPACK версии 3.6
• Улучшена производительность функций LU-факторизации и обращения матрицы для сверхмалых размеров (<16)
• Оптимизированные примитивы для глубоких нейронных сетей (CNN и DNN)
• Новый интерфейс для работы с упакованными матрицами
• Улучшена производительность ScaLAPACK для симметричных задач на собственные значения
• Новые функции для B-сплайнов и монотонных сплайнов
• Экспериментальная функциональность - SparseQR & наименьшие/наибольшие собственные значения
14
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
Additional Material
▪ Product page – overview, features, FAQs, support…
▪ Training materials – movies, tech briefs, documentation…
▪ Evaluation guides – step by step walk through
▪ Reviews
Additional Development Products:
▪ Intel® Software Development Products
15
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice16
Enhanced Application Performance with Intel® AVX-512 Support
Enhanced performance due to Intel® AVX-512 instructions taking advantage of FMA units, memcpy, new pre-fetch instructions, new transcendental instructions, MCDRAM,
and increased number of cores.
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice17
Enhanced Application Performance with AVX-512 SupportKey functionality / library domain KNL features used to deliver enhanced performance (instructions, other)
Intel® Math Kernel Library
*GEMMs/BLASMP Linpack
• Two FMA units + 2 instruction decoders are key• AVX512 FMA (vfmadd231ps or vfm231pd)
• Prefetcht0 instruction• MCDRAM
LU/CHolesky/QR/LAPACK/SMP Linpack Same as in BLAS (as main LAPACK kernel is ?*GEMM) + greater core count
2D and 3D FFTs• Two FMA units + 2 instruction decoders• MCDRAM, tile-to-tile mesh
• AVX512 FMA• Prefetcht1 instruction
DNN• Two FMA units + 2 instruction decoders• MCDRAM, tile-to-tile mesh • AVX512 FMA
• Prefetcht0, prefetcht1 instruction• Masking support• Large core count
Sparse• Two FMA units + 2 instruction decoders• MCDRAM • AVX512 FMA
• Prefetcht1 instruction• Depend on seq. Blas level 3 Knights Landing improvement
Vector Statistics Similar to BLAS/LAPACK, greater number of cores
Vector Math• AVX512 FMA• Two FMA units + 2 instruction decoders• Large number of cores for MT performance
• New Transcendental Support Instructions: VGETEXP, VGETMANT, VRNDSCALE, VSCALEF, VFIXUPIMM, VRCP28, VRSQRT28, VEXP2
Intel® Integrated Performance Primitives
All – from Signal Processing (1D) and up to Image (2D) and Volume (3D) processing
The main advantage inherited from LRB/KNC is support of mask registers and therefore support of predicates for all new instructions. Then, - full 512-bit register palign support (no lanes restrictions as for “old” AVX palign)- _mm512_alignr_epi32, _mm512_alignr_epi64. Then, on the “fly” integer conversions: vpmovq{w|b|d}, vpmovq{w|b}. And the last one – integer any-direction comparison: vpcmp{d|q} and vpcmpu{d|q}.
Intel® Data Analytics Acceleration Library
Similar to BLAS/LAPACK, greater number of cores
Intel® MPI Library
• Used compiler’s AVX-512 version of memcpy (but w/ fix, failed CQ on ICC)• Build IMPI w/ -fvisibility=hidden (make all symbols as hidden by default and only needed as external). Addressed KNL micro-arch features, such as
short BTB, by reducing access to PLT/GOT• Reduced/simplified critical path where it’s possible. Addressed KNL frond-end specifics.
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice18
Easy Access to Intel® Parallel Studio XE RuntimesFor Amazon Web Services* users only
Intel Parallel Studio XE Runtime
▪ Required to be able to run applications built with the Intel® Performance Libraries or Intel® compilers.
▪ Includes latest optimizations for Intel® architecture for faster application performance
▪ Linux* only
Easy access for Amazon Web Services users at no cost
▪ Latest runtimes through Linux native repos
– YUM repo – available now! (http://bit.ly/ParallelStudioXE-Runtimes)
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
Educating with Webinar Series about 2017 Tools
▪ Expert talks about the new features
▪ Series of live webinars, September 13 – November 8, 2016
▪ Attend live or watch after the fact.
https://software.intel.com/events/hpc-webinars
19
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice20
Educating withHigh-Performance Programming BookKnights-Landing-specific details, programming advice, and real-world examples.
Intel® Xeon Phi™ Processor High Performance Programming
▪ Techniques to generally increase program performance on any system and prepare you better for Intel Xeon Phi processors.
Available as of June 2016
http://lotsofcores.com
” I believe you will find this book is an invaluable reference to help develop your own Unfair Advantage.”
James A. Manager
Sandia National Laboratories
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice21
More Education withsoftware.intel.com/moderncode
▪ Online community growing collection of tools, trainings, support
– Features Black Belts in parallelism from Intel and the industry
▪ Intel® HPC Developer Conferences developers share proven techniques and best practices
– hpcdevcon.intel.com
▪ Hands-on training for developers and partners with remote access to Intel® Xeon® processor and Xeon Phi™ coprocessor-based clusters.
– software.intel.com/icmp
▪ Developer Access Program provides early access to Intel® Xeon Phi™ processor codenamed Knights Landing plus one-year license for Intel® Parallel Studio XE Cluster Edition.
– http://dap.xeonphi.com/
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice22
Choices to Fit Needs: Intel® ToolsAll Products with support – worldwide, for purchase.
– Intel® Premier Support - private direct support from Intel
– support for past versions
– software.intel.com/products
Most Products without Premier support – viaspecial programs for those who qualify
– students, educators, classroom use,open source developers, and academic researchers
– software.intel.com/qualify-for-free-software
▪ Intel® Performance Libraries without Premier support -Community licensing for Intel performance libraries
– no royalties,no restrictions based on company or project size
– software.intel.com/nest Community support only – Intel Performance Libraries:Community Licensing (no qualification required)
Community support only – all tools:Students, Educators, classroom use,
Open Source Developers,Academic Researchers (qualification required)
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
Automatic Performance Scaling from the Core, to Multicore, to Many Core and BeyondIntel® MKL
Extracting performance from the computing resources
▪ Core: vectorization, prefetching, cache utilization
▪ Multi-Many core (processor/socket) level parallelization
▪ Multi-socket (node) level parallelization
▪ Clusters scaling
MKL+
OpenMP
MKL+
Intel® MPI
Sequential Intel® MKL
Many CoreIntel® Xeon PhiTM
Coprocessor
23
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
Big Data and Machine Learning Challenge
Problem:
▪ Big data needs high performance computing.
▪ Many big data applications leave performance at the table –> Not optimized for underlying hardware.
Solution:
▪ A performance library provides building blocks to be easily integrated into big data analytics workflows.
Volume
Velocity Variety
Value
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice25
Challenges Faced by Developers
▪ Performance optimization is a never-ending task.
▪ Completing key processing tasks within designated time constraints is a critical issue.
▪ Hand optimization code for one platform makes code performance worse on another platform.
▪ With manual optimization code becomes more complex and difficult to maintain.
▪ Code should run fast as possible without spending extra effort.
Speaker – the speaker notes are important for this presentation. Be sure to read them.
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
Optimizing Performance On Parallel HardwareIt’s an Iterative Process…
Ignore if youare not targeting
clusters.
Tune MPI
Optimize Bandwidth
Thread
Y
N
YN
Y NVectorize
Cluster Scalable
?
Memory Bandwidth Sensitive
?
Effectivethreading
?
27
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
Performance Analysis Tools for DiagnosisIntel® Parallel Studio XE
Intel® Trace Analyzer & Collector (ITAC)
Intel® MPI SnapshotIntel® MPI Tuner
Intel® VTune™ Amplifier
Intel® Advisor
Intel® VTune™ Amplifier
Tune MPI
Optimize Bandwidth
Thread
Y
N
YN
Y NVectorize
Cluster Scalable
?
Memory Bandwidth Sensitive
?
Effectivethreading
?
28
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
Tools for High-Performance Implementation Intel® Parallel Studio XE
Intel® Compiler
Intel® MPI LibraryIntel® MPI Benchmarks
Intel® Math Kernel LibraryIntel® IPP – Media & Data LibraryIntel® Data Analytics LibraryIntel® Cilk™ PlusIntel® OpenMP*
Intel® TBB – Threading Library
Tune MPI
Optimize Bandwidth
Thread
Y
N
YN
Y NVectorize
Cluster Scalable
?
Memory Bandwidth Sensitive
?
Effectivethreading
?
29
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
Legal Disclaimer and Optimization Notice
INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.
OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos.
Copyright © 2016, Intel Corporation. All rights reserved. Intel, Pentium, Xeon, Xeon Phi, Core, VTune, Cilk, and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries.
Optimization Notice
Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804
3030