Thursday, October 10, 2024
Computer Vision on NPU - all you need to know
Computer Vision on NPU - all you need to know
Anton Maltsev
5.96K subscribers
Subscribe
104
Share
Download
Clip
Save
2,584 views May 13, 2024
00:00:00 - Intro.
00:00:35 - Difference between NPU, CPU, GPU
00:02:24 - Why NPU? Main advantages.
00:04:17 - NPU / LPU / TPU / VPU / DLA / BPU / DPU / IPU / VPU
00:05:40 - Main Vendors (Intel, Nvidia, Hailo, Axelera, Qualcomm, RockChip, etc)
00:07:36 - Frameworks: TF lite, ONNX Runtime, MNN, Tengin, SnapDragon, RKNN, HailoRT, etc.
00:10:08 - Export
00:13:49 - RockChip example
00:16:02 - Hailo example
00:18:05 - ESP32 example
00:18:55 - NMS problem?
00:20:52 - Memory size and structure
00:22:45 - Preprocessing
00:23:55 - Transformers
00:25:20 - Layers support
00:27:31 - Quantization
00:30:27 - Speed comparison
00:33:52 - Memory speed
00:36:42 - NPU and CPU
00:38:42 - C++ and Python
00:40:10 - Subscribe!
My LinkedIn - / maltsevanton
My Telegram channel - https://t.me/CVML_team
e-mail: anton@rembrain.ai
Twitter - / serious_wk
Usefull videos:
RockChip and how to run detection on it - • Running YOLO (Yolov8, Yolov5, Yolov6,...
Milk-V and SOPHGO SG2002 - • Milk-V DUO. Is it good for computer v...
Hailo - • Hailo-8: let's compare it with others...
Jetson - • Jetson Nano in 2022. Comparing with c...
Maix-III - • Review and comparison on Seeed Studio...
Raspberry Pi 5 - • Raspberry Pi 5 for Image Recognition:...
Google Coral - • Computer Vision on Google Coral in 20...
Khadas Vim 3 (amlogic 311) - • Khadas Vim 3 in 2022. How good it is ...
Chapters
View all
Transcript
Follow along using the transcript.
Show transcript
Anton Maltsev
5.96K subscribers
Videos
About
7:42
Detection, classification, etc., for RockChip NPU after 1.6.0 version
by Anton Maltsev
23:03
Hailo-8: let's compare it with others boards
by Anton Maltsev
11:50
How to start neural networks on ESP32
by Anton Maltsev
14 Comments
rongmaw lin
Add a comment...
@wolpumba4099
4 months ago
Summary: Running Computer Vision Models on NPUs
What is an NPU? (0:37)
- NPUs are specialized silicon chips optimized for running neural network computations, especially matrix multiplications.
- Unlike CPUs and GPUs, they can't run general-purpose programs, focusing purely on neural network inference.
- Many different names exist for these chips, including LPU, TPU, VPU, etc., but they share the core idea of accelerating neural network calculations.
Why Use NPUs? (2:29)
- Main advantages: Reduced power consumption, lower device cost, potential for significant speedups compared to CPU/GPU for specific tasks.
- Main disadvantages: Increased development complexity, limited choice of neural network architectures, more intricate deployment and testing processes.
Challenges of working with NPUs:
- Diverse Ecosystem: (7:42) A vast landscape of vendors, frameworks, and boards makes finding a perfect solution difficult. Each vendor typically offers its own custom framework.
- Model Export and Compatibility: (10:09)
- Requires careful preparation, including specific patches and quantization, to adapt your model to the target NPU architecture.
- Non-maximum suppression (NMS) (18:59) often needs to be handled outside the NPU, requiring separate code or fallback mechanisms.
- Memory Limitations: (20:54)
- Limited memory size on NPUs restricts model size and complexity.
- Memory access speed and structure significantly impact performance.
- Preprocessing: (22:46) May need to be performed separately on the CPU, GPU, or dedicated accelerator depending on the NPU and its capabilities.
- Transformer Support: (23:58) Limited or non-existent on many NPUs, often requiring model adjustments or alternative convolutional architectures.
- Layer Support: (25:23)
- Advertised layer support can be misleading due to merged layers or limited functionalities.
- Always verify compatibility and performance for your specific model layers.
- Quantization: (27:33)
- Essential for many NPUs to reduce model size and accelerate inference.
- Can be complex and lead to accuracy degradation, requiring careful fine-tuning and evaluation.
- Benchmarks: (30:30)
- Often don't reflect real-world performance.
- Always test on your target hardware and specific model for accurate results.
Additional considerations:
- CPUs play a vital role in data transfer, image decoding, preprocessing, and fallback mechanisms, impacting overall performance (36:43).
- C++ is the dominant language for inference on most NPUs, while Python prevails in model training and export (38:45).
- Training on NPUs is possible but involves a separate class of processors and different considerations (39:51).
i used gemini 1.5 pro
4
Reply
1 reply
@shakhizatnurgaliyev9355
4 months ago
good one!
1
Reply
@andreyl2705
4 months ago
awesome)
1
Reply
@diegosantos9757
4 months ago
Dear, tks for the content.
Which sbc would you recommend for somente just starting with computer vision?
Reply
Anton Maltsev
·
2 replies
@ДенисСлепцов-ь6п
4 months ago
Здравствуйте, давно слежу за Вашим творчеством. Прошу Вас, продолжайте в том же духе! Очень интересно. Могли бы Вы сказать, доводилось ли Вам размещать нейронную сеть на FPGA ? Если да, то могли бы Вы, пожалуйста, поделиться своим опытом ?
Reply
Anton Maltsev
·
2 replies
@עינהרע
4 months ago
You gonna test the new Hailo GenAI m.2 board?
Reply
Anton Maltsev
·
2 replies
@____________________________.x
4 months ago
Your jump cuts make this confusing
Reply
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment