Thursday, October 10, 2024

Computer Vision on NPU - all you need to know

Computer Vision on NPU - all you need to know Anton Maltsev 5.96K subscribers Subscribe 104 Share Download Clip Save 2,584 views May 13, 2024 00:00:00 - Intro. 00:00:35 - Difference between NPU, CPU, GPU 00:02:24 - Why NPU? Main advantages. 00:04:17 - NPU / LPU / TPU / VPU / DLA / BPU / DPU / IPU / VPU 00:05:40 - Main Vendors (Intel, Nvidia, Hailo, Axelera, Qualcomm, RockChip, etc) 00:07:36 - Frameworks: TF lite, ONNX Runtime, MNN, Tengin, SnapDragon, RKNN, HailoRT, etc. 00:10:08 - Export 00:13:49 - RockChip example 00:16:02 - Hailo example 00:18:05 - ESP32 example 00:18:55 - NMS problem? 00:20:52 - Memory size and structure 00:22:45 - Preprocessing 00:23:55 - Transformers 00:25:20 - Layers support 00:27:31 - Quantization 00:30:27 - Speed comparison 00:33:52 - Memory speed 00:36:42 - NPU and CPU 00:38:42 - C++ and Python 00:40:10 - Subscribe! My LinkedIn - / maltsevanton My Telegram channel - https://t.me/CVML_team e-mail: anton@rembrain.ai Twitter - / serious_wk Usefull videos: RockChip and how to run detection on it - • Running YOLO (Yolov8, Yolov5, Yolov6,... Milk-V and SOPHGO SG2002 - • Milk-V DUO. Is it good for computer v... Hailo - • Hailo-8: let's compare it with others... Jetson - • Jetson Nano in 2022. Comparing with c... Maix-III - • Review and comparison on Seeed Studio... Raspberry Pi 5 - • Raspberry Pi 5 for Image Recognition:... Google Coral - • Computer Vision on Google Coral in 20... Khadas Vim 3 (amlogic 311) - • Khadas Vim 3 in 2022. How good it is ... Chapters View all Transcript Follow along using the transcript. Show transcript Anton Maltsev 5.96K subscribers Videos About 7:42 Detection, classification, etc., for RockChip NPU after 1.6.0 version by Anton Maltsev 23:03 Hailo-8: let's compare it with others boards by Anton Maltsev 11:50 How to start neural networks on ESP32 by Anton Maltsev 14 Comments rongmaw lin Add a comment... @wolpumba4099 4 months ago Summary: Running Computer Vision Models on NPUs What is an NPU? (0:37) - NPUs are specialized silicon chips optimized for running neural network computations, especially matrix multiplications. - Unlike CPUs and GPUs, they can't run general-purpose programs, focusing purely on neural network inference. - Many different names exist for these chips, including LPU, TPU, VPU, etc., but they share the core idea of accelerating neural network calculations. Why Use NPUs? (2:29) - Main advantages: Reduced power consumption, lower device cost, potential for significant speedups compared to CPU/GPU for specific tasks. - Main disadvantages: Increased development complexity, limited choice of neural network architectures, more intricate deployment and testing processes. Challenges of working with NPUs: - Diverse Ecosystem: (7:42) A vast landscape of vendors, frameworks, and boards makes finding a perfect solution difficult. Each vendor typically offers its own custom framework. - Model Export and Compatibility: (10:09) - Requires careful preparation, including specific patches and quantization, to adapt your model to the target NPU architecture. - Non-maximum suppression (NMS) (18:59) often needs to be handled outside the NPU, requiring separate code or fallback mechanisms. - Memory Limitations: (20:54) - Limited memory size on NPUs restricts model size and complexity. - Memory access speed and structure significantly impact performance. - Preprocessing: (22:46) May need to be performed separately on the CPU, GPU, or dedicated accelerator depending on the NPU and its capabilities. - Transformer Support: (23:58) Limited or non-existent on many NPUs, often requiring model adjustments or alternative convolutional architectures. - Layer Support: (25:23) - Advertised layer support can be misleading due to merged layers or limited functionalities. - Always verify compatibility and performance for your specific model layers. - Quantization: (27:33) - Essential for many NPUs to reduce model size and accelerate inference. - Can be complex and lead to accuracy degradation, requiring careful fine-tuning and evaluation. - Benchmarks: (30:30) - Often don't reflect real-world performance. - Always test on your target hardware and specific model for accurate results. Additional considerations: - CPUs play a vital role in data transfer, image decoding, preprocessing, and fallback mechanisms, impacting overall performance (36:43). - C++ is the dominant language for inference on most NPUs, while Python prevails in model training and export (38:45). - Training on NPUs is possible but involves a separate class of processors and different considerations (39:51). i used gemini 1.5 pro 4 Reply 1 reply @shakhizatnurgaliyev9355 4 months ago good one! 1 Reply @andreyl2705 4 months ago awesome) 1 Reply @diegosantos9757 4 months ago Dear, tks for the content. Which sbc would you recommend for somente just starting with computer vision? Reply Anton Maltsev · 2 replies @ДенисСлепцов-ь6п 4 months ago Здравствуйте, давно слежу за Вашим творчеством. Прошу Вас, продолжайте в том же духе! Очень интересно. Могли бы Вы сказать, доводилось ли Вам размещать нейронную сеть на FPGA ? Если да, то могли бы Вы, пожалуйста, поделиться своим опытом ? Reply Anton Maltsev · 2 replies @עינהרע 4 months ago You gonna test the new Hailo GenAI m.2 board? Reply Anton Maltsev · 2 replies @____________________________.x 4 months ago Your jump cuts make this confusing Reply

No comments: