GPU Acceleration for Vision Code

I have a Python 3.12.3 project that ingests video streams and image batches for face recognition (mtcnn 1.0.0 + insightface 0.7.3), OCR (paddleocr 2.10.0 on paddlepaddle 3.0.0 / paddlepaddle-gpu 2.6.2), and post-processing with scikit-learn 1.6.0. Although one GPU-ready wheel is present, all processing still executes on the CPU. The goal is full NVIDIA CUDA utilisation across the entire workflow, from frame decoding to final inference. I need you to: • Profile the current code, pinpoint CPU-bound sections, and migrate or rewrite them for GPU execution (CUDA, CuDNN, cuBLAS, or other relevant CUDA-based APIs). • Update or swap libraries where necessary—feel free to recommend faster CUDA-compatible alternatives if they will not break accuracy (e.g., CuPy, TensorRT, NVIDIA Video Codec SDK). • Modify the code so GUI-less batch processing and real-time video runs stay identical in behaviour and output. • Provide a concise “from-scratch” setup script or README covering driver versions, conda/pip commands, and any environment variables. • Deliver a short benchmark report showing the speed-up you achieved. I’m open to adding extra libraries or frameworks if they make a clear impact, so please include your suggestions in your bid. If you have proven experience accelerating computer-vision workloads on CUDA, I’d love to see it in action here.

Python

Регистрация