AVX-512

  • OpenCV 4.13: Enhanced AVX-512 and CUDA 13 Support


    OpenCV 4.13 brings more AVX-512 usage, CUDA 13 support, many other new featuresOpenCV 4.13 introduces enhanced support for AVX-512, a set of instructions that can significantly boost performance on compatible hardware, making it more efficient for tasks such as image processing. The update also includes support for CUDA 13, enabling better integration with NVIDIA's latest GPU technologies, which is crucial for accelerating computer vision applications. Additionally, the release brings a variety of other improvements and new features, including bug fixes and optimizations, to further enhance the library's capabilities. These advancements are important as they enable developers to leverage cutting-edge hardware and software optimizations for more efficient and powerful computer vision solutions.

    Read Full Article: OpenCV 4.13: Enhanced AVX-512 and CUDA 13 Support

  • CNN in x86 Assembly: Cat vs Dog Classifier


    I implemented a Convolutional Neural Network (CNN) from scratch entirely in x86 Assembly, Cat vs Dog ClassifierAn ambitious project involved implementing a Convolutional Neural Network (CNN) from scratch in x86-64 assembly to classify images of cats and dogs, using a dataset of 25,000 RGB images. The project aimed to deeply understand CNNs by focusing on low-level operations such as memory layout, data movement, and SIMD arithmetic, without relying on any machine learning frameworks or libraries. Key components like Conv2D, MaxPool, Dense layers, activations, forward and backward propagation, and the data loader were developed in pure assembly, achieving a performance approximately 10 times faster than a NumPy version. Despite the challenges of debugging at this scale, the implementation successfully runs inside a lightweight Debian Slim Docker container, showcasing a unique blend of low-level programming and machine learning. This matters because it demonstrates the potential for significant performance improvements in neural networks through low-level optimizations.

    Read Full Article: CNN in x86 Assembly: Cat vs Dog Classifier