Nvidia

  • TensorFlow 2.17 Updates


    What's new in TensorFlow 2.17TensorFlow 2.17 introduces significant updates, including a CUDA update that enhances performance on Ada-Generation GPUs like NVIDIA RTX 40**, L4, and L40, while dropping support for older Maxwell GPUs to keep Python wheel sizes manageable. The release also prepares for the upcoming TensorFlow 2.18, which will support Numpy 2.0, potentially affecting some edge cases in API usage. Additionally, TensorFlow 2.17 marks the last version to include TensorRT support, as future releases will no longer support it. These changes reflect ongoing efforts to optimize TensorFlow for modern hardware and software environments, ensuring better performance and compatibility.

    Read Full Article: TensorFlow 2.17 Updates

  • NVIDIA’s NitroGen: AI Model for Gaming Agents


    NVIDIA AI Researchers Release NitroGen: An Open Vision Action Foundation Model For Generalist Gaming AgentsNVIDIA's AI research team has introduced NitroGen, a groundbreaking vision action foundation model designed for generalist gaming agents. NitroGen learns to play commercial games directly from visual data and gamepad actions, utilizing a vast dataset of 40,000 hours of gameplay from over 1,000 games. The model employs a sophisticated action extraction pipeline to convert video data into actionable insights, enabling it to achieve significant task completion rates across various gaming genres without reinforcement learning. NitroGen's unified controller action space allows for seamless policy transfer across multiple games, demonstrating improved performance when fine-tuned on new titles. This advancement matters because it showcases the potential of AI to autonomously learn complex tasks from large-scale, diverse data sources, paving the way for more versatile and adaptive AI systems in gaming and beyond.

    Read Full Article: NVIDIA’s NitroGen: AI Model for Gaming Agents

  • Toggle Thinking on Nvidia Nemotron Nano 3


    Fix for Nvidia Nemotron Nano 3's forced thinking – now it can be toggled on and off!The Nvidia Nemotron Nano 3 has been experiencing an issue where the 'detailed thinking off' instruction fails due to a bug in the automatic Jinja template on Lmstudio, which forces the system to think. A workaround has been provided that includes a bugfix allowing users to toggle the thinking feature off by typing /nothink at the system prompt. This solution is shared via a Pastebin link for easy access. This matters because it offers users control over the Nemotron Nano 3's processing behavior, enhancing user experience and system efficiency.

    Read Full Article: Toggle Thinking on Nvidia Nemotron Nano 3

  • Nvidia’s $20B Groq Deal: A Shift in AI Engineering


    [D] The Nvidia/Groq $20B deal isn't about "Monopoly." It's about the physics of Agentic AI.The Nvidia acquisition of Groq for $20 billion highlights a significant shift in AI technology, focusing on the engineering challenges rather than just antitrust concerns. Groq's SRAM architecture excels in "Talking" tasks like voice and fast chat due to its instant token generation, but struggles with large models due to limited capacity. In contrast, Nvidia's H100s handle large models well with their HBM memory but suffer from slow PCIe transfer speeds during cold starts. This acquisition underscores the need for a hybrid inference approach, combining Groq's speed and Nvidia's capacity to efficiently manage AI workloads, marking a new era in AI development. This matters because it addresses the critical challenge of optimizing AI systems for both speed and capacity, paving the way for more efficient and responsive AI applications.

    Read Full Article: Nvidia’s $20B Groq Deal: A Shift in AI Engineering

  • NVIDIA Drops Pascal Support, Impacting Arch Linux


    NVIDIA Drops Pascal Support On Linux, Causing Chaos On Arch LinuxNVIDIA's decision to drop support for Pascal GPUs on Linux has caused disruptions, particularly for Arch Linux users who rely on these older graphics cards. This change has led to compatibility issues and forced users to seek alternative solutions or upgrade their hardware to maintain system stability and performance. The move highlights the challenges of maintaining support for older technology in rapidly evolving software ecosystems. Understanding these shifts is crucial for users and developers to adapt and ensure seamless operation of their systems.

    Read Full Article: NVIDIA Drops Pascal Support, Impacting Arch Linux

  • NVIDIA Blackwell Boosts AI Training Speed and Efficiency


    NVIDIA Blackwell Enables 3x Faster Training and Nearly 2x Training Performance Per Dollar than Previous-Gen ArchitectureNVIDIA's Blackwell architecture is revolutionizing AI model training by offering up to 3.2 times faster training performance and nearly doubling training performance per dollar compared to previous-generation architectures. This is achieved through innovations across GPUs, CPUs, networking, and software, including the introduction of NVFP4 precision. The GB200 NVL72 and GB300 NVL72 GPUs demonstrate significant performance improvements in MLPerf benchmarks, allowing AI models to be trained and deployed more quickly and cost-effectively. These advancements enable AI developers to accelerate their revenue generation by bringing sophisticated models to market faster and more efficiently. This matters because it enhances the ability to train larger, more complex AI models while reducing costs, thus driving innovation and economic opportunities in the AI industry.

    Read Full Article: NVIDIA Blackwell Boosts AI Training Speed and Efficiency

  • Autoscaling RAG Components on Kubernetes


    Retrieval-augmented generation (RAG) systems enhance the accuracy of AI agents by using a knowledge base to provide context to large language models (LLMs). The NVIDIA RAG Blueprint facilitates RAG deployment in enterprise settings, offering modular components for ingestion, vectorization, retrieval, and generation, along with options for metadata filtering and multimodal embedding. RAG workloads can be unpredictable, requiring autoscaling to manage resource allocation efficiently during peak and off-peak times. By leveraging Kubernetes Horizontal Pod Autoscaling (HPA), organizations can autoscale NVIDIA NIM microservices like Nemotron LLM, Rerank, and Embed based on custom metrics, ensuring performance meets service level agreements (SLAs) even during demand surges. Understanding and implementing autoscaling in RAG systems is crucial for maintaining efficient resource use and optimal service performance.

    Read Full Article: Autoscaling RAG Components on Kubernetes

  • Reducing CUDA Binary Size for cuML on PyPI


    Reducing CUDA Binary Size to Distribute cuML on PyPIStarting with the 25.10 release, cuML can now be easily installed via pip from PyPI, eliminating the need for complex installation steps and Conda environments. The NVIDIA team has successfully reduced the size of CUDA C++ library binaries by approximately 30%, enabling this distribution method. This reduction was achieved through optimization techniques that address bloat in the CUDA C++ codebase, making the libraries more accessible and efficient. These efforts not only improve user experience with faster downloads and reduced storage requirements but also lower distribution costs and promote the development of more manageable CUDA C++ libraries. This matters because it simplifies the installation process for users and encourages broader adoption of cuML and similar libraries.

    Read Full Article: Reducing CUDA Binary Size for cuML on PyPI

  • NVIDIA MGX: Future-Ready Data Center Performance


    Delivering Flexible Performance for Future-Ready Data Centers with NVIDIA MGXThe rapid growth of AI is challenging traditional data center architectures, prompting the need for more flexible, efficient solutions. NVIDIA's MGX modular reference architecture addresses these demands by offering a 6U chassis configuration that supports multiple computing generations and workload profiles, reducing the need for frequent redesigns. This design incorporates the liquid-cooled NVIDIA RTX PRO 6000 Blackwell Server Edition GPU, which provides enhanced performance and thermal efficiency for AI workloads. Additionally, the MGX 6U platform integrates NVIDIA BlueField DPUs for advanced security and infrastructure acceleration, ensuring that AI data centers can scale securely and efficiently. This matters because it enables enterprises to build future-ready AI factories that can adapt to evolving technologies while maintaining optimal performance and security.

    Read Full Article: NVIDIA MGX: Future-Ready Data Center Performance

  • Join the 3rd Women in ML Symposium!


    Join us at the third Women in ML Symposium!The third annual Women in Machine Learning Symposium is set for December 7, 2023, offering a virtual platform for enthusiasts and professionals in Machine Learning (ML) and Artificial Intelligence (AI). This inclusive event provides deep dives into generative AI, privacy-preserving AI, and the ML frameworks powering models, catering to all levels of expertise. Attendees will benefit from keynote speeches and insights from industry leaders at Google, Nvidia, and Adobe, covering topics from foundational AI concepts to open-source tools and techniques. The symposium promises a comprehensive exploration of ML's latest advancements and practical applications across various industries. Why this matters: The symposium fosters diversity and inclusion in the rapidly evolving fields of AI and ML, providing valuable learning and networking opportunities for women and underrepresented groups in tech.

    Read Full Article: Join the 3rd Women in ML Symposium!