Diffusion Models

  • Geometric Deep Learning in Molecular Design


    [D] I summarized my 4-year PhD on Geometric Deep Learning for Molecular Design into 3 research questionsThe PhD thesis explores the application of Geometric Deep Learning in molecular design, focusing on three pivotal research questions. It examines the expressivity of 3D representations through the Geometric Weisfeiler-Leman Test, the potential for unified generative models for both periodic and non-periodic systems using the All-atom Diffusion Transformer, and the capability of generative AI to design functional RNA, demonstrated by the development and wet-lab validation of gRNAde. This research highlights the transition from theoretical graph isomorphism challenges to practical applications in molecular biology, emphasizing the collaborative efforts between AI and biological sciences. Understanding these advancements is crucial for leveraging AI in scientific innovation and real-world applications.

    Read Full Article: Geometric Deep Learning in Molecular Design

  • Open-Source AI Tools Boost NVIDIA RTX PC Performance


    Open-Source AI Tool Upgrades Speed Up LLM and Diffusion Models on NVIDIA RTX PCsAI development on PCs is rapidly advancing, driven by improvements in small language models (SLMs) and diffusion models, and supported by enhanced AI frameworks like ComfyUI, llama.cpp, and Ollama. These frameworks have seen significant popularity growth, with NVIDIA announcing updates to further accelerate AI workflows on RTX PCs. Key optimizations include support for NVFP4 and FP8 formats, boosting performance and memory efficiency, and new features for SLMs to enhance token generation and model inference. Additionally, NVIDIA's collaboration with the open-source community has led to the release of the LTX-2 audio-video model and tools for agentic AI development, such as Nemotron 3 Nano and Docling, which improve accuracy and efficiency in AI applications. This matters because it empowers developers to create more advanced and efficient AI solutions on consumer-grade hardware, democratizing access to cutting-edge AI technology.

    Read Full Article: Open-Source AI Tools Boost NVIDIA RTX PC Performance

  • Clean PyTorch Implementations of 50+ ML Papers


    [D] Clean, self-contained PyTorch re-implementations of 50+ ML papers (GANs, diffusion, meta-learning, 3D)A repository offers clean and self-contained PyTorch implementations of over 50 machine learning papers, covering areas like GANs, VAEs, diffusion models, meta-learning, and 3D reconstruction. These implementations are designed to remain true to the original methods while minimizing unnecessary code, making them easy to run and inspect. The goal is to reproduce key results where feasible, providing a valuable resource for understanding and experimenting with advanced machine learning concepts. This matters because it facilitates learning and experimentation in machine learning by providing accessible and concise code examples.

    Read Full Article: Clean PyTorch Implementations of 50+ ML Papers

  • Bridging Synthetic Media and Forensic Detection


    [D] Bridging the Gap between Synthetic Media Generation and Forensic Detection: A Perspective from IndustryFuturism AI highlights the growing gap between synthetic media generation and forensic detection, emphasizing challenges faced in real-world applications. Current academic detectors often struggle with out-of-distribution data, and three critical issues have been identified: architecture-specific artifacts, multimodal drift, and provenance shift. High-fidelity diffusion models have reduced detectable artifacts, complicating frequency-domain detection, while aligning audio and visual elements in digital humans remains challenging. The industry is shifting towards proactive provenance methods, such as watermarking, rather than relying on post-hoc detection, raising questions about the feasibility of a universal detector versus hardware-level proof of origin. This matters because it addresses the evolving challenges in detecting synthetic media, crucial for maintaining media integrity and trust.

    Read Full Article: Bridging Synthetic Media and Forensic Detection

  • Free Interactive Course on Diffusion Models


    I built a free interactive course to learn how diffusion models workAn interactive course has been developed to make understanding diffusion models more accessible, addressing the gap between overly simplistic explanations and those requiring advanced knowledge. This course includes seven modules and 90 challenges designed to engage users actively in learning, without needing a background in machine learning. It is free, open source, and encourages feedback to improve clarity and difficulty balance. This matters because it democratizes access to complex machine learning concepts, empowering more people to engage with and understand cutting-edge technology.

    Read Full Article: Free Interactive Course on Diffusion Models

  • S2ID: Scale Invariant Image Diffuser


    [P] S2ID: Scale Invariant Image Diffuser - trained on standard MNIST, generates 1024x1024 digits and at arbitrary aspect ratios with almost no artifacts at 6.1M parameters (Drastic code change and architectural improvement)The Scale Invariant Image Diffuser (S2ID) presents a novel approach to image generation that overcomes limitations of traditional diffusion architectures like UNet and DiT models, which struggle with artifacts when scaling image resolutions. S2ID leverages a unique method of treating image data as a continuous function rather than discrete pixels, allowing for the generation of clean, high-resolution images without the usual artifacts. This is achieved by using a coordinate jitter technique that generalizes the model's understanding of images, enabling it to adapt to various resolutions and aspect ratios. The model, trained on standard MNIST data, demonstrates impressive scalability and efficiency with only 6.1 million parameters, suggesting significant potential for applications in image processing and computer vision. This matters because it represents a step forward in creating more versatile and efficient image generation models that can adapt to different sizes and shapes without losing quality.

    Read Full Article: S2ID: Scale Invariant Image Diffuser

  • Interactive ML Paper Explainers


    Envision - Interactive explainers for ML papers (Attention, Backprop, Diffusion and more)Interactive explainers have been developed to help users understand foundational machine learning papers through simulations rather than just equations. These explainers cover topics such as Attention, Word2Vec, Backpropagation, and Diffusion Models, providing 2-4 interactive simulations for each. The aim is to demystify complex concepts by allowing users to engage with the material, such as building query vectors or exploring embedding spaces. The platform is built using Astro and Svelte, with simulations running client-side, and it seeks feedback on future topics like the Lottery Ticket Hypothesis and GANs. This approach enhances comprehension by focusing on the "why" behind the concepts, making advanced ML topics more accessible. Understanding these core concepts is crucial as they form the backbone of many modern AI technologies.

    Read Full Article: Interactive ML Paper Explainers