biotechnology

  • Geometric Deep Learning in Molecular Design


    [D] I summarized my 4-year PhD on Geometric Deep Learning for Molecular Design into 3 research questionsThe PhD thesis explores the application of Geometric Deep Learning in molecular design, focusing on three pivotal research questions. It examines the expressivity of 3D representations through the Geometric Weisfeiler-Leman Test, the potential for unified generative models for both periodic and non-periodic systems using the All-atom Diffusion Transformer, and the capability of generative AI to design functional RNA, demonstrated by the development and wet-lab validation of gRNAde. This research highlights the transition from theoretical graph isomorphism challenges to practical applications in molecular biology, emphasizing the collaborative efforts between AI and biological sciences. Understanding these advancements is crucial for leveraging AI in scientific innovation and real-world applications.

    Read Full Article: Geometric Deep Learning in Molecular Design

  • AI and the Creation of Viruses: Biosecurity Risks


    AI can now create viruses from scratch, one step away from the perfect biological weaponRecent advancements in artificial intelligence have enabled the creation of viruses from scratch, raising concerns about the potential development of biological weapons. The technology allows for the design of viruses with specific characteristics, which could be used for both beneficial purposes, such as developing vaccines, and malicious ones, such as creating harmful pathogens. The accessibility and power of AI in this field underscore the need for stringent ethical guidelines and regulations to prevent misuse. This matters because it highlights the dual-use nature of AI in biotechnology, emphasizing the importance of responsible innovation to safeguard public health and safety.

    Read Full Article: AI and the Creation of Viruses: Biosecurity Risks

  • Generative AI and Precision Gene Control


    Generative AI creates synthetic regulatory DNA sequences for precision gene control - Nature GeneticsGenerative AI is being utilized to create synthetic regulatory DNA sequences, which can significantly enhance precision in gene control. This technological advancement holds promise for improving gene therapy and personalized medicine by allowing for more targeted and efficient genetic modifications. The ability to design and implement precise DNA sequences could revolutionize how genetic diseases are treated, potentially leading to more effective and less invasive therapies. Understanding and harnessing this capability is crucial as it could lead to breakthroughs in medical treatments and biotechnology.

    Read Full Article: Generative AI and Precision Gene Control

  • Engineering Resilient Crops for Climate Change


    Engineering more resilient crops for a warming climateAs global warming leads to more frequent droughts and heatwaves, the internal processes of staple crops are being disrupted, particularly photosynthesis, which is crucial for plant growth. Berkley Walker and his team at Michigan State University are exploring ways to engineer crops to withstand higher temperatures by focusing on the enzyme glycerate kinase (GLYK), which plays a key role in photosynthesis. Using AlphaFold to predict the 3D structure of GLYK, they discovered that high temperatures cause certain flexible loops in the enzyme to destabilize. By replacing these unstable loops with more rigid ones from heat-tolerant algae, they created hybrid enzymes that remain stable at temperatures up to 65°C, potentially leading to more resilient crops. This matters because enhancing crop resilience is essential for maintaining food security in the face of climate change.

    Read Full Article: Engineering Resilient Crops for Climate Change

  • InstaDeep’s NTv3: Multi-Species Genomics Model


    InstaDeep Introduces Nucleotide Transformer v3 (NTv3): A New Multi-Species Genomics Foundation Model, Designed for 1 Mb Context Lengths at Single-Nucleotide ResolutionInstaDeep has introduced Nucleotide Transformer v3 (NTv3), a multi-species genomics foundation model designed to enhance genomic prediction and design by connecting local motifs with megabase scale regulatory contexts. NTv3 operates at single-nucleotide resolution for 1 Mb contexts and integrates representation learning, functional track prediction, genome annotation, and controllable sequence generation into a single framework. The model builds on previous versions by extending sequence-only pretraining to longer contexts and incorporating explicit functional supervision and a generative mode, making it capable of handling a wide range of genomic tasks across multiple species. NTv3 employs a U-Net style architecture that processes very long genomic windows, utilizing a convolutional downsampling tower, a transformer stack for long-range dependencies, and a deconvolution tower for base-level resolution restoration. It tokenizes input sequences at the character level, maintaining a vocabulary size of 11 tokens. The model is pretrained on 9 trillion base pairs from the OpenGenome2 resource and post-trained with a joint objective incorporating self-supervision and supervised learning on functional tracks and annotation labels from 24 animal and plant species. This comprehensive training allows NTv3 to achieve state-of-the-art accuracy in functional track prediction and genome annotation, outperforming existing genomic foundation models. Beyond prediction, NTv3 can be fine-tuned as a controllable generative model using masked diffusion language modeling, enabling the design of enhancer sequences with specified activity levels and promoter selectivity. These designs have been validated experimentally, demonstrating improved promoter specificity and intended activity ordering. NTv3's ability to unify various genomic tasks and support long-range, cross-species genome-to-function inference makes it a significant advancement in genomics, providing a powerful tool for researchers and practitioners in the field. This matters because it enhances our understanding and manipulation of genomic data, potentially leading to breakthroughs in fields such as medicine and biotechnology.

    Read Full Article: InstaDeep’s NTv3: Multi-Species Genomics Model

  • PLAID: Multimodal Protein Generation Model


    Repurposing Protein Folding Models for Generation with Latent DiffusionPLAID is a groundbreaking multimodal generative model that addresses the challenge of simultaneously generating protein sequences and 3D structures by leveraging the latent space of protein folding models. Unlike previous models, PLAID can generate both discrete sequences and continuous all-atom structural coordinates, making it more practical for real-world applications such as drug design. This model can interpret compositional function and organism prompts, and is trained on extensive sequence databases, which are significantly larger than structural databases, allowing for a more comprehensive understanding of protein generation. The PLAID model utilizes a diffusion model over the latent space of protein folding models, specifically using ESMFold, a successor to AlphaFold2. This approach allows for the training of generative models using only sequence data, which is more readily available and less costly than structural data. By learning from this expansive data set, PLAID can decode both sequence and structure from sampled embeddings, effectively using the structural information contained in pretrained protein folding models for protein design tasks. This method is akin to vision-language-action models in robotics, which use vision-language models trained on large-scale data to inform perception and reasoning. To address the challenges of large and complex latent spaces in transformer-based models, PLAID introduces CHEAP (Compressed Hourglass Embedding Adaptations of Proteins), which compresses the joint embedding of protein sequence and structure. This compression is crucial for managing the high-resolution image synthesis-like mapping required for effective protein generation. The approach not only enhances the capability to generate all-atom protein structures but also holds potential for adaptation to other multimodal generation tasks. As the field advances, models like PLAID could be pivotal in tackling more complex systems, such as those involving nucleic acids and molecular ligands, thus broadening the scope of protein design and related applications. Why this matters: PLAID represents a significant step forward in the field of protein generation, offering a more practical and comprehensive approach that could revolutionize drug design and other applications by enabling the generation of useful proteins with specific functions and organism compatibility.

    Read Full Article: PLAID: Multimodal Protein Generation Model