genomics

  • AlphaFold’s Impact on Science and Medicine


    AlphaFold: Five years of impactAlphaFold has significantly accelerated research timelines, particularly in plant physiology, by enabling better understanding of environmental perception in plants, which may lead to more resilient crops. Its impact is evident in over 35,000 citations and incorporation into over 200,000 research papers, with users experiencing a 40% increase in novel protein structure submissions. This AI model has also facilitated the creation of Isomorphic Labs, a company revolutionizing drug discovery with a unified drug design engine, aiming to solve diseases by predicting the structure and interactions of life's molecules. AlphaFold's server supports global non-commercial researchers, aiding in the prediction of over 8 million molecular structures and interactions, thus transforming scientific discovery processes. This matters because it represents a leap forward in biological research and drug development, potentially leading to groundbreaking medical and environmental solutions.

    Read Full Article: AlphaFold’s Impact on Science and Medicine

  • InstaDeep’s NTv3: Multi-Species Genomics Model


    InstaDeep Introduces Nucleotide Transformer v3 (NTv3): A New Multi-Species Genomics Foundation Model, Designed for 1 Mb Context Lengths at Single-Nucleotide ResolutionInstaDeep has introduced Nucleotide Transformer v3 (NTv3), a multi-species genomics foundation model designed to enhance genomic prediction and design by connecting local motifs with megabase scale regulatory contexts. NTv3 operates at single-nucleotide resolution for 1 Mb contexts and integrates representation learning, functional track prediction, genome annotation, and controllable sequence generation into a single framework. The model builds on previous versions by extending sequence-only pretraining to longer contexts and incorporating explicit functional supervision and a generative mode, making it capable of handling a wide range of genomic tasks across multiple species. NTv3 employs a U-Net style architecture that processes very long genomic windows, utilizing a convolutional downsampling tower, a transformer stack for long-range dependencies, and a deconvolution tower for base-level resolution restoration. It tokenizes input sequences at the character level, maintaining a vocabulary size of 11 tokens. The model is pretrained on 9 trillion base pairs from the OpenGenome2 resource and post-trained with a joint objective incorporating self-supervision and supervised learning on functional tracks and annotation labels from 24 animal and plant species. This comprehensive training allows NTv3 to achieve state-of-the-art accuracy in functional track prediction and genome annotation, outperforming existing genomic foundation models. Beyond prediction, NTv3 can be fine-tuned as a controllable generative model using masked diffusion language modeling, enabling the design of enhancer sequences with specified activity levels and promoter selectivity. These designs have been validated experimentally, demonstrating improved promoter specificity and intended activity ordering. NTv3's ability to unify various genomic tasks and support long-range, cross-species genome-to-function inference makes it a significant advancement in genomics, providing a powerful tool for researchers and practitioners in the field. This matters because it enhances our understanding and manipulation of genomic data, potentially leading to breakthroughs in fields such as medicine and biotechnology.

    Read Full Article: InstaDeep’s NTv3: Multi-Species Genomics Model