Healthcare
-
InstaDeep’s NTv3: Multi-Species Genomics Model
Read Full Article: InstaDeep’s NTv3: Multi-Species Genomics Model
InstaDeep has introduced Nucleotide Transformer v3 (NTv3), a multi-species genomics foundation model designed to enhance genomic prediction and design by connecting local motifs with megabase scale regulatory contexts. NTv3 operates at single-nucleotide resolution for 1 Mb contexts and integrates representation learning, functional track prediction, genome annotation, and controllable sequence generation into a single framework. The model builds on previous versions by extending sequence-only pretraining to longer contexts and incorporating explicit functional supervision and a generative mode, making it capable of handling a wide range of genomic tasks across multiple species. NTv3 employs a U-Net style architecture that processes very long genomic windows, utilizing a convolutional downsampling tower, a transformer stack for long-range dependencies, and a deconvolution tower for base-level resolution restoration. It tokenizes input sequences at the character level, maintaining a vocabulary size of 11 tokens. The model is pretrained on 9 trillion base pairs from the OpenGenome2 resource and post-trained with a joint objective incorporating self-supervision and supervised learning on functional tracks and annotation labels from 24 animal and plant species. This comprehensive training allows NTv3 to achieve state-of-the-art accuracy in functional track prediction and genome annotation, outperforming existing genomic foundation models. Beyond prediction, NTv3 can be fine-tuned as a controllable generative model using masked diffusion language modeling, enabling the design of enhancer sequences with specified activity levels and promoter selectivity. These designs have been validated experimentally, demonstrating improved promoter specificity and intended activity ordering. NTv3's ability to unify various genomic tasks and support long-range, cross-species genome-to-function inference makes it a significant advancement in genomics, providing a powerful tool for researchers and practitioners in the field. This matters because it enhances our understanding and manipulation of genomic data, potentially leading to breakthroughs in fields such as medicine and biotechnology.
-
PLAID: Multimodal Protein Generation Model
Read Full Article: PLAID: Multimodal Protein Generation Model
PLAID is a groundbreaking multimodal generative model that addresses the challenge of simultaneously generating protein sequences and 3D structures by leveraging the latent space of protein folding models. Unlike previous models, PLAID can generate both discrete sequences and continuous all-atom structural coordinates, making it more practical for real-world applications such as drug design. This model can interpret compositional function and organism prompts, and is trained on extensive sequence databases, which are significantly larger than structural databases, allowing for a more comprehensive understanding of protein generation. The PLAID model utilizes a diffusion model over the latent space of protein folding models, specifically using ESMFold, a successor to AlphaFold2. This approach allows for the training of generative models using only sequence data, which is more readily available and less costly than structural data. By learning from this expansive data set, PLAID can decode both sequence and structure from sampled embeddings, effectively using the structural information contained in pretrained protein folding models for protein design tasks. This method is akin to vision-language-action models in robotics, which use vision-language models trained on large-scale data to inform perception and reasoning. To address the challenges of large and complex latent spaces in transformer-based models, PLAID introduces CHEAP (Compressed Hourglass Embedding Adaptations of Proteins), which compresses the joint embedding of protein sequence and structure. This compression is crucial for managing the high-resolution image synthesis-like mapping required for effective protein generation. The approach not only enhances the capability to generate all-atom protein structures but also holds potential for adaptation to other multimodal generation tasks. As the field advances, models like PLAID could be pivotal in tackling more complex systems, such as those involving nucleic acids and molecular ligands, thus broadening the scope of protein design and related applications. Why this matters: PLAID represents a significant step forward in the field of protein generation, offering a more practical and comprehensive approach that could revolutionize drug design and other applications by enabling the generation of useful proteins with specific functions and organism compatibility.
-
AI Transforming Healthcare in Africa
Read Full Article: AI Transforming Healthcare in Africa
Generative AI is transforming healthcare by providing innovative solutions to real-world health challenges, particularly in Africa. There is significant interest across the continent in addressing issues such as cervical cancer screening and maternal health support. In response, a collaborative effort with pan-African data science and machine learning communities led to the organization of an Africa-wide Data Science for Health Ideathon. This event aimed to utilize Google's open Health AI models to address these pressing health concerns, highlighting the potential of AI in creating impactful solutions tailored to local needs. From over 30 submissions, six finalist teams were chosen for their innovative ideas and potential to significantly impact African health systems. These teams received guidance from global experts and access to technical resources provided by Google Research and Google DeepMind. The initiative underscores the growing interest in using AI to develop local solutions for health, agriculture, and climate challenges across Africa. By fostering such innovation, the ideathon showcases the potential of AI to address specific regional priorities effectively. This initiative is part of Google's broader commitment to AI for Africa, which spans various sectors including health, education, food security, infrastructure, and languages. By supporting projects like the Data Science for Health Ideathon, Google aims to empower local communities with the tools and knowledge needed to tackle their unique challenges. This matters because it demonstrates the role of AI in driving meaningful change and improving the quality of life across the continent, while also encouraging local innovation and problem-solving.
