PLAID is a groundbreaking multimodal generative model that addresses the challenge of simultaneously generating protein sequences and 3D structures by leveraging the latent space of protein folding models. Unlike previous models, PLAID can generate both discrete sequences and continuous all-atom structural coordinates, making it more practical for real-world applications such as drug design. This model can interpret compositional function and organism prompts, and is trained on extensive sequence databases, which are significantly larger than structural databases, allowing for a more comprehensive understanding of protein generation.
The PLAID model utilizes a diffusion model over the latent space of protein folding models, specifically using ESMFold, a successor to AlphaFold2. This approach allows for the training of generative models using only sequence data, which is more readily available and less costly than structural data. By learning from this expansive data set, PLAID can decode both sequence and structure from sampled embeddings, effectively using the structural information contained in pretrained protein folding models for protein design tasks. This method is akin to vision-language-action models in robotics, which use vision-language models trained on large-scale data to inform perception and reasoning.
To address the challenges of large and complex latent spaces in transformer-based models, PLAID introduces CHEAP (Compressed Hourglass Embedding Adaptations of Proteins), which compresses the joint embedding of protein sequence and structure. This compression is crucial for managing the high-resolution image synthesis-like mapping required for effective protein generation. The approach not only enhances the capability to generate all-atom protein structures but also holds potential for adaptation to other multimodal generation tasks. As the field advances, models like PLAID could be pivotal in tackling more complex systems, such as those involving nucleic acids and molecular ligands, thus broadening the scope of protein design and related applications.
Why this matters: PLAID represents a significant step forward in the field of protein generation, offering a more practical and comprehensive approach that could revolutionize drug design and other applications by enabling the generation of useful proteins with specific functions and organism compatibility.
The development of PLAID, a multimodal generative model, marks a significant step forward in the field of protein design and synthesis. By leveraging the latent space of protein folding models, PLAID can simultaneously generate both the 1D sequence and 3D structure of proteins. This approach is a natural progression from the groundbreaking work of AlphaFold2, which was recently recognized with a Nobel Prize. The ability to generate proteins with specific functions and organism compatibility is crucial for advancing drug design and biotechnology, as it allows for the creation of proteins that are not only functional but also tailored for specific applications.
PLAID addresses several limitations of previous protein generative models by focusing on the multimodal co-generation problem. This involves generating both the discrete sequence and the continuous all-atom structural coordinates, which is necessary for creating functional proteins. Additionally, PLAID can be trained on sequence databases that are significantly larger than structure databases, making the training process more efficient and cost-effective. This is important because it enables the model to learn from a broader and more diverse set of data, which can lead to the generation of more versatile and effective proteins.
The implications of this technology are vast, particularly in the realm of drug discovery and development. By enabling the generation of “useful” proteins, PLAID provides a powerful tool for researchers looking to design proteins with specific functions and properties. This capability is akin to controlling image generation through compositional textual prompts, allowing for precise control over the protein design process. The potential to specify complex constraints, such as organism specificity and solubility, could streamline the development of biologics and other protein-based therapeutics, ultimately accelerating the time it takes to bring new drugs to market.
Looking ahead, the methods developed in PLAID could be adapted for multimodal generation across various domains, where there exists a predictor from a more abundant modality to a less abundant one. As sequence-to-structure predictors continue to evolve, the ability to generate complex systems, such as proteins in combination with nucleic acids or molecular ligands, becomes increasingly feasible. This opens up new possibilities for research and collaboration, inviting scientists to test and extend these methods in practical, real-world applications. The advancements in protein generation exemplified by PLAID highlight the transformative potential of AI in biological sciences, paving the way for innovative solutions to some of the most pressing challenges in medicine and biotechnology.
Read the original article here

