Prot2Token is a unified framework for protein modeling that converts various protein-related predictions into a standardized next-token prediction format.
It employs an autoregressive decoder conditioned on embeddings from pre-trained protein encoders and guided by learnable task tokens to perform diverse predictions.
The architecture allows for multi-task learning, enabling a single model to excel in multiple protein prediction tasks with improved efficiency.
Extensive experimental validation shows that Prot2Token offers strong predictive power, significant speedups, and performance matching or exceeding specialized approaches, which can accelerate biological discovery and therapeutic development.