Proteins are crucial in biological systems and understanding their functions is essential.
GLProtein is a novel framework for protein pre-training that integrates global structural similarity and local amino acid details for improved prediction accuracy and functional insights.
GLProtein combines protein-masked modeling, triplet structure similarity scoring, protein 3D distance encoding, and substructure-based amino acid molecule encoding.
Experimental results show that GLProtein surpasses previous methods in various bioinformatics tasks like predicting protein-protein interaction and contact prediction.