ProtBFN is a 650 million-parameter foundation model for protein sequence design introduced in a recent paper in Nature Communications.
It utilizes Bayesian Flow Networks to generate diverse protein sequences without explicit structural data, offering unconditional and conditional protein generation.
The model outperforms leading autoregressive and diffusion models, produces sequences matching natural length and amino acid distributions, and includes a fine-tuned variant for antibody heavy chains.
ProtBFN's approach enables zero-shot design, making it versatile for therapeutic and industrial enzyme design, with open-source availability for benchmarking and community contributions.