ModernBERT is a novel language model that pushes the boundaries of natural language processing.
It is trained on 2 trillion tokens with an 8192 sequence length and outperforms existing encoder models regarding speed, memory efficiency, and performance.
ModernBERT incorporates modern architecture and training techniques such as GeGLU, ROPE, and alternating local-global attention.
This code snippet demonstrates the usage of ModernBERT for masked language modelling and its application in various NLP tasks.