ModernAraBERT is a bilingual Arabic-English transformer model created by the Data Science Team at Giza Systems to enhance cross-lingual understanding.
The motivation behind developing ModernAraBERT was the need for a model that excelled in Arabic-specific tasks and could compete with well-known Arabic NLP models like AraBERT.
The model addresses the issues of vocabulary fragmentation and code-switching by incorporating a mix of formal Arabic, dialects, and English texts in its training data.
Utilizing FarasaPy for Arabic segmentation and token-level data augmentation, ModernAraBERT demonstrated strong performance in both monolingual and code-switched contexts.