menu
techminis

A naukri.com initiative

google-web-stories
Home

>

Open Source News

>

Reclaiming...
source image

Medium

2w

read

427

img
dot

Image Credit: Medium

Reclaiming the Algorithm: How to Train AI in Your Own Language

  • Language loss to AI is not inevitable but convenient for system builders; however, reclaiming the algorithm is possible by providing it with necessary data.
  • Most AI models are trained on internet text, thus if your language is underrepresented, the model won't understand.
  • Creating data intentionally in your language online by writing, publishing, and uploading content is crucial to teaching AI.
  • Community-led language digitization projects provide structured language data for training language models.
  • Open-source language models like Mistral, LLaMA, Falcon, and BLOOM can be fine-tuned locally to create AI that reflects specific language and values.
  • Language preservation and empowerment must come from the people themselves, as big tech companies may not prioritize low-resource languages.
  • Governments, schools, and cultural institutions should fund AI language modeling to preserve national identity in the digital age.
  • AI should be seen as something to shape rather than just use, with every interaction serving as a teaching moment.
  • Community data governance is crucial to prevent exploitation of language data and ensure collective rights to datasets.
  • The goal is to train machines on our terms and prevent language erasure through passive inaction.
  • The future should involve AI that understands diverse voices and speaks in various languages, preserving cultural identity.

Read Full Article

like

25 Likes

For uninterrupted reading, download the app