Ilnar Salimzianov's Personal Site


Читать на русском

Empower Your Language: Let's Build its Digital Future with Mozilla

First published: July 24, 2025. Last update: August 1, 2025.

On July 21, 2025, I began a new role as a Regional Language Researcher, working as an independent contractor for the Mozilla Foundation. Over the next six months, I'll be focused on a project that I believe is vital for the future of our languages. I want to tell you about this initiative and explain what's in it for you.

Note: I am an independent contractor, not a Mozilla employee. All views expressed here are my own.

What is the Mozilla Data Collective (MDC)?

Many of us know Common Voice, Mozilla's groundbreaking project to crowdsource speech data. Its success is a testament to what a global community can achieve. To date, the project has collected over 33,816 hours of recorded speech across an incredible 137 languages.

The Mozilla Data Collective (MDC) is the next step in that vision. Think of it as Common Voice, but for all types of language data — not just speech. The core philosophy is Create, Curate, Control. It's a platform that allows individuals and communities to contribute data on their own terms, putting power back into the hands of data creators.

The two key differences from Common Voice are:

Why Should You Contribute? What's in it for You?

Your motivation will depend on who you are. Here’s how the MDC can benefit you directly:

For Researchers, Academics, and Linguists

For Content Creators, Journalists, and Publishers

For Language Activists and Communities

What Kind of Data Are We Looking For?

We are interested in datasets large enough for modern NLP tasks. Ideal contributions include:

If your data uses multiple orthographies or is in a raw format (like ELAN files), don't worry. As long as it's well-documented, it is likely suitable for the MDC.

My Role and Geographic Focus

As an independent contractor, my focus is on sourcing datasets for languages of Greater Central Asia and the Caucasus. This includes languages such as:

Even if your language isn't listed, please reach out. Mozilla's goal is to support all languages, and I can connect you with the right colleague.

Let's Collaborate!

I've spent my career working on computational tools for our languages, often with public funding. I see this work with Mozilla as a way to give back and help create a more equitable digital world.

Big tech will not save our languages—we, native speakers, will. Initiatives like the MDC empower us to build the future we want. Your contribution can make a huge difference.

If you own or know of a dataset that could be a good fit, please contact me. I am here to answer your questions and handle the technical heavy lifting.

You can reach me directly at mdc.ilnar@gmail.com, or by filling out the expression-of-interest form below.

📬 Interested in contributing a dataset? Fill out the Expression of Interest Form, and I’ll be in touch!

For More Information

You can read the official announcement about the Mozilla Data Collective on the Common Voice Discourse forum.


Home | Resume | Projects | Publications | Talks | Reading log | Movies log | Now | Email