Ilnar Salimzianov's Personal Site

English | Deutsch | Русский | Татарча | Türkçe

Resume

Ilnar Salimzianov, MLOps & AI Backend Engineer

Websites

Personal page,

Contact

Portfolio

Gitlab, Github, Sourceforge

Please also see the Projects page.

Programming skills

Languages: Python, Racket, Clojure, GNU Bash
Experience with: Data & Backend Infra: Elasticsearch, FastAPI, Pandas, Flask, Scrapy, Docker, Terraform | MLOps & NLP: Azure ML, ONNX Runtime, HF Transformers, Scikit-learn, Spacy, NLTK | Low-Resource NLP: Apertium, HFST, VISL CG-3 | Testing & Packaging: Pytest, GNU Make, Pyinstaller

Education

University of Stuttgart (2014-2017): M.Sc. degree in Computational Linguistics
Kazan State University (2006-2011): Specialist's degree in German Philology, focus Linguistics

Natural Languages

Tatar (native), Russian, German (TestDaF 5,5,5,5), English (TOEFL iBT 112), Kazakh, Turkish

Experience

07/2025-Present Regional Language Researcher (Independent Contractor)

Mozilla Data Collective (Remotely)

Engineered extraction and distributed filtering infrastructure to process, clean, and archive terabytes of unstructured text from global media corpora for machine learning applications.
Established rigorous dataset evaluation pipelines, utilizing precise word and token counts as the primary indicators of linguistic viability for downstream LLM and NLP model training.
Developed standardized data packaging protocols and wrote comprehensive datasheets to ensure massive linguistic datasets are immediately ingestible by external MLOps architectures.
Architected data processing workflows that adhere strictly to open-access standards, ensuring high-quality, ML-friendly data is available for low-resource language R&D.

01/2024-12/2025 AI Backend Engineer/Computational Linguist (Independent Contractor)

US LegalTech Startup (Remotely)

Architected and deployed an end-to-end data ingestion pipeline, scraping and processing over 20 million public trademark records from the USPTO API.
Designed and optimized a highly complex Elasticsearch schema capable of executing multi-layered similarity queries, including phonetic matching, orthographic homoglyphs, and cross-lingual semantic translations.
Built a RESTful API utilizing FastAPI to serve real-time trademark clearance search results to downstream frontend applications.
Implemented automated data clearance pipelines to isolate and verify so-called distinctive components within registered trademarks.

08/2021-10/2023 Computational linguist / NLP developer

Orpheus Technology Ltd (prowritingaid.com)

Helped launch ProWritingAid's premium-tier tone detection feature by training and fine-tuning models using Scikit-learn and Huggingface Transformers.
Deployed models on Microsoft Azure ML and monitored performance over time to ensure zero downtime.
Converted PyTorch models into ONNX format, optimizing and quantizing them for specific Azure ML hardware using the Huggingface Optimum library.
Load-tested Azure ML endpoints using Locust. Quantized ONNX models handled orders of magnitude more requests per second with negligible accuracy loss compared to vanilla API wrappers, saving thousands of dollars monthly in infrastructure costs.
Used Large Language Models (ChatGPT, Claude) for data generation and labeling, and fine-tuned models via the OpenAI API.
Prototyped GUI apps in Racket (gui-easy) and Clojure (cljfx) to explore combining ProWritingAid with LLMs.

01/2018-11/2021 Remote research assistant/computational linguist

Nazarbayev University, Nur-Sultan, Kazakhstan

Trained a new speech-to-text system for Kazakh using the Coqui STT framework (details: https://arxiv.org/abs/2107.10637).
Wrote a web interface to an existing ESPnet-based Kazakh speech-to-text system and packaged the app into a Docker image.
Gathered data and led the launching of https://commonvoice.mozilla.org/kk and https://commonvoice.mozilla.org/tt.
Extended the Kazakh morphological transducer apertium-kaz with new stems, affixes, and a Constraint Grammar including dependency parsing capabilities in the Universal Dependencies framework.

07/2017-03/2018 Remote software contractor

Central Eurasian Studies Department, Indiana University, Bloomington (United States)

Developed a closed domain, rule-based Tatar-to-English machine translator to translate Tatar population records (1828-1918).

04/2015-03/2017 Research assistant

Institute for Natural Language Processing, University of Stuttgart

Developed a dependency-based sentence simplifier/compressor for German.

05/2014-08/2014 Student developer

Apertium project as part of the Google Summer of Code 2014 programme

Developed a prototype Tatar-Russian machine translator.

05/2012-08/2012 Student developer

Apertium project, Google Summer of Code 2012 programme

Developed a rule-based Kazakh-Tatar machine translator[cite: 46, 103, 160, 214].

Publications & preprints

Sevilay Bayatlı, Ilnar Salimzianov, Jonathan North Washington (2023). Bayat, a variety of Iraq Turkic spoken in villages around Kerkük. Journal of Endangered Turkic Languages, Vol. 14, Issue 23 (upcoming)
Washington, Jonathan N.; Tyers, Francis M; Salimzianov, Ilnar (2022). Non-finite verb forms in Turkic exhibit syncretism, not multifunctionality. Folia Linguistica, Vol. 56, Issue. 3, p. 693.
Ilnar Salimzianov (2021). A baseline model for computationally inexpensive speech recognition for Kazakh using the Coqui STT framework. arxiv.org:2107.10637
Jonathan Washington, Ilnar Salimzianov, Francis Tyers, Memduh Gökırmak, Sardana Ivanova, Oğuzhan Kuyrukçu (2019). Free/Open-Source technologies for Turkic languages developed in the Apertium project. International Conference on Computer Processing In Turkic Languages Turklang 2019.
Sevilay Bayatlı, Sefer Kurnaz, Ilnar Salimzianov, Jonathan Washington, Francis Tyers (2018). Rule-based machine translation from Kazakh to Turkish. EAMT 2018.
Jonathan Washington, Ilnar Salimzianov, Francis Tyers (2014). Finite-state morphological transducers for three Kypchak languages. Proceedings of the Language Resources and Evaluation Conference, LREC 2014.
Ilnar Salimzyanov, Jonathan Washington, Francis Tyers (2013). A free/open-source Kazakh-Tatar machine translation system. Proceedings of MT Summit XIV.
Francis Tyers, Jonathan Washington, Ilnar Salimzyanov and Rustam Batalov (2012). A prototype machine translation system for Tatar and Bashkir based on free/open-source components. Proceedings of the Turkic Languages Workshop at the Language Resources and Evaluation Conference, LREC2012.

Independent Coursework

MITx 6.00.1x: Introduction to Computer Science and Programming Using Python = 93% (on edx.org)
MITx 6.00.2x Introduction to Computational Thinking and Data Science = 88% (on edx.org)
UTAustinX UT.5.02x Linear Algebra - Foundations to Frontiers = 62% (on edx.org)
SPD1x: Systematic Program Design - Part 1
Andrew Ng’s “Machine Learning” course

Service