Ilnar Salimzianov's Personal Site
English
|
Татарча |
Русский
Resume
Ilnar Salimzianov, MLOps & AI Backend Engineer
Websites
Personal page, Professional services, Educational hobby project
Contact
Email
Portfolio
Gitlab, Github, Sourceforge
Please also see the Projects
page.
Programming skills
- Languages
-
Python, Racket, Clojure, GNU Bash
- Experience with
-
Data & Backend Infra: Elasticsearch, FastAPI, Pandas, Flask, Scrapy, Docker, Terraform | MLOps & NLP: Azure
ML, ONNX Runtime, HF Transformers, Scikit-learn, Spacy, NLTK |
Low-Resource NLP: Apertium, HFST, VISL CG-3 | Testing &
Packaging: Pytest, GNU Make, Pyinstaller
Education
- University of Stuttgart (2014-2017)
- M.Sc. degree in Computational Linguistics
- Kazan State University (2006-2011)
- Specialist's degree in German Philology, focus Linguistics
Natural Languages
Tatar (native), Russian, German (TestDaF 5,5,5,5), English (TOEFL iBT
112), Kazakh, Turkish
Experience
- 07/2025-Present Regional Language Researcher (Independent
Contractor)
- Mozilla Data Collective (Remotely)
-
- Engineered extraction and distributed filtering infrastructure to
process, clean, and archive terabytes of unstructured text from global
media corpora for machine learning applications.
- Established rigorous dataset evaluation pipelines, utilizing
precise word and token counts as the primary indicators of linguistic
viability for downstream LLM and NLP model training.
- Developed standardized data packaging protocols and wrote
comprehensive datasheets to ensure massive linguistic datasets are
immediately ingestible by external MLOps architectures.
- Architected data processing workflows that adhere strictly to
open-access standards, ensuring high-quality, ML-friendly data is
available for low-resource language R&D.
- 01/2024-12/2025 AI Backend Engineer/Computational Linguist
(Independent Contractor)
- US LegalTech Startup (Remotely)
-
- Architected and deployed an end-to-end data ingestion pipeline,
scraping and processing over 20 million public trademark records from
the USPTO API.
- Designed and optimized a highly complex Elasticsearch schema
capable of executing multi-layered similarity queries, including
phonetic matching, orthographic homoglyphs, and cross-lingual semantic
translations.
- Built a RESTful API utilizing FastAPI to serve real-time trademark
clearance search results to downstream frontend applications.
- Implemented automated data clearance pipelines to isolate and
verify so-called distinctive components within registered
trademarks.
- 08/2021-10/2023 Computational linguist / NLP developer
-
Orpheus Technology Ltd (prowritingaid.com)
-
- Helped launch ProWritingAid's premium-tier tone detection feature
by training and fine-tuning models using Scikit-learn and Huggingface
Transformers.
- Deployed models on Microsoft Azure ML and monitored performance
over time to ensure zero downtime.
- Converted PyTorch models into ONNX format, optimizing and
quantizing them for specific Azure ML hardware using the Huggingface
Optimum library.
- Load-tested Azure ML endpoints using Locust. Quantized ONNX models
handled orders of magnitude more requests per second with negligible
accuracy loss compared to vanilla API wrappers, saving thousands of
dollars monthly in infrastructure costs.
- Used Large Language Models (ChatGPT, Claude) for data generation
and labeling, and fine-tuned models via the OpenAI API.
- Prototyped GUI apps in Racket (gui-easy) and Clojure (cljfx) to
explore combining ProWritingAid with LLMs.
- 01/2018-11/2021 Remote research assistant/computational
linguist
- Nazarbayev University, Nur-Sultan, Kazakhstan
-
- Trained a new speech-to-text system for Kazakh using the Coqui STT
framework (details: https://arxiv.org/abs/2107.10637).
- Wrote a web interface to an existing ESPnet-based Kazakh
speech-to-text system and packaged the app into a Docker image.
- Gathered data and led the launching of https://commonvoice.mozilla.org/kk and https://commonvoice.mozilla.org/tt.
- Extended the Kazakh morphological transducer apertium-kaz with new
stems, affixes, and a Constraint Grammar including dependency parsing
capabilities in the Universal Dependencies framework.
- 07/2017-03/2018 Remote software contractor
- Central Eurasian Studies Department, Indiana University, Bloomington
(United States)
- Developed a closed domain, rule-based Tatar-to-English machine
translator to translate Tatar population records (1828-1918).
- 04/2015-03/2017 Research assistant
- Institute for Natural Language Processing, University of
Stuttgart
- Developed a dependency-based sentence simplifier/compressor for
German.
- 05/2014-08/2014 Student developer
- Apertium project as part of the Google Summer of Code 2014
programme
- Developed a prototype Tatar-Russian machine translator.
- 05/2012-08/2012 Student developer
- Apertium project, Google Summer of Code 2012 programme
- Developed a rule-based Kazakh-Tatar machine translator[cite: 46, 103,
160, 214].
Publications & preprints
- Sevilay Bayatlı, Ilnar Salimzianov, Jonathan North Washington (2023).
Bayat, a variety of Iraq Turkic spoken in villages around Kerkük. Journal
of Endangered Turkic Languages, Vol. 14, Issue 23 (upcoming)
- Washington, Jonathan N.; Tyers, Francis M; Salimzianov, Ilnar (2022).
Non-finite verb forms in Turkic exhibit syncretism, not
multifunctionality. Folia Linguistica, Vol. 56, Issue. 3, p. 693.
- Ilnar Salimzianov (2021). A baseline model for computationally
inexpensive speech recognition for Kazakh using the Coqui STT framework.
arxiv.org:2107.10637
- Jonathan Washington, Ilnar Salimzianov, Francis Tyers, Memduh
Gökırmak, Sardana Ivanova, Oğuzhan Kuyrukçu (2019). Free/Open-Source
technologies for Turkic languages developed in the Apertium project.
International Conference on Computer Processing In Turkic Languages
Turklang 2019.
- Sevilay Bayatlı, Sefer Kurnaz, Ilnar Salimzianov, Jonathan Washington,
Francis Tyers (2018). Rule-based machine translation from Kazakh to
Turkish. EAMT 2018.
- Jonathan Washington, Ilnar Salimzianov, Francis Tyers (2014).
Finite-state morphological transducers for three Kypchak languages.
Proceedings of the Language Resources and Evaluation Conference, LREC
2014.
- Ilnar Salimzyanov, Jonathan Washington, Francis Tyers (2013). A
free/open-source Kazakh-Tatar machine translation system. Proceedings of
MT Summit XIV.
- Francis Tyers, Jonathan Washington, Ilnar Salimzyanov and Rustam
Batalov (2012). A prototype machine translation system for Tatar and
Bashkir based on free/open-source components. Proceedings of the Turkic
Languages Workshop at the Language Resources and Evaluation Conference,
LREC2012.
Independent Coursework
- MITx 6.00.1x: Introduction to Computer Science and Programming Using
Python = 93% (on edx.org)
- MITx 6.00.2x Introduction to Computational Thinking and Data Science =
88% (on edx.org)
- UTAustinX UT.5.02x Linear Algebra - Foundations to Frontiers = 62% (on
edx.org)
- SPD1x: Systematic Program Design - Part 1
- Andrew Ng’s “Machine Learning” course
Service
Home | Resume |
Projects | Publications | Talks |
Reading log | Movies
log | Now | Email