Pocket Polyglot Mzansi: Machine Translation for Local Languages on the Edge

Speaker Stefan Strydom
Track Applications of LLMs and AI
Type Short Talk (25 minutes)

Abstract

Introduction


While commercial machine translation (MT) systems and open-source projects like NLLB-200 have expanded to include African languages, translation quality remains significantly lower than for high-resource languages. Furthermore, their reliance on cloud-based deployment limits access in low-connectivity settings.


We show that much smaller versions of models like NLLB-200 can be fine-tuned for South African languages with minimal accuracy loss. This project builds on my work for two Zindi challenges: the winning solution in the Melio MLOps MT Challenge (competition page, source code), and a second-place finish in the Lelapa AI Buzuzu-Mavi Challenge (competition page, source code). This research has been accepted and will be presented at Deep Learning IndabaX South Africa in Stellenbosch in July 2025.


Intended audience: Python developers with an interest in machine learning and AI; machine learning researchers and engineers interested in developing small scale AI; developers interested in deploying AI on edge devices.


Methodology


NLLB-200 is a state-of-the-art open-source MT model for low-resource languages. However, the smallest version still has 600M parameters and needs 1.15GB of memory to load the weights in 16-bit (841MB in 8-bit). We shrink this model to 50M parameters by reducing the vocabulary and pruning the number of hidden layers and their dimensions. The resulting model requires just 95MB of memory to load the weights in 16-bit (53MB in 8-bit).


We fine-tune this model on the WMT22-African dataset for the top three South African home languages (Census 2022) as well as English. Training ran for 240k steps on an A6000 GPU on the JarvisLabs platform, taking under 21 hours and costing less than ZAR300.


Results


On the Flores-200 devtest set, our model achieves a chrF++ score of 51.8 on in-domain language pairs compared to 54.2 for the 600M-parameter model. Our model outperforms the 600M model on isiZulu-to-isiXhosa and English-to-Afrikaans translation. In general, the small model matches or exceeds the accuracy of NLLB-200 for language pairs where English is not the target language. Overall, the small model retains more than 95% of the accuracy of NLLB-200-600M while requiring 92% less memory for inference.


Next Steps


We’re developing a mobile app to demonstrate offline translation and exploring distillation techniques to improve accuracy. We show that translation accuracy of large models is attainable by significantly smaller models; however, MT for low-resource languages is still limited by poor quality parallel corpora. We plan to explore self-supervised learning on monolingual data as one path forward.


Talk outline



  • Introduction and problem statement (2 minutes)

  • Literature review (3 minutes)

  • Modeling techniques and experiments (5 minutes)

  • Main results (5 minutes)

  • Secondary findings and next steps (5 minutes)

  • Q&A (5 minutes)