Syllabification and parameter optimisation in Zulu to English machine translation
Keywords:machine translation, word segmentation, alignment, Zulu, English
AbstractWe present a series of experiments involving the machine translation of Zulu to English using a well-known statistical software system. Due to morphological complexity and relative scarcity of resources, the case of Zulu is challenging. Against a selection of baseline models, we show that a relatively naive approach of dividing Zulu words into syllables leads to a surprising improvement. We further improve on this model through manual configuration changes. Our best model significantly outperforms the baseline models (BLEU measure, at p < 0.001) even when they are optimised to a similar degree, only falling short of the well-known Morfessor morphological analyser that makes use of relatively sophisticated algorithms. These experiments suggest that even a simple optimisation procedure can improve the quality of this approach to a significant degree. This is promising particularly because it improves on a mostly language independent approach — at least within the same language family. Our work also drives the point home that sub-lexical alignment for Zulu is crucial for improved translation quality.
Research Papers (general)
Copyright of all work published here subsists in the authors. While SACJ retains right of first publication, subsequent re-publication is expressly permitted provided the original SACJ publication is acknowledged and cited, according to the terms detailed below. If plagiarism is detected during review, a paper may be summarily rejected and will not be accepted unless even minor infringements are corrected. Should plagiarism be detected after a paper is published, the Editor reserves the right to withdraw a paper from publication. We expect authors to be honest in representing work as their own, and to respect the time and effort our reviewers put in without an undue burden of policing plagiarism, and hence take violations seriously. SACJ applies the Creative Commons Attribution NonCommercial 4.0 License (CC BY-NC 4.0) to all papers published in this journal. Authors who publish with SACJ agree to the following:
- Authors retain copyright and grant SACJ right of first publication. The work is additionally licensed under a Creative Commons Attribution Non-Commercial License that requires others who share the work to acknowledge the work’s authorship and initial publication in SACJ. Should anyone else wish to make commercial use of the work, SACJ cedes the right to the author to negotiate terms and does not expect to be paid any royalties.
- Authors may enter into additional arrangements for non-exclusive distribution of the SACJ-published version of the work (e.g., post it to a repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are required to refrain from posting their work online prior to completion of reviews so as not to compromise double-blind reviewing or confuse plagiarism checks.