Multilingual Document Data Preprocessing and Machine Translation Employing Neural Machine Translation Models

Main Article Content

Sunita B , T. John Peter

Abstract

In today’s world of Internet, there is a significant increase in data, encompassing not just local but also regional languages. India, with its 22 official regional languages, including Hindi as the National Language and Kannada as a Dravidian language, presents a unique challenge and opportunity for multilingual document processing. In this paper, we focus on training a system using Kannada and Hindi languages and employing distinct stemming algorithms tailored to each language for efficient data preprocessing. Given the linguistic diversity, we utilize the Transformer Model for Translation and conduct a thorough evaluation of its performance for Kannada and Hindi translations. Our study encompasses the implementation and evaluation of various stemming algorithms, providing insights into effective multilingual NLP strategies.

Article Details

Section
Articles