Skip to content.
You are here: Home » Resources » ICTD Case Studies » Dobhase: English-Nepali Translator
Personal tools

Dobhase: English-Nepali Translator

Page Tools

Grant Amount: US$ 29,990

Keywords: ENGLISH-NEPALI, INTERNET, LOCALIZATION, NEPAL

Geographic coverage: Nepal

Objective

The objective of this project is to develop a web-based engine that can provide a “gist translation” of general English source text into corresponding Nepali target text.

Research context

The number of Internet users in Nepal is growing rapidly but only a small portion of the population can read and understand English, thus limiting access to global Internet content. The project addresses this problem through the development of an English to Nepali Machine Translation (MT) system. The MT project aims to create a web-based engine that can translate general English texts to Nepali on demand. A transfer-based approach is used to develop the main translation engine, with server side scripting and Hypertext Markup Language (HTML) used to integrate the engine with a web server.

Target beneficiaries

The free online distribution of the software package benefits Nepali speaking Internet users. The MT also benefits a wide range of organizations that require translation of web content, manuals and documents from English to Nepali. Experience in MT and local language computing benefits students and faculty at Kathmandu University. The project work also furthers research related to language computing.

Outputs

  • Bilingual English-Nepali Dictionary;
  • English morphological analyser;
  • English parser;
  • Nepali morphological analyser;
  • Nepali generator;
  • Transfer rules;
  • Web interface; and
  • An integrated MT engine.

Research results and outcomes

This project work was hampered by political events and technical problems that were beyond its control. Just after the project work began a state of emergency was declared in Nepal, affecting the transfer of funds to the project team. The project also faced setbacks due to a drought that limited hydroelectric power supply. Finally, project work was impeded by curfews that were in effect during the recent political unrest. Despite these setbacks, the project team has noted that these problems are past and continued its work.

The project can be viewed as a two-step process. The first step consists of the analysis, design of the system architecture and implementation of the MT system, Dobhase. The second step consists of making the Dobhase system available over the Internet. This requires the design of user interfaces, choice of appropriate web locations for the system and popularization of the system in the community that can benefit from its use.

The project followed an evolutionary approach to system development. Prototypes were placed online to allow end-users from different backgrounds to offer feedback and suggestions on the quality of translation. The project goals, design and implementation were presented in conferences, seminars and workshops. On the basis of the feedback received from both of these sources, the system has been continually refined and upgraded.

The Dobhase MT system uses a transfer-based approach that first analyses the source language and then generates a representation called a parse tree. Transfer rules are then applied to this representation and generate the syntax of the target language. Finally, morphology generation rules are applied to each terminal (lexical items) of the parse tree. The system has pipeline architecture. Each module has an input, which is the output of another module. Output from the system is guaranteed even if the input English sentence is grammatically incorrect. In the worst cases, the system is able to produce word-to-word translation.

The rule-based approach was selected over a more sophisticated method because a sufficiently large parallel text corpus for the English-Nepali language pair was not available. The system being developed is enough to provide a gist translation, however it may require post editing for the quality of translation to be acceptable for publications and formal writing. Looking at other initiatives in Nepal, it is likely that in a couple of year's time there may be a sufficiently large parallel corpus which could be used to modify the current system and incorporate hybrid modules (statistic/example-based) with greater accuracy, thus, improving the quality of the translated data.

The system is now online and available to everyone who has access to the Internet. The project team notes that the full realization of the project requires further advertising of the product and training of users, which is beyond the scope of the present initiative. Nevertheless, the Dobhase MT project has designed and implemented a "rule-based" MT system that can be further developed and enhanced. The project has produced a number of by-products, which enhance the linguistic resources of Nepali language in general, such as the Bilingual dictionary and grammar rules among others. The project results and outputs have been widely disseminated through referred conferences, journal publications and presentations.

Duration

Start Date: January 2005
End Date: June 2006
Total Duration: 18 Months

Contact information

Sanat Kumar Bista, Assistant Professor 
Information and Language Processing Research Lab
Kathmandu University, Dhulikhel, Kavre, Nepal
PO Box 6250, Kathmandu, Nepal
Telephone: +977 11 661 399
Fax: +977 11 661 443
Email: sanat@ku.edu.np

Website: http://www.ku.edu.np

Reference website: http://www.ku.edu.np/~dobhase


Last modified 2006-09-18 04:58 PM
 
 

Powered by Plone rss logo

This site conforms to the following standards:

Valid XHTML 1.0 Transitional Valid CSS!

Hosted by Inigo