Development of Punjabi-English (PunEng) Parallel Corpus for Machine Translation System

Kamal Deep; Ajit Kumar; Vishal Goyal

doi:10.14419/ijet.v7i2.10762

Authors and Affiliations

Kamal Deep Punjabi University
Ajit Kumar Punjabi University
Vishal Goyal Punjabi University

About this article

DOI:

https://doi.org/10.14419/ijet.v7i2.10762

Received:

28-03-2018

Revised:

12-06-2019

Accepted:

06-04-2018

Published:

10-05-2018

Views:

252

Downloads:

7

Download PDF

Keywords:

English, Machine Translation, Parallel Corpus, Punjabi, Puneng Corpus

Abstract

This paper describes the creation process and statistics of Punjabi English (PunEng) parallel corpus. Parallel corpus is the main requirement to develop statistical machine translation as well as neural machine translation. Until now, we do not have any availability of PunEng parallel corpus. In this paper, we have shown difficulties and intensive labor to develop parallel corpus. Methods used for collecting data and the results are discussed, errors during the process of collecting data and how to handle these errors will be described.

References

M. Post, C. Callison-Burch, and M. Osborne, “Constructing parallel corpora for six Indian languages via crowdsourcing,” Wmt-2012, pp. 401–409, 2012.

A. Kunchukuttan, P. Mehta, and P. Bhattacharyya, “The IIT Bom-bay English-Hindi Parallel Corpus,” pp. 2–5, 2017.

V. Goyal and G. S. Lehal, “Hindi to Punjabi machine translation sys-tem,” Commun. Comput. Inf. Sci., vol. 139 CCIS, no. 1, pp. 236–241, 2011.

Webliography

[W1]https://en.wikipedia.org/wiki/Punjabi_language

View more references (7)

[W2] http://www.lancaster.ac.uk/fass/projects/corpus/emille/

[W3]http://www.lancaster.ac.uk/fass/projects/corpus/emille/MAUAL.htm

[W4]http://tdildc.in/index.php?option=com_download&task=fsearch&lang=en&limitstart=15&limit=5

[W5]http://www.statmt.org/wmt16/translation-task.html

[W6]https://translate.google.com/

[W7]https://www.wikipedia.org/

[W8]http://tdildc.in/index.php?option=com_download&task=showresourceDetails&toolid=281&lang=en

How to Cite

Deep, K., Kumar, A., & Goyal, V. (2018). Development of Punjabi-English (PunEng) Parallel Corpus for Machine Translation System. International Journal of Engineering and Technology, 7(2), 690-693. https://doi.org/10.14419/ijet.v7i2.10762

Download Citation