A Method to Integrate Word Sense Disambiguation and Translation Memory for English to Hindi Machine Translation System

  • Sunita Rawat Shri Ramdeobaba College of Engineering and Management

Abstract

Word sense disambiguation deals with deciding the word’s precise meaning in a certain specific context. One of the major problems in natural language processing is lexicalsemantic ambiguity, where a word has more than one meaning. Disambiguating the sense of polysemous words is the most important task in machine translation. This research work aims to design and implement English to Hindi machine translation. The design methodology addresses improving the speed and accuracy of the machine translation process. The algorithm and modules designed in this research work have been deployed on the Hadoop infrastructure, and test cases are designed to check the feasibility and reliability of this process. The research work presented describes the methodologies to reduce data transmission by adding a translation memory component to the framework. The speed of execution is increased by replacing the modules in the machine translation process with lightweight modules, which reduces infrastructure and execution time.

Keywords

machine translation, word sense disambiguation, statistical machine translation, translation memory,

References

1. D. Pinto, D. Vilariño, C. Balderas, M. Tovar, B. Beltrán, Evaluating n-gram models for a bilingual word sense disambiguation task, Computación y Sistemas, 15(2): 209–220, 2011.
2. M. Artetxe, G. Labaka, E. Agirre, Unsupervised statistical machine translation, [in:] Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Brussels, Belgium, pp. 3632–3642, 2018, doi: 10.18653/v1/D18-1399.
3. G. K. Sidhu, N. Kaur, Role of machine translation and word sense disambiguation in natural language processing, IOSR Journal of Computer Engineering (IOSR-JCE), 11(3): 78–83, 2013.
4. N. Sharma, P. Bhatia, English to Hindi Statistical Machine Translation, International Journal of Advances in Computer Networks and Its Security, 1(1): 362–366, 2011.
5. Y.S. Chan, H.T. Ng, D. Chiang, Word sense disambiguation improves statistical machine translation, [in:] Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 33–40, Prague, Czech Republic, 2007.
6. J. Cho, W. Huh, Unsupervised word sense disambiguation using association rules from XML document, International Journal of Applied Engineering Research IJAER, 9(24): 29609–29616, 2014.
7. J.G. Cho, K.C. Shin, A graph-based word sense disambiguation using measures of graph connectivity, KIIT, 12(6): 143–152, 2014, doi: 10.14801/kiitr.2014.12.6.143.
8. P. Desai, A. Sangodkar, Om P. Damani, A domain-restricted, rule based, English-Hindi machine translation system based on dependency parsing, [in:] Proceedings of the 11th International Conference on Natural Language Processing (ICON), Goa, India, 2014.
9. A.R. Pal, D. Saha, Word sense disambiguation in Bengali language using unsupervised methodology with modifications, S¯adhan¯a, 44, Article no. 168, Indian Academy of Sciences, 2019, doi: 10.1007/s12046-019-1149-2.
10. A. Fraser, D. Marcu, Getting the structure right for word alignment: LEAF, [in:] Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Association for Computational Linguistics, Prague, Czech Republic, pp. 50–60, 2007.
11. Y. Jiang, W. Han, K. Tu, A regularization-based framework for bilingual grammar Induction, [in:] Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 1423–1428, Hong Kong, China, 2019, doi: 10.18653/v1/D19-1148.
12. R. Mahendra, H. Septiantri, H.A. Wibowo, R. Manurung, M. Adriani, Cross-lingual and supervised learning approach for Indonesian word sense disambiguation task, [in:] Proceedings of the 9th Global WordNet Conference – GWC 2018, pp. 245–250, Nanyang Technological University (NTU), Singapore, 2018.
13. Y. Xia, Research on statistical machine translation model based on deep neural network, Computing, 102: 643–661, 2020, doi: 10.1007/s00607-019-00752-1.
14. S. Melacci, A. Globo, L. Rigutini, Enhancing modern supervised word sense disambiguation models by semantic lexical resources, [in:] Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018), Miyazaki, Japan, pp. 1012–1017, 2018.
15. D. Melamed, A word-to-word model of translational equivalence, [in:] Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics, Madrid, Spain, July 1997, pp. 490–497, 1997, doi: 10.3115/976909.979680.
16. S. Štajner, M. Franco-Salvador, S.P. Ponzetto, P. Rosso, H. Stuckenschmidt, Sentence alignment methods for improving text simplification systems, [in:] Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Short Papers), pp. 97–102, Vancouver, Canada, July 30 – August 4, 2017, doi: 10.18653/v1/P17-2016.
17. N. Pourdamghani, M. Ghazvininejad, K. Knight, Using word vectors to improve word alignments for low resource machine translation, [in:] Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 2 (Short Papers), pp. 524–528, 2018, doi: 10.18653/v1/N18-2083.
18. X. Pu, N. Pappas, J. Henderson, A. Popescu-Belis, Integrating weakly supervised word sense disambiguation into neural machine translation, Transactions of the Association for Computational Linguistics, 6: 635–650, 2018, doi: 10.1162/tacl_a_00242.
19. P. Rani, V. Pudi, D. Sharma, Semisupervised data driven word sense disambiguation for resource-poor languages, [in:] Proceedings of the 14th International Conference on Natural Language Processing (ICON 2017), pp. 503–512, Kolkata, India, 2017.
20. A. Saif, N. Omar, U.Z. Zainodin, M.J. Ab-Aziz, Building sense tagged corpus using Wikipedia for supervised word sense disambiguation, Procedia Computer Science, 123: 403–412, 2018, doi: 10.1016/j.procs.2018.01.062.
21. A.R. Shahid, D. Kazakov, Using parallel corpora for word sense disambiguation, [in:] Proceedings of Recent Advances in Natural Language Processing, Hissar, Bulgaria, pp. 336–341, 2013.
22. N. Sharma, English to Hindi statistical machine translation system, Master thesis, Thapar University, Patiala, India, 2011, https://tudr.thapar.edu:8443/jspui/handle/10266/1449.
23. K. Neerajaa, R.B. Padmaja, K. Srinivas Rao, Graph-based word sense disambiguation in Telugu language, International Journal of Knowledge-based and Intelligent Engineering Systems, 23(1): 55–60, 2019, doi: 10.3233/KES-190399.
24. A.V. Subalalitha, B. S. Baqui, Statistical machine translation from English to Hindi, International Journal of Pure and Applied Mathematics, 118(20): 1649–1655, 2018.
25. A.M. Bigvand, T. Bu, A. Sarkar, Joint prediction of word alignment with alignment types, Transactions of the Association for Computational Linguistics, 5: 501–514, 2017, doi: 10.1162/tacl_a_00076.
26. X. Wang, Z. Tu, M. Zhang, Incorporating statistical machine translation word knowledge into neural machine translation, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(12): 2255–2266, 2018, doi: 10.1109/TASLP.2018.2860287.
27. S. Yamaki, H. Shinnou, K. Komiya, M. Sasaki, Supervised word sense disambiguation with sentences similarities from context word embeddings, [in:] Proceedings of the 30th Pacific Asia Conference on Language, Information and Computation (PACLIC 30), October 2016, Seoul, South Korea, pp. 115–121, 2016.
28. IIT Bombay English-Hindi Corps, http://www.cfilt.iitb.ac.in/wsd/annotated_corpus/, last accessed on 05.11.2018.
29. B. Moradi, E. Ansari, Z. Žabokrtsky, Unsupervised word sense disambiguation using word embeddings, [in:] Proceedings of the 25th Conference of Open Innovations Association (FRUCT), 5–8 Nov., 2019, Helsinki, Finland, 2019.
30. A. Kumari, D.K. Lobiyal, Efficient estimation of Hindi WSD with distributed word representation in vector space, Journal of King Saud University Computer and Information Sciences, [in press] 2021.
31. S. Rawat, Supervised word sense disambiguation using decision tree, International Journal of Recent Technology and Engineering (IJRTE), 8(2): 4043–4047, 2019.
32. S.G. Rawat, M. B. Chandak, N.A. Chavan, An approach for improving accuracy of machine translation using WSD and GIZA, International Journal of Computer Sciences and Engineering, 5(10): 256–259, Oct. 2017, doi: 10.26438/ijcse/v5i10.256259.
33. S. Rawat, M.B. Chandak, A. Chavan, An approach for efficient machine translation using translation memory, [in:] A. Unal, M. Nayak, D.K. Mishra, D. Singh, A. Joshi [Eds.], Smart Trends in Information Technology and Computer Communications. SmartCom 2016. Communications in Computer and Information Science, vol. 628, Springer, Singapore, doi: 10.1007/978-981-10-3433-6_34.
34. S. Rawat, M. Chandak, Comparative survey of document analysis and categorization techniques, [in:] Proc. of the International Conference On Recent Advances in Computer Science, E-Learning, Information & Communication Technology (CSIT – 2016), New Delhi, India, 3(1): 37–41, 2016.
35. S. Rawat, A review on word sense disambiguation, International Journal of Innovative Research in Computer & Communications Engineering, 3(4): 2750–2755, April 2015, doi: 10.15680/ijircce.2015.0304012.
36. S. Rawat, A comparative study on different approaches to word sense disambiguation, [in:] Proceedings of the National Conference on Research in Cloud and Cyber Security (NCRCCS 2015), Nagpur, India, 2015.
37. S. Rawat, M. Chandak, Word sense disambiguation and classification algorithms: A review, International Journal of Computer Science and Applications (Proc. of NCRMC-2014, RCoEM, Nagpur, India as a Special Issue of IJCSA), 8(1): 4–8, 2015.
Published
Apr 11, 2022
How to Cite
RAWAT, Sunita. A Method to Integrate Word Sense Disambiguation and Translation Memory for English to Hindi Machine Translation System. Computer Assisted Methods in Engineering and Science, [S.l.], v. 29, n. 1–2, p. 125–144, apr. 2022. ISSN 2299-3649. Available at: <https://cames.ippt.pan.pl/index.php/cames/article/view/395>. Date accessed: 28 may 2022. doi: http://dx.doi.org/10.24423/cames.395.