A Method to Integrate Word Sense Disambiguation and Translation Memory for English to Hindi Machine Translation System

Sunita Rawat

doi:10.24423/cames.395

Authors

Sunita Rawat Department of Computer Science and Engineering, Shri Ramdeobaba College of Engineering and Management, Nagpur, India

Abstract

Word sense disambiguation deals with deciding the word’s precise meaning in a certain specific context. One of the major problems in natural language processing is lexicalsemantic ambiguity, where a word has more than one meaning. Disambiguating the sense of polysemous words is the most important task in machine translation. This research work aims to design and implement English to Hindi machine translation. The design methodology addresses improving the speed and accuracy of the machine translation process. The algorithm and modules designed in this research work have been deployed on the Hadoop infrastructure, and test cases are designed to check the feasibility and reliability of this process. The research work presented describes the methodologies to reduce data transmission by adding a translation memory component to the framework. The speed of execution is increased by replacing the modules in the machine translation process with lightweight modules, which reduces infrastructure and execution time.

Keywords:

machine translation, word sense disambiguation, statistical machine translation, translation memory

References

1. D. Pinto, D. Vilariño, C. Balderas, M. Tovar, B. Beltrán, Evaluating n-gram models for a bilingual word sense disambiguation task, Computación y Sistemas, 15(2): 209–220, 2011.

2. M. Artetxe, G. Labaka, E. Agirre, Unsupervised statistical machine translation, [in:] Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Brussels, Belgium, pp. 3632–3642, 2018, https://doi.org/10.18653/v1/D18-1399

3. G. K. Sidhu, N. Kaur, Role of machine translation and word sense disambiguation in natural language processing, IOSR Journal of Computer Engineering (IOSR-JCE), 11(3): 78–83, 2013.

4. N. Sharma, P. Bhatia, English to Hindi Statistical Machine Translation, International Journal of Advances in Computer Networks and Its Security, 1(1): 362–366, 2011.

5. Y.S. Chan, H.T. Ng, D. Chiang, Word sense disambiguation improves statistical machine translation, [in:] Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 33–40, Prague, Czech Republic, 2007.

6. J. Cho, W. Huh, Unsupervised word sense disambiguation using association rules from XML document, International Journal of Applied Engineering Research IJAER, 9(24): 29609–29616, 2014.

7. J.G. Cho, K.C. Shin, A graph-based word sense disambiguation using measures of graph connectivity, KIIT, 12(6): 143–152, 2014, https://doi.org/10.14801/kiitr.2014.12.6.143

8. P. Desai, A. Sangodkar, Om P. Damani, A domain-restricted, rule based, English-Hindi machine translation system based on dependency parsing, [in:] Proceedings of the 11th International Conference on Natural Language Processing (ICON), Goa, India, 2014.

9. A.R. Pal, D. Saha, Word sense disambiguation in Bengali language using unsupervised methodology with modifications, S¯adhan¯a, 44, Article no. 168, Indian Academy of Sciences, 2019, https://doi.org/10.1007/s12046-019-1149-2

10. A. Fraser, D. Marcu, Getting the structure right for word alignment: LEAF, [in:] Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Association for Computational Linguistics, Prague, Czech Republic, pp. 50–60, 2007.

11. Y. Jiang, W. Han, K. Tu, A regularization-based framework for bilingual grammar Induction, [in:] Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 1423–1428, Hong Kong, China, 2019, https://doi.org/10.18653/v1/D19-1148

12. R. Mahendra, H. Septiantri, H.A. Wibowo, R. Manurung, M. Adriani, Cross-lingual and supervised learning approach for Indonesian word sense disambiguation task, [in:] Proceedings of the 9th Global WordNet Conference – GWC 2018, pp. 245–250, Nanyang Technological University (NTU), Singapore, 2018.

13. Y. Xia, Research on statistical machine translation model based on deep neural network, Computing, 102: 643–661, 2020, https://doi.org/10.1007/s00607-019-00752-1

14. S. Melacci, A. Globo, L. Rigutini, Enhancing modern supervised word sense disambiguation models by semantic lexical resources, [in:] Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018), Miyazaki, Japan, pp. 1012–1017, 2018.

15. D. Melamed, A word-to-word model of translational equivalence, [in:] Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics, Madrid, Spain, July 1997, pp. 490–497, 1997, https://doi.org/10.3115/976909.979680

16. S. Štajner, M. Franco-Salvador, S.P. Ponzetto, P. Rosso, H. Stuckenschmidt, Sentence alignment methods for improving text simplification systems, [in:] Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Short Papers), pp. 97–102, Vancouver, Canada, July 30 – August 4, 2017, https://doi.org/10.18653/v1/P17-2016

17. N. Pourdamghani, M. Ghazvininejad, K. Knight, Using word vectors to improve word alignments for low resource machine translation, [in:] Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 2 (Short Papers), pp. 524–528, 2018, https://doi.org/10.18653/v1/N18-2083

18. X. Pu, N. Pappas, J. Henderson, A. Popescu-Belis, Integrating weakly supervised word sense disambiguation into neural machine translation, Transactions of the Association for Computational Linguistics, 6: 635–650, 2018, https://doi.org/10.1162/tacl_a_00242

19. P. Rani, V. Pudi, D. Sharma, Semisupervised data driven word sense disambiguation for resource-poor languages, [in:] Proceedings of the 14th International Conference on Natural Language Processing (ICON 2017), pp. 503–512, Kolkata, India, 2017.

20. A. Saif, N. Omar, U.Z. Zainodin, M.J. Ab-Aziz, Building sense tagged corpus using Wikipedia for supervised word sense disambiguation, Procedia Computer Science, 123: 403–412, 2018, https://doi.org/10.1016/j.procs.2018.01.062

21. A.R. Shahid, D. Kazakov, Using parallel corpora for word sense disambiguation, [in:] Proceedings of Recent Advances in Natural Language Processing, Hissar, Bulgaria, pp. 336–341, 2013.

22. N. Sharma, English to Hindi statistical machine translation system, Master thesis, Thapar University, Patiala, India, 2011, https://tudr.thapar.edu:8443/jspui/handle/10266/1449

23. K. Neerajaa, R.B. Padmaja, K. Srinivas Rao, Graph-based word sense disambiguation in Telugu language, International Journal of Knowledge-based and Intelligent Engineering Systems, 23(1): 55–60, 2019, https://doi.org/10.3233/KES-190399

24. A.V. Subalalitha, B. S. Baqui, Statistical machine translation from English to Hindi, International Journal of Pure and Applied Mathematics, 118(20): 1649–1655, 2018.

25. A.M. Bigvand, T. Bu, A. Sarkar, Joint prediction of word alignment with alignment types, Transactions of the Association for Computational Linguistics, 5: 501–514, 2017, https://doi.org/10.1162/tacl_a_00076

26. X. Wang, Z. Tu, M. Zhang, Incorporating statistical machine translation word knowledge into neural machine translation, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(12): 2255–2266, 2018, https://doi.org/10.1109/TASLP.2018.2860287

27. S. Yamaki, H. Shinnou, K. Komiya, M. Sasaki, Supervised word sense disambiguation with sentences similarities from context word embeddings, [in:] Proceedings of the 30th Pacific Asia Conference on Language, Information and Computation (PACLIC 30), October 2016, Seoul, South Korea, pp. 115–121, 2016.

28. IIT Bombay English-Hindi Corps, http://www.cfilt.iitb.ac.in/wsd/annotated_corpus/ last accessed on 05.11.2018.

29. B. Moradi, E. Ansari, Z. Žabokrtsky, Unsupervised word sense disambiguation using word embeddings, [in:] Proceedings of the 25th Conference of Open Innovations Association (FRUCT), 5–8 Nov., 2019, Helsinki, Finland, 2019.

30. A. Kumari, D.K. Lobiyal, Efficient estimation of Hindi WSD with distributed word representation in vector space, Journal of King Saud University Computer and Information Sciences, [in press] 2021.

31. S. Rawat, Supervised word sense disambiguation using decision tree, International Journal of Recent Technology and Engineering (IJRTE), 8(2): 4043–4047, 2019.

32. S.G. Rawat, M. B. Chandak, N.A. Chavan, An approach for improving accuracy of machine translation using WSD and GIZA, International Journal of Computer Sciences and Engineering, 5(10): 256–259, Oct. 2017, https://doi.org/10.26438/ijcse/v5i10.256259

33. S. Rawat, M.B. Chandak, A. Chavan, An approach for efficient machine translation using translation memory, [in:] A. Unal, M. Nayak, D.K. Mishra, D. Singh, A. Joshi [Eds.], Smart Trends in Information Technology and Computer Communications. SmartCom 2016. Communications in Computer and Information Science, vol. 628, Springer, Singapore, https://doi.org/10.1007/978-981-10-3433-6_34

34. S. Rawat, M. Chandak, Comparative survey of document analysis and categorization techniques, [in:] Proc. of the International Conference On Recent Advances in Computer Science, E-Learning, Information & Communication Technology (CSIT – 2016), New Delhi, India, 3(1): 37–41, 2016.

35. S. Rawat, A review on word sense disambiguation, International Journal of Innovative Research in Computer & Communications Engineering, 3(4): 2750–2755, April 2015, https://doi.org/10.15680/ijircce.2015.0304012

36. S. Rawat, A comparative study on different approaches to word sense disambiguation, [in:] Proceedings of the National Conference on Research in Cloud and Cyber Security (NCRCCS 2015), Nagpur, India, 2015.

37. S. Rawat, M. Chandak, Word sense disambiguation and classification algorithms: A review, International Journal of Computer Science and Applications (Proc. of NCRMC-2014, RCoEM, Nagpur, India as a Special Issue of IJCSA), 8(1): 4–8, 2015.

Online first
Accepted manuscripts
2026, Vol 33
	No 2	No 1
2025, Vol 32
	No 1	No 2	No 3	No 4
2024, Vol 31
	No 1	No 2	No 3	No 4
2023, Vol 30
	No 1	No 2	No 3	No 4
2022, Vol 29
	No 1-2		No 3	No 4
2021, Vol 28
	No 1	No 2	No 3	No 4
2020, Vol 27
	No 1	No 2-3		No 4
2019, Vol 26
	No 1	No 2	No 3-4
2018, Vol 25
	No 1	No 2-3		No 4
2017, Vol 24
	No 1	No 2	No 3	No 4
2016, Vol 23
	No 1	No 2-3		No 4
2015, Vol 22
	No 1	No 2	No 3	No 4
2014, Vol 21
	No 1	No 2	No 3-4
2013, Vol 20
	No 1	No 2	No 3	No 4
2012, Vol 19
	No 1	No 2	No 3	No 4
2011, Vol 18
	No 1-2		No 3	No 4
2010, Vol 17
	No 1	No 2/3/4
2009, Vol 16
	No 1	No 2	No 3-4
2008, Vol 15
	No 1	No 2	No 3-4
2007, Vol 14
	No 1	No 2	No 3	No 4
2006, Vol 13
	No 1	No 2	No 3	No 4
2005, Vol 12
	No 1	No 2-3		No 4
2004, Vol 11
	No 1	No 2-3		No 4
2003, Vol 10
	No 1	No 2	No 3	No 4
2002, Vol 9
	No 1	No 2	No 3	No 4
2001, Vol 8
	No 1	No 2-3		No 4
2000, Vol 7
	No 1	No 2	No 3	No 4
1999, Vol 6
	No 1	No 2	No 3-4
1998, Vol 5
	No 1	No 2	No 3	No 4
1997, Vol 4
	No 1	No 2	No 3-4
1996, Vol 3
	No 1	No 2	No 3	No 4
1995, Vol 2
	No 1	No 2	No 3	No 4
1994, Vol 1
	No 1-2		No 3-4

A Method to Integrate Word Sense Disambiguation and Translation Memory for English to Hindi Machine Translation System

Downloads

Authors

Abstract

Keywords:

References

cover

ippt-pan

Issue

Pages

Section

DOI

Received

Accepted

Published

License

How to Cite

Principal Contact

Address

Support Contact