WORD CLUSTERING OF BANGLA SENTENCES USING HIGHER ORDER N-GRAM LANGUAGE MODEL

Asmaul Hosna; Ayesha Khatun; Md. Jahidul Islam; Md. Mahin; Babe Sultana; Sumaiya Kabir

Published: Apr 28, 2022

Keywords:

N-gram model corpus cluster word natural language processing (NLP) Bangla language processing (BLP) higher orders N-gram

Asmaul Hosna

Ayesha Khatun

Md. Jahidul Islam

Md. Mahin

Babe Sultana

Sumaiya Kabir

Abstract

In natural language processing, word clustering has extreme in many uses like, POS tagging, spell checker, grammar checker, word sense disambiguation and so on. A point is that, to form a different sentence, N-gram rules used to originate several types of probabilities. For English and other different languages, N-gram model is successfully embedded for word clustering. So, it brings a new dimension in Bangla language processing. In this paper, we have proposed a framework for word clustering by using higher order N-grams language model and we workout with the most popular language in the world named Bangla. Our proposed framework is based on the similarity of meaning in language and contextual. A method called word clustering is used to partition the sets of words which makes these words into subsets of semantically similar words. In this research, for implementation, we have also introduced a system which originate different words of cluster and it’s experimented by threshold values to verify the given outcome. After experimenting with a massive substance of the word length of Bangla sentence our proposed framework shows that the accuracy approximately 80% for higher order N-gram which enrich our satisfactory level.

Issue:

Vol. 4 No. 1 (2017): Volume 04, Issue 01, December-2017

Area :

Articles

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

References

J. K. Author, “Title of chapter in the book,” in Title of His Published Book, xth ed. City of Publisher, Country if not.

P. F. Brown, P. V. Desouza, R. L. Mercer, V. J. D. Pietra, V. J. Della. and J. CLai, “Class-based N-gram Models of Natural Language”,Computational Linguistics, 18 No: 4, 1992, pp. 467-479.

H. A. Sánchez, A. P. Porrata and R.B. Llavori, “Word sense disambiguation based on word sense clustering”, Advances in Artificial Intelligence, Springer Berlin Heidelberg, 2006, pp. 472-481.

Y. Goldberg. “Task-specific word-clustering for Part-ofSpeech tagging”,arXiv preprint arXiv:1205.4298, 2012.

M. N. Hoque and M. H. Seddiqui, “Bangla Parts-of-Speech tagging using Bangla stemmer and rule based analyzer ”, In Proc. 18thInternational Conference on Computer and Information Technology (ICCIT), Dec. 2015, pp. 440-444. [6] K. Roy, R. Mandal, A. Bandapaddhaya, “Towards Unconstrained Online Bangla Handwriting Recognition”, InProc. National Conference on Computer Vision Graphics and Image Processing, 2010.

S. Finch and N. Chater, “Automatic methods for finding linguistic categories”, In Igor Alexander and John Taylor, editors, Artificial Neural Networks, Volume 2. Elsevier Science Publishers, 1992.

E. E. Korkmaz, “A method for improving automatic word categorization”, Inc Proc. Doctoral dissertation, Middle East Technical University, 1997, in press.

S. Mori, M. Nishimura and N. Itoh, “Word clustering for a word bi–gram Model”, In Proc. International Conference on Spoken Language Processing, 1998, in presss.

W. Ding, H. Al-Mubaid and S. Kotagiri, “Word classification:

An experimental approach with Naïve Bayes”, In Proc. Conference on Computers and Their Applications, 2009.

M. A. Karim, M. Kaykobad and M. Murshed, “Technical Challenges and design issues in Bangla language processimg”,IGI global, USA, 2013, pp. 425.

S. Ismail, M. S. Rahman, “Bangla Word Clustering Based on N-gram Language Model”, In Proc. International Conference on Electrical Engineering and Information and Communication Technology (ICEEICT), 2014.

M. Haque, M. T. Habib, M. M. Rahman, “Automated word prediction in Bangla language using stochastic language model”, In Proc. International Journal in foundation Computer Science and Technology (IJFCST), Nov.2015, vol.5, No.6.

J.J. Jammy, T.T. Urmi, S. Ismail,“A corpus based unsupervised Bangla word stemming using N-gram language model”, In Proc. International ConferenceonInformatics, Electronics and Vision (ICIEV), 2016, pp. 824-828.

D. Saha, M. S. Hossain, M. S. Islam and S. Islam, “Bangla Word Clustering Based on Tri-gram, 4-gram and 5-gram

Language Model”,In Proc. International Conference on Engineering Research, Innovation and Education 2017 ICERIE 2017, 13 15 Jan

Jurafsky and J. H. Martin,“An introduction to natural language processing, computational linguistics and speech processing”, USA: Prentice-Hall, Inc,2007, pp. xxiv+934

D. Jurafsky and J. H. Martin,“An introduction to natural language processing, computational linguistics and speech processing”, USA: Prentice-Hall, Inc,2007, pp. xxiv+934

Article

##plugins.themes.bootstrap3.article.sidebar##

##plugins.themes.bootstrap3.article.main##

Abstract

##plugins.themes.bootstrap3.article.details##

References