A Novel Approach for Detection of Ambiguity for Marathi Sentence and Development of Stemmer
Word sense disambiguation is finding the correct sense of the word in given context. WSD is always difficult and challenging task in Natural language processing. Many applications of NLP like the Information retrieval, Machine translation, Question answering system etc. are depend upon the accuracy that is achieved by WSD system.
Large amount of work has been done in WSD for foreign languages, but for Indian languages this issue is still an open challenge and major area of concern. One of such language in Indian context is Marathi. Marathi is spoken by native people of Maharashtra in India and it is morphologically rich language. From last two decades many researchers of the NLP shows an interest in machine learning based approaches.
They have used the supervised and unsupervised WSD approaches. Supervised approach for WSD uses sense annotated corpus while Unsupervised approach will work on the concept of that the sense of a word will depend on those of neighboring words.
Preprocessing steps for WSD plays an important role in achieving the accuracy. These steps are broadly categories into tokenization, parsers, stemmer.There is a need to get the basic word from the word by adding or removing suffixes /prefixes, which effects the recall rate of WSD. To get the basic word i.e a steaming process is required. This paper a presented a rule-based stemmer for Marathi language. Before performing the stemmer process the development of parsers has been discussed to read the contents of data base files. The next step is to present a novel approach to find the ambiguous word in a sentence. Marathi Wordnet developed at IIT, Bombay, has been used for this purpose.
Keywords: Natural Language processing, Word sense disambiguation , Stemmer