Part of Speech Tagger for Natural Language Queries in Bengali
This paper proposes a Part-of-Speech (POS) tagger in Bengali language using some predefined syntactic rules which are resident in a default database. The proposed system shall accept an arbitrary Bengali text (typed in Bengali font) to produce a Bengali POS tagged output in Bengali language which may be directly applied to Natural Language Processing (NLP) applications using Bengali Query–Response Interface Systems. Since the POS Tagger is based on syntactic rules, it does not require any training data set and hence there is no need for storage of huge amount of training data and the response of the system is also very fast. Whenever an input string in Bengali language is fed to the POS Tagger, rule patterns are generated using a sliding window. Each of the rules pattern is compared with the syntactic rule base and whenever there is a match, the POS tag of each corresponding token in the input string is extracted. The designed POS Tagger is generic, domain independent and accepts Bengali strings in structured format as input.