Denoising SMS Text Using Ternary Tree for FAQ Retrieval

Manoj Kumar, Vipra Bhatt

Manoj Kumar, Vipra Bhatt

Abstract

With the increase in mobile technology, everyone is using mobile phones for various purposes, from only making calls to using it for accessing the internet, gaming, etc. Internet service is not available on every mobile phone, but the facility of Short Messaging Services (SMS) is on every phone. So, many people choose SMS for information retrieval. The SMS can be used to get a quick response in various fields such as healthcare, railways, insurance, banking and travel. But the SMS texts are made noisy by intentional and non-intentional errors, and there is a need to find the correct word to substitute them. This paper proposed efficiently denoising the SMS term by finding matching prefix and suffix length using a ternary search tree and calculating similarity score between the noisy token and the correct English words using Longest Common Subsequence (LCS). We see that the ternary tree is space-efficient compared to the trie data structure used in the previous approach to efficiently denoise the SMS token. It also helps in completing the task in less time compared to using a hash table. We compute the overall score by combining the prefix matching length, suffix matching length, and similarity score. We experimented using some noisy SMS words generally used by the people and compare our model with some previous models. Our model outperforms all of these earlier approaches.