Toward Representing Automatic Knowledge Discovery from Social Media Contents Based on Document Classification
Abstract
Representing documents is one of the critical steps in natural language processing and text mining that focus on converting unstructured to structured documents with numeric vectors to get access to machine learning and data mining algorithms. Bag of word (BOW) model is an adopted text representation system in document classification. Based on BOW, document demonstrate as fixed-length. This process means word dimensions presented as a numerical value that is defined as TF-IDF or word frequency. In this paper, we analyze, the combination of Bag-of-Concept (BOC) and BOW demonstration applying attention mechanism to operate information of word-level and concept-level to achieve the optimal performance of document classification.