Analysis of Domain-Independent Unsupervised Text Segmentation Using LDA Topic Modeling over Social Media Contents
Topic models generate the probability of words to analyze segregated data in available contents. In recent researches, that used conventional topic modeling on twitter posts, updated posts from same user combined into one single document and the reason of combination is texts information which is very short to conclude the topic relations properly. Similarly, this procedure can reduce the issue of sparsity and can't obtain the differences between topics which one user is posting. In this paper, the proposed system is a new topic modeling used to analyze short text contents which contain user information e.g., News, tweeter and Face book. Similarly, this model tokenized the above issue by applying user content clustering and text segmentation based on the topic assignments of words and each cluster in document have general topic relationship. Using huge data in this process by combining texts, various topic relations can generate from similar user. In this system, collection of texts which collected from same user combine into multiple topics. By applying combination of Latent Dirichlet Allocation and word2vec, number of clusters can automatically generated. Developed system is based on collapsed Gibbs sampling which can capture the evaluation between extracted clusters and topics at once. The advantage and effectiveness of this system presented with short text documents. In addition this model is capable to obtain the user interest dynamically by combining time distributions and also set of contents can cluster to get more better accuracy by applying side information’s e.g., time information.