Effective Data Clustering Using K Means Along with Lion Optimization Algorithm

Dr. A. Tamilarasi, Ms. A. Abarna, Mrs. K. Chitra, Mr. K. Nagendhiran, Ms. R. Aarthi

Dr. A. Tamilarasi, Ms. A. Abarna, Mrs. K. Chitra, Mr. K. Nagendhiran, Ms. R. Aarthi

Abstract

Text mining is one in every of the active and evergreen analysis areas, currently and ever. in depth usage of text and its wider pertinency are the solid reasons for the fascination with the text mining. although are many analysis areas in text mining, agglomeration is one in every of the crucial analysis areas. during this project economical algorithmic program to resolve the new downside strictly, this is often not considerably a lot of pricey than K-Means. The system establishes the association between our technique and K-Means to supply theoretical motivation of our technique. Experimental results show that our algorithmic program systematically reaches higher cut and in the mean time outperforms in agglomeration metrics than K-mean agglomeration ways.

during this project work, comparison of 3 varieties of the agglomeration and realize value operate and loss operate and calculate them. Error rate of the agglomeration ways and the way to calculate the error share perpetually be one on the necessary issue for evaluating the agglomeration ways, thus this project introduce a technique to calculate the error rate of agglomeration ways. agglomeration algorithms are often divided into many classes as well as partitioning agglomeration algorithms, hierarchal algorithms and density primarily based algorithms.

In addition to match agglomeration algorithms by quantifiability, Ability to figure with completely different attribute, Clusters fashioned by typical, Having smallest data of the pc to acknowledge the input parameters, categories for addressing noise and further deposition that very same error rate for agglomeration a brand new information, Thus, there's no impact on the input file, completely different dimensions of high levels, K-means is one in every of the only approach to agglomeration that agglomeration is associate degree unsupervised downside. The planned LOA with calculation of (CALOA) dead on 2 datasets dataset1 - medical dataset and dataset2 - E-commerce dataset pictured within the table that are obtainable on on-line UCI repository. numerous parameters that are evaluated over dataset are accuracy, TP rate, FP rate, exactitude and Recall.