Enhanced Centroid Initialization in Clustering for finding outliers in DataStream

R.Sangeetha et al.

R.Sangeetha et al.

Abstract

Disparate items imply the object which cannot belongs to any cluster and is referred as 'outliers' in data mining as it deviates from the rest. Though there are many techniques involved in detecting those items, it seems to be very challenging in Data Streams as it has enormous, dynamic data and the outlier must be detected with limited scan. Data Stream is an uprising area in mining that has a unique feature of having continuous flow, multiple access and rapid evolving and changing in nature. So, the deployed method must be able to handle arbitrary shaped and multi-dimensional data. E-commerce data is an application of data stream has huge number of variant customers, products in online ordering. Outliers in E-commerce products refers to the least selling of products. Detecting those items aids in online product recommendation and gives better understanding of local and global wholesale and retail market. Centroid Clustering groups the informationon thebasis of distance between the centroid and the other objects. Mostly every algorithm has the method of random initialization infixing the centroid. This paper put forth an enhanced initializationprocedure for centroid in order to produce better clustersand to find outliers using K-Median Clustering. The research work is implemented in Weka 3.8.3. Finally, the results are assessed with clustering parameters.