Hybridization of Grid and Density Based Clustering for Formation of Macro and Micro Level Clusters
Clustering large data sets often suffer from scalability issues due to their quadratic time complexity as they involve pair-wise distance computations among all the instances of the data set. In this paper the authors propose a two step approach to reduce the number of distance computations required for clustering. The first step partitions the large dataset using highly scalable grid based approach into multiple smaller dense regions called macro level clusters each containing the set of candidate co-members of clusters at micro level leading to the elimination of pair-wise distance computations between instances belonging to different dense regions for micro level clustering. The second step employs more accurate density based clustering algorithms within the individual dense regions even if they are not originally scalable for handling large datasets. A new framework for Hybridization of Grid and Density Based Clustering (HGDBC) is developed to implement the proposed divide and conquer strategy for clustering large datasets. The effectiveness of the proposed methodology was evaluated on bench mark datasets and comparative analysis results are presented.
Keywords: Grid based approach, Density based clustering, macro level cluster, micro level cluster.