Classification of High Dimensional Class Imblance Data Streams Using Improved Genetic Algorithm Sampling
Abstract
One of the more growing and applicable area of data mining for analysis is classification. Knowledge discovery using evolutionary technique is the recent trends for effective classification. The complexity of the data sources available in real world poses serious difficulties in the efficient classification process. Data stream are the continuous flow of data and mining such a data have specific constraints and limitations. Many algorithms have been already exists for classification of high dimensional data. However, those algorithms have some shortcomings such as the less accuracy in their classification. To overcome the above limitations of classification of high dimensional data streams, Particle Swarm Optimization (PSO) is one of the new optimization methods proposed in recent literature. PSO gives better classification results, when it is a small data set with small dimension, but not efficient for high dimensional data sets. Moreover, in some extreme cases there is probability of PSO to enter into the region of local optimal due to the ineffective global exploration. The PSO limitation can be addressed by appropriate feature selection technique termed as an Improved Evolutionary Sampling using Particle Swarm Optimization (ESPSO) algorithm is proposed for solving high dimensional data classification problem for data streams. It is proposed by integrating the local search phase in PSO with two global search phases in high dimensional feature selection to achieve global optimum in PSO. Hence the convergence speed ofPSO is achieved and the exploration, exploitation is balanced. The superiority of the proposed approach can be observed on high dimensional data streams.
Keywords: Particle swarm optimization(PSO), Data Streams, Convergence Speed. Local Optima, Evolutionary Sampling using Particle Swarm Optimization (ESPSO).