An Empirical Study on Methods, Metrics and Evaluation on Feature Extraction in Big Data Analytics

J. Jebamalai Robinson et al.

J. Jebamalai Robinson et al.

Abstract

Big data usually refer to large data with high accumulation rate and complex which makes it very difficult or even impossible for processing with the help of traditional techniques. Big data is often referred as the data with many varieties accumulated in large volumes with exceptional velocity. Many times, too much of information can be the cause of inefficiency in the domain of data mining. Attributes that are irrelevant often add noises and can affect the accuracy of the data model. The attributes may also be redundant that measures the same feature. These anamloies present in the data that is built are prone to skew any logic of the DM algorithms and can cause adverse effects in the model’s accuracy. Data with many such attributes involves lot of processing difficulties when data mining algorithms are applied. The attributes present in a data model represents the dimensionality of the processing space that are used by a particular algorithm. Greater the dimensionality, higher the cost of computation in any algorithm design and processing. In order to minimize these noise and high dimensions, specific form of dimensionality reduction techniques are required for the data mining to be effective. Feature selection and feature extraction are common approaches towards solving the issue. The former deals in selecting the attributes that are most relevant and the latter helps to combine the attributes into a set of reduced features. Feature extraction in the process of attribute reduction. Feature selection is the process where the attributes that are existing are ranked based on the predictive significance whereas Feature Extraction transforms the attributes in reality. Numerous researches have been conducted in proposing methods and techniques for effective feature extraction. This paper is an outcome of the study on the methods, metrics and evaluation techniques for the feature extraction in Big Data analytics.