Data Preprocessing With K Nearest Neighbor, Normalization In Big Data

  • K. Sathesh Kumar*, S. Ramkumar, P.Nagaraja, A. Robert Singh

Abstract

The proposed MapReduce framework works more effectively in big data. The big data concern
large-volume, complex, growing data sets with multiple, autonomous sources. This may lead
problems such as noisy data, missing data and redundant data because the big data is collected
from multiple systems or sources which affects the decision making for data mining with big data
in this work the k nearest neighbor algorithm is used to calculate the missing values. Assume
each feature of training data sets has a distinct dimension in some scope, and take an
observation value for its feature coordinates in that dimension, then acquire some set of points
from space. Then assume the identical of two points would be the distance between them in this
space depends on some metric. This method is used to gain proper dataset and it increases data
quality for further processes.

Published
2020-05-01
How to Cite
K. Sathesh Kumar*, S. Ramkumar, P.Nagaraja, A. Robert Singh. (2020). Data Preprocessing With K Nearest Neighbor, Normalization In Big Data. International Journal of Advanced Science and Technology, 29(7s), 2948-2958. Retrieved from http://sersc.org/journals/index.php/IJAST/article/view/17360