Improvising Label Accuracy for Unsupervised Machine Learning Models

Dr. K.S. Wagh et al.

Dr. K.S. Wagh et al.

Abstract

Converting the unlabeled data into labelled data requires the knowledge of labelling functions,
expertise in the domain, high quality input images, training data and various computational
algorithms. In order to obtain this, there is a huge demand for domain experts which means lots of
human efforts that will eventually incur great costs. Producing large amounts of labelled training
data is necessary in order to construct, train, test and deploy an accurate machine learning model.
Hence there is an emerging need of correct labelled dataset. There are systems already developed for
generating the labelled data from the unlabeled ones. Most of the time the unsupervised model tends
to be overconfident i.e. it predicts or assigns wrong labels. In case of small datasets we can manually
check the labels whether they are assigned correctly or not but for large datasets we need a proper
generalized technique to verify our predicted labels. We propose a technique to increase the labelling
accuracy via label smoothing .Hence we are trying to prevent the model from capturing noisy data or
from learning incorrect features with the help of soft labels. This will help in reducing the confusion
of resemblance of the instances between the different classes. This will thus increase the success rate
of the ML algorithms by improving the accuracy of prediction of the probabilistic labels.