Human Concentration Level Recognition Based on VGG16 CNN architecture

C.Ruvinga,  D.Malathi, J. D. Dorathi Jayaseeli

C.Ruvinga, D.Malathi, J. D. Dorathi Jayaseeli

Abstract

Recent developments in Artificial Intelligence have availed a plethora of deep learning algorithms which enable automatic feature extraction such as Residual Neural Network (RESNET), InceptionV3, Xception, Visual Group Geometry 16 (VGG16), LeNet etc., to solve numerous real world problems like object detection, facial emotion recognition and image classification. These deep learning algorithms have a high accuracy in comparison to prior machine learning algorithms which extensively rely on hand crafted features such as Scale Invariant Feature Transform (SIFT), Local Binary Pattern (LBP), and Histogram of Oriented Gradients (HOG) to mention, but a few. To solve the problem of student monitoring due to lack of human supervision during online classes, we propose a Deep Convolutional Neural Network (DCNN) model based on VGG16 architecture for human concentration level recognition. The model is developed using python programing language, open CV, keras and tensorflow framework on Google Colab cloud software. The proposed model comprises of a sequence of sixteen convolution, pooling, dense layers and a softmax classifier for categorical output. Input size into the model is 224x224 pixel colour images. In the preliminary phase, images from the datasets go through a series of pre-processing which includes resizing to 224x224 color images so as to conform to the VGG16 architecture requirement. In the second phase transfer learning technique is applied to the pre trained VGG16 architecture imported from keras open source library. This stage involves fine-tuning the last layer of the VGG16 architecture while freezing previous layers. The proposed model is compiled in the third phase using an Adam optimizer with adaptive learning rate. Facial Emotion Recognition (FER) 2013 public dataset, Yale dataset and a custom dataset are used for training, validation and testing. Lastly the model is saved as .h5 file in python. The proposed model recorded a validation accuracy of 93.47% and a precision score of 0.9667 on the confusion matrix. The human concentration recognition is implemented by extracting frames from web camera and using the facial images as input to the saved .h5 model for providing real time feedback to student and tutor.