A Comparative Study of Deep Neural Network and Long-Short Term Memory Based Voice Active Detection technique for the Recognition of Dysfluent Speech
Speech is effective way of communication to exchange the information and sharing of knowledge between humans, but if there is any discontinuity or dysfluency in speech leads to different or adverse meaning. Nowadays this speech communication is extended upto machines to do the work even in the absence of human, in such situations Automatic Speech Recognition System (ASR) helps the machines to recognize the words properly and to do work as per the commands given by humans. But, if it is dysfluent speech the recognition of words by machines may go wrong which leads to adverse effect. In this proposed work, recognition of such unstructured speech is done by discriminating voiced and un-voiced regions by using Voice Active Detection (VAD) technique. The recognition of VAD is done by two efficient neural networks: Deep Neural Network (DNN) based VAD and Long-Short Term Memory (LSTM) Recurrent Neural Network (RNN). Even though the DNN based VAD is suitable for discrimination of voiced and un-voiced regions, to capture context information of speech LSTM based VAD is adopted which gives promising results compare to DNN.