Long-Short Term Memory Recurrent Neural Network Based Language Model For The Synthesis Of Dysfluent Speech

  • Vinay N A et al.

Abstract

peech synthesis places a major role in speech-to-text and text-to-speech processes because in
recent years we will find technologies like speech to text to prepare notes, also to search the
desired one by saying OK GOOGLE and voice-based commands for home appliances like TV. In
such cases, the continuous speech will help the machine to give required output through the
Automatic Speech Recognition (ASR) process, but if there is any gap or discontinuity in the
command the ASR won’t give the appropriate or desired output. Thus it is necessary is to filter
out such irregularity in speech. Likewise, text-to-speech also plays an important role in
pronouncing the syllable of various accents and text of the filtered dysfluent speech. In the
proposed work, the synthesis of such unstructured speech is done by using the Long-Short Term
Memory (LSTM) Recurrent Neural Network (RNN) based Language Model(LM). Specifically,
Back Propagation (BP) based LSTM is adapted to synthesis the dysfluent speech, this language
model gives minimum Word Error Rate (WER) compared to RNN based LM.

Published
2020-04-13