Dual Threshold Log Spectral Distance Voice Activity Detector based Effective Statistical Speech Enhancement

  • K. Ayyappa Swamy, Samuda Prathima, N. Padmaja, C. Sushma

Abstract

This work focuses on the single-channel frequency-domain statistical speech enhancement. All the previous statistical signal processing methods used a single threshold Voice Activity Detection (VAD). The traditional VAD compares the local SNR of each frame with Noise Margin (NM) and divides the speech into the two categories. Since this type of traditional statistical speech enhancement uses either Decision Directed (DD) or Maximum Likelihood (ML) estimators to calculate the enhancement gain and it suffers from either musical noise or speech distortion. In this paper, the statistical enhancement of speech is presented by defining observed noisy speech into three categories. This is based on the dual-threshold VAD, which compares the Log Spectral Distance (LSD) of each frame with two margins namely, the Noise Margin (NM) and the Speech Margin (SM). This approach utilizes the pros and cons of ML and DD estimators by the selection of the matched a-priori SNR based on each category frame. By this, the proposed method achieves the simple and effective noise reduction technique. In addition, a 3-point smoothing is applied before the noise reduction which improves the SNR further. The proposed method is evaluated for different noises by an objective measure "Short Time Objective Intelligibility" (STOI).

Published
2020-03-19
How to Cite
N. Padmaja, C. Sushma, K. A. S. S. P. (2020). Dual Threshold Log Spectral Distance Voice Activity Detector based Effective Statistical Speech Enhancement. International Journal of Advanced Science and Technology, 29(3), 5640 - 5653. Retrieved from https://sersc.org/journals/index.php/IJAST/article/view/6190
Section
Articles