Sparse Models for Improved Multistyle Classification of Speech under Stress

Debrup Banerjee, Indeepa P., Levy Priyusha T., Akhil Reddy K.

Debrup Banerjee, Indeepa P., Levy Priyusha T., Akhil Reddy K.

Abstract

Speech based emotion recognition has been an active research topic since last decade. A typical emotion recognition system consists of three components: speech segmentation, feature extraction and emotion identification. Various speech features have been developed for emotion recognition and can be divided into three categories including excitation, vocal tract and prosodic. In this paper, we proposed a sparse coding (SC) based method to fuse the different categories of features to identify emotional states. Sparse coding has been widely applied to many applications and achieved state-of-the-art performances. We evaluated the proposed method on the Speech Under Simulated and Actual Stress (SUSAS) speech database and compared our system to other methods in literature. We conducted multiple classification experiments in three different contexts, text-dependent pairwise stress classification, text-independent pairwise stress classification and text-independent multistyle stress classification. Experimental results show that our proposed method outperforms other methods in the literature in most cases, proving the efficacy of the sparse coding technique.