Detecting anxiety and depression from phone conversations using x-vectors

We developed a model for detecting anxiety and depression from telephony recordings between a customer and a representative at a call center using vocal features and a deep neural network. Our binary classification model using x-vectors outperformed the use of the other acoustic features such as i-vectors and openSMILE features, as well as linguistic or text-based features. Our models were built based on self-reported scores: GAD-7 for anxiety and PHQ-8 for depression. Especially, the anxiety model’s performance is very similar to the GAD-7 score’s screening accuracy. A prior study compared self-reported GAD-7 scores to an actual mental health professional’s diagnosis of anxiety disorder and reported sensitivity and specificity of 0.74 and 0.54 respectively, and our model showed a sensitivity of 0.70 and a specificity of 0.54. This study exhibits the potential of voice analysis on topic-independent speech, particularly from 8 kHz phone conversations, to identify anxiety and depression.

Namee Kwon, Shahruk Hossain, Nate Blaylock, Henry O’Connell, Naomi Hachen, Joseph Gwin, “Detecting anxiety and depression from phone conversations using x-vectors,” 2022. Proceedings of the Workshop on Speech, Music and Mind 2022 (SMM), Incheon, Korea.