Gong Takes First in Depression Segment of 2017 Audio/Visual Emotion Challenge

Author: Nina Welding

Yuan Gong, a Ph.D. student in the Department of Computer Science and Engineering at the University of Notre Dame, has won first place in the depression segment of the 2017 Audio/Visual Emotion Challenge (AVEC2017), which was held on Oct. 23 in Mountain View, Calif., as part of the 25th Association for Computing Machinery Multimedia Conference (ACMMM2017). Participants of the challenge were asked to model and predict depression levels using multimedia tools — audio, video, text from an interview ranging between 7-33 minutes. Not an easy task, especially considering that average features over an entire interview lose most temporal information.

Everyone experiences sadness, even grief. Major depression, however, is something more.  It may last for years, have recurring episodes, or be accompanied by physical symptoms. According to the World Health Organization, as many as 350 million people, 5% of the world’s population, suffer from depression. The National Institute of Mental Health reports that in 2015 alone there were an estimated 16.1 million adults in the United States (approximately 7% of all adults) who suffered from at least one major depressive episode during the year.

Gong’s adviser, Christian Poellabauer, associate professor of computer science and engineering, shared that one of the problems in detecting depression is that not everyone who suffers from major depression experiences the same symptoms. “Symptoms vary depending on the individual, the particular illness and the stage of the illness. Yet early and accurate detection can ensure appropriate intervention and treatment options, making a simple and accurate method for detection vital.”

This was the challenge Gong addressed through his proposal Topic Modeling Based Multi-modal Depression Detection, which suggested a novel approach to discovering, capturing and preserving the details of the “test” interview in order to achieve a diagnosis. Gong’s method makes use of audio, video and semantic features of the topics covered in each interview to develop a “map” of the interview and the individual’s status in context of the interview. He then used feature selection algorithms to correlate the information and create a feature space with which to determine a more accurate baseline to identify depressive disorders. His research showed that this approach is able to significantly outperform context-unaware methods and methods suggested by other teams in the challenge.


Originally published by Nina Welding at conductorshare.nd.edu on November 08, 2017.