Automatic recognition of depression based on audio and video: A review

doi:10.5498/wjp.v14.i2.225

Advanced Search

BPG is committed to discovery and dissemination of knowledge

Home / Archive / Volume 14, Issue 2

This Article

Academic Content and Language Evaluation of This Article

CrossCheck and Google Search of This Article

Academic Rules and Norms of This Article

Citation of this article

Corresponding Author of This Article

Research Domain of This Article

Article-Type of This Article

Open-Access Policy of This Article

Times Cited Counts in Google of This Article

Number of Hits and Downloads for This Article

Total Article Views (1832)

All Articles published online

The chart showing PDF series, WORD series, HTML series, Tables (1-1) series.

Item

Count

PDF

WORD

HTML

1026

Tables (1-1)

Sum=1161

Featured Article

The chart showing Browse series, Download series.

Item

Count

Browse

169

Download

231

Sum=400

Publishing Process of This Article

Item

Count

Browse

112

Download

109

Sum=221

Feb 19, 2024 (publication date) through May 10, 2024

Times Cited of This Article

Times Cited (0)

Journal Information of This Article

Publication Name

World Journal of Psychiatry

ISSN

2220-3206

Publisher of This Article

Baishideng Publishing Group Inc, 7041 Koll Center Parkway, Suite 160, Pleasanton, CA 94566, USA

Minireviews

World J Psychiatry. Feb 19, 2024; 14(2): 225-233
Published online Feb 19, 2024. doi: 10.5498/wjp.v14.i2.225

Automatic recognition of depression based on audio and video: A review

Meng-Meng Han, Xing-Yun Li, Xin-Yu Yi, Yun-Shao Zheng, Wei-Li Xia, Ya-Fei Liu, Qing-Xiang Wang

Meng-Meng Han, Wei-Li Xia, Ya-Fei Liu, Qing-Xiang Wang, Shandong Mental Health Center, Shandong University, Jinan 250014, Shandong Province, China

Meng-Meng Han, Xing-Yun Li, Xin-Yu Yi, Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, Shandong Province, China

Xing-Yun Li, Xin-Yu Yi, Shandong Engineering Research Center of Big Data Applied Technology, Faculty of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, Shandong Province, China

Xing-Yun Li, Xin-Yu Yi, Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, Jinan 250353, Shandong Province, China

Yun-Shao Zheng, Department of Ward Two, Shandong Mental Health Center, Shandong University, Jinan 250014, Shandong Province, China

Author contributions: Han MM, Li XY, Yi XY, Zheng YS and Wang QX designed the research study; Xia WL and Liu YF conducted literature retrieval; Han MM, Li XY, Yi XY, Zheng YS, and Wang QX summarized and analyzed relevant literature; Zheng YS provided medical knowledge; Han MM, Li XY, Yi XY, and Wang QX were responsible for writing and revising the manuscript; Wang QX reviewed the manuscript and approved its publication. All authors have read and approve the final manuscript.

Supported by Shandong Province Key R and D Program, No. 2021SFGC0504; Shandong Provincial Natural Science Foundation, No. ZR2021MF079; and Science and Technology Development Plan of Jinan (Clinical Medicine Science and Technology Innovation Plan), No. 202225054.

Conflict-of-interest statement: There is no conflict of interest associated with any of the senior author or other coauthors contributed their efforts in this manuscript.

Open-Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: https://creativecommons.org/Licenses/by-nc/4.0/

Corresponding author: Qing-Xiang Wang, PhD, Associate Professor, Shandong Mental Health Center, Shandong University, No. 49 Wenhua East Road, Jinan 250014, Shandong Province, China. wangqx@qlu.edu.cn

Received: November 25, 2023
Peer-review started: November 25, 2023
First decision: December 6, 2023
Revised: December 18, 2023
Accepted: January 24, 2024
Article in press: January 24, 2024
Published online: February 19, 2024

Abstract

Depression is a common mental health disorder. With current depression detection methods, specialized physicians often engage in conversations and physiological examinations based on standardized scales as auxiliary measures for depression assessment. Non-biological markers-typically classified as verbal or non-verbal and deemed crucial evaluation criteria for depression-have not been effectively utilized. Specialized physicians usually require extensive training and experience to capture changes in these features. Advancements in deep learning technology have provided technical support for capturing non-biological markers. Several researchers have proposed automatic depression estimation (ADE) systems based on sounds and videos to assist physicians in capturing these features and conducting depression screening. This article summarizes commonly used public datasets and recent research on audio- and video-based ADE based on three perspectives: Datasets, deficiencies in existing research, and future development directions.

Keywords: Depression recognition, Deep learning, Automatic depression estimation System, Audio processing, Image processing, Feature fusion, Future development

Core Tip: The automatic recognition of depression based on deep learning has gradually become a research hotspot. Researchers have proposed automatic depression estimation (ADE) systems utilizing sound and video data to assist physicians in screening for depression. This article provides an overview of the latest research on ADE systems, focusing on sound and video datasets, current research challenges, and future directions.