Minireviews
Copyright ©The Author(s) 2024. Published by Baishideng Publishing Group Inc. All rights reserved.
World J Psychiatry. Feb 19, 2024; 14(2): 225-233
Published online Feb 19, 2024. doi: 10.5498/wjp.v14.i2.225
Automatic recognition of depression based on audio and video: A review
Meng-Meng Han, Xing-Yun Li, Xin-Yu Yi, Yun-Shao Zheng, Wei-Li Xia, Ya-Fei Liu, Qing-Xiang Wang
Meng-Meng Han, Wei-Li Xia, Ya-Fei Liu, Qing-Xiang Wang, Shandong Mental Health Center, Shandong University, Jinan 250014, Shandong Province, China
Meng-Meng Han, Xing-Yun Li, Xin-Yu Yi, Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, Shandong Province, China
Xing-Yun Li, Xin-Yu Yi, Shandong Engineering Research Center of Big Data Applied Technology, Faculty of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, Shandong Province, China
Xing-Yun Li, Xin-Yu Yi, Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, Jinan 250353, Shandong Province, China
Yun-Shao Zheng, Department of Ward Two, Shandong Mental Health Center, Shandong University, Jinan 250014, Shandong Province, China
Author contributions: Han MM, Li XY, Yi XY, Zheng YS and Wang QX designed the research study; Xia WL and Liu YF conducted literature retrieval; Han MM, Li XY, Yi XY, Zheng YS, and Wang QX summarized and analyzed relevant literature; Zheng YS provided medical knowledge; Han MM, Li XY, Yi XY, and Wang QX were responsible for writing and revising the manuscript; Wang QX reviewed the manuscript and approved its publication. All authors have read and approve the final manuscript.
Supported by Shandong Province Key R and D Program, No. 2021SFGC0504; Shandong Provincial Natural Science Foundation, No. ZR2021MF079; and Science and Technology Development Plan of Jinan (Clinical Medicine Science and Technology Innovation Plan), No. 202225054.
Conflict-of-interest statement: There is no conflict of interest associated with any of the senior author or other coauthors contributed their efforts in this manuscript.
Open-Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: https://creativecommons.org/Licenses/by-nc/4.0/
Corresponding author: Qing-Xiang Wang, PhD, Associate Professor, Shandong Mental Health Center, Shandong University, No. 49 Wenhua East Road, Jinan 250014, Shandong Province, China. wangqx@qlu.edu.cn
Received: November 25, 2023
Peer-review started: November 25, 2023
First decision: December 6, 2023
Revised: December 18, 2023
Accepted: January 24, 2024
Article in press: January 24, 2024
Published online: February 19, 2024
Abstract

Depression is a common mental health disorder. With current depression detection methods, specialized physicians often engage in conversations and physiological examinations based on standardized scales as auxiliary measures for depression assessment. Non-biological markers-typically classified as verbal or non-verbal and deemed crucial evaluation criteria for depression-have not been effectively utilized. Specialized physicians usually require extensive training and experience to capture changes in these features. Advancements in deep learning technology have provided technical support for capturing non-biological markers. Several researchers have proposed automatic depression estimation (ADE) systems based on sounds and videos to assist physicians in capturing these features and conducting depression screening. This article summarizes commonly used public datasets and recent research on audio- and video-based ADE based on three perspectives: Datasets, deficiencies in existing research, and future development directions.

Keywords: Depression recognition, Deep learning, Automatic depression estimation System, Audio processing, Image processing, Feature fusion, Future development

Core Tip: The automatic recognition of depression based on deep learning has gradually become a research hotspot. Researchers have proposed automatic depression estimation (ADE) systems utilizing sound and video data to assist physicians in screening for depression. This article provides an overview of the latest research on ADE systems, focusing on sound and video datasets, current research challenges, and future directions.