Review Open Access
Copyright ©The Author(s) 2023. Published by Baishideng Publishing Group Inc. All rights reserved.
World J Psychiatry. Jan 19, 2023; 13(1): 1-14
Published online Jan 19, 2023. doi: 10.5498/wjp.v13.i1.1
Emotion recognition support system: Where physicians and psychiatrists meet linguists and data engineers
Peyman Adibi, Simindokht Kalani, Sayed Jalal Zahabi, Homa Asadi, Mohsen Bakhtiar, Mohammad Reza Heidarpour, Hamidreza Roohafza, Hassan Shahoon, Mohammad Amouzadeh
Peyman Adibi, Hassan Shahoon, Isfahan Gastroenterology and Hepatology Research Center, Isfahan University of Medical Sciences, Isfahan 8174673461, Iran
Simindokht Kalani, Department of Psychology, University of Isfahan, Isfahan 8174673441, Iran
Sayed Jalal Zahabi, Mohammad Reza Heidarpour, Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan 8415683111, Iran
Homa Asadi, Mohammad Amouzadeh, Department of Linguistics, University of Isfahan, Isfahan 8174673441, Iran
Mohsen Bakhtiar, Department of Linguistics, Ferdowsi University of Mashhad, Mashhad 9177948974, Iran.
Hamidreza Roohafza, Department of Psychocardiology, Cardiac Rehabilitation Research Center, Cardiovascular Research Institute (WHO-Collaborating Center), Isfahan University of Medical Sciences, Isfahan 8187698191, Iran
Mohammad Amouzadeh, School of International Studies, Sun Yat-sen University, Zhuhai 519082, Guangdong Province, China
ORCID number: Peyman Adibi (0000-0001-6411-5235); Simindokht Kalani (0000-0002-9999-541X); Sayed Jalal Zahabi (0000-0001-5868-8192); Homa Asadi (0000-0003-1655-1336); Mohsen Bakhtiar (0000-0001-7012-6619); Mohammad Reza Heidarpour (0000-0002-2819-2556); Hamidreza Roohafza (0000-0003-3582-0431); Hassan Shahoon (0000-0003-1945-3520); Mohammad Amouzadeh (0000-0001-8964-7967).
Author contributions: Adibi P, Kalani S, Zahabi SJ, Asadi H, Bakhtiar M, Heidarpour MR, Roohafza H, Shahoon H, Amouzadeh M; all contributed in conceptualization, identifying relevant studies, framing the results; Kalani S and Roohafza H wrote the psychological related part of the paper; Zahabi SJ and Heidarpour MR wrote the data science related part of the paper; Aasdi H, Bakhtiar M, and Amouzadeh M, wrote the phonetics-linguistic, cognitive-linguistic, and semantic-linguistic related parts of the paper, respectively; Adibi P, Roohafza H, Shahoon H supervised the study.
Conflict-of-interest statement: All the authors report no relevant conflicts of interest for this article.
Open-Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See:
Corresponding author: Sayed Jalal Zahabi, PhD, Assistant Professor, Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan 8415683111, Iran.
Received: June 19, 2022
Peer-review started: June 19, 2022
First decision: September 4, 2022
Revised: September 18, 2022
Accepted: December 21, 2022
Article in press: December 21, 2022
Published online: January 19, 2023


An important factor in the course of daily medical diagnosis and treatment is understanding patients’ emotional states by the caregiver physicians. However, patients usually avoid speaking out their emotions when expressing their somatic symptoms and complaints to their non-psychiatrist doctor. On the other hand, clinicians usually lack the required expertise (or time) and have a deficit in mining various verbal and non-verbal emotional signals of the patients. As a result, in many cases, there is an emotion recognition barrier between the clinician and the patients making all patients seem the same except for their different somatic symptoms. In particular, we aim to identify and combine three major disciplines (psychology, linguistics, and data science) approaches for detecting emotions from verbal communication and propose an integrated solution for emotion recognition support. Such a platform may give emotional guides and indices to the clinician based on verbal communication at the consultation time.

Key Words: Physician-Patient relations, Emotions, Verbal behavior, Linguistics, Psychology, Data science

Core Tip: In the context of doctor-patient interactions, we focus on patient speech emotion recognition as a multifaceted problem viewed from three main perspectives: Psychology/psychiatry, linguistics, and data science. Reviewing the key elements and approaches within each of these perspectives, and surveying the current literature on them, we recognize the lack of a systematic comprehensive collaboration among the three disciplines. Thus, motivated by the necessity of such multidisciplinary collaboration, we propose an integrated platform for patient emotion recognition, as a collaborative framework towards clinical decision support.


In order to establish a therapeutic relationship between physician and patient, it is necessary to have knowledgeable practitioners in various specialties as well as an effective interaction and communication between physician and patient which starts with obtaining the patient's medical history and continues to convey a treatment plan[1,2]. Doctor-patient communication is a complex interpersonal interaction where different types of expertise and techniques are required to understand this relationship completely in verbal and nonverbal forms, especially when trying to extract emotional states and determinants during a medical consultation session[3]. Doctor-patient communication is a complex interpersonal interaction which requires an understanding of each party׳s emotional state. In this paper, our focus is on physicians’ understanding of patients’ emotions. When patients attend medical consultation, they generally convey their particular experiences of the perceived symptoms to physicians. They interpret these somatic sensations in terms of many different factors including their unique personal and contextual circumstances. Motivated by the illness experience, they generate their own ideas and concerns (emotions), leading them to seek out consultation[4-6]. Generally, patients expect and value their doctors caring for these personal aspects of their experience[7,8]. During interactions and conversations with patients, physicians should be able to interpret their emotional states, which can help build up trust between patients and them[9,10]. This will ultimately lead to better clinical outcomes. Also, identifying and recording these states will help complete patients’ medical records. Many diseases that seem to have physical symptoms are, in fact, largely intertwined with psychological variables, such as functional somatic syndromes (FSS)[11]. Increasingly, physicians have realized that recognizing the psychological state of patients with FSS will be very effective in providing an appropriate treatment. For example, the ability to accurately understand sound states may help interpret a patient's pain. Thus, the presence of information about patients' mental states in their medical records is essential.

Emotion detection accuracy, i.e., the ability to detect whether a patient is expressing an emotion cue, has consequences for the physician–patient relationship. The key to patient-centered care is the ability to detect, accurately identify, and respond appropriately to the patient's emotions[12-15]. Failure to detect a patient's emotional cues may give rise to an ineffective interaction between doctor and patient, which may, in turn, lead to misdiagnosis, lower recall, mistreatments, and poorer health outcomes[16,17]. Indeed, if the emotion cue is never detected, then the ability to accurately identify or respond to the emotion never comes into play. Doctors who are more aware of their patients’ emotions are more successful in treating them[13]. Patients have also reported greater satisfaction with such physicians[18-22]. Recognizing the emotions and feelings of patients provides the ground for more physician empathy with patients[23,24]. The academic and medical literature highlights the positive effects of empathy on patient care[25]. In this regard, the medical profession requires doctors to be both clinically competent and empathetic toward the patients. However, in practice, meeting both needs may be difficult for physicians (especially inexperienced and unskilled ones)[26]. On the other hand, patients do not always overtly express these experiences, feelings, concerns, and ideas. Rather, they often communicate them indirectly through more or less subtle nonverbal or verbal “clues” which nevertheless contain interesting clinical information which can be defined as "clinical or contextual clues"[27-29]. They do not say, ‘‘Hey doctor, I’m feeling really emotional right now; or do you know whether I’m angry or sad?’’ Thus, emotional cues are often ambiguous and subtle[30-33].

On the other hand, patients' emotional audiences (i.e., physicians) are often inexperienced in detecting emotions. One of the most important problems physicians face in the development of this process is the difficulty of capturing the clues that patients offer and failing to encourage them to expose details about these feelings[34]. Research indicates that over 70% of patients’ emotional cues are missed by physicians[34]. It is unclear whether missed responses were the result from physicians detecting an emotional cue and choosing not to respond, or from failing to detect the cue in the first place. Indeed, these emotional cues present a challenge to doctors who often overlook them, as clinical information and therefore opportunities to know the patient's world are lost[34-37]. Physicians vary in their ability to recognize patients' emotions, with some being fully aware of the significance of understanding emotions and capable of identifying them. They also range from high emotional intelligence to low emotional intelligence. Another argument often heard from physicians is that they do not have time for empathy[38].

Despite the importance of such issues, this aspect remains grossly overlooked in conventional medical training. This comes from the fact that training emotion skills in medical schools is variable, lacks a strong evidence- base, and often does not include the training of emotion processing[39].

In the preceding paragraphs, four reasons were offered as to why physicians have failed to detect and interpret patients’ emotional states, and hence why we need to find a solution for this problem. These reasons could be summarized as follows. First, detecting patients’ emotions can contribute to healing them, as well as to increasing their satisfaction. Secondly, emotional cues are mostly indirectly found in patients’ speech. That is, emotional cues can be very subtle and ambiguous. Further, many physicians do not possess enough experience to detect patients’ emotions or even when they are skilled and experienced enough to do so, they do not have time to deal with it. In addition, training doctors to detect patients’ emotions has been thoroughly overlooked in routine medical training. Thus, if a solution can be found to help physicians recognize patients' emotions and psychological states, this problem can be overcome to a large extent.

One strategy is to develop and employ a technology that can provide information about the patient’s emotions, feelings, and mental states by processing their verbal and non-verbal indicators (Figure 1). In the present manuscript, we focus on verbal communication. Human speech carries a tremendous number of informative features, which enables listeners to extract a wealth of information about speakers’ identity. These features can range from linguistic characteristics through extralinguistic features to paralinguistic information, such as the speaker’s feelings, attitudes, or psychological states[40]. The psychological states (including emotions, feelings, and affections) embedded in people's speech are among the most important parts of the verbal communication array humans possess. As with other non-verbal cues, they are under conscious control much less than verbal cues. This makes speech an excellent guide to a human’s “true” emotional state even when he/she is trying to hide it.

Figure 1
Figure 1 Emotion indicators in the patient-doctor interaction.

In order to design and present such technology, the first step is to know which indicators in speech can be used to identify emotions. Psychologists, psychiatrists, and linguists have done extensive research to identify people's emotions and feelings, and have identified a number of indicators. They believe that through these markers, people's emotions and feelings can be understood.


Psychologists and psychiatrists pay attention to content indicators and acoustic variables to identify people's emotions through their speech. Scholarly evidence suggests that mental health is associated with specific word use[41-43]. Psychologists and psychiatrists usually consider three types of word usage to identify emotions: (1) Positive and negative emotion words; (2) standard function word categories; and (3) content categories. They distinguish between positive (“happy”, “laugh”) and negative (“sad”, “angry”) emotion words, standard.

Function word categories (e.g., self-references, first, second, and third person pronouns) and various content categories (e.g., religion, death, and occupation). The frequent use of “You” and “I” suggests a different relationship between the speaker and the addressee than that of “We”. The former suggests a more detached approach, whereas the latter expresses a feeling of solidarity. Multiple studies have indicated that the frequent use of the first-person singular is associated with negative affective states[44-48], which reveals a high degree of self-preoccupation[49]. People with negative emotional states (such as sadness or depression) use second and third person pronouns less often[38-40]. These people have a lower ability to express positive emotions and express more negative emotions in their speech[44-48]. Also, people with negative emotional states use more words referring to death[44].

In addition to the content of speech, psychologists and psychiatrists also look at several acoustic variables (such as pitch variety, pause time, speaking rate, and emphasis) to detect emotions. According to the research in this area, people with negative emotional states typically have a slower speaking rate[50-54], lower pitch variety[55,56], produce fewer words[57], and have longer pauses[53,54,58].


Within linguistics, various approaches (e.g., phonetic, semantic, discourse-pragmatic, and cognitive) have been adopted to examine the relationship between language and emotion[56,59,60]. As far as the phonetic and acoustic studies are concerned, emotions can be expressed through speech and are typically accompanied with physiological signals such as muscle activity, blood circulation, heart rate, skin conductivity, and respiration. This will subsequently affect the kinematic properties of the articulators, which in turn will cause altered acoustic characteristics of the produced speech signals of the speakers. Studies of the effects of emotion on the acoustic characteristics of speech have revealed that parameters related to the frequency domain (e.g., average values and ranges of fundamental frequency and formant frequencies), the intensity domain of speech (e.g., energy, amplitude), temporal characteristics of speech (e.g., duration and syllable rate), spectral features Mel frequency cepstral coefficients, and voice quality features (e.g., jitter, shimmer, and harmonics-to-noise-ratio are amongst the most important acoustically measurable parameters for correlates of emotion in speech. For instance, previous studies have reported that the mean and range of fundamental frequency observed for utterances spoken in anger situations were considerably greater than the mean and range for the neutral ones, while the average fundamental frequency for fear was lower than that observed for anger[61] (Figure 2 and Table 1).

Figure 2
Figure 2 Spectrograms of the Persian word (sahar) pronounced by a Persian female speaker in neutral (top) and anger (down) situations. Figure 2 shows spectrograms of the word (sahar), spoken by a native female speaker of Persian. The figure illustrates a couple of important differences between acoustic representations of the produced speech sounds. For example, the mean fundamental frequency in anger situations is higher (225 Hz) than that observed for neutral situations (200 Hz). Additionally, acoustic features such as mean formant frequencies (e.g. F1, F2, F3, and F4), minimum and maximum of the fundamental frequency, and mean intensity are lower in neutral situations. More details are provided in Table 1.
Table 1 Acoustic differences related to prosody and spectral features of the word (sahar) produced by a Persian female speaker in neutral and anger situations.

Prosody features
Mean Fundamental frequency (F0)200 Hz225 Hz
Minimum of the fundamental frequency194 Hz223 Hz
Maximum of the fundamental frequency213 Hz238 Hz
Mean intensity60 dB78 dB
Spectral features
First formant frequency (F1)853 Hz686 Hz
Second formant frequency (F2) 2055 Hz1660 Hz
Third formant frequency (F3)3148 Hz2847 Hz
Fourth formant frequency (F4)4245 Hz3678 Hz

Past research has produced many important findings to indicate that emotions can be distinguished by acoustical patterns; however, there are still a multitude of challenges regarding emotional speech research. One of the major obstacles that must be tackled in the domain of emotion recognition relates to variable vocalization which exists within speakers. Voices are often more variable within the same speaker (within-speaker variability) than they are between different speakers and it is thus unclear how human listeners can recognize individual speakers' emotion from their speech despite the tremendous variability that individual voices reveal. Emotion is sensitive to a large degree of variation within a single speaker and is highly affected by factors such as gender, speakers, speaking styles, sentence structure in spoken language, culture, and environment. Thus, identifying what specific mechanisms motivate variability in acoustic properties of emotional speech and how we can overcome differences arising from individual properties remain major challenges ahead of the emotion recognition field.

With regard to investigations in the area of pragmatics (in its continental notion which encompasses discourse analysis, sociolinguistics, cognitive linguistics, and even semantics), we observe a flourishing trend in linguistics focusing on the emotion in language[59,62]. These studies have examined important issues related to referential and non-referential meanings of emotion. In semantics, the focus has been on defining emotional and sentimental words and expressions, collocations and frames of emotion[63,64], field semantics[62], as well as lexical relations including semantic extensions. However, more pragmatic and discourse-oriented studies have looked at issues in terms of emotion and cultural identity[65,66]; information structure/packaging (e.g. topicalization and thematicization[67] and emotion, emotive particles and interjections[68-70], emotional implicatures, and emotional illocutionary acts, deixis, and indexicality (e.g. proximalization and distalization[71,72], conversational analysis and emotion (e.g. turn-taking and interruption)[73,74], etc.

Cognitive linguists use other methods to recognize emotion in speech. The cognitive linguistic approach to emotion concepts is based on the assumption that conventionalized language used to talk about emotions is a significant tool in discovering the structure and content of emotion concepts[75]. They consider a degree of universality for emotional experience and hold that this partial universality arises from basic image schemas that emerge from fundamental bodily experiences[76-79]. In this regard, the cultural model of emotions is a joint product of (possibly universal) actual human physiology, metonymic conceptualization of actual human physiology, metaphor, and cultural context[77]. In this approach, metaphor and metonymy are used as conceptual tools to describe the content and structure of emotion concepts.

Conceptual metaphors create correspondences between two distinct domains. One of the domains is typically more physical or concrete than the other (which is thus more abstract)[76]. For example, in the Persian expression gham dar delam âshiyâneh kardeh ‘sadness has nested in my heart’, gham ‘sadness’ is metaphorically conceptualized as a bird and del ‘heart/stomach’ is conceived of as a nest. The metaphor focuses on the perpetuation of sadness. The benefit of metaphors in the study of emotions is that they can highlight and address various aspects of emotion concepts[75,76]. Metonymy involves a single domain, or concept. Its purpose is to provide mental access to a domain through a part of the same domain (or vice versa) or to a part of a domain through another part in the same domain[80]. Metonymies can express physiological and behavioral aspects of emotions[75]. For example, in she was scarlet with rage, the physiological response associated with anger, i.e., redness in face and neck area, metonymically stands for anger. Thus, cognitive linguistics can contribute to the identification of metaphorical and metonymical conceptualizations of emotions in large corpora.

Although speech provides substantial information about the emotional states of speakers, accurate detection of emotions may nevertheless not always be feasible due to challenges that pervade communicative events involving emotions. Variations at semantic, pragmatic, and social-cultural levels present challenges that may hinder accurately identifying emotions via linguistic cues. At the semantic level, one limitation seems to be imposed by the “indeterminacy of meaning”, a universal property of meaning construction which refers to “situations in which a linguistic unit is underspecified due to its vagueness in meaning”[81]. For example, Persian expressions such as ye juriam or ye hâliam roughly meaning ‘I feel strange or unknown’ even in context may not explicitly denote the emotion(s) the speaker intends to convey, and hence underspecify the conceptualizations that are linguistically coded. The other limitation at the semantic level pertains to cross-individual variations in the linguistic categorization of emotions. Individuals differ as to how they linguistically label their emotional experiences. For example, the expression tu delam qoqâst ‘there is turmoil in my heart’ might refer to ‘extreme sadness’ for one person but might suggest an ‘extreme sense of confusion’ for another. Individuals also reveal varying degrees of competence in expressing emotions. This latter challenge concerns the use of emotion words, where social categories such as age, gender, ethnic background, education, social class, and profession could influence the ease and skill with which speakers speak of their emotions. Since emotions perform different social functions in different social groups[82], their use is expected to vary across social groups.

Language differences are yet another source of variation in the use and expression of emotions, which presents further challenges to the linguistic identification of emotions. Each language has its own specific words, syntactic structures, and modes of expressions to encode emotions. Further, emotions are linked with cultural models and reflect cultural norms as well as values[83]. Thus, emotion words cannot be taken as culture-free analytical tools or as universal categories for describing emotions[84]. Patterns of communication vary across and within cultures. The link between communication and culture is provided by a set of shared interpretations which reflect beliefs, norms, values, and social practices of a relatively large group of people[85]. Cultural diversity may pose challenges to doctors and health care practitioners in the course of communicating with patients and detecting their emotions. In a health care setting, self-disclosure is seen as an important (culturally sensitive) characteristic that differentiates patients according to their degree of willingness to tell the doctor/practitioner what they feel, believe, or think[86]. Given the significance of self-disclosure and explicitness in the verbal expression of feelings in health care settings (Robinson, ibid), it could be predicted that patients coming from social groups with more indirect, more implicit, and emotionally self-restrained styles of communication will probably pose challenges to doctors in getting them to speak about their feelings in a detailed and accurate manner. In some ethnic groups, self-disclosure and intimate revelations of personal and social problems to strangers (people outside one’s family or social group) may be unacceptable or taboo due to face considerations. Thus, patients belonging to these ethnic groups may adopt avoidance strategies in their communication with the doctor and hide or understate intense feelings. People may also refrain from talking about certain diseases or use circumlocutions due to the taboo or negative overtones associated with them. Further, self-restraint may be regarded as a moral virtue in some social groups, which could set a further obstacle in self-disclosing to the doctor or healthcare practitioner.

Overall, it is seen that these linguistically-oriented studies reveal important aspects of emotion in language use. In particular, they have shown how emotion is expressed and constructed by speakers in discourse. Such studies, however, are not based on multi-modal research to represent a comprehensive and unified description of emotion in language use. This means that, for a more rigorous and fine-grained investigation, we need an integrative and cross-disciplinary approach to examining emotions in language use.


From the data science perspective, speech emotion recognition (SER) is a machine learning (ML) problem whose goal is to classify the speech utterances based on their underlying emotions. This can be viewed from two perspectives: (1) Utterances as sounds with acoustic and spectral features (non-verbal); and (2) Utterances as words with specific semantic properties (verbal)[87-91]. While in the literature, SER typically refers to the former perspective, the latter is also important and provides a rich source of information, which can be harvested in favor of emotion recognition via natural language processing (NLP). Recent advances in the NLP technology allow for a fast analysis of text. In particular, word vector representations (also known as word embeddings) are used to embed words in a high dimensional space where words maintain semantic relationships with each other[92]. These vector representations, which are obtained through different ML algorithms, commonly capture the semantic relations between the words by looking into their collocation/co-occurrence in large corpora. In this way, the representation of each word and the machine’s understanding of that partially reflect the essential knowledge that relates to that word, thus capturing the so-called frame semantics. The problem of SER can thus be tackled by analyzing the transcript of the speech by running various downstream tasks on the word vectors of the given speech.

As for the former perspective, different classifiers have so far been suggested for SER as candidates for a practically feasible automatic emotion recognition (AER) system. These classifiers can be put broadly into two main categories: Linear classifiers and non-linear classifiers. The main classification techniques/models within these two categories are: (1) Hidden Markov model[93-96]; (2) Gaussian mixture model[97,98]; (3) K-Nearest neighbor[99]; (4) Support vector machine[100,101]; (5) Artificial neural network[94,102]; (6) Bayes classifier[94]; (7) Linear discriminant analysis[103,104]; and (8) Deep neural network[102-107].

A review of the most relevant works within the above techniques has recently been done in[108]. We have provided a short description of the above techniques in Appendix. One of the main approaches in the last category, i.e., deep neural networks, is to employ transfer learning. Recently[109] has reviewed the application of generalizable transfer learning in AER in the existing literature. In particular, it provides an overview of the previously proposed transfer learning methods for speech-based emotion recognition by listing 21 relevant studies.

The classifiers developed for SER may also be categorized in terms of their feature sets. Specifically, there are three main categories of speech features for SER: (1) The prosodic features[110-114]; (2) The excitation source features[110,111,115,116]; and (3) The spectral or vocal tract features[117-120].

Rosodic features, also known as continuous features, are some attributes of the speech sound such as pitch or fundamental frequency and energy. These features can be grouped into the following subcategories[104,105]: (1) Pitch-related features; (2) Formant features; (3) Energy-related features; (4) Timing features; and (5) Articulation features. Excitation source features, which are also referred to as voice quality features, are features which are used to represent glottal activity, such as harshness, breathiness, and tenseness of the speech signal.

Finally, spectral features, also known as segmental or system features, are the characteristics of various sound components generated from different cavities of the vocal tract system that have been extracted in different forms. The particular examples are ordinary linear predictor coefficients[117], one-sided autocorrelation linear predictor coefficients[113], short-time coherence method[114], and least squares modified Yule–Walker equations[115].

Table 2 summarizes the three discussed approaches to recognizing emotional indicators in speech 1.

Table 2 Different approaches to recognizing the emotional indicators in speech.
ApproachesEmotional indicators
Psychological(1) Positive and negative emotion words; (2) Standard function word categories; (3) Content categories; (4) The way of pronoun usage; and (5) Acoustic variables (such as pitch variety, pause time, speaking rate and emphasis)
Linguistic(1) Phonetic: Spectral analysis, temporal analysis; (2) Semantic & Discourse-pragmatic: Words, field, cultural identity, emotional implicatures, illocutionary acts, deixis and indexicality; and (3) Cognitive: Metaphor, metonymy
Data science(1) SER: Looking at sounds with acoustic and spectral features; and (2) NLP: Looking at words with specific semantic properties, word embedding

Given the breadth and complexity of emotion detection indicators in psychology and linguistics, it is difficult to establish a decision support system for a doctor’s emotional perception of patients. This requires a comprehensive and multidisciplinary approach. In order to build such a system, an application will be very useful. When a person experiences intense excitement, in addition to a reduction in his/her concentration, his/her mental balance is also disturbed more easily and quickly. This is also used as a strategy in sociology to take hold of people’s minds.

Under unstable conditions, reasoning and logical thinking (and thus more effective and active behavior), which emerge in response to the activity of new and higher parts of the brain, are dominated by older parts of the brain, which have more biological precedents (several thousand vs millions of years). Thus, these older parts act impulsively or reactively.

Working in an emergency environment and sometimes even in an office has special conditions, such as excessive stress due to medical emergencies, pressure from patient companions, patient’s own severe fear, as well as the impact of the phenomenon of "transference" and "countertransference" between physician and patient or between physician and patient companion. These can impair a physician's ability to reason and think logically. Thus, use of such an intelligent system can enhance doctors’ efficiency, increase their awareness, and make it easier for them to manage the conditions.


In the previous sections, the problem of SER was viewed from its three main perspectives: Psychology/psychiatry, linguistics, and data science, and the key elements within each perspective were highlighted. One way to integrate these three sides and benefit from their potential contributions to SER is through developing an intelligent platform. In what follows, focusing on SER in the context of doctor-patient interactions, we propose a solution for such integration.

The proposed solution consists of two key components: (1) The intelligent processing engine; and (2) The data-gathering platform.

The intelligent processing engine, at the algorithmic level, is based on NLP, speech processing, and in a wider context, behavioral signal processing methods. While it is clear that the processing engine will serve as the brain of the proposed intelligent platform, and is indeed a place where the novelty, creativity, and robustness of implemented algorithms can make a great difference, it will not practically function desirably without a well-thought, flexible data-gathering platform. Thus, despite the genuine algorithms which are to be developed at the core of the platform, and the undeniable impact they will have on the performance of the system, we believe it is the data-gathering platform that will make the solution very unique. One idea is to develop a cloud-based multi-mode multi-sided data gathering platform, which has three main sides: (1) The patient side; (2) The physician side; and (3) The lin-guistic/psychologist side.

Regarding the functioning of the platform, three modes can be considered: (1) The pre-visit mode; (2) The on-visit mode; and (3) The post-visit mode.

The pre-visit mode will include the patient's declaration of his/her health-related compla-ints/conditions and concerns, which will be automatically directed to the cloud-based processing engine, and labeled via a SER algorithm. This mode is reinforced via receiving additional multi-dimensional data from the patient through filling various forms and questionnaires. Also, it is possible for the patient to submit text to accompany his/her speech. This allows one to perform additional classification/clustering tasks such as sentiment analysis or patient segmentation on the provided text, using biomedical NLP methods. The on-visit mode enables the recording of the visiting session and the clinician-patient conversations. Finally, the post-visit mode of the application provides an interface for the psychiatrist/psychologist as well as the linguist to extract and label the psychological and linguistic features within the patient’s speech. Such tagging of the data by a team of specialists will in the long term lead to a rich repository of patient speech, which is of great value in training the ML algorithms in the processing engine. The proposed platform, which we have named INDICES, is depicted in Figure 3.

Figure 3
Figure 3 Integrated platform for patient emotion recognition and decision support. It consists of the data gathering platform and the intelligent processing engines. Each patient’s data, in the form of voice/transcripts is captured, labeled, and stored in the dataset. The resulting dataset feeds the machine language training/validation and test engines. The entire process of intelligent processing may iterate several times for further fine tuning. It is crucial to have collaboration among the three relevant expertise in different parts of the proposed solution.

Although the proposed platform is to be designed such that it scales up at the population level in order to benefit from the diversity of the gathered data, it will also serve every individual as a customized personalized electronic health record that keeps track of the patient’s psycho-emotional profile. As for the implementation of the platform, it is practically possible to tailor it to various devices (cell phones, tablets, PCs, and Laptops) via android/macOS, and web service applications

Note that emotion is essentially a multifaceted concept and no matter how sophisticated the proposed signal processing and data mining technology is, it would eventually face limitations in grasping all of its aspects. For instance, cultural aspects of expressing emotions can be a serious challenge to the technological system. Extracting the appropriate measurable features for correctly interpreting the cultural indices of emotion in speech can be a challenge, which nonetheless adds to the beauty of the problem. Further, as mentioned earlier, not all emotional indicators are embedded in the speech. Indeed, facial expressions and body gestures play important roles in expressing one’s emotions as well. Hence, since the technology considered in our proposed method is focused merely on speech signals, it will of course have blind spots such as the visual aspects of emotion which are not exploited. This can be thought of as a main limitation that bounds the performance of the proposed emotion recognition system. However, with the same pattern that technology has always emerged throughout history, the proposed method can similarly serve as a baseline to which further improvements and additional capabilities can be added in future. We must also note that in capturing the different aspects of emotion, we are faced with a tradeoff between computational complexity and performance. In particular, depending on the required accuracy of the system, one may need to customize the aspects of emotion which are to be examined via technology, taking into account the computational burden they would impose on the system.

We shall finally end this section with two remarks. First, it is important to note that despite all integrations and optimizations involved in the design and training of the proposed intelligent platform, it would still have the intrinsic limitations of a machine as a decision-maker, some of which were mentioned above. Thus, the proposed solution would eventually serve as a decision aid/support (and not as a decision replacement). Secondly, while the proposed solution provides a global framework, it invites for a series of methodologies and solutions, which are to be adapted and customized to each language and culture setting for local use.


We provide Table 3, which includes a brief description of each of the data science techniques and models mentioned earlier, along with reference sources in which further technical details of the methods can be found.

Table 3 A brief description of some data science models/methods.
Short description
HMMA HMM is a statistical model that can be used to describe the evolution of observable events that depend on internal factors, which are not directly observable. The observed event is called a ‘symbol’ and the invisible factor underlying the observation is called a ‘state’. A HMM consists of two stochastic processes, namely, an invisible process of hidden states and a visible process of observable symbols. The hidden states form a Markov chain and the probability distribution of the observed symbol depend on the underlying stateVia this model, the observations are modeled in two layers: One visible and the other invisible. Thus, it is useful in classification problems where raw observations are to be put into a number of categories that are more meaningful to us (Supplementary Figure 1)[121,122]
Gaussian mixture modelA Gaussian mixture model is a probabilistic model that assumes all data points are generated from a mixture of a finite number of Gaussian distributions with unknown parameters (Supplementary Figure 2)[123]
KNNKNN is a type of supervised learning algorithm used for classification. KNN tries to predict the correct class for the test data by calculating the distance between the test data and all training points. The algorithm then selects the K number of points which are closest to the test data. The KNN algorithm calculates the probability of the test data belonging to the classes of ‘K’ training data where the class that holds the highest probability (by majority voting) will be selected (Supplementary Figure 3)[123]
SVMThe SVM is an algorithm that finds a hyperplane in an N-dimensional space (N: The number of features) that distinctly classifies the data points in a way that the plane has the maximum margin, i.e., the maximum distance between data points of the two classes. Maximizing this margin distance would allow the future test points to be classified more accurately. Support vectors are data points that are closer to the hyperplane and influence the position as well as orientation of the hyperplane (Supplementary Figure 4)[123]
Artificial neural networkAn artificial neural network is a network of interconnected artificial neurons. An artificial neuron which is inspired by the actual neuron is modeled with inputs which are multiplied by weights, and then passed to a mathematical function which determines the activation of the neuron. The neurons in a neural network are grouped into layers. There are three main types of layers: – Input Layer – Hidden Layer(s) – Output Layer. Depending on the architecture of the network, outputs of some neurons are carried along with certain weights as inputs to some other neurons. By passing an input through these layers, the neural network finally outputs a value (discrete or continuous) which can be used to perform various classification/regression tasks. In this context, the neural network first has to learn the set of weights via the patterns within the so called training dataset, which is a sufficiently large set of input data labeled with their corresponding correct (expected) output (Supplementary Figure 5)[124]
Bayes classifierBayes classifier, which is based on Bayes’ theorem in probability, models the probabilistic relationships between the feature set and the class variable. Based on the modeled relationships, it estimates the class membership probability of the unseen example, in such a way that it minimizes the probability of misclassification[123]
Linear discriminant analysisLinear discriminant analysis is a method used in statistical machine learning, to find a linear combination of features that characterizes or separates two or more classes of objects or events. The resulting linear combination can be used as a linear classifier, or, as a means to dimension reduction prior to the actual classification task[124]

In the context of doctor-patient interactions, this article focused on patient SER as a multidimensional problem viewed from three main aspects: Psychology/psychiatry, linguistics, and data science. We reviewed the key elements and approaches within each of these three perspectives, and surveyed the relevant literature on them. In particular, from the psychological/psychiatric perspective, the emotion indicators in the patient-doctor interaction were highlighted and discussed. In the linguistic approach, the relationship between language and emotion was discussed from phonetic, semantic, discourse-pragmatic, and cognitive perspectives. Finally, in the data science approach, SER was discussed as a ML/signal processing problem. The lack of a systematic comprehensive collaboration among the three discussed disciplines was pointed out. Motivated by the necessity of such multidisciplinary collaboration, we proposed a platform named indices: An integrated platform for patient emotion recognition and decision support. The proposed solution can serve as a collaborative framework towards clinical decision support.


Provenance and peer review: Invited article; Externally peer reviewed.

Peer-review model: Single blind

Specialty type: Psychiatry

Country/Territory of origin: Iran

Peer-review report’s scientific quality classification

Grade A (Excellent): A

Grade B (Very good): B

Grade C (Good): 0

Grade D (Fair): 0

Grade E (Poor): 0

P-Reviewer: Panduro A, Mexico; Stoyanov D, Bulgaria S-Editor: Liu XF L-Editor: A P-Editor: Liu XF

1.  Riedl D, Schüßler G. The Influence of Doctor-Patient Communication on Health Outcomes: A Systematic Review. Z Psychosom Med Psychother. 2017;63:131-150.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 42]  [Cited by in F6Publishing: 62]  [Article Influence: 7.0]  [Reference Citation Analysis (0)]
2.  Begum T. Doctor patient communication: A review. J Bangladesh Coll Phys Surg. 2014;32:84-88.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 6]  [Cited by in F6Publishing: 6]  [Article Influence: 0.8]  [Reference Citation Analysis (0)]
3.  Kee JWY, Khoo HS, Lim I, Koh MYH. Communication Skills in Patient-Doctor Interactions: Learning from Patient Complaints. Heal Prof Educ. 2018;4:97-106.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 75]  [Cited by in F6Publishing: 78]  [Article Influence: 15.0]  [Reference Citation Analysis (0)]
4.  Helman CG. Communication in primary care: The role of patient and practitioner explanatory models. Soc Sci Med. 1985;20:923-931.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 93]  [Cited by in F6Publishing: 95]  [Article Influence: 2.4]  [Reference Citation Analysis (0)]
5.  Kleinmann A  The illness narratives. USA: Basic Books, 1988.  [PubMed]  [DOI]  [Cited in This Article: ]
6.  McWhinney IR. Beyond diagnosis: An approach to the integration of behavioral science and clinical medicine. N Engl J Med. 1972;287:384-387.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 73]  [Cited by in F6Publishing: 76]  [Article Influence: 1.4]  [Reference Citation Analysis (0)]
7.  Colliver JA, Willis MS, Robbs RS, Cohen DS, Swartz MH. Assessment of Empathy in a Standardized-Patient Examination. Teach Learn Med. 1998;10:8-11.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 64]  [Cited by in F6Publishing: 64]  [Article Influence: 2.6]  [Reference Citation Analysis (0)]
8.  Mercer SW, Maxwell M, Heaney D, Watt GC. The consultation and relational empathy (CARE) measure: Development and preliminary validation and reliability of an empathy-based consultation process measure. Fam Pract. 2004;21:699-705.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 434]  [Cited by in F6Publishing: 471]  [Article Influence: 22.8]  [Reference Citation Analysis (0)]
9.  Kadadi S, Bharamanaiker S. Role of emotional intelligence in healthcare industry. Drishtikon Manag J. 2020;11:37.  [PubMed]  [DOI]  [Cited in This Article: ]
10.  Weng HC. Does the physician's emotional intelligence matter? Health Care Manage Rev. 2008;33:280-288.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 61]  [Cited by in F6Publishing: 65]  [Article Influence: 4.1]  [Reference Citation Analysis (0)]
11.  Barsky AJ, Borus JF. Functional somatic syndromes. Ann Intern Med. 1999;130:910-921.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 612]  [Cited by in F6Publishing: 634]  [Article Influence: 25.5]  [Reference Citation Analysis (0)]
12.  Beach MC, Inui T; Relationship-Centered Care Research Network. Relationship-centered care. A constructive reframing. J Gen Intern Med. 2006;21 Suppl 1:S3-S8.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 358]  [Cited by in F6Publishing: 407]  [Article Influence: 21.1]  [Reference Citation Analysis (0)]
13.  Blue AV, Chessman AW, Gilbert GE, Mainous AG 3rd. Responding to patients' emotions: Important for standardized patient satisfaction. Fam Med. 2000;32:326-330.  [PubMed]  [DOI]  [Cited in This Article: ]
14.  Finset A. "I am worried, Doctor! Patient Educ Couns. 2012;88:359-363.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 49]  [Cited by in F6Publishing: 51]  [Article Influence: 4.5]  [Reference Citation Analysis (0)]
15.  Mead N, Bower P. Patient-centredness: A conceptual framework and review of the empirical literature. Soc Sci Med. 2000;51:1087-1110.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 1698]  [Cited by in F6Publishing: 1508]  [Article Influence: 77.2]  [Reference Citation Analysis (0)]
16.  Zimmermann C, Del Piccolo L, Finset A. Cues and concerns by patients in medical consultations: A literature review. Psychol Bull. 2007;133:438-463.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 174]  [Cited by in F6Publishing: 179]  [Article Influence: 10.9]  [Reference Citation Analysis (0)]
17.  Jansen J, van Weert JC, de Groot J, van Dulmen S, Heeren TJ, Bensing JM. Emotional and informational patient cues: The impact of nurses' responses on recall. Patient Educ Couns. 2010;79:218-224.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 79]  [Cited by in F6Publishing: 81]  [Article Influence: 6.1]  [Reference Citation Analysis (0)]
18.  Weng HC, Chen HC, Chen HJ, Lu K, Hung SY. Doctors' emotional intelligence and the patient-doctor relationship. Med Educ. 2008;42:703-711.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 67]  [Cited by in F6Publishing: 74]  [Article Influence: 4.5]  [Reference Citation Analysis (0)]
19.  Hall JA, Roter DL, Blanch DC, Frankel RM. Nonverbal sensitivity in medical students: Implications for clinical interactions. J Gen Intern Med. 2009;24:1217-1222.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 56]  [Cited by in F6Publishing: 47]  [Article Influence: 4.0]  [Reference Citation Analysis (0)]
20.  DiMatteo MR, Hays RD, Prince LM. Relationship of physicians' nonverbal communication skill to patient satisfaction, appointment noncompliance, and physician workload. Health Psychol. 1986;5:581-594.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 11]  [Cited by in F6Publishing: 38]  [Reference Citation Analysis (0)]
21.  DiMatteo MR, Taranta A, Friedman HS, Prince LM. Predicting patient satisfaction from physicians' nonverbal communication skills. Med Care. 1980;18:376-387.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 210]  [Cited by in F6Publishing: 213]  [Article Influence: 4.9]  [Reference Citation Analysis (0)]
22.  Kim SS, Kaplowitz S, Johnston MV. The effects of physician empathy on patient satisfaction and compliance. Eval Health Prof. 2004;27:237-251.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 531]  [Cited by in F6Publishing: 558]  [Article Influence: 27.9]  [Reference Citation Analysis (0)]
23.  Shi M, Du T. Associations of emotional intelligence and gratitude with empathy in medical students. BMC Med Educ. 2020;20:116.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 9]  [Cited by in F6Publishing: 10]  [Article Influence: 3.0]  [Reference Citation Analysis (0)]
24.  Arora S, Ashrafian H, Davis R, Athanasiou T, Darzi A, Sevdalis N. Emotional intelligence in medicine: A systematic review through the context of the ACGME competencies. Med Educ. 2010;44:749-764.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 183]  [Cited by in F6Publishing: 207]  [Article Influence: 14.1]  [Reference Citation Analysis (0)]
25.  Hojat M, Louis DZ, Maio V, Gonnella JS. Empathy and health care quality. Am J Med Qual. 2013;28:6-7.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 62]  [Cited by in F6Publishing: 68]  [Article Influence: 6.2]  [Reference Citation Analysis (0)]
26.  Ogle J, Bushnell JA, Caputi P. Empathy is related to clinical competence in medical care. Med Educ. 2013;47:824-831.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 67]  [Cited by in F6Publishing: 72]  [Article Influence: 7.4]  [Reference Citation Analysis (0)]
27.  Marvel MK. Involvement with the psychosocial concerns of patients. Observations of practicing family physicians on a university faculty. Arch Fam Med. 1993;2:629-633.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 22]  [Cited by in F6Publishing: 22]  [Article Influence: 0.7]  [Reference Citation Analysis (0)]
28.  Byrne PS, Long BE.   Doctors Talking to Patients. London: National government publication, 1976.  [PubMed]  [DOI]  [Cited in This Article: ]
29.  Thompson BM, Teal CR, Scott SM, Manning SN, Greenfield E, Shada R, Haidet P. Following the clues: Teaching medical students to explore patients' contexts. Patient Educ Couns. 2010;80:345-350.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 12]  [Cited by in F6Publishing: 13]  [Article Influence: 0.9]  [Reference Citation Analysis (0)]
30.  Zimmermann C, Del Piccolo L, Bensing J, Bergvik S, De Haes H, Eide H, Fletcher I, Goss C, Heaven C, Humphris G, Kim YM, Langewitz W, Meeuwesen L, Nuebling M, Rimondini M, Salmon P, van Dulmen S, Wissow L, Zandbelt L, Finset A. Coding patient emotional cues and concerns in medical consultations: The Verona coding definitions of emotional sequences (VR-CoDES). Patient Educ Couns. 2011;82:141-148.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 171]  [Cited by in F6Publishing: 167]  [Article Influence: 13.2]  [Reference Citation Analysis (0)]
31.  Mjaaland TA, Finset A, Jensen BF, Gulbrandsen P. Patients' negative emotional cues and concerns in hospital consultations: A video-based observational study. Patient Educ Couns. 2011;85:356-362.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 22]  [Cited by in F6Publishing: 22]  [Article Influence: 1.8]  [Reference Citation Analysis (0)]
32.  Del Piccolo L, Goss C, Bergvik S. The fourth meeting of the Verona Network on Sequence Analysis ''Consensus finding on the appropriateness of provider responses to patient cues and concerns''. Patient Educ Couns. 2006;61:473-475.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 1]  [Cited by in F6Publishing: 1]  [Article Influence: 0.1]  [Reference Citation Analysis (0)]
33.  Piccolo LD, Goss C, Zimmermann C. The Third Meeting of the Verona Network on Sequence Analysis. Finding common grounds in defining patient cues and concerns and the appropriateness of provider responses. Patient Educ Couns. 2005;57:241-244.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 12]  [Cited by in F6Publishing: 14]  [Article Influence: 0.7]  [Reference Citation Analysis (0)]
34.  Levinson W, Gorawara-Bhat R, Lamb J. A study of patient clues and physician responses in primary care and surgical settings. JAMA. 2000;284:1021-1027.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 412]  [Cited by in F6Publishing: 433]  [Article Influence: 17.9]  [Reference Citation Analysis (0)]
35.  Branch WT, Malik TK. Using 'windows of opportunities' in brief interviews to understand patients' concerns. JAMA. 1993;269:1667-1668.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 57]  [Cited by in F6Publishing: 57]  [Article Influence: 1.9]  [Reference Citation Analysis (0)]
36.  Bylund CL, Makoul G. Examining empathy in medical encounters: an observational study using the empathic communication coding system. Health Commun. 2005;18:123-140.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 94]  [Cited by in F6Publishing: 95]  [Article Influence: 5.2]  [Reference Citation Analysis (0)]
37.  Easter DW, Beach W. Competent patient care is dependent upon attending to empathic opportunities presented during interview sessions. Curr Surg. 2004;61:313-318.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 54]  [Cited by in F6Publishing: 48]  [Article Influence: 2.8]  [Reference Citation Analysis (0)]
38.  Mjaaland TA, Finset A, Jensen BF, Gulbrandsen P. Physicians' responses to patients' expressions of negative emotions in hospital consultations: A video-based observational study. Patient Educ Couns. 2011;84:332-337.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 45]  [Cited by in F6Publishing: 46]  [Article Influence: 3.8]  [Reference Citation Analysis (0)]
39.  Satterfield JM, Hughes E. Emotion skills training for medical students: a systematic review. Med Educ. 2007;41:935-941.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 119]  [Cited by in F6Publishing: 126]  [Article Influence: 7.4]  [Reference Citation Analysis (0)]
40.  Rose P  Forensic speaker identification. New York: Taylor & Francis, 2001.  [PubMed]  [DOI]  [Cited in This Article: ]
41.  Gottschalk LA, Gleser, GC.   The measurement of psychological states through the content analysis of verbal behavior. California: University of California Press, 1979.  [PubMed]  [DOI]  [Cited in This Article: ]
42.  Rosenberg SD, Tucker GJ. Verbal behavior and schizophrenia. The semantic dimension. Arch Gen Psychiatry. 1979;36:1331-1337.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 31]  [Cited by in F6Publishing: 31]  [Article Influence: 0.7]  [Reference Citation Analysis (0)]
43.  Stiles WB. Describing talk: A taxonomy of verbal response modes. Lang Soc. 1993;22:568-570.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 1]  [Cited by in F6Publishing: 1]  [Article Influence: 0.1]  [Reference Citation Analysis (0)]
44.  Pennebaker JW, Francis, ME, & Booth, RJ Linguistic inquiry and word count: LIWC 2001.   Mahway: Lawrence Erlbaum Associates, 2001.  [PubMed]  [DOI]  [Cited in This Article: ]
45.  Weintraub W  Verbal Behavior in Everyday Life. New York: Springer, 1989.  [PubMed]  [DOI]  [Cited in This Article: ]
46.  Bucci W, Freedman N. The language of depression. Bull Menninger Clin. 1981;45:334-358.  [PubMed]  [DOI]  [Cited in This Article: ]
47.  Weintraub W  Verbal behavior: Adaptation and psychopathology. New York: Springer Publishing Company, 1981.  [PubMed]  [DOI]  [Cited in This Article: ]
48.  Rude SS, Gortner E-M, Pennebaker JW. Language use of depressed and depression-vulnerable college students. Cogn Emot.. 2004;18:1121-133.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 472]  [Cited by in F6Publishing: 243]  [Article Influence: 24.8]  [Reference Citation Analysis (0)]
49.  Balsters MJH, Krahmer EJ, Swerts MG, Vingerhoets AJJM. Verbal and nonverbal correlates for depression: A review. Curr Psychiatry Rev. 2012;8:227-234.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 17]  [Cited by in F6Publishing: 18]  [Article Influence: 1.5]  [Reference Citation Analysis (0)]
50.  Kraepelin E  Manic-depressive insanity and paranoia. Edinburgh UK: Alpha Editions, 1921.  [PubMed]  [DOI]  [Cited in This Article: ]
51.  Newman S, Mather VG. Analysis of spoken language of patients with affective disorders. Am J Psychiatry. 1938;94:913-942.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 45]  [Cited by in F6Publishing: 45]  [Article Influence: 0.5]  [Reference Citation Analysis (0)]
52.  Hinchliffe MK, Lancashire M, Roberts FJ. Depression: Defence mechanisms in speech. Br J Psychiatry. 1971;118:471-472.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 29]  [Cited by in F6Publishing: 29]  [Article Influence: 0.6]  [Reference Citation Analysis (0)]
53.  Mundt JC, Snyder PJ, Cannizzaro MS, Chappie K, Geralts DS. Voice acoustic measures of depression severity and treatment response collected via interactive voice response (IVR) technology. J Neurolinguistics. 2007;20:50-64.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 167]  [Cited by in F6Publishing: 171]  [Article Influence: 10.4]  [Reference Citation Analysis (0)]
54.  Sobin C, Alpert M. Emotion in speech: The acoustic attributes of fear, anger, sadness, and joy. J Psycholinguist Res. 1999;28:347-365.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 97]  [Cited by in F6Publishing: 103]  [Article Influence: 4.0]  [Reference Citation Analysis (0)]
55.  Nilsonne A. Acoustic analysis of speech variables during depression and after improvement. Acta Psychiatr Scand. 1987;76:235-245.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 56]  [Cited by in F6Publishing: 55]  [Article Influence: 1.6]  [Reference Citation Analysis (0)]
56.  Alpert M, Pouget ER, Silva RR. Reflections of depression in acoustic measures of the patient's speech. J Affect Disord. 2001;66:59-69.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 110]  [Cited by in F6Publishing: 113]  [Article Influence: 5.0]  [Reference Citation Analysis (0)]
57.  Weintraub W, Aronson H. The application of verbal behavior analysis to the study of psychological defense mechanisms. IV. Speech pattern associated with depressive behavior. J Nerv Ment Dis. 1967;144:22-28.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 33]  [Cited by in F6Publishing: 33]  [Article Influence: 0.6]  [Reference Citation Analysis (0)]
58.  Chapple ED, Lindemann E. Clinical Implications of Measurements of Interaction Rates in Psychiatric Interviews. Appl Anthropol. 1942;1:1-11.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 2]  [Cited by in F6Publishing: 2]  [Article Influence: 0.1]  [Reference Citation Analysis (0)]
59.  Prakash M, Language and Cognitive Structures of Emotion.   Cambridge: Palgrave Macmillan, 2016: 182.  [PubMed]  [DOI]  [Cited in This Article: ]
60.  Dresner E, Herring SC. Functions of the nonverbal in CMC: Emoticons and illocutionary force. Communication Theory. 2010;20:249-268.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 228]  [Cited by in F6Publishing: 253]  [Article Influence: 17.5]  [Reference Citation Analysis (0)]
61.  Williams CE, Stevens KN. Emotions and speech: Some acoustical correlates. J Acoust Soc Am. 1972;52:1238-1250.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 437]  [Cited by in F6Publishing: 432]  [Article Influence: 8.6]  [Reference Citation Analysis (0)]
62.  Liu Y. The emotional geographies of language teaching. Teacher Development. 2016;20:482-497.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 11]  [Cited by in F6Publishing: 7]  [Article Influence: 1.6]  [Reference Citation Analysis (0)]
63.  Ruppenhofer J  The treatment of emotion vocabulary in FrameNet: Past, present and future developments. Düsseldorf University Press, 2018.  [PubMed]  [DOI]  [Cited in This Article: ]
64.  Johnson-Laird PN, Oatley K.   Emotions, music, and literature in: Ewis M, Haviland-Jones JM, Barrett LF: Handbook of emotions. London: Guilford Press, 2008: 102-113.  [PubMed]  [DOI]  [Cited in This Article: ]
65.  Giorgi K  Emotions, Language and Identity on the Margins of Europe. London: Springer, 2014.  [PubMed]  [DOI]  [Cited in This Article: ]
66.  Wilce JM, Wilce JM.   Language and emotion. Cambridge: Cambridge University Press, 2009.  [PubMed]  [DOI]  [Cited in This Article: ]
67.  Wang L, Bastiaansen M, Yang Y, Hagoort P. ERP evidence on the interaction between information structure and emotional salience of words. Cogn Affect Behav Neurosci. 2013;13:297-310.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 27]  [Cited by in F6Publishing: 29]  [Article Influence: 2.7]  [Reference Citation Analysis (0)]
68.  Braber N. Emotional and emotive language: Modal particles and tags in unified Berlin. J Pragmat38:1487-503.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 5]  [Cited by in F6Publishing: 5]  [Article Influence: 0.3]  [Reference Citation Analysis (0)]
69.  Alba-Juez L, Larina TV. Language and emotion: Discourse-pragmatic perspectives. Russ J Linguist. 2018;22:9-37.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 10]  [Cited by in F6Publishing: 10]  [Article Influence: 2.0]  [Reference Citation Analysis (0)]
70.  Goddard C. Interjections and emotion (with special reference to "surprise" and "disgust"). Emotion Review. 2014;6:53-63.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 41]  [Cited by in F6Publishing: 106]  [Article Influence: 4.1]  [Reference Citation Analysis (0)]
71.  Glazer T. The Semiotics of Emotional Expression. Trans Charles S Peirce Soc. 2017;53:189-215.  [PubMed]  [DOI]  [Cited in This Article: ]
72.  Wilce JM. Current emotion research in linguistic anthropology. Emot Rev. 2014;6:77-85.  [PubMed]  [DOI]  [Cited in This Article: ]
73.  Peräkylä A, Sorjonen ML.   Emotion in interaction. New York: Oxford University Press, 2012.  [PubMed]  [DOI]  [Cited in This Article: ]
74.  Stevanovic M, Peräkylä A. Experience sharing, emotional reciprocity, and turn-taking. Front Psychol. 2015;6:450.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 13]  [Cited by in F6Publishing: 16]  [Article Influence: 1.6]  [Reference Citation Analysis (0)]
75.  Kövecses Z  Emotion concepts. New York: Springer, 1990.  [PubMed]  [DOI]  [Cited in This Article: ]
76.  Kövecses Z  Metaphors of anger, pride and love. Amsterdam: Benjamins, 1986.  [PubMed]  [DOI]  [Cited in This Article: ]
77.  Kövecses Z  Metaphor and emotion: Language, culture, and body in human feeling. Cambridge: Cambridge University Press, 2003.  [PubMed]  [DOI]  [Cited in This Article: ]
78.  Lakoff G, Kövecses Z.   The cognitive model of anger inherent in American English. In Holland D, Quinn N, Editors: Cultural models in language and thought. Cambridge: Cambridge University Press, 1987: 195-221.  [PubMed]  [DOI]  [Cited in This Article: ]
79.  Yu N  The contemporary theory of metaphor: A perspective from Chinese. Amsterdam: John Benjamins Publishing, 1998.  [PubMed]  [DOI]  [Cited in This Article: ]
80.  Kövecses Z, Radden G. Metonymy: Developing a cognitive linguistic view. Cogn Linguist. 1998;9:37-78.  [PubMed]  [DOI]  [Cited in This Article: ]
81.  Radden G, Köpcke KM, Berg T, Siemund P.   The construction of meaning in language. Aspects of Meaning Construction. Amsterdam: John Benjamins Publishing Co, 2007: 1-5.  [PubMed]  [DOI]  [Cited in This Article: ]
82.  Salmela M  The functions of collective emotions in social groups. In Institutions, emotions, and group agents. Dordrecht: Springer, 2014: 159-176.  [PubMed]  [DOI]  [Cited in This Article: ]
83.  Kövecses Z  The concept of emotion: Further metaphors. In: Emotion concepts. New York: Springer, 1990: 160-181.  [PubMed]  [DOI]  [Cited in This Article: ]
84.  Wierzbicka A. Talking about emotions: Semantics, culture, and cognition. In: Cognition & Emotion. 1992;6:285-319.  [PubMed]  [DOI]  [Cited in This Article: ]
85.  Lustig M, Koester J.   Intercultural communication: Interpersonal communication across cultures. J. Koester-Boston: Pearson Education, 2010.  [PubMed]  [DOI]  [Cited in This Article: ]
86.  Robinson NM  To tell or not to tell: Factors in self-disclosing mental illness in our everyday relationships (Doctoral dissertation). Available from:  [PubMed]  [DOI]  [Cited in This Article: ]
87.  Akçay MB, Oğuz K. Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers. Speech Commun. 2020;116:56-76.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 184]  [Cited by in F6Publishing: 210]  [Article Influence: 61.3]  [Reference Citation Analysis (0)]
88.  Tan L, Yu K, Lin L, Cheng X, Srivastava G, Lin JC, Wei W. Speech Emotion Recognition Enhanced Traffic Efficiency Solution for Autonomous Vehicles in a 5G-Enabled Space-Air-Ground Integrated InteIlligent Transportation System. IEEE T Intell Transp. 2022;23:2830-42.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 38]  [Cited by in F6Publishing: 39]  [Article Influence: 38.0]  [Reference Citation Analysis (0)]
89.  Schuller BW. Speech emotion recognition: Two decades in a nutshell, benchmarks, and ongoing trends. COMMUN ACM. 2018;61:90-9.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 175]  [Cited by in F6Publishing: 76]  [Article Influence: 35.0]  [Reference Citation Analysis (0)]
90.  Zhang S, Zhang S, Huang T, Gao W. Speech Emotion Recognition Using Deep Convolutional Neural Network and Discriminant Temporal Pyramid Matching. IEEE Trans Multimedia. 2018;20:1576-1590.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 179]  [Cited by in F6Publishing: 180]  [Article Influence: 35.8]  [Reference Citation Analysis (0)]
91.  Chen M, He X, Yang J, Zhang H. 3-D Convolutional Recurrent Neural Networks with Attention Model for Speech Emotion Recognition. IEEE Signal Process Lett. 2018;25:1440-1444.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 183]  [Cited by in F6Publishing: 187]  [Article Influence: 36.6]  [Reference Citation Analysis (0)]
92.  Samadi MA, Akhondzadeh MS, Zahabi SJ, Manshaei MH, Maleki Z, Adibi P.   Evaluating Sparse Interpretable Word Embeddings for Biomedical Domain. 2020 Preprint. Available from:  [PubMed]  [DOI]  [Cited in This Article: ]
93.  Bitouk D, Verma R, Nenkova A. Class-Level Spectral Features for Emotion Recognition. Speech Commun. 2010;52:613-625.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 132]  [Cited by in F6Publishing: 133]  [Article Influence: 10.2]  [Reference Citation Analysis (0)]
94.  Fernandez R, Picard R. Modeling drivers' speech under stress. Speech Commun. 2003;40:145-159.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 110]  [Cited by in F6Publishing: 109]  [Article Influence: 5.5]  [Reference Citation Analysis (0)]
95.  Nwe T, Foo S, De Silva L. Speech emotion recognition using hidden Markov models. Speech Commun. 2003;41:603-623.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 520]  [Cited by in F6Publishing: 251]  [Article Influence: 26.0]  [Reference Citation Analysis (0)]
96.  Lee C, Yildirim S, Bulut M, Busso C, Kazemzadeh A, Lee S, Narayanan S. Effects of emotion on different phoneme classes. J Acoust Soc Am. 2004;116:2481-2481.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 1]  [Cited by in F6Publishing: 1]  [Article Influence: 0.1]  [Reference Citation Analysis (0)]
97.  Breazeal C, Aryananda L. Recognition of Affective Communicative Intent in Robot-Directed Speech. Auton Robots. 2002;12:83-104.  [PubMed]  [DOI]  [Cited in This Article: ]
98.  Slaney M, McRoberts G. BabyEars: A recognition system for affective vocalizations. Speech Commun. 2003;39:367-384.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 62]  [Cited by in F6Publishing: 62]  [Article Influence: 3.1]  [Reference Citation Analysis (0)]
99.  Pao TL, Chen YT, Yeh JH, Liao WY.   Combining acoustic features for improved emotion recognition in mandarin speech. In: Tao J, Tan T, Picard RW, editors. Affective Computing and Intelligent Interaction. International Conference on Affective Computing and Intelligent Interaction; 2005 Oct; Berlin. Heidelberg: Springer, 2005: 279-285.  [PubMed]  [DOI]  [Cited in This Article: ]
100.  Wu S, Falk T, Chan W. Automatic speech emotion recognition using modulation spectral features. Speech Commun. 2011;53:768-785.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 221]  [Cited by in F6Publishing: 221]  [Article Influence: 18.4]  [Reference Citation Analysis (0)]
101.  Pierre-Yves O. The production and recognition of emotions in speech: Features and algorithms. Int J Hum Comput. 2003;1:157-183.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 184]  [Cited by in F6Publishing: 182]  [Article Influence: 9.2]  [Reference Citation Analysis (0)]
102.  Zhu A, Luo Q.   Study on speech emotion recognition system in e-learning. In: Jacko A editors. Human-Computer Interaction. HCI Intelligent Multimodal Interaction Environments. International Conference on Human-Computer Interaction; 2007 Jul 22, Berlin, Heidelberg: Springer, 2007: 544-552.  [PubMed]  [DOI]  [Cited in This Article: ]
103.  Chen L, Mao X, Xue Y, Cheng LL. Speech emotion recognition: Features and classification models. Digital Signal Processing. 2012;22:1154-160.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 130]  [Cited by in F6Publishing: 71]  [Article Influence: 11.8]  [Reference Citation Analysis (0)]
104.  Xanthopoulos P, Pardalos PM, Trafalis TB.   Linear discriminant analysis. In Xanthopoulos P, Pardalos PM, Trafalis TB. Robust data mining New York: Springer, 2013: 27-33.  [PubMed]  [DOI]  [Cited in This Article: ]
105.  Chen M, He X, Yang J, Zhang H. 3-D Convolutional Recurrent Neural Networks with Attention Model for Speech Emotion Recognition. IEEE Signal Process Let. 2018;25:1440-1444.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 183]  [Cited by in F6Publishing: 187]  [Article Influence: 36.6]  [Reference Citation Analysis (0)]
106.  Zhang S, Zhang S, Huang T, Gao W. Speech Emotion Recognition Using Deep Convolutional Neural Network and Discriminant Temporal Pyramid Matching. IEEE Trans Multimedia. 2018;20:1576-1590.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 179]  [Cited by in F6Publishing: 180]  [Article Influence: 35.8]  [Reference Citation Analysis (0)]
107.  Mao Q, Dong M, Huang Z, Zhan Y. Learning salient features for speech emotion recognition using convolutional neural networks. IEEE Trans Multimedia. 2014;16:2203-2213.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 318]  [Cited by in F6Publishing: 324]  [Article Influence: 35.3]  [Reference Citation Analysis (0)]
108.  Feng K, Chaspari T. A Review of Generalizable Transfer Learning in Automatic Emotion Recognition. Front Comput Sci. 2020;2.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 27]  [Cited by in F6Publishing: 27]  [Article Influence: 9.0]  [Reference Citation Analysis (0)]
109.  Roy T, Marwala T, Chakraverty S.   A survey of classification techniques in speech emotion recognition. In: Chakraverty S: Mathematical Methods in Interdisciplinary Sciences. New Jersey: Wiley, 2020: 33-48.  [PubMed]  [DOI]  [Cited in This Article: ]
110.  Cowie R, Douglas-Cowie E, Tsapatsoulis N, Votsis G, Kollias S, Fellenz W, Taylor JG. Emotion recognition in human-computer interaction. IEEE Signal Process Mag. 2001;18:32-80.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 1355]  [Cited by in F6Publishing: 1318]  [Article Influence: 61.6]  [Reference Citation Analysis (0)]
111.  Murray IR, Arnott JL. Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion. J Acoust Soc Am. 1993;93:1097-1108.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 572]  [Cited by in F6Publishing: 582]  [Article Influence: 19.1]  [Reference Citation Analysis (0)]
112.  Banse R, Scherer KR. Acoustic profiles in vocal emotion expression. J Pers Soc Psychol. 1996;70:614-636.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 1100]  [Cited by in F6Publishing: 1202]  [Article Influence: 40.7]  [Reference Citation Analysis (0)]
113.  Beeke S, Wilkinson R, Maxim J. Prosody as a compensatory strategy in the conversations of people with agrammatism. Clin Linguist Phon. 2009;23:133-155.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 15]  [Cited by in F6Publishing: 10]  [Article Influence: 1.1]  [Reference Citation Analysis (0)]
114.  Tao J, Kang Y, Li A. Prosody conversion from neutral speech to emotional speech. IEEE Trans Audio Speech Lang Process. 2006;14:1145-1154.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 114]  [Cited by in F6Publishing: 114]  [Article Influence: 6.7]  [Reference Citation Analysis (0)]
115.  Scherer KR. Vocal affect expression: A review and a model for future research. Psychol Bull. 1986;99:143-165.  [PubMed]  [DOI]  [Cited in This Article: ]
116.  Davitz JR, Beldoch M.   The Communication of Emotional Meaning. New York: McGraw-Hill, 1964.  [PubMed]  [DOI]  [Cited in This Article: ]
117.  Rabiner LR, Schafer RW.   Digital processing of speech signals. New Jersey: Prentice Hall, 1978: 121-123.  [PubMed]  [DOI]  [Cited in This Article: ]
118.  Hernando J, Nadeu C. Linear prediction of the one-sided autocorrelation sequence for noisy speech recognition. IEEE Trans Speech Audio Process. 1997;5:80-84.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 46]  [Cited by in F6Publishing: 43]  [Article Influence: 1.8]  [Reference Citation Analysis (0)]
119.  Le Bouquin R. Enhancement of noisy speech signals: Application to mobile radio communications. Speech Commun. 1996;18:3-19.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 25]  [Cited by in F6Publishing: 22]  [Article Influence: 0.9]  [Reference Citation Analysis (0)]
120.  Bou-Ghazale SE, Hansen JH. A comparative study of traditional and newly proposed features for recognition of speech under stress. IEEE Trans Speech Audio Process. 2000;8:429-442.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 166]  [Cited by in F6Publishing: 159]  [Article Influence: 7.2]  [Reference Citation Analysis (0)]
121.  Rabiner LR. A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE. 1989;77:257-286.  [PubMed]  [DOI]  [Cited in This Article: ]
122.  Yoon BJ. Hidden Markov Models and their Applications in Biological Sequence Analysis. Curr Genomics. 2009;10:402-415.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 183]  [Cited by in F6Publishing: 189]  [Article Influence: 15.3]  [Reference Citation Analysis (0)]
123.  Duda RO, Hart PE, Stork DG, Ionescu A.   Pattern classification, chapter nonparametric techniques. Wiley-Interscience Publication, 2000.  [PubMed]  [DOI]  [Cited in This Article: ]
124.  Haykin S  Neural networks and learning machines, 3/E. Pearson Education India, 2010.  [PubMed]  [DOI]  [Cited in This Article: ]