How are you, actually?: Using AI algorithms for facial emotion recognition

Vienna Parnell | December 1, 2022

Hopefully it comes to the surprise of no one that people lie on a regular basis without others’ knowledge, whether it be fabricating an excuse for a late assignment or wanting to avoid offending a friend. In the grand scheme of things, these instances go unnoticed; however, in real-world applications concerning security, digital humans, criminal justice, and psychotherapy, honesty in expressing emotions has larger implications.

Many consider emotions to be inherent, critical features of effective human communication. The process of expressing and detecting emotional states enhances interpersonal interactions and bolsters strong relationships that are unique to human behavior. As they transcend ethical differences and cultural barriers, emotions serve as a universal language in regular conversation. While some signals are more reliable and genuine than others, such as vocal intonations and body language, facial emotional expression can frequently convey an individual’s mental state by nonverbally reflecting his or her intentions.

Some aspects of facial emotion recognition remain challenging, as an individual may intentionally or inadvertently attempt to disguise their emotions with misleading expressions. Interpreting another’s intentions based on visible facial expressions alone may be detrimental and lead to a conflictual relationship. In these cases, to reveal an individual’s genuine emotional state, micro-expressions should be considered. Unlike their “macro” counterparts, micro-expressions refer to rapid, involuntary muscular movement, known as “action units.” Since they are difficult to prevent or manipulate, micro-expressions are particularly expressive in discerning repressed emotions, but they are essentially invisible to the naked eye.

But what if there were a way for computers, entities that are considered entirely objective and void of emotion, to categorize emotions more accurately than humans could?

Recently, convolutional neural networks (CNNs), a class of neural networks used to identify partners in images by processing pixel data, have been implemented to extract and interpret both overt and elusive facial expressions. The use of deep learning for image classification problems has yielded efficient and accurate detection of facial emotions, though some aspects of conventional frameworks can be optimized.

During the Summer of 2021, I conducted research in deep learning for facial emotion recognition (FER), an experience that would greatly influence my choice of major and interests as a student at Vanderbilt. I wanted to develop a program that could recognize universal emotions by identifying subtle micro-expressions, involuntary muscular movements that reflect an individual’s genuine emotions but are invisible to the naked eye.

I gained proficiency in the Python language, topics in image detection and manipulation including Haar Cascades and Gaussian filters, and the IDEs PyCharm and Google Colaboratory. Using OpenCV and the Keras library of Tensorflow, I coded my neural network to associate different facial cues with certain emotions; for example, I indicated a briefly wrinkled nose and down-turned mouth to signal disgust. I also used four existing facial emotion datasets for training and testing: FER-2013, CASME-II, FERG, and SAMM. These data sets primarily differ with regards to the ethnicities and attire of the subjects and the subtlety of their facial expressions, and they are frequently cited in similar studies.

To test my multi-layer network, I compared the actual and predicted results for different subjects and analyzed their accuracies. I noticed that my network had lower accuracies when the images in the dataset contained partially obscured facial expressions, such as when the subject had a hand to their mouth or if they were wearing accessories. This observation led me to conclude that frequent gestures and movements may affect visibility of the face and should be considered in future research on deep learning approaches in FER. Out of curiosity, I even captured some of my own facial expressions using my webcam to input into my algorithm, to some surprisingly accurate results.

Because of their versatility, FER algorithms are useful in a wide variety of real-world situations. For instance, by monitoring yawning, eye closure, and blink rate, computer vision techniques can be implemented to detect drowsy drivers and decrease car accidents. Another use of FER is in psychotherapy as a means of monitoring patients’ emotional responses to different treatments. Biomimetic data is also helpful in police surveillance through CCTV cameras for crime identification and detection of suspicious behavior, though the ethics of this practice are controversial.

In addition to enhancing image classification, FER algorithms can also improve human-machine interaction (HMI) through affective computing, providing machines with the tools to carry out socially intelligent communication. In human-robot interaction (HRI), a subfield of HMI, humanoid robots are trained to possess and exhibit human-like characteristics on three levels: first, in their emotional state; second, in their outward expression; and third, in their ability to infer human emotional state. This third aspect utilizes FER techniques alongside analysis of thermal changes in facial images, body language and kinematics, brain activity, voice, and peripheral physiological responses. Because the ability to interpret and respond to a human user’s expressions is crucial in gauging the situation and responding in an adequate manner, FER algorithms are powerful tools that have the potential to revolutionize HMI—and there is no reason to wait until after graduation to begin exploring this interdisciplinary field.

At Vanderbilt, majoring in more than one discipline is often said to be as easy as submitting some paperwork, allowing students to combine their passion for engineering and the social sciences, for instance. Pursuing a double major in computer science and cognitive studies is just one way of gaining a nuanced understanding of HMI and its subfields and becoming well-versed in both the innately human and technical aspects of the field. The computer science department at Vanderbilt also offers higher-level courses specifically geared toward human-computer interaction, machine learning, and computer vision, in addition to lab research opportunities centered on robotics and autonomous systems.

Whether I learn about HMI in my classes or in my future research, I am consistently fascinated by the applications of facial expression recognition in criminology, psychotherapy, and personalized learning. As I consider AI in society, I like to be mindful of the social implications of integrating smart, autonomous systems into our daily lives. If reliable, computer vision can help us understand each other better—especially when our own eyes and judgment fail us.

Bibliography

Mehta D, Siddiqui M, Javaid A. Facial emotion recognition: A survey and real-world user experiences in mixed reality. Sensors. 2018; 18(2):416. doi:10.3390/s18020416

Minaee S, Minaei M, Abdolrashidi A. Deep-emotion: Facial expression recognition using attentional convolutional network. Sensors. 2021;21(9):3046. doi:10.3390/s21093046

Podoletz L. We have to talk about emotional AI and crime. AI & SOCIETY. 2022. doi:10.1007/s00146-022-01435-w

Spezialetti M, Placidi G, Rossi S. Emotion recognition for human-robot interaction: Recent advances and future perspectives. Frontiers in Robotics and AI. 2020;7. doi:10.3389/frobt.2020.532279

Steppan M, Zimmermann R, Fürer L, Schenk N, Schmeck K. Machine learning facial emotion recognition in psychotherapy research. A useful approach? 2020. doi:10.31234/osf.io/wpa5e

Xia B, Wang W, Wang S, Chen E. Learning from Macro-expression. Proceedings of the 28th ACM International Conference on Multimedia. 2020. doi:10.1145/3394171.3413774

Yan W-J, Li X, Wang S-J, Zhao G, Liu Y-J, Chen Y-H, Fu X. CASME II: An improved spontaneous micro-expression database and the Baseline Evaluation. PLoS ONE. 2014;9(1). doi:10.1371/journal.pone.0086041

Zahara L, Musa P, Prasetyo Wibowo E, Karim I, Bahri Musa S. The facial emotion recognition (FER-2013) dataset for prediction system of micro-expressions face using the Convolutional Neural Network (CNN) algorithm based Raspberry Pi. 2020 Fifth International Conference on Informatics and Computing (ICIC). 2020. doi:10.1109/icic50835.2020.9288560