What can your voice say about you?

Now that Zoom has taken away our ability to assess body language, our voices have become the prime agents for detecting one another’s emotions.

Writer: Anastasiya Kolesnichenko
Editor: Bethany Maia Evans
Artist: Sophie North

The world as we know it has become digitalitalised. Meetings and classes have moved online, as has the ability to express oneself. Whilst previously, speakers could look their listeners directly in the eyes, walk around the room or even stomp their feet to show their emotions, this is no longer the case online.

So how can we emote in a digital world? Thankfully, back in 2017, Dr. Michael Kraus of Yale University conducted a study addressing this issue. The solution might just surprise you: it is easier to detect someone’s emotions when you can hear them, but can’t see them.

Kraus asked strangers to discuss difficult work situations over Zoom using either just the microphone or both microphone and video. The results showed that the participants were more empathetic during the voice-only communication.

So if you’re trying to communicate something over Zoom, you should probably turn your camera off. The solution sounds easy, but how can we decode emotions from someone’s voice?

How do we hear emotions in speech?

The autonomic nervous system is closely linked with the emotion centres in the brain. This connection helps to explain why we can recognise when our loved ones are upset or angry from slight changes in their tone of voice.

When we hear an unfamiliar person talk, we are quick to notice their sex and age. It’s not hard for us to determine whether the language spoken is their mother tongue, or where in the country they were brought up. We do this not only by interpreting the words they say but also by how they say it. When we listen, we pay attention to the nonverbal content of the voice; the pitch, the volume, the speed, and the pauses in speech.

In many cases, sounds such as ‘woohoo’ and ‘oops’ are enough for us to assess how the person pronouncing them feels. For millions of years, humans have used these wordless vocalisations to communicate feelings that can be understood in seconds. These vocal bursts, the ‘oohs’, ‘aahs’ and ‘uh-ohs’, represent a grand total of 24 emotions, according to a 2019 American Psychologist study.

How do our brains react to voices?

In 2017, researchers at the University of Geneva mapped the brain regions involved in interpreting emotions that are communicated orally. They found that the frontal lobes play a critical role, classifying and discriminating between emotions to facilitate productive social interaction.

Video calls, however, can be hard on the frontal lobes. In a gallery view, where all participants appear next to each other with their amusing backgrounds and kids walking by, we are forced to focus on so many things at once that we challenge the brain’s central vision and can’t fully comprehend the speaker. As a result, people report a similar feeling of exhaustion.

Neuropsychology professors still refer to a famous study from UCLA psychology professor Albert Mehrabian when they explain the condition that has earned its own name, ‘Zoom fatigue’. The study, published in 1972, concluded that during face-to-face interaction, our brains pay the least attention to what’s being said, instead focusing on the tone of voice and body language. This seems to explain the experiences of online users today.

Video calls take away most body movement cues, but because the speaker is still visible, our brains search for facial expressions to understand their emotions. In this scenario, our brains work harder than usual, leading to fatigue. So in your next Zoom call, you could try joining on your phone and focussing solely on audio.

Why is it important to study oral emotion recognition?

Tuka Alhanai studies depression diagnostics in the Computer Science and Artificial Intelligence Laboratory at MIT. To diagnose depression, clinicians ask questions related to past mental illnesses, lifestyle and mood. “Some of us may be better at this than others,” said Alhanai in an interview with Kinesis, “but we might use patterns in the words they use, for example, if someone says they are ‘feeling sad’, or are struggling with everyday activities, or are speaking more slowly than normal”.

Artificial intelligence (AI) systems may be trained to detect these same patterns by analysing thousands of voices of depressed and non-depressed people. Researchers at MIT, including Alhanai, have created a neural network that can be used to spot the signs of depression in human speech.

Alhanai pointed out the critical importance of interpreting emotions from voice in light of the pandemic. “In the current era, where we wear masks, it might be trickier to inspect someone’s facial expressions and so we might be forced to focus more on their voice and the emotional content contained within”.

The role voices play in emotion recognition has been studied since as early as 1972. Today, with masks covering our faces and Zoom calls taking over in-person communication, we have to largely rely on the vocal cues. What can your voice say about you? Your voice can give away as little as your gender and as much as your level of depression.