Every day, humans are accosted by the sounds and sights of their environment, while their visual and auditory systems work together to make sense of what is going on. If a person is, for example, looking at people sitting in a restaurant, they will see multiple mouths moving while different voices emerge and instinctively know which voice belongs to which person. That’s because the brain actively integrates voices and faces in all situations — save for the instances when the McGurk effect takes hold.

Essentially, the McGurk effect is when what we see overrides what we hear — the common example is that if someone sees a person mouth “ga,” but the person is actually making the sound “ba,” then the first person will hear “da.” Mouth movements have an incredibly strong influence over what is perceived audibly, and the McGurk effect has been show to affect people of all genders and all language backgrounds. In a recent paper published in PLOS Computational Biology, researchers gained a further understanding of the McGurk effect by creating a new model that predicts when the brain will be affected by this phenomena.

While it’s only a trick of the mind, the McGurk effect demonstrates how the brain unconsciously integrates visual information and the perception of speech, which is how we pull in information about the world. But when it comes down to it, the McGurk effect happens when the brain is muddled — and the researchers behind this new paper wanted to know why.

They decided to focus on one particular idiosyncrasy: Why does the McGurk effect happen with the pairing of some incongruent syllables, but not others? The brain perceives “da” when “ba” is heard and “ga” is seen, but not when “ga” is heard and “ba” is seen. So, they used the principle of casual inference to created an algorithm modeling this phenomenon.

story continues below

“We compared our model with an alternative model that is identical, except that it always integrates the available cues, meaning there is no casual inference of speech perception,” said lead author and professor of neuroscience Michael Beauchamp in a statement. “Using data from a large number of subjects, the model with casual inference better predicted how humans would or would not integrate audiovisual speech syllables.”

The model created by Beauchamp and his team demonstrates that it takes a very specific pair of auditory and visual syllables to create the McGurk effect. Not every mouth movement can be convincingly paired with sound, they write, and vice versa. There’s a reason that silently mouthing “olive juice” can look like “I love you,” but if you watched a video where the former was paired with an audible “I love you,” it wouldn’t be convincing at all.

This research team hopes that their model can later be useful in helping patients with speech perception deficits and in the building of computers designed to understand auditory and visual speech. This way, when you say the word “vacuum” to your future robot butler, it won’t accidentally read your mouth’s movements as having just said “fuck you.”

Photos via Giphy/BBC