However powerful, the human brain often deviates from rationality when reasoning about each other. This phenomenon, known as social bias, occurs when we unwittingly or deliberately favor/disfavor certain individuals because of their belonging to a specific social group, a behavioral pattern that can often happen in an unconscious manner while still influencing social groups (Skinner, Meltzoff & Olson 2017).
Because the effect of social biases is entangled in the multiple contextual factors that influence social interactions, they are extremely difficult to study experimentally. For instance, it is hard to control that the sole perception of an experimental participant as being part of a specific social group is enough to influence others’ behaviors, independently of all the other characteristics of that participant. To do so, one would need the ability to covertly manipulate how a given participant is perceived by others (e.g. male or female, young or elderly), without their own knowing — something which, until recently, was a matter of science fiction. This requires the realization of highly-realistic voice manipulation algorithms, capable of transforming voice attributes in real-time.
The REVOLT project aims to break this methodological barrier. To do so, we will leverage the spectacular recent accomplishments in the field of speech synthesis and voice transformations, to implement deep-learning architectures able to efficiently transform vocal attributes online, and possibly in real-time—while preserving the high quality standards of existing offline synthesis and transformation algorithms (RO1). This would constitute a decisive move towards the possibility to create audio-visual deep fakes, and unlock important experimental barriers in cognitive science. This will allow us to design cognitive science experiments where we will transform participants’ vocal attributes (e.g. gender) during human-human interactions, in order to infer the causal links between the perception of these cues and subsequent patterns of behavior (RO2).
From a signal processing perspective, the development of such real-time vocal transformations will significantly advance the state of the art in voice conversion and deep-learning, opening the way to realistic audio-visual deep fakes. From a cognitive science perspective, the use of such algorithms in the context of the experimental study of human-human interactions will be a paradigm shift in how to investigate the dynamics governing human social cognition.
In sum, in the Revolt project we will develop a new deep learning architecture capable of transforming vocal social attributes (such as age or genre) in real time, and use these algorithms to design cognitive science experiments to probe the influence of these manipulated social cues in human-human interactions and decision making.