Applications using augmented acoustic reality are receiving attention in a broad range of fields including artistic creation, cultural mediation, communication, and entertainment. Audition is a key modality to understand and to interact with our spatial environment, and plays a major role in augmented reality applications. Embedding computer-generated or pre-recorded auditory content into a user’s real acoustic environment creates an engaging and interactive experience that can be applied to video games, museum guides or radio plays. The major challenge of audio processing in augmented reality applications lies in the ability to seamlessly integrate these sound events into the real environment without any perceptual auditory or visual discrepancy. The spatial should constantly adapt to the acoustic conditions of the real environment according, for instance, to the movement of the sound sources or of the listener.
The objective of the HAIKUS project is the joint exploitation of machine learning and audio signal processing methods to solve acoustic problems encountered in augmented reality applications. Machine learning methods will be applied for the automatic identification of the acoustic channels between the sources and the listener. The seamless integration of virtual sounds in the real environment requires the estimation of the room or site’s acoustic parameters enabling automatic adaptation of the reverberation process applied to virtual sources.
The challenge is therefore the blind estimation of the acoustic parameters (reverberation time, direct-to- reverberant energy ratio) or the geometry of the room (volume and shape of the rooms, wall absorption) based on simple observation of the reverberant audio signals from real sound sources present in the room. The listener’s adhesion to the augmented acoustic scene is based on a realistic and congruent evolution of the acoustic cues with their movement in the scene and the movement of the virtual sources. This requires the inference of plausible rules for modifying spatialization parameters, or the implementation of room impulse response interpolation techniques, according to the relative movements of the sources and the listener.
Interactive virtual sound scenes are generally rendered in binaural format over headphones. Convincing binaural rendering requires the use of individual head related transfer functions (HRTFs) that must be personalized for each listener, ideally requiring complex measurement in an anechoic chamber with perfectly calibrated audio signals. We propose the blind estimation of the listener’s HRTFs based on binaural ‘selfies’, i.e. binaural signals recorded in real environments and non-supervised conditions (everyday environment, unknown and moving audio sources).
IRCAM's Team : Acoustic and Cognitive Spaces