• Research

    The fundamental principle of IRCAM is to encourage productive interaction among scientific research, technological developments, and contemporary music production. Since its establishment in 1977, this initiative has provided the foundation for the institute’s activities. One of the major issues is the importance of contributing to the renewal of musical expression through science and technology. Conversely, sp…

    • Research Topics
    • The STMS Lab
    • Research Teams
    • Sound Systems and Signals: Audio/Acoustics, InstruMents
    • Acoustic and Cognitive Spaces
    • Perception and Sound Design
    • Sound Analysis-Synthesis
    • Sound Music Movement Interaction
    • Musical Representations
    • Analysis of Musical Practices
    • Projects
    • Sound Workshop
    • The Musical Body
    • Creative Dynamics
    • Musique/Sciences Collection
  • Creation

    IRCAM is an internationally recognized research center dedicated to creating new technologies for music. The institute offers a unique experimental environment where composers strive to enlarge their musical experience through the concepts expressed in new technologies.

    • Composers & Artists in Studio
    • In Ex Machina
    • Jazz Ex Machina
    • Improvise cum machina 1/2
    • Improvise cum machina 2/2
    • Like Sound, Like Flesh
    • Silent Talks
    • Music-Fictions
    • Tablado
    • Musical creation around Outer Space
    • Campo Abierto
    • Musical creation around Dream Work
    • Biotope
    • EROR
    • IDEA
    • Artistic Research Residency
    • Artistic Residencies: The Blog
    • Rendez Vous 20.21
    • Season 2021.22
    • Seasons from 1996 to present
    • ManiFeste-2022
    • ManiFeste festival from 2012 to 2022
    • L’Étincelle, IRCAM’s journal of creation
  • Transmission

    In parallel to its fundamental missions of research and creation, IRCAM is committed to sharing its knowledge and know-how, its technologies with the general public. Cutting-edge research and diffusion of innovations, an international reference for education and the democratization of artistic practices are the driving forces behind the institute’s educational activities.

    • 2022.23 Training Courses
    • Max, Max for Live
    • OpenMusic
    • Modalys
    • TS2 and Partiels
    • Sound spatialization
    • From PureData to audio plugins
    • Sensors, Interfaces, and Interactive Machine Learning
    • Other training programs
    • Practical Information
    • Advanced Programs
    • Cursus Program on Composition and Computer Music
    • Supersonic Chair
    • Master ATIAM
    • Sound Design Master's Program
    • Music Doctorate
    • AIMove Master
    • School Programs
    • Studios of Creation
    • Mixed-Music
    • Artistic and Cultural Education
    • Career Discovery Visit
    • Art School Reserach and Creation Workshop
    • Images of a Work Collection
    • ManiFeste-2022, the Academy
  • Innovations

    At the center of societal and economic concerns combining culture and information technologies, the current research at IRCAM is seen by the international research community as a reference for interdisciplinary projects on the sciences and technologies for sound and music, constantly exposed to society’s new needs and uses.

    • The IRCAM Forum
    • Subscribe to the Forum
    • Softwares
    • Ircam Amplify
    • Industrial Applications
    • Industrial Licenses
    • Forum Vertigo
  • IRCAM
  • Careers & Job Offers
  • Calls for applications
  • Newsletter
  • Arrive
  • Boutique
  • Resource Center
  • Magazine
  • Medias
  • login
  • En
  • Fr
  • IRCAM
  • Careers & Job Offers
  • Calls for applications
  • Newsletter
  • Arrive
  • Boutique
  • Resource Center

Fr | En

  • login
  • Research

    The fundamental principle of IRCAM is to encourage productive interaction among scientific research, technological developments, and contemporary music production. Since its establishment in 1977, this initiative has provided the foundation for the institute’s activities. One of the major issues is the importance of contributing to the renewal of musical expression through science and technology. Conversely, sp…

    • Research Topics
    • The STMS Lab
    • Research Teams
    • Sound Systems and Signals: Audio/Acoustics, InstruMents
    • Acoustic and Cognitive Spaces
    • Perception and Sound Design
    • Sound Analysis-Synthesis
    • Sound Music Movement Interaction
    • Musical Representations
    • Analysis of Musical Practices
    • Projects
    • Sound Workshop
    • The Musical Body
    • Creative Dynamics
    • Musique/Sciences Collection
  • Creation

    IRCAM is an internationally recognized research center dedicated to creating new technologies for music. The institute offers a unique experimental environment where composers strive to enlarge their musical experience through the concepts expressed in new technologies.

    • Composers & Artists in Studio
    • In Ex Machina
    • Jazz Ex Machina
    • Improvise cum machina 1/2
    • Improvise cum machina 2/2
    • Like Sound, Like Flesh
    • Silent Talks
    • Music-Fictions
    • Tablado
    • Musical creation around Outer Space
    • Campo Abierto
    • Musical creation around Dream Work
    • Biotope
    • EROR
    • IDEA
    • Artistic Research Residency
    • Artistic Residencies: The Blog
    • Rendez Vous 20.21
    • Season 2021.22
    • Seasons from 1996 to present
    • ManiFeste-2022
    • ManiFeste festival from 2012 to 2022
    • L’Étincelle, IRCAM’s journal of creation
  • Transmission

    In parallel to its fundamental missions of research and creation, IRCAM is committed to sharing its knowledge and know-how, its technologies with the general public. Cutting-edge research and diffusion of innovations, an international reference for education and the democratization of artistic practices are the driving forces behind the institute’s educational activities.

    • 2022.23 Training Courses
    • Max, Max for Live
    • OpenMusic
    • Modalys
    • TS2 and Partiels
    • Sound spatialization
    • From PureData to audio plugins
    • Sensors, Interfaces, and Interactive Machine Learning
    • Other training programs
    • Practical Information
    • Advanced Programs
    • Cursus Program on Composition and Computer Music
    • Supersonic Chair
    • Master ATIAM
    • Sound Design Master's Program
    • Music Doctorate
    • AIMove Master
    • School Programs
    • Studios of Creation
    • Mixed-Music
    • Artistic and Cultural Education
    • Career Discovery Visit
    • Art School Reserach and Creation Workshop
    • Images of a Work Collection
    • ManiFeste-2022, the Academy
  • Innovations

    At the center of societal and economic concerns combining culture and information technologies, the current research at IRCAM is seen by the international research community as a reference for interdisciplinary projects on the sciences and technologies for sound and music, constantly exposed to society’s new needs and uses.

    • The IRCAM Forum
    • Subscribe to the Forum
    • Softwares
    • Ircam Amplify
    • Industrial Applications
    • Industrial Licenses
    • Forum Vertigo
  • Home
  • Judith Deschamps : AI at all levels
Publish date


With Published chosen, won't be shown until this time





April 28, 2022
Edit
Article
Creation
News






Judith Deschamps : AI at all levels

Edit






Artistic Residencies : The Blog
Edit






In the Sound Analysis & Synthesis team, Judith Deschamps benefits from the combined talents of two researchers: research director Axel Roebel and doctoral student Frederik Bous, to recreate a realistic castrato voice. Here is a brief overview of their work in progress.

Nearly 30 years ago, three members of the Analysis/Synthesis team, Philippe Depalle, Guillermo Garcia and Xavier Rodet, had already succeeded in creating a 'Virtual Castrato' for Gérard Corbiau's film Farinelli, which they described in a very detailed article. They had recorded two singers (with orchestral accompaniment), a countertenor and a coloratura soprano. The principle of the synthesis was quite simple, but it was hard work : each voice corresponds to a register, and it was above all a question of keeping a coherent timbre throughout the range of the castrato, particularly in the medium register, when one goes from one to the other. They therefore amassed a large database of notes for each voice, sung by both singers, covering all vowels and three levels of intensity, which they analysed to identify the spectral characteristics corresponding to the timbre. Considering that the castrato's timbre was probably closer to that of the countertenor, they "adapted" the notes that were too high for him to be sung by the soprano, by manually correcting their timbres.

When you know how they did it," admits Axel Roebel, "the result is all the more impressive! The only real problem they found with this device, apart from its tediousness, was the feeling of sometimes hearing dynamic discontinuities in the sound.

In this new project, the aim this time is to 'augment' a voice by 'hybridising' it. For passages that fall within its own range, this voice is used as is. For pitches outside its range, the idea is to record the transposed passages at a singable pitch, then transpose them and transform them by borrowing the timbre of other voices. Six voices were therefore recorded singing the aria "Quell' usignolo che innamorato" by Geminiano Giacomelli (1662-1740), which Judith Deschamps wants to reconstruct: two children's voices, a soprano, an alto, a countertenor and a leggero tenor. Of these six voices, the alto has been retained, with the others being used to fill in the gaps in its pitch, thanks to neural networks or learning systems developed by Axel Roebel and doctoral student Frederik Bous.

The basic principle," says Axel Roebel, "is to make a deep model learn to reconstitute the timbre of a singer from a given signal and a target pitch, which will then make it possible to transpose the passages that the selected voice cannot reach. The idea seems obvious, but its implementation is much more complex.

To communicate the properties of a given voice to the networks, the researchers use a representation of that voice, condensed and sampled in time as an image, called a 'Mel-scale spectrogram' or 'Mel-spectrogram'. The Mel-spectrogram is easier for neural networks to manipulate because many details of the spectrum that are not relevant to our perceptions have been removed.

Judith Deschamps in the ircam studios


First step: 'Using the voice recordings, a neural network was trained to recreate a sound from its Mel-spectrogram,' explains Axel Roebel. By comparing the result with its model, we measure the 'loss' of quality linked to the process, which then allows the neural network to improve itself.

This principle was then reproduced, but with a pair of networks called "Autoencoder". The aim of the first neural network is to produce not a 'complete' Mel-spectrogram, but a reduced form of Mel-spectrogram that represents only the spectral content that does not concern the fundamental frequency of the sound, i.e. the sung pitch. It is said to 'unmix' the pitch from the rest of the information in the Mel-spectrogram. Again, this sounds obvious, but in reality it is not so obvious, since the timbre of a voice depends (also) on the pitch sung! This gives us what Frederik Bous calls the "residual code" of the sound, which "codes" the timbre, the phoneme, the vibrato, etc. This "residual code" ( residual code") is the result of the sound of the voice.????

This 'residual code' (or rather these residual codes, as the process is reproduced for all the sounds in the database) is used to train the second neural network: the latter's job is to reconstitute a complete Mel-Spectrogram from this residual code, and from a given frequency.

The first phase of the joint training of these two networks is done by feeding back into the second one exactly the same fundamental frequency as that of the original sample. Here again, this allows the result of the work to be compared with the original sound, giving the networks the possibility of learning from their mistakes and improving themselves (on the one hand to refine the production of the residual code, and on the other hand to adjust the recreation of the sound).

Through the way it works, the Autoencoder already allows transpositions to be made (by injecting a fundamental frequency different from the original), even if it has never yet learned to do so. But the more we change this frequency, the more the system has to modify the Mel-Spectrogram at the output.

"However, modifying also means inventing! And in order to improve these 'inventions' of the system, we can no longer compare them with an existing signal - because no singer is capable of singing the same melody transposed into another range. We therefore lack a "target" for the system. For this purpose, we will use a new network, to which we have taught the untouched voices. It is up to this new network, called the "discriminator", to "criticize" the inventions of our transposition tool by comparing them with what it knows, and to determine whether they are plausible or not.

Again, the two neural networks are used to train each other: the transposer 'transposes', the discriminator 'criticises', trying to guess whether or not the sounds produced are the result of a transposition by the former, and if so, to what extent. So that the first one produces more and more plausible sounds, and the second one becomes more and more precise in its criticisms... Hence, a win-win process.

This penultimate stage has only just begun at the time of writing. The hope," says Axel Roebel, "is that the neural network will improve the quality of its transpositions and limit losses until they are undetectable. At present, we are achieving this over two octaves. Ideally, we would like to be able to reach three and a half octaves.

The final step will be to do what we wanted to do from the beginning: to augment the singer by extending her singing to all pitches in a consistent and plausible way. "The principle," explains Frederik Bous, "is actually the same as that of the Deepfakes for faces and videos. By proceeding in this way," explains Axel Roebel, "we create a hybrid voice with no break in its timbre depending on in relation to the pitches."

Edit

Also discover

News
Creation

Judith Deschamps : Imitating Farinelli's Nightingale

Article
Apart from the recurrent gender issues in her creative practice, nothing destined the artist Judith Deschamps to be interested in Farinelli. The artist has no medium of choice, but she is not a music…
Ircam

1, place Igor-Stravinsky
75004 Paris
T. +33 1 44 78 48 43
Opening times

Monday through Friday 9:30am-7pm Closed Saturday and Sunday
Subway access

Hôtel de Ville, Rambuteau, Châtelet, Les Halles
  • Jobs Offers & Internships
  • The IRCAM team
  • Partners
  • Support IRCAM
Institut de Recherche et de Coordination Acoustique/Musique

go to :
  • Centre Pompidou
  • Legal Notes
  • General Conditions
  • Espace Pro
Copyright © 2022 Ircam. All rights reserved.