Axel Roebel

Biography

Axel Roebel is the Head of the Analysis and Synthesis of Sound (AS) team at IRCAM (STMS), where he has held the rank of Senior Research Scientist (Directeur de Recherche) since 2017.

He graduated from the University of Hanover in Electrical Engineering (1990) and earned his PhD in Computer Science from the Technical University of Berlin in 1993. In 1994, he joined the German National Research Center for Information Technology (GMD-First) in Berlin, where he conducted research on the use of artificial neural networks for the adaptive modeling of signals generated by non-linear dynamical systems. In 1996, he was appointed Assistant Professor for Digital Signal Processing in the Department of Communication Sciences at the Technical University of Berlin. In 2000, he was awarded a research fellowship at CCRMA (Stanford University), where he began investigating adaptive sinusoidal modeling.

Arriving at IRCAM in 2000 within the Analysis/Synthesis team, Axel Roebel has developed state-of-the-art algorithms for the analysis and transformation of speech and music. He is the author of numerous software libraries resulting from his research, such as SuperVP, an analysis and synthesis engine integrated into many professional audio tools. He was appointed Head of the AS team in 2011 and received his Habilitation (HDR) from Sorbonne University in 2013.

His current research focuses on the development of deep learning techniques for music and voice processing. This includes neural vocoders, the exploration of signal representation and manipulation within latent spaces, and the study of disentanglement strategies in these spaces.

Research topics

Voice processing
  • speech analysis (F0, voiced/unvoiced, glottal source)
  • singing synthesis
  • speech transformation - (shape invariant phase vocoder, extended source-filter speech models (PaN), neural vocoder)
  • singing voice separation
  • deep learning-based speech analysis, processing, and transformation,
  • neural vocoder.
Music
  • high-quality signal transformation based on the phase vocoder representation
  • additive signal models using advanced algorithms for the analysis and representation of non-stationary signals and the development of Pm2, IRCAMs software for sinusoidal analysis/synthesis.
  • structured signal models and perceptually pertinent signal descriptors (fundamental frequency, spectral envelope, ...)
  • signal decomposition
  • polyphonic f0 estimation

Development activities

  • Multi-band Excited WaveNet Neural Vocoder (MBExWN)
  • ISiS: singing synthesis software written in Python.
  • as_pysrc: Python packages for signal processing
  • Deep learning-based signal processing in Tensorflow
  • SuperVP: an extended phase vocoder software allowing high-quality transformations of music and speech signals, and implementing new techniques for spectral envelope estimation and transformation. SuperVP is a cross-platform library that is available in the form of a command-line application (SuperVP), which is used in AudioSculpt and OpenMusic, as well as in the form of a real-time signal transformation module, which is used in Max/MSP and SuperVP-TRaX.
  • VoiceForger: real-time voice transformation library based on SuperVP.
  • Pm2 library and application for analysis/synthesis using advanced sinusoidal signal models
  • MatMTL: a Matlab compatible c++ template library
  • LibFFT: a support library for cross-platform vectorized FFT calculation

Email : Axel.Roebel (at) ircam.fr


Curriculum Vitae

University Degrees

2013:HDR (Habilitation) Computer Science, University of Pierre and Mairie Curie, Paris VI,France.
1990-1993:Dr.-Ing. Computer Science, Technical University of Berlin, Germany
1983-1990Dipl.-Ing. Electrical Engineering, University of Hanover, Germany

WORK

01.2011- : Head of the Analysis Synthesis Team, IRCAM (STMS)
(Senior Research Scientist / Directeur de Recherche since 2017)
01.2008-12.2011 :Adjoint Team Leader, Analysis Synthesis team, IRCAM
10.2000-12.2007 :Researcher and Developer, Analysis-Synthesis Team, IRCAM
04.2006-07.2006 :Edgar Varese Guest-Professor, Electronic Studio, Technical University of Berlin
04.2000-09.2000 :Invited researcher, Center for Computer Research in Music and Acoustics (CCRMA), Stanford University, USA
01.1996-09.2000 : Assistant Professor, Communication Science, Technical University of Berlin
08.1994-12.1995 :PostDoc, GMD FIRST, Berlin.

Projects

2025-2029:CAP AI-MADE, Collaborative acceleration program (CAP) in AI within the PostGenAI@Paris cluster of excellence (France 2030 investment program).
2023-2027:ANR project EVA, Explicit Voice Attributes
2023-2025:Project DeTOX, Lutte contre les vidéos hyper-truquées de personnalités françaises
2023-2026:ANR project BRUEL, ElaBoRation d’Une méthodologie d’EvaLuation des
systèmes d’identification par la voix
2023-2026:ANR project ExoVoices, Virtual Story Telling for Kids: Expressive and Cognitive Aspects of Voice Synthesis.
2020-2024:H2020 project AI4Media, Deep learning for media production.
2020-2024:ANR project ARS, Analysis and tRansformation of Singing style.
2017-2022:H2020/ERC project IRiMaS, Interactive Research in Music as Sound. Conseil and collaboration on signal processing methods for music analysis.
2018-2021:ANR Project TheVoice, Voice creation for media content production. Supervision of PhD thesis on deep learning-based voice conversion.
2014-2017:ANR Project Chanter, Real-time controlled digital singing. Coordination of WP2 on text to chant synthesis
2012-2015:ANR Project Physis, Physically informed and semantically controllable interactive sound synthesis. Coordination of WP3 on low level sound representation
2011-2015:FP7-ICT-2011 Project 3DTVS, 3DTV Content Search. Coordination of WP4 3D Audio & Multi Modal Content Analysis and Description.
2010-2013:ANR Project Sample Orchestrator II Hybrid Sound Processing and Interactive Arrangement for New Generation Samplers. Coordination of WP2 Structured Instrument Models and Signal Transformations
2000-2000:DFG Project Ref RO2277/1-1 : Adaptive additive synthesis of non-stationary sounds. Research scholarship at CCRMA

PhD Students

Directed or Co-directed

2023-Diego Torres, Neural Conversion of Voice Attributes. (co-directed by Nicolas Obin)
2023-Maximino Linares, Musical instrument audio synthesis via physics-informed neural networks (co-directed by T. Hélie)
2023-Simon Rouard, Control and Adaptation of Deep Learning Models of Music Generation
2023-Mathilde Abrassart, Voice Identity Conversion with DNN for the Simulation of Voice Identity Usurpation Attacks. (co-directed by N. Obin)
2023-Théodor Lemerle, Text-to-Speech Synthesis for Expressive Storytelling (co-directed by N. Obin)
2021-Lenny Renault, Deep learning-based generation of high-quality music from symbolic music representation.
2019-2023Frederic Bous, Voice Synthesis and Transformation with DNN
2019-2023Yann Teytaut, Speech and Singing Alignment and Style analysis with DNN
2019-Antoine Lavault, Drum synthesis with DNN
2019-é023Clement Le Moine Veillon, Expressive speech transformation with DNN
2016-2019Hugo Caracalla, Sound texture synthesis from summary statistics, Sorbonne University, 2019
2016-2019Céline Jacques, Machine learning methods for drum transcription (in French), Sorbonne University, 2019
2014-2017Luc Ardaillon, Synthesis and expressive transformation of singing voice, UPMC, 2017
2012-2015Wei-Hsiang Liao, Modelling and transformation of sound textures and environmental sounds, UPMC, 2015. (co-direction with X. Rodet, IRCAM, and A. Su, NCKU Taiwan)
2011-2015Stefan Huber, High-quality voice conversion by modelling and transformation of extended voice characteristics, UPMC 2015, (co-directed by Xavier Rodet)
2012-2015Henrik Hahn, Expressive sampling synthesis: Learning extended Source-Filter models from Instrument sound databases for expressive sample manipulations, UPMC 2015, (co-directed by X. Rodet)

Supervised

2009-2012Marco Liuni, Automatic adaptation of sound analysis and synthesis, UPMC, 2012, PhD directors X. Rodet and M.Romito 
2006-2010Fernando Villavicencio, High quality voice conversion, UPMC 2010, PhD director X. Rodet
2007-2010Gilles Degottex, Glottal source and vocal-tract separation, UPMC 2010, PhD director X. Rodet
2003-2008Chunghsin Yeh, Multiple fundamental frequenc estiation of polypohnic recordings, UPMC 2008, PhD director X. Rodet.

PhD/HDR Jury

2025Louis Bahrmann (PhD), Acoustics-aware hybrid deep neural dereverberation. (Reviewer and Examiner)
2024Leonardo Fierro (PhD), Audio Decomposition for Time Stretching. (Reviewer)
2024Morgan Buisson (PhD), Deep Learning Methods for Music Structure Analysis: Addressing Data Scarcity and Ambiguity. (Examiner)
2022Ajinkya Kulkarni (PhD), Expressivity transfer in deep learning based text-to-speech synthesis. (Examiner)
2022Merlijn Blaauw (PhD), Modeling Timbre for Neural Singing Synthesis. (Reviewer and Examiner)
2022Grégoire Locqueville (PhD), Voks: A vocal instrument Family Based on Syllabic Sequencing o Vocal Samples. (Examiner)
2022Javier Nistal (PhD), Exploring Generative Adversarial Networks for Controllable Musical Audio Synthesis. (Examiner)
2020Muhammad Huzaifah (PhD), Directed Audio Texture Synthesis With Deep Learning (Reporter and examiner)
2020Alexandre Defossez (PhD), Optimization of fast deep learning models for audio analysis and synthesis. (Reviewer and Examiner)
2019Alexey Ozerov (HDR), Contributions in audio modeling for solving inverse problems: Source separation, compression, and inpainting. (Reviewer and Examiner)
2019Alice Cohen-Hadria (PhD), Estimation de Descriptions musicales et sonore par apprentissage profond. (Examiner)
2019Clément Laroche (PhD), Apprentissage de dictionnaire et décomposition orthogonal pour la séparation de sources harmoniques/percussives, PhD. (Examiner).
2017Benjamin Cohen-Lhyver (PhD), Modulation de mouvements de tête pour l'analyse multimodale d'un envirnonnement inconnu, PhD. (Examiner)
2013Ricard Marxer (PhD), Audio source separation for music in low-latency and high-latency scenarios. (Reviewer and Examiner)
2013Saso Musevic (PhD), Non-stationary sinusoidal analysis. (Reviewer and Examiner)
2013Alexis Moinet, (PhD) Slowdio: Audio time-scaling for slow motion sports videos. (Examiner)

Publications

Articles and Thesis

Reports and working papers