On exhibit from October 4, 2024 to February 2, 2025 at the Serpentine North Gallery in London as part of a collaboration between artists Holly Herndon and Mat Dryhurst and Serpentine Arts Technologies, The Call is a spatial audio installation built in two symmetrical parts, each containing a mic, and separated by a curtain. This interactive installation allows visitors to generate, by singing their own songs, whether improvised or not, English sacred choral music. Using deep-learning technologies and drawing on a vast dataset gathered especially for the project, The Call also proposes to reflect on the use of AI and its impact on our society. In this first instalment of a two-part article, artists Holly Herndon and Mat Dryhurst and computer music designer Robin Meier (accompanied by Matéo Fayet) share their process and insights.
Holly, Mat: what are your views on using artificial intelligence for artistic purposes?
Holly Herndon & Mat Dryhurst: AI systems are another tool to use in making artwork. It is possible to use various models for ideation, as a generator or an effect, in a more familiar workflow. We are interested in building our own models and ways of interacting with them, as it feels like we have not yet scratched the surface on what a wide adoption of these technologies will mean for culture and art. Models are something in-between a participatory game, instrument, archive medium and costume to wear. It feels important to understand how they work and what more can be demanded of them in service of new kinds of art.
With this in mind, how does this particular project approach the subject of AI?
H.H. & M.D.: Our initial idea was to create a songbook for training and record choirs across the United Kingdom in service of producing a dataset. We then wanted to find a way to express that dataset by using an AI model to generate choir music. With our project The Call, we are therefore exhibiting a training protocol. We have tried to influence all dimensions of creating an AI model. We first wrote a training songbook that covers the full spectrum of the English language, which means that if you sing all the songs, you will have created enough data for the generative AI model to comprehensively train on your voice.
Robin Meier: This songbook was created with the help of former PhD student at IRCAM Ken Deguernel, who works today in the Algomus research team at the University of Lille. For this project, he worked in collaboration with another researcher, Mathieu Giraud, and one of their students to develop an AI model to generate symbolic music. This model was trained on a dataset constituted of popular songs, including The Sacred Harp, a collection of sacred choral music that originated in New England and was later perpetuated and carried on in the American South in the 19th century.
Because we intended to use this collection to train another generative AI model, this time for choirs and developed by the Sound Analysis-Synthesis team – which includes the researcher Nils Demerlé – several questions arose: “What needs to be sung for this model to work? How can we create a relevant dataset for learning choirs, harmonies, timbres? And, more generally, knowing that everything that is produced today – songs, texts, videos – will one day end up in a database to be devoured by automatised systems; how can we feed the machine to make it produce exactly what we want?
H.H. & M.D. : We used the recordings we made of choirs from across the UK to train a suite of new models. We had worked on a vocal timbre transfer tool1 before, but that only allowed for monophonic input and output, and so working with IRCAM’s research teams to make a polyphonic tool was a huge breakthrough. Together, we built an interactive model where a participant can sing through the various choirs we recorded. There is also another prompted music model that generates full new choral compositions from the same training data we captured.
By choosing to work on this project with amateur choirs, was it important for you to address the various issues raised by collecting such a massive amount of data?
H.H.&M.D.: The Call is an opportunity for us to continue our reflection on the origin of datasets, by working with Serpentine on a system that lets participating choirs be co-owners of their own data and determine its future.
Because large AI models require a lot of data to operate properly, there is an unavoidable collectivity to them. We are trying to encourage imagination about what it means to contribute to a project that is greater than yourself, for the benefit of everyone. Emergence, or something being created that is greater than the sum of its parts, is a concept native to both AI and choral music. In both cases, the Holy Grail. It is easier to experiment with these ideas in the arts than it would be with sensitive health data, but the question of how to govern and distribute the potential bounties of AI models feels important to address.
R.M.: It is important to address the question of who owns the dataset and who owns what is generated with it. Holly and Mat have adopted a critical approach to technology, both in their discourse and their practice. I usually differentiate two categories of projects: those that were created with AI and those that approach the subject of AI. The Call combines both, by using state-of-the-arts technology while reflecting on the ethical and creative dimensions of the project.
H.H.&M.D.: We have been trying to advocate that the process of training a model on data in a specific context is a new way of making art. Even though the models that were developed are simply playing back the fruits of that dataset, it takes it to another level to allow for people to sing through it and join the chorus. We feel some sense of ownership over the process, but what is interesting about training and sharing models is that the art in a sense belongs to everyone who contributed to the data and the instrument, and to all those who use the model. This is a different way of working!
Interview conducted by Jérémie Szpirglas
Photos: Holly Herndon and Mat Dryhurst conducting a recording session with London Contemporary Voices in London, 2024. Courtesy: Foreign Body Productions.
1. Timber transfer refers to the automatic generation of a waveform from a source to reproduce its characteristics but change its timber (for instance, having an audio of violin as an input and getting an audio of flute as an output).