REACH : journey to human-machine co-creativity
Gérard Assayag, Marc Chemillier © Jeff Joly
Let's go back to the very beginning of the project: how did the idea of improvising with a machine come about?
It all started when Shlomo Dubnov, an Israeli-American researcher, joined the Musical Representations team at the end of the 1990s to do his post-doc at IRCAM. Shlomo's specialty was information theory. He was particularly interested in the notion of "musical style", meaning certain idiosyncrasies that can be found in a given musical genre, in a musician, or a composer. His idea was that the computational principles of modern information theory could capture certain structures of this "musical style". At the time, our team was completely devoted to the notion of creation through the use of computational tools. First for composition, with OpenMusic, then more and more for creation on the fly, through interaction and improvisation, with the precious contributions of Carlos Agon. Marc Chemillier, a mathematician and anthropologist, was also part of the team. With his jazz culture, he had another obsession: to reproduce, with the machine, the notion of swing, associated with the harmonic complexities of jazz. A fourth person was decisive for the project: Georges Bloch, who did not like the MIDI sound of the first experiments and developed the very first audio components of the software. I was developing the idea of a creative artificial intelligence, by means of generative algorithms and autonomous agent architectures, and I had the opportunity to introduce a formalism from genome research into computer music, enabling the statistical modelling of complex symbolic sequences, the "Oracle of factors", which would become the central model of active musical memory, for both listening and generation, of the whole series of improvisation software. All the elements were in place for the birth of OMax. The four of us became the "OMax Brothers", the founding group of this paradigm of improvisation and technology that can be found throughout the entire software genealogy, up to the most recent Dicy2 and Somax2, via ImproteK (which became Djazz).
How does OMax work?
A metaphor I like to use is that of the map and the territory. The idea is this: the musical style is like a territory - for classical music, for example, the different tonalities would be as many landscapes, the rhythm, accidents in the relief, etc. When a musician plays or improvises, they take a path in this landscape and, while doing so, reveal a part of it, necessarily limited. From the paths taken by the musician, OMax will try to construct a cartography of the entirety—or the potential entirety—of the territory covered. This is how OMax works: from one instance among thousands, it infers the style, or rather the global structure that gives the idea of the style. Then it plays. In other words, it takes its turn walking through the map it has drawn. This explains why what the computer plays resembles what he has been fed, forming new and coherent variations. The concert version of OMax was later rewritten and optimized by Benjamin Lévy as part of his doctoral thesis.
Somax2
You talk about "genealogy”. Aren't the following programs successive improvements of OMax?
No. Because improvisation has many facets and OMax only reproduces some of them. At the beginning, it was very basic. Of course, OMax listens in order to learn, but when it plays, it does so obstinately, without listening to the musician. It walks around in the model, without adapting to changes in its sound environment. It does not interact in the true sense of the word. The other programs were defined in relation to the shortcomings of OMax. Two main ones were quickly identified. First shortcoming: planning. Improvisation can be done in two contexts: idiomatic (jazz, baroque, etc.), or not (totally free). In the first case, it must follow a temporal scenario that can be quite rigid. This is what gave rise to ImproteK and then Djazz, which produce a discourse dependent on idiomatic constraints (harmonic grid, pulsation, rhythm, etc.). Second shortcoming: reactivity. When OMax wanders through its map, it is very aware of the structures of the territory it is exploring: it listens to itself and makes its own choices. This is self-listening. But it is deaf to the outside world. What it produces is independent of the immediate context.
« If we want the software to react, we need true external listening, which influences it instantly. This is what Somax does: it constantly tries to reconcile its map (learned from the music we feed it) and that inferred from the musician's playing, to weave a coherent path between the two. »
We were inspired by the principle of cortical maps from neuroscience, an idea introduced by Laurent Bonnasse-Gahot, then a post-doc in the team that worked on the very first versions of Somax. If someone whistles the beginning of a tune I know, it immediately "activates" several regions of my memory, which will allow me to continue the tune, or at least to anticipate what will follow, and sometimes to create a variation of the original. This is exactly how Somax works: when it recognizes a pattern, it will look in its cartography for landscapes that are close to it, activate them (like neurons "lighting up" in the brain) and then walk around these "locus". And it is capable of doing this on all dimensions of the discourse: melody, harmony, rhythm, timbre, intensity, etc.
Une installation Somax2 dans laquelle des agents autonomes s’écoutent et dialoguent entre eux dans un hommage aux grands maîtres du XXe siècle
Can we say that the last moments of the musical discourse almost provide it with the "script" it must follow, as in ImproteK?
No. However, this is what Dyci2 does, as it introduces the notion of "microscenario". The Dicy2 project was born from the will to combine the functionalities of OMax, Somax and ImproteK in a single software. This project was led by Jérôme Nika during his post-doc. However, this was too optimistic, and it finally resulted in... a new interaction paradigm, implementing the notion of reactive micro-scenario. This rather compositional approach (we have to write these scenarios) allows to set up sophisticated interactions, requiring a certain preparation, thus less spontaneous in the improvisation than Somax, but with their own particular qualities.
So the four programs have different functionalities?
Each speaks to a particular paradigm of attention and interaction, like a distinct aspect of the improvising brain, and they can be used in parallel within the same musical session to better replicate the complexity of human behavior. These programs are hyper-specialized in their functionality and interface.
Like cars: racing, rallying or touring, they still run but do not do the same things.
Exactly, and they are each excellent in their own field.
What are the different areas of research today?
Three possible avenues are emerging for the future. The first was actually initiated in the 1980s by George Lewis with Voyager. Today, the improvising agents in our software do not view themselves as a collective. In Voyager, generative agents were socially organized, with coordinating agents helping to make decisions and to sort out sometimes contradictory visions. Developments in AI today may enable us to go further, by implementing this social dimension via neural learning mechanisms, especially deep learning, which would enable us to learn behaviors that are difficult to specify or formalize. This principle would be extremely useful to develop different aspects, in particular to give a spatial dimension to interactions or to give them a more "instrumental" character, a dimension that Mikhaïl Malt is exploring in the team. By addressing these higher-level agents, we could control complex behaviors in a simple way and make computer performance more dynamic. The second line of research concerns the autonomy of improvising agents. Today, the machine listens to signal A and deduces signal B, but it is the software's manipulator who sets the main aesthetic directions and redirects if the result is not satisfactory. But how can we ensure that this signal B has a musical interest on its own? This can be done through what is called reinforcement learning. This is the method that has allowed computers to be very good at games like Chess and Go. But, to do this, you have to be able to give the machine an idea of the level of quality of what it produces!
Voyager software used by George Lewis
« In games, one wins or not and this is what conditions the reinforcement of favorable behaviors. But how to define a "winning strategy" in art, without closing aesthetic possibilities that one would like to open indefinitely? On what objective criteria of judgment can we rely? »
One of our ideas is that the musician's reaction to the machine can help: for example, if he repeats the machine's proposals, it is perhaps because he likes them... But how can we quantify this? The third field of research is still a bit theoretical and technical. So far, we have explored statistical modeling systems, using constituted memory sets - we are performing a kind of very coarse-grained granulation, which allows us to capture a lot of the musical and instrumental articulations in the material learned by the machine. So, we don't need to reinvent that content: it's the new trajectories that we invent. On the other hand, deep neural learning allows to interpolate in a continuous space ("latent space") and thus to create objects that are literally "unheard of", but which have more difficulty in having a global coherence and a satisfactory sound quality. Can't we have the best of both worlds and make these two systems work together? This is the subject of intensive research that takes place in the REACH project, which today federates all our research around the musician-machine co-creativity. REACH puts forward the notion of co-creativity as a phenomenon of "emergence" in the complex system formed by the musician and the machine, a system in which each listens to and constantly learns from the other and develops an adaptive path. Thus, joined musical forms appear which are not reduced to the action of one or the other but exist only in the fleeting being of the interaction. It is thus a whole new universe that is now offered to the imagination.
Joëlle Léandre © Geert Vandepoele
Interview by Jérémie Szpirglas, writer