Blog homepage

On Composing Interactive Music: let's time-travel to 1993...

Interactive Music


In 1993, I was working on my master's degree at New York University's Interactive Telecommunications Program (ITP) while simultaneously running Viacom New Media's Interactive Music Lab.  Viacom is the parent company of MTV and Nickelodeon, and the New Media group was mainly making video games.  The interactive music lab was a little skunkworks R&D lab doing demos and prototypes.  Our mission was to create new interactive music experiences.  Mostly, the work involved moving around the screen to create melodies and rhythms, set loops into motion, and change harmonies and timbres.  I had been doing similar work at ITP, but at Viacom we had a budget for equipment and the opportunity to collaborate with artists such as George Clinton, Vernon Reid, the Beastie Boys, and others.

For example, I did a thing called the tritone game, based on the the concept of a tritone substitution in jazz. There was a ii-V7-I progression playing a loop while on screen we had a game, kind of like Pong.  One player tried to flip the key into F# and the other tried to flip it back to C.  Very few people got that one.  There was another one called HipHoppera, where you arranged virtual musicians on a stage and moved them around to make them perform different parts.  We had a few that were a lot like what eventually became Guitar Hero, which came along a few years later. They would allow you to follow a path to trigger notes, but were more branching and improvisational and without the plastic guitars.

This was all very new and exciting stuff at the time, combining MIDI and samples and computer animation and interactive realtime control. It all just barely worked and the potential applications were wide open for imagination.  We needed some sort of conceptual framework to guide us on the path forward.  If I recall correctly, I originally wrote the article On Composing Interactive Music for school, and then brought it to the lab and revised it. It helped explain to people in the company and artists we wanted to work with what it was we were up.

More recently I've been working on a project called the Global Jukebox.  It's based on the scholarship of musicologist Alan Lomax. You can explore folk music from hundreds of different cultures from around the world and lean about how the characteristics of the music relate to the culture.


1993 article: On Composing Interactive Music  

Concept and Background

Interactive music is a field which is still in its infancy but is becoming more widespread as multimedia technology is more capable of supporting it. There are many approaches one can take when creating an interactive music piece, and there are many issues one must take into consideration when developing an approach to a project. Often music in a multimedia project takes a secondary importance to the other elements such as the graphics, action, or narrative. But music can be produced to be much more contextual and meaningful. Indeed, a whole new genre of applications can be created in which music is the driving force. These may be dubbed participatory music environments. In such environments the user or player controls the music in a virtual world rich with visual and spatial cues to reinforce the musical actions.

interactive music - time travel.pngimage courtesy of gingaparachi

When composing music for an interactive application, the goal is to tap the inherent ability of music to express emotion, evoke a sense of drama, and communicate a story. One should take advantage of many musical devices, such as tension, release, consonance, dissonance, rhythm, tempo, timbre, voice, lyric, structure, repetition, variation, and a variety of other elements, and adapt these elements to an interactive context. A key element of my approach to interactive music development has been that the environments are real-time simulation-based, as opposed to the many so-called "interactive" database-browser or slide show-type products currently available.

My work has been focused in two main areas: perceptual and technological. On the perceptual side, the issues are knowing what elements in music people will respond to emotionally and intuitively, what are appropriate musical parameters for user control and for "smart" computer control, what are appropriate methods of input control and visual feedback, how can one design a musical environment so that users can easily identify their contribution to an interactive composition and have a sense that they are "making it happen", and what are appropriate metaphors for the role of the user(s) in a participatory musical experience. On the technological side, we are simply interested in finding and developing the proper tools to build interactive music environments in terms of hardware platforms, controllers and interface devices, software operating systems and authoring environments, musical data protocols, audio standards, and sound generation gear (synthesizers and samplers).

To a musician, all music is interactive in the sense that one is an active participant in the creation of the music, even when that participation involves only listening. To an audience, the main difference between experiencing a live musical performance and listening to a recording is the sense of involvement and interaction with the music, the crowd, and the musicians. The knowledge the music is in the future and unrealized adds a sense of excitement and unpredictability.

Improvisation is a major feature of musical performances in many genres. Musicians typically improvise off of a composed piece, or a composition may have parts in it that have room for varying degrees of improvisation. In either case, the piece being improvised can be thought of as a musical "space", or non-linear domain, (bounded by the parameters of the composition, style, and so on) with each performance being a unique instance of expression in that realm, a squiggly line that maps a "fly-through" of that musical space. The musicians' moment-to-moment decisions shape this line in real time and these decisions are influenced by many simultaneous factors: the global parameters of the tune they are playing, the music being made by the musicians they are jamming with, the mood and expectations of the audience, their own mood, and perhaps the desire to make a specific statement or reach a particular musical destination. The audience can be a direct participant in this process by providing context, response, and collective influence for the musicians. Indeed, many performers make the audience active participants in their live shows. Similarly, the goal in an interactive music piece is to make players active participants in the creation and direction of the music through their actions in the environment.

Conceptual Models

A genre that has been a great source of inspiration is animated cartoons. In many great classic cartoons, the entire action proceeds from the music, as does the pacing, tone, and choreography. In the best ones, the soundtrack is so well crafted that the line between the score and sound effects is indistinguishable. Furthermore, many cartoons are overtly musical in their themes and actions, or proceed directly from the music as the source of inspiration for the rest of the work. They stand as important examples of the use of a musical score as the basis for a multimedia production, in which the scores closely support carefully choreographed actions and themes. Indeed, the genres of cartoons and video games may ultimately merge into a single art form.

An examination of video games can provide us with some other conceptual models. Video games in general have solved many of the problems one faces in the areas of interface, point of view, graphic representation of abstract data, and the user's identification of characters and situations in a complex, simulated environment. Furthermore, the experience of playing a video game can be strikingly similar to that of playing music. Each requires high level of control over some physical instrument (such as a joystick or saxophone) with reactions based or recognizing where one is in the moment, and in each case the player relies on a combination of learned patterns and contextually appropriate improvisation for success.

Various possibilities have been suggested for adapting different video game scenarios to an interactive music context. One of these is the flight simulator or navigable Three-Space model, which also includes some racing and combat games. Environments like these are often real-time and simulation-based, which fits in well with my approach to interactivity in music. There is a strong parallel between a multi-dimensional virtual space and an abstract musical "space" with different axes corresponding to different musical parameters.

Quest-oriented adventure games also offer some useful insights. Many allow multiple players to work together to achieve a common goal of fighting a common enemy. This concept can obviously be extended towards multiple players controlling multiple musical elements, contributing to a single harmony. Adventure games usually also have a map or other representation of the game space. Each room in the game space represents an encounter in the adventure, and the local environment defines the parameters of the encounter and influences the outcome. The sequence of the encounters is influenced by their relative locations and often by the necessity of solving puzzles in order to proceed into a new area. Players acquire skills and items that enable them to carry out their quest. In the same way, one could construct an environment with different "rooms" that represent musical themes and encounters that amount to playing a song or part of a song, while players carry with them items that enable them to complete certain melodies and move into new musical territory.

Existing sport and action games already feature interactive sound to a degree, although not usually in a musical context. Objects in the environment may produce sound effects, or alter the tone or tempo of the background music, or trigger a segue to a different piece of music. Many activities in these games are inherently rhythmic, such as running, jumping, bobbing and weaving, or dribbling a ball. There is a tremendous potential to exploit these types of on-screen movements to musical ends.

Issues in Composing Music for Interactivity

There are many issues which must be considered when creating music for an interactive project, especially if one is to deliver continuous control in a real-time participatory environment. First among these is the problem of timing and resolving user input to musically consistent events. If an interface allows a user to trigger an event at any moment, the timing of that event usually must be evaluated in terms of the pulse of the music (bars, beats, and so on ). Often a delay will be required (such as waiting until the next down beat), so the activation of the event does not throw the music out of time, and some form of interim feedback must be provided. Conversely, the computer will sometimes have to anticipate an expected input that may arrive late. Both contingencies must be provided for in the music and the interface.

Similarly, segues and transitions between different themes must be handled with consideration. The music as a whole must "hang together", and jarring or abrupt changes from one segment to another (caused perhaps by a global change to the screen environment) can be very disruptive. A musical phrase often needs to be resolved before a new one can be introduced. In general, themes that can be juxtaposed will have to be composed so that they dovetail together. The general timing issues already mentioned apply here as well.

A third issue is that of depth. Interactive music is by definition non-linear, so a composer will have to write significantly more music than the intended length of the musical experience if the composition is intended to be heard repeatedly in various ways. Additionally, composers will have to write many more tracks in a given section of music than will necessarily be heard, if they intend to give the user control over that dimension. They may have to write parts to bridge disparate sections or themes, or compose multiple endings, intros, harmonies, counterpoints, turnarounds and breaks, depending on the intended nature of the experience and level of interaction.

As mentioned before, improvisation is a major element in many forms of music, and a major opportunity for us as developers of interactive music. Obviously, improvised music cannot be wholly composed ahead of time, but will be generated interactively by the computer and user as a collaborative musical experience. Improvisation can take place on many levels, from simple variations on a theme to tripping free-form space jams. Allowing opportunities for improvisation is a major challenge in composing interactive music.

Another issue is the resolution of the local and global orientations at a given moment of the music. Usually, a given music event can be thought of in more than one way. For example, a chord can be thought of in terms of its relationship to the previous chord or the chord following it (local orientations), or in terms of its relationship to the current key or its absolute tonic value (global orientations). This can get many levels deep in some kinds of music. Similarly, questions of where a groove is (in terms of things like strong and weak beats) can be answered in multiple ways, depending on one's musical orientation. These considerations lead us to questions of Artificial Intelligence models of music, chiefly how much of a composition ought to exist at a global score level, and how much is realized by individual AI "players" when they perform the composition and how to best represent a musical mind as an AI player.

In most cases, visual representation of the music will be an important consideration. There are many ways one can graphically depict music, from traditional sheet music to bouncing ball piano rolls to animated musicians to abstract shapes and colors to wacky creatures and instruments. In general, the graphics ought to illustrate some relationships among the various elements present in the music, such as key, harmony, rhythm, meter, voicing, or instrumentation. Of course, this can be done very imaginatively, and the graphics should reinforce the content of the music. The visuals of an interactive work may also contribute to any story elements present by providing characters and helping define a point of view and role for the user.

So far we have considered primarily instrumental music, but writing lyrics for interactivity poses a whole other set of challenges. Like music, language must follow certain grammatical and semantic consistencies. The lyric element of an interactive music work will likely convey a large aspect of the narrative or overt dramatic content, and is closely tied to the issues of interactive fiction. As with the visual portion of a work, the issues of point of view, feedback, user role, narrative, and non-linear story development will require careful consideration, in addition to all the musical elements.

interactive-music.pngimage courtesy of  redkidOne

Methods of Generating Interactive Music

There are multiple methods a composer can use when writing music for interactivity. All of them proceed from the basic premise of taking a more-or-less defined composition and making it manipulable in any of several ways, which are dependent on the musical authoring tools available to the composer. Generally the composer ought to be aware of the method(s) to be employed and write the music with their opportunities and limitations in mind.

The most basic level of imparting an element of interactivity to music is by cueing and queuing sequences. This simply means that many pre-composed segments of music can be strung together and played back in any random or user-defined order. This is essentially the same as using the shuffle feature on a CD player, although some sort of navigable tree could greatly aid the user in sensibly controlling the music and establishing context.

The next deeper level of control is provided by muting and unmuting tracks within a musical sequence or series of sequences. This represents a musical dimension perpendicular to the ordering of parts, and combining the two can give the impression of significant musical depth. Like the first method, it relies on pre-composed material, but allows for control of the mix. For example, a user could choose between one of several bass lines, or elect to have a horn section provide an accompaniment. Again, a navigable tree structure in the background could control groups of tracks and lead to logical musical choices.

To gain more interactivity, the third level calls for asynchronous firing of sequences. This means having individual riffs or other segments of music exist independently (with respect to time or meter) from other tracks, lines, or patterns. The parts can then be recombined with a much greater flexibility, allowing for a degree of genuine interactive music composition as opposed to merely slicing up and shuffling pre-composed songs.

The fourth level of control involves parametrically filtering sequences. This will enable a user to manipulate a track, sequence, or group of tracks or sequences along a host of parameters such as volume, timbre, tempo, and key. More advanced applications will allow tracks to reharmonize themselves in a different mode or voicing, automatically follow chord progressions, or employ a rhythmic or harmonic template, either precomposed or generated on the fly. Additionally, this method allows for continual application of modifiers such as tremolo or pitch bend differentially to individual musical voices or subgroups.

The deepest level of musical interactivity can be made by generative sequences. In this method, there are no precomposed sequences per se; the computer "improvises" the music in real time according to a set of rules set forth by the composer. This provides opportunities for user input to control the music at a full range of levels and in a huge variety of ways. This is the only compositional method that is wholly real-time simulation-based. Obviously, it also requires the most sophisticated and intelligent drivers, and the development of these drivers requires a significant amount technological research.

Although each of the above methods provides increasing degrees of interactive control, they are not completely separate or distinct. An interactive work will probably employ several of the methods to varying extents. Still, the method(s) used will strongly influence the nature of composition, the kinds of drivers required to generate the music, and the resulting musical experience. An important point is that each progressively deeper method is also more computationally efficient in terms of maximizing the usage of existing data and providing opportunities for thematic development and variation.

Sound Rendering Technology Considerations

In order to create interactive music experiences, we will generally need to have highly malleable musical data at our disposal, and the capacity for high quality sound delivery. Currently the two primary means of processing and producing or reproducing sound in computer mediated environments are MIDI Sequencing and Digital Audio Sampling. Each has its relative merits and drawbacks, but the two technologies can be used in tandem to get the most advantage from each.

MIDI sequencing has evolved into a primary way of working for many electronic musicians and composers. The MIDI (Musical Instrument Digital Interface) standard is universally supported by computers, synthesizers, and a vast host of other gear. The great strength of MIDI is that it treats music as data, representing individual notes in terms of their pitch, velocity, and duration. Many other control parameters are supported that can effect timbre, volume, instrument, key, and any other imaginable factor that might effect the sound of a note. A MIDI sequence is simply a list of these note and control messages indexed in respect to time. An additional advantage is that MIDI has evolved from a live performance orientation, and is very well suited for real time applications. The major drawback of MIDI is that since the music is represented as performance data, it requires external sound renderers (such as synthesizers, effects processors, and mixers) to realize the music. These machines vary greatly in their capabilities and programming implementation. In general, every MIDI studio is unique, and the sound must be custom designed with the specific studio environment and gear in mind.

Digital audio sampling and playback is the major alternative to MIDI sequencing and sound synthesis. With digital audio, sound is recorded directly into the memory of a computer, where it can be processed, filtered, edited, looped, and otherwise manipulated. This is a less flexible method, since all the music must be performed and recorded ahead of time, and cannot be composed on the fly. Additionally, audio samples require enormous amounts of disk space for storage and RAM for playback, especially with high quality stereo sound. This also limits the number of samples that can be played back and manipulated simultaneously. However, digital audio files rely much less on quirky external hardware and can be ported across multiple platforms with reasonably consistent results. Also, well produced and completed source material can be directly adapted to Digital Audio for interactive applications. It is also currently the best means of reproducing in a computer environment dialogue, vocal music with lyrics, and other sounds which cannot be easily synthesized.

The digital audio and MIDI realms can be combined using MIDI controlled sampling devices, which can trigger samples and manipulate them with the same flexibility as with any other sound source. Such sample playback devices can be emulated on a computer and may allow a complete, self-contained system for realizing an interactive music product.

Character-Based Music-Driven Animation Techniques

The concept of a character as a fundamental organizing unit has been central to my approach. A character consists of several things: the component artwork that comprises the different cel cycles or behavioral loops; the musical sequences, samples or algorithms for the character; and modules of code to generate behavior, to map MIDI input to the animation, to receive and interpret input signals from the user to the character, and coordinate the component elements in real time. One of the main musical objectives is to consistently identify the individual screen characters with different voices or instruments in the musical arrangement.

To this end, an important method of achieving a meaningful link between musical and visual elements is through a technology I have dubbed MIDI Puppets. These are animated screen characters whose moment-to-moment movements are generated by reading musical data from an incoming MIDI stream and calculating the appropriate position to correspond to a musical event. For example, a singer would open its mouth when a Note On command arrived on the MIDI channel that triggered the voice associated with the character. This is a very strong technique, since the character is being controlled by the same data as the synthesizer responsible for rendering the audio portion of the simulation and the same animation engine will drive a character for any music, whether a sequenced composition, an algorithmically-created composition, or input from a live musical performance.

For some puppets, I have employed a refined version of this method which differentiates note triggers and pitch values and combinatorically derives the correct behavior for the puppet for the current moment. For example, my GigMe Drummer can differentiate six ranges of pitch values, corresponding (in the General MIDI specification) to bass drum, snare drum, hi-hat pedal, hi-hat stick, crash cymbal, and ride cymbal. Whenever a MIDI signal is received it is evaluated against the six drum types (remaining valid for fifty milliseconds, which is 1.5 times the duration of an animation frame). Based on a default state of the Drummer hitting no drums, I have derived what is essentially a six dimensional matrix of behavioral response. The code that comprise the drummer's animation engine is very modular and can be easily recalibrated to control any set of character artwork along any set of MIDI parameters.

Narrative and Navigation

Another part of the interface issue deals with how to navigate a non-linear musical space—how to enable a user to go to different parts of a song. This becomes especially important when using music to relate a story and particular pieces of music must be matched with specific narrative events. The three basic methods of moving forward in this type of environment are object differentiation, spatial differentiation, and temporal or state-dependent differentiation.

Object differentiation is best exemplified by my implementation of the musician-character concept. Each character has its own musical state which is influenced by global factors. To the user this means that interacting with a given character (by clicking, dragging, shooting, colliding, or whatever) will have a consistent musical result, usually associated with a particular instrument in the mix.

With spatial differentiation, different areas or objects on the screen are mapped to specific musical events. For example, the melody being sung by a backup singer in a choir is directly dependent on which riser the character is standing. Similarly, bass players can groove in one of several ways depending on where on the stage they're standing. Other characters, when being driven around the stage by the user will trigger a percussion sound whenever they take a step. The volume at which each character is playing may directly proportional to the distance between the cursor and that character.

A third method I have identified is temporal or state differentiation. By this I mean that a change in behavior is triggered by a sequence of specific inputs in time or a complex relationship in the change of several conditional states. One example of this is that when a character is clicked in an Attract Mode, it triggers a solo riff, whereas the same click in a Groove Mode results in a different behavior, such as moving to a new position and singing a different line. Many other examples can be created from this principal.

These methods can be combined to create a variety of rich interactive musical experiences.




John Szinger

John Szinger

Software DesignerJohn Szinger is a software designer and developer, musician, and origami artist who lives in the New York City area. You can learn more about his activities at


Felipe Angeles Hamann

December 06, 2017 at 06:29 am

Hi Mr. Szinger, awesome article bringing a lot of stuff to the point a young musician and former video game addict has on his mind. The animated cartoon concept: The video game "Rayman Origins" brings the worlds of traditional music based cartoons and gaming together in perfection. Although the music plays a big role in the whole game and emphasizes the fairy tale atmosphere, there are selected levels that are 100% synced in music and image. The player's best chance to beat the level is to play the controller like a drumkit, since actions that ensure survival are always timed to beats. Going so far that the image is blurred out or pixelated that strong that visual orientation is impossible and it's inevitable for the player not to concentrate on the audio. Composing Music for Interactivity At the moment I am designing a liveset that can be played on modern Midi-Controllers, and allows me to rearrange up to 16 instruments containing different numbers of loops on the fly. Eventhough I play music everyday, I still can't mute/unmute, apply effects and change loops totally tight, and that eventhough I've composed the music by myself. Now the main issue with making this type of interactive music is letting people understand how to manipulate the music they are hearing. In what you've stated there`s a lot of musical parameters people with little to no musical education will feel, but not understand as the concept of minor or major keys. An important design specification for this topic has to be: Can the player "fail"? Do AI and the composition (existing audio/midi-instruments) allow the user to be untight, to play disharmonic things, or will the program always output a well-timed, harmonic song. Of course there's always a possibility of improving any skill even if the program pulls you straight, but in my opinion a game that challenges the player will have a great motivational effect, on improving the own skills and finally beating a level. The imperative of always playing a perfect song may be what limits an interactive musical game from becoming awesome. People who have learned and played instruments throughout their lifetime know that failure is part of mastering an instrument. Almost every other game in this world has the possibility of failure or a game over. Why change this in a musical game? I'd compare the autotune and autotight feature to the brake and curve assistance in modern racing games. It's great for the first one or two tries when failing may actually cause you to stop playing the game, but for someone who actually wants to spend hours on the game and prove his/her skills it's like the hand of someone else on the steering wheel/ instrument. So my shoutout to the interactive music game developers: Let the player's fail! A success feels a lot better, if there's actually a contrast and not only a computer playing your notes right when you fuck up. Keep it up, Felipe

Benjamin Whitehouse

December 06, 2017 at 08:58 am

Great article! The Global Jukebox is my discovery of the year - what a treat!

Leave a Reply

Your email address will not be published.

More articles

Is Hybrid Interactive Music the Future? PART II - Technical Demonstrations

In part one of this blog, we discussed Hybrid Interactive Music and why it is so important that we...

4.4.2018 - By Olivier Derivière

Behind the Sounds of Another Sight

London, end of the Victorian Era. The British Empire is at the apex of its power. The Royal Navy...

24.7.2018 - By Luca Piccina

5-Minute Entry-Level Track Swapping Challenge

Can You (or this Entry-Level Legend in a Lab Coat?) Implement Track Swapping in Five Minutes?

29.5.2019 - By George A. Sanger

The Differences Between Working in Game Audio and Film: Part 2

In part 1 of this blog, we covered some differences between working as a sound designer and composer...

25.6.2019 - By Jesper Ankarfeldt

Music Driven Animation - A Simple Method

The Wwise 2019.1 release introduced Wwise Events triggered in Music Segments, a "music cues"...

11.12.2019 - By Peter "pdx" Drescher

Infinite Wreckage

This blog post is about Wreckage Systems by 65daysofstatic. Wreckage Systems is a live broadcast of...

9.6.2022 - By Paul Wolinski

More articles

Is Hybrid Interactive Music the Future? PART II - Technical Demonstrations

In part one of this blog, we discussed Hybrid Interactive Music and why it is so important that we...

Behind the Sounds of Another Sight

London, end of the Victorian Era. The British Empire is at the apex of its power. The Royal Navy...

5-Minute Entry-Level Track Swapping Challenge

Can You (or this Entry-Level Legend in a Lab Coat?) Implement Track Swapping in Five Minutes?