Creating ambient sounds for a scene is just like setting up an aquarium. The aquarium design should be a delight to the eye while keeping its functional purpose in mind. The body of water, the group of fish and the decorations should complement each other. Together, they form an organic and interactive system. Sound design is no different. Hello. I’m Chang Liu, the sound designer behind the mobile game, Game of Thrones: Winter is Coming, (GOT). I was responsible for the sound design of multiple modules in the game. In this post, I’d like to share with you how we designed and built the interactive ambient sound system for GOT.
Defining the Interaction Mechanism
When I first received the GOT demo, I was surprised by the amount of sound design possibilities the scenes of the game allowed for. The art team presented us with a very intricate Inner City scene. As a motivated audio designer, I was lucky to be able to make my mark on a scene with so much room for interactive audio design.
Before creating interactive ambient sounds, we needed to define the interaction mechanism and fully consider all interactivity dimensions so that we could ourselves a lot of work, if ever we needed to add deeper interactive features after significant resources had already been invested.
Let’s take the Inner City scene for example:
- In top view, the camera can be zoomed in to examine the terrain details such as the buildings or zoomed out to overlook the entire Inner City and its surroundings.
- There are 34 functional buildings in total, each of them can be clicked to focus the camera on its exterior, to highlight the details.
- There are a variety of terrain blocks outside of the city such as coasts, rivers, forests, waterfalls, alpine regions.
- The day-night cycle.
- The rainy and snowy weather.
We've defined the interaction mechanism that can affect the audio behaviors. Now let’s lay out the related asset modules and implementation methods used to build that mechanism.
The camera can not only be moved horizontally, but also zoomed over distance. It would be too tedious to have just one ambient asset to loop. Now that the camera can be zoomed in to examine the terrain details, it’s essential to place 3D point sources to reflect those details. When the camera is zoomed out, the 3D point sources are attenuated over distance. In order to offset the deviation in listening experience caused by the attenuation, I layered a 2D wind sound. Its volume gradually increases when the camera is zoomed out. This makes it possible to achieve smooth loudness transitions while switching soundscapes.
To do so, I created an RTPC, called CameraDistance, which represents the distance from the camera to the mapping point positioned in the center of the screen . (The reason for choosing this point is that it changes both the height and the elevation angle when players zoom the camera).
I categorized the assets as either 2D or 3D, and created the RTPC curves to define the relationships between their properties (such as Voice Volume and Low-pass Filter) and the CameraDistance RTPC.
The relationship for 2D is basically the opposite of the relationship for 3D and vice versa. Therefore, I applied another RTPC control over Low-pass Filter in addition to Voice Volume.
We’ve specified the design objective, now we can refine the 2D/3D assets to be used.
The Choices & Trade-offs of Sound Sources - The Building Module
As mentioned above, there are 34 functional buildings in the scene as well as a variety of terrain blocks. Now that the camera can be zoomed in to examine the terrain details, designing unique ambiance sources for the buildings is on the agenda.
Designing unique ambiance sources for 34 functional buildings is quite challenging, since the artistic realism imposes certain limits on how we design the sound behaviors. We can't be sure how long players will dwell on the details and how to avoid being repetitive in the listening experience when a same building is clicked frequently. Clearly, it won’t be enough to have just one ambient sound to loop. So I divided the ambient sounds for each building into two parts: basic ambiance-only loops (crowds + surroundings), to reflect their daily status, and random point sources for emphasizing their unique sonic characteristics (such as the bells attached to The Castle or the writing, page-turning, vocal elements in the Maester's Tower). These two layers are played together: one for a more basic atmosphere, the other for highlighting their unique sonic characteristics. By dividing the assets into the smallest units and then combining them organically, the possibilities provided by the randomization and playback rules are put to good use here. Despite the great efforts that were made, we had to be more flexible and further diversified the sound behavior in the end.
Ultimately, each functional building is divided into three parts actually: Ambiance (basic ambient loops), Element (random point sources) and Element_Loop (looping random point sources). This makes the building module very immersive expressive, in terms of sound behavior.
Take the Barrack assets for example:
The Barrack_Amb_Loop SFX mainly reflects the ambiance and the noisy crowds. The Barrack Blend Container contains the point source elements for reflecting the daily training: the Building_Barrack_Bows Random Container for archery training, the Building_Barrack_Horse Random Container for horse movements and hooves, the Building_Barrack_Weapons Random Container for weapon impacts during the soldier training. The Barrack_TroopsLp Random Container has a footstep element (the Building_Barrack_TroopsLp SFX) during the formation drill. It is looped with the Silence SFX with Transitions settings, for a better listening experience when soldiers are aligned in formation, moving back and forth or just standing still.
With the approaches above, we created a realistic and vivid barrack scene.
As the development progressed and the amount of buildings increased, I realized that there are more possibilities to explore. In addition to using realistic sound effects to reflect the sonic characteristics of the buildings, we can also use music elements to add emotion. For instance, we can add music elements to some characteristic buildings such as a war command room. In the world of GOT, war is intense. There are many epic battles presented in the TV series. So, for buildings like a war command room, how about using drums to render the war atmosphere? When players click the war command room, exciting drums are played along with the rhythm of the background music. This should increase the tension as war approaches.
The Choices & Trade-offs of Sound Sources - The City Inside Ambiences
When the camera is zoomed in to the limit, there are 5 ~ 7 buildings concurrently displayed on the screen. If we play all the ambient sounds designed for these buildings in the same way, the entire soundscape is noisy and disordered, regardless how carefully we defined the 3D attenuation. Therefore, we have to make some trade-offs. Despite the range of its movement, the camera is still above the ground surface, even when zoomed in to the limit. So, what if we use the ambient sounds previously designed for the buildings, when they are clicked with the camera to focus on them? How about creating another set of ambient sounds to better reflect the different terrain blocks in top view? That way, the most subtle ambient sounds will be played each time a building is clicked and focused on. The players’ level of emotion will rise, instead of providing a redundant listening experience regardless of whether they are inside or outside the buildings.
Keeping these new ideas in mind, I started to create ambient sounds for the terrain blocks.
There is, however, another situation to consider. These 34 functional buildings don’t appear all of a sudden when the game is started. Actually, many of them need to be constructed by the players later in the game. The soundscape has to vary in intensity before and after the castle is built. There should also be smooth transitions as the construction progresses. So we first need to specify which buildings exist in the beginning, and then build the underlying soundscape around them.
The buildings selected above are attached to the ambient sounds for the terrain blocks. They exist when the game starts and they cover their adjacent areas well. This provides smooth transitions whenever players move the camera.
According to possible building properties, I used different ambient sounds for the Campsite, Construction_Site, Courtyard, Farmyard and Market_Place containers, to reflect their corresponding living scenes. To further differentiate their characteristics, I also added some point sources.
For example, the Bell Sequence Container, for the bells attached to The Castle, the Birdsong_Loop SFX for the birdsong in the courtyard, the Pond_Fountain_Loop SFX for the fountain pond sound in front of the Great Sept of Baelor, the Temple for reflecting the light effects etc.
We’ve built the underlying soundscape in top view for the terrain blocks mentioned above, now let’s add more assets. For the buildings to be constructed by the players, we need to evaluate whether should we add new elements and what kinds of elements to add, according to their locations and properties. Through trial and error, I created the soundscape for when the construction of the entire building is completed.
However, it felt like something was missing. Yes, we created the buildings and scenes. But as a "living castle" instead of a ghost town, the most important thing - the residents - were missing.
Based on all the buildings’ locations and properties, I specifically added the crowd voices (Crowd_Vo_01, Crowd_Vo_02 and Crowd_Vo_03) for some typical buildings to bring life to the castle.
Now, we’ve created the 3D point sources for the City Inside. However, to take a cooking analogy, the soup was too dry. There were exclusively 3D-only point sources for the terrain blocks, and they didn’t blend well with the environment. What if we add some 2D assets? Will it be better then? As mentioned previously, I layered a 2D wind sound. It’s meant to be used when the camera is zoomed out to the limit. This sound is supposed to emulate the listening experience of being at high altitude. To better blend the 3D point sources with the environment, I layered another quieter 2D ambient sound with a broader sound field. This helped.
Compared to the old 2D ambient sound (Wind_Far), the new 2D ambient sound (City_Large_Sparse) has a very different CameraDistance RTPC curve. It is not completely muted when the camera is zoomed in to the limit, instead, it blends with the 3D point sources. Conversely, its volume is increases quickly as the camera is zoomed out, which makes for better spatial transitions.
Now, we’ve completed the ambient sounds for the City Inside. However, there are more assets to be created: the ambient sounds for the City Outside.
The Choices & Trade-offs of Sound Sources - The City Outside Ambiences
The areas outside of the city mainly require natural sounds:
The terrain blocks are what makes the biggest difference, when inside the City versus outside the City. The coasts and rivers have more complex shapes. Simple point sources aren't enough to reflect that. The key is to set the emitting positions properly for the terrain blocks outside of the City.
Take the following coast for example:
There are lots of large sea areas in the scene. Should the sounds cover the entire sea then? Of course not. Only the areas that can be focused on need to be considered. The points outlined above are the emitting positions that I configured for the Sea_Loop SFX. Only the coast trend and important islands are considered. Using the Large Mode would be the most efficient and practical way to trigger these sounds. I won’t go into details of the components used here or the Large Mode feature. For more information, you can check the Wwise Unity Integration documentation.
I’d like to clarify that, for some emitters such as a river, it’s better to attenuate the sound faster in vertical direction. When a river appears in a corner of the screen, the reasonable way to get a better listening experience would be to have the river ambiance played faintly along with the game visual. However, the attenuation radius derived based on the horizontal plane is not suitable for the spatial attenuation in a vertical direction. Life and travel experience tells me that the vertical attenuation needs to be more intense.
For the sounds such as a river ambiance, I applied another layer of control in a vertical direction to make the attenuation over the CameraDistance faster, to get a more realistic listening experience.
Now, we are done with the city scenes. Phew!
The Day-Night Cycle
Unfortunately, there are more challenges ahead. All the previous work was actually based on a major premise: daytime.
When the day is done, the night comes to Westeros.
Now that we already have the day-night cycle feature, the next thing to do is to create another RTPC Day_Night and attach it to the day-night cycle parameters.
All the content we implemented before must be considered from a new angle: we need to figure out which sounds will be affected by the day-night cycle and how they will be affected.
With the Day_Night RTPC applied, less assets are played at nighttime than daytime. Now, it’s time to add some unique elements.
The above two layers are night ambiances for the City Inside and Outside. They are mainly elements that reflect the nighttime status, such as insect sounds. For the City Inside, the sounds are soft and pleasing; for the City Outside, they are more unnerving and we blended in some wild animal sounds. Using this approach, the changes in mood between the City Inside and Outside at nighttime are reflected more intensely. The RTPC settings for nighttime and daytime are basically reversed.
The Weather System
Now there is another problem: the previous work was actually based a major premise: the weather is moderate.
But when the fine weather is gone, guess what happens in Westeros?
Similar to the work done on the day-night cycle, the next thing to do is to create the RTPCs for different weather: Rain_Intensity and Snow_Intensity.
Then, repeat the previous steps for these two parameters. See the following example:
For rainy and snowy weather, I created different behaviors for the insect sounds for the City Inside, at nighttime. In rainy weather, the sound intensity drops off as the rainfall intensity increases. In snowy weather, even light snowfall makes the insect sounds disappear completely.
Yes, I did all this based on my life experience. To get a general idea, check the following screenshot.
Let’s talk about the rain first:
Based on the intensity of the effect, the rain sounds are divided into Rain_Small, Rain_Medium and Rain_Big. Based on the distance, the thunder sounds are divided into Thunder_Distant and Thunder_Close. Combining with the CameraDistance and Rain_Intensity RTPCs, the volume can be controlled more accurately. This adds realism to the rain effect.
The snowy weather is simpler. In northern China, when it snows, it snows heavily. I believe that it should be the same in this game. I’m not saying that we should play the rain and snow sounds loudly all the time just because we created them. Everything requires balance. Sound is no different - it should feel natural. It’s a frozen and snow-covered world, but it’s not a cold winter with the bitter wind whistling loudly. Instead, it’s quiet and peaceful. We should reflect that in our sound design.
We’ve said a lot about the interaction mechanism, the organization and implementation of our assets, now let’s talk about the management of busses.
We created a complex soundscape, this is reflected in the configuration of our busses. It’s very important to set priorities, apply Auto-ducking and use side-chaining systematically.
I created two busses under the Ambiance bus: Ambiance_Building and Ambiance_City. These two busses are at the same hierarchy level. However, they are assigned different assets and are used in different scenes. They don’t interfere with each other. To be specific, the Ambiance_Building bus is assigned with point sources and ambiance sources for the 34 functional buildings. Additionally, these sounds are only to be triggered when players click them. The Ambiance_City bus is assigned ambient sounds for the terrain blocks inside and outside of the city.
Let’s take a look at the buildings first. Random point sources (Element and Element_Loop) have a higher priority than the ambiance-only loops (Loop). So I applied Auto-ducking to the Ambiance_Building_Elements bus to suppress the Ambiance_Building_Loop bus. This way, details can be highlighted when players click the buildings. Based on their locations, I created another two busses: Ambiance_City_Inside and Ambiance_City_Outside. That’s because the sounds for the City Inside and the buildings have a higher priority than those for the City Outside and the terrain blocks. On the other hand, players usually stay longer at a location when they construct buildings inside of the city. The natural elements for the City Outside shouldn’t crowd out more the important sounds for the City Inside. So, how does this work?
A Wwise Meter effect is inserted on the Ambiance_City_Inside bus to measure the real-time loudness of the current child bus. The value is sent to a new RTPC: Ambience_City_Sidechain_A. This RTPC is used to control the Ambiance_City_Outside bus. Using this approach, we can distinguish and manage the hierarchy and composition of these sounds in an organized way.
Now, we’ve set different priorities for the sounds. But, let’s not forget the music.
As the soundscape gets more complicated, we need to apply real-time ducking to the music to avoid confusion. The ducking rule for the music is dependent on the overall loudness, rather than of the buildings’ status (being clicked or not, zoomed in or out).
Again, using the side-chaining method, a Wwise Meter effect is inserted directly on the Ambiance bus, then the real-time loudness is sent to a new RTPC: Ambiance_City_Sidechain_B. But, what are we going to do with this RTPC? Control the entire music module? That would be too extensive.
We created the interactive music for the Inner City based on the time of day (day or night) and the weather. There are different in harmonies and instruments. Considering the sonic characteristics in different weather (fine, rainy or snowy), we should control the side-chaining used for the music accordingly. For example, in violent thunderstorm weather, almost all other sounds are covered by the heavy rain and the rolling thunder. If the gloomy atmosphere music in rainy weather is suppressed too much, the visual and sound experience will not be as immersive. Therefore, the music should be controlled with the ambiances, the current emotion and the relevant scenes in mind.
Now, we’ve designed and built the interactive ambient sound system for the entire Inner City scene. The only thing left is to test, improve and optimize our work until we are satisfied.
For me, building such an interactive ambient sound system is very interesting . I’m proud to create frameworks, specify rules and refine details for an empty world. It’s just like with building blocks: there are always new possibilities ahead of you. The creative process is just as important as the results. Lastly, I’d like to thank the project team for their support and Wwise, of course, for accompanying me as a student all the way to today.