In this series, we are taking an extensive look at past, current, and emerging reverberation techniques and reviewing them from an immersive and spatial perspective. In the previous articles, we covered the reasons why immersive reverberations are so challenging in virtual reality, some of the techniques behind classic reverberation algorithms, and how we can simulate sound propagation through virtual acoustics algorithms. In this article, we discuss how we can combine some of these methods to create efficient, yet compelling, spatial reverberation effects. This article is also a preview of Audiokinetic’s upcoming spatial reverberation technology.
In the previous articles, we covered the motivation behind creating new reverberation algorithms and some of the background to understand how these new tools are being created. Understanding the limits of classic reverberation algorithms, along with the various ramifications of sound propagation simulation, can be valuable when working with these new effects. Just as new sound design guidelines are appearing with the emergence of binaural and ambisonics spatial panning, working with dynamic spatial reverberation will also require new design paradigms.
While the need for novel spatial audio technologies is motivated by the immersive qualities of virtual reality, they can also be powerful tools for any interactive sound design for non-VR platforms. How can we spatialize reverberation and how can this effect contribute to future sound mixes? Let’s explore the practical implications of some of the techniques we covered in the past articles, and examine some of the lessons learned during the development of Audiokinetic’s forthcoming dynamic early reflections technology.
With a spatial reverberator, we aim to spatialize the reverberation through the same panning algorithm as with the direct sound. For this to be possible, the reverberator should yield some information regarding the direction and distance of key reflections. While some techniques such as ray-based simulation and wave-based methods can achieve good spatial awareness, they also have some challenges and limitations. For instance, ray-based methods are great at simulating early reflections but don’t offer realistic rendering of late reverberation. On the other hand, wave-based methods become more expensive as we compute the propagation of higher frequencies. Interestingly, as the echo density increases in a reverberator, the output becomes increasingly more diffused, and the spatial accuracy much less essential. For these reasons, it is useful to consider reverberation as a succession of stages over time, allowing us to choose the most suitable reverberation methods at each stage. In order to make this possible, we will also need to ensure that these stages can blend well together and can form a complete hybrid reverberation algorithm.
Looking at the techniques at our disposal, wave-based methods stand out for their simulation accuracy. These techniques can be useful to generate various impulse responses (IR) automatically based on the game’s geometry, minimizing some of the workload involved in assigning reverberation properties to large virtual environments. Unfortunately, the computing cost of these methods remains prohibitively too expensive for most practical applications. In reality, the flexibility and perceptual benefits of classic reverberations still outweigh their convenience. Therefore, classic multichannel reverberators, like an ambisonics impulse response, are a great option for late reverberations. The common limitation of most multichannel late reverberators is the lack of interactive early reflections. By combining the use of a ray-based method to inform early reflections, we can form a hybrid reverberator. One of the benefits of ray-based simulations for this purpose is that they can offer a lot of control on individual reflections, which makes them a potentially powerful sound design tool.
Towards new immersive sound design paradigms
Using ray-based early reflections, we can simulate the originating position of these reflections. Having the position of both the listener and the emitter within the virtual geometry, we can create the effect through the individual delay times and spatial positioning for each reflection. The individual positioning allows appropriate HRTF spatialization in a binaural panner. The distance taken for the reflection to reach the listener will set the length of a delay, simulating the slow moving nature of sound. The amplitude of each reflection can also be adjusted according to the distance as well. For example, being close to a sound source located near the corner of a room would be audible through very short delays that would also make the sound perceivably louder. To adjust the frequency content of the effect, audio filters can be added to the system, simulating frequency dependant wall absorption. Through these parameters, the information from a ray-based sound propagation simulation can be used to create a system with many moving parts. Indeed, while providing spatial cues at a reasonable computing cost, a spatial reverberation should also be adaptable. Reverberation remains an effect and its aesthetic qualities must come before the accuracy of its simulation.
Similar to any other reverberation, the effect of a spatial reverberator should vary to adapt to various design scenarios. For example, at times the effect should be subtle, allowing space in the mix to favor speech intelligibility or, perhaps, simulating the proximity effect that occurs when you get closer to a sound source.
On other occasions, it might be preferable to prioritize the sources going into the reverberator, when too many sounds are playing simultaneously, as calculating individual reflection paths for each is not a perceptually beneficial use of computing resources.
In another situation, it could be appropriate to exaggerate the perception of key reflections beyond what the physics would suggest. For instance, creating an audible tension using the footsteps of the main character walking through a long and narrow corridor by reinforcing the awareness of the two closest walls. Perhaps, the geometry is of a small room, but it would sound better if we simulated the acoustics of a larger space.
To ensure the frequency response of the effect contributes to the overall mix, the audio filters tied to each wall would not only replicate the absorbent nature of different materials, but also contribute to an overall desired mix.
Based on all of these examples, we can see how retaining a creative flexibility remains crucial for these types of spatial effects, which are otherwise driven by the virtual geometry and a simulation algorithm.
Dynamic early reflections
Let’s now focus in more detail on these dynamic ray-based early reflections and how to ensure they can offer creative opportunities. If we look again at the IR of a reverberator, we aim to modulate only the first few echoes. These reflections are still sparse enough in time to be perceived as specular (distinct), while later reflections are much denser and blend together to form a diffuse reverberation. The first aspect we wish to modulate is the amplitude. Through various settings, such as wall absorption and distance attenuation, each reflection will follow a tunable attenuation. Amplitude can be a powerful parameter to emphasize the effect and accentuate certain reflections. It can also be used to control these reflections independently from the usual distance parameter controlling the wet/dry ratio of the late reverberation. For instance, in the case of the proximity effect when a listener is very close to a source, it could be desirable to mute the late reverberation but retain a certain level of spatialized early reflections, for realism.
Modulating the amplitude of early reflections. Modified image from .
The time delay between individual early reflections should also be modulated. The distances to the various reflective surfaces, the speed of sound, the position of the listener, and the position of the emitting source can all be used together to calculate the length of various delays. The speed of sound is an interesting parameter that can be used to exaggerate or minimize the perceived size of a virtual space.
Since the positioning can vary at every frame, the delay length should be dynamic. These are known as time-varying delay lines, or fractional delays. When increasing or decreasing the length of a fractional delay, a Doppler effect can sometimes be audible, provided the change is significant enough. This is due to the need to read more or fewer samples and interpolate to the new delay length, which is essentially resampling.
Modulating the time delay of early reflections. Modified image from .
Finally, each of these reflections should be individually spatialized. A simple way of achieving this is using an intermediate spatial bus. For example, the reflections can be output to a higher order ambisonics bus, thus preserving their incoming directions. Afterwards, the higher order ambisonics bus can be mixed to headphones using a chosen binaural plug-in. This procedure will ensure each reflection will remain coherent over time as a listener rotates, for example, while minimizing the computing cost of applying HRTF filters on the reflections.
Spatialized intermediate early reflections patterns before going through HRTF based on their incoming angles.
All of these settings can be centralized into an interface to parametrize the effect. In the example below, we can see how various distance attenuation curves are being used to determine the attenuation and spreading based on the reflection distance and the distance between the listener and emitter. The time scaling can be modified to adjust the delays impacting the perceived size of the room. Individual walls can be turned on and off, which makes it possible to solo a specific wall and tune its settings appropriately.
Prototype spatial early reflection interface in Wwise
Virtual geometry is a crucial part of informing the behavior of early reflections. One of the popular methods for binaural plug-ins is to use a fixed rectangular shaped room around the listener. Indeed, since limiting the binaural effect to the direct path results in poorly spatialized sounds, these effects need to have some level of spatialized reflections. For this, using fixed room pre-sets of various sizes can be a simple solution without creating a dependency for in-game geometry information. These statically positioned reflections are simply filtered through the same set of HRTF filters.
Fixed room around a listener. The positioning of each reflection is static.
For a more refined effect, we can use in-game ray casting to find the positions of the closest reflecting surfaces. This would usually be done by sending rays at various directions from a specific point. A ray is simply a straight line, and ray-casting is the operation of finding the in-game surfaces or objects that intersect with this line. This can usually be performed by the game engine. To generate the effect, the desired sound would be mixed into a mono bus which would then serve as input for the reflections algorithm. The mixed sound is then reflected to the closest surfaces obstructing the rays, and by delaying the sound according to the distance traveled and positioning the output in the direction of the rays. For example, this method can be used on the sounds being emitted from the main character. Since the main character is often the focal point in the sound design, this is a good way to enhance only those sounds. This effect was used by Blizzard for their game Overwatch. In this case, it is called a quad delay since four ray-informed delay lines were used .
Sending rays at fixed angles around the listener to locate reflective surfaces.
For a more complex spatial rendering, we can allow the sound engine to directly access some of the geometry information. In this case, a three dimensional box representing a simplified version of the geometry would suffice, because we are simulating early reflections and not the fully diffused sound field. Having access to the simplified geometry will allow the early reflection algorithm to determine the individual location of key reflection paths for multiple sound sources. This can be more efficient than performing ray tracing on multiple sources.
With individualized early reflections, we can create a rich effect that would modulate based on the position of each sound source. By separating each source, we can also create custom settings for each of them. In some cases, to minimize the computing costs for example or for aesthetic considerations, it might be desirable to produce only two reflections. The closest walls would be prioritized for those sources, while other sources could require the use of every wall for a different effect.
During experimentation, we discovered that reflections coming from the ceiling and the floor could have an undesirable effect when outputting on standard surround systems. Indeed, since there are no channels above and below, they would be repositioned horizontally around the listener and reinforce the wrong direction. In this case, it is preferable to mute them. Therefore, the required output channel configuration should be taken into consideration.
A common way to group the parameters is to have a setting from the perspective of the main character and another setting for external sounds. The rationale behind this is to allow sounds coming from the listener, therefore very close, to maintain the effect while we mute the reflections of external sounds when they are closer to the listener.
Individual reflection paths for different sound sources. Using less reflections for one source.
Having individual reflection paths requires better communication between the sound engine and the game engine. The sound engine is responsible for holding the different settings for various categories of sources. It also holds the parameters for different wall materials and late reverberations. This information is then passed onto the game engine, where properties can be attached to the geometry. For instance, in the game editor, the list of acoustic materials coming from the sound engine would be used to tag different walls. In turn, the game will send information on the geometry, attached acoustics properties, and positioning, back to the sound engine. At this stage, the sound engine has all the information required to render the effect and the early reflections can be combined with a late reverb.
Data flow between the sound engine and the game engine.
Here are two short sound clips that demonstrate how the effect can be perceived. The first sound has a regular reverberation without any spatial element, and the second has four fully spatialized early reflections. Both samples use the Auro®-HeadPhones™ binaural plug-in. You can expect many more opportunities to experience the effect, very soon!
With dynamic early reflections, it is possible to create versatile spatial effects that can enhance the immersion well beyond the techniques which they are based upon. Indeed, with careful parametrization of key reflections, a wide range of rich aesthetics can be crafted. By combining them with multichannel reverberation, a complete immersive soundscape can be recreated, thus producing new sound design paradigms that will undoubtedly evolve with emerging spatial platforms. More details on Audiokinetic’s upcoming early reflections technology will follow during the Game Developers Conference (GDC) and future blog articles.
 V. Välimäki, J. D. Parker, L. Savioja, J. O. Smith, J. S. Abel, “Fifty years of artificial reverberation”, IEEE Transactions on Audio, Speech and Language Processing, vol. 20, no. 5, pp. 1421–1448, July 2012. Available at: https://aaltodoc.aalto.fi/bitstream/handle/123456789/11068/publication6.pdf