After discovering that the market lacked real tracked impulse responses for 3D Audio use, we set out to experiment and produce the first commercial Higher-order Ambisonics impulse response library ever. In this article, we give a behind-the-scenes look at how we did that.
The first big question was which microphones to use. We tested a bunch of setups. For first order Ambisonics there are a few options out there (e.g., the Oktava MK-4012 or the well-known Sennheiser Ambeo VR). As we started the project, we were not quite sure yet in which format we should record the IRs. As you may know, there are other 3D Audio formats out there which we could have also used. We of course also did some tests with the ORTF 3D setup from Schoeps, which was really good sounding. But testing those IRs in a production environment revealed some artifacts we could not fix. We also noticed that it was not possible to match our reference tracks (we will get to that later) with the results as close as we could with an Ambisonics setup. Not to mention that a smaller microphone setup would be more practical to do IRs from helmets, boxes or cars. But the main concern with Ambisonics microphones was that its spatial resolution would not be sufficient enough for 3D audio. But this is only true for FOA (first order Ambisonics). So, we started looking for HOA options. The list here is very short.
The Eigenmike from mhAcoustics finally gave us the flexibility needed for this library. This is a coincident microphone array with 32 capsules. The big advantage of the Eigenmike is if you record all 32 capsules (A-Format), you can then later compute many different formats. In the end you have one recording and get your Ambisonics 1st – 3rd order but are also free to generate other mono, stereo or surround microphone characteristics. For the stereo impulse responses, we used an ORTF characteristic computed from the 32-channel source.
The result sounded very natural and the Eigenmike brought a lot of practical benefits (setup, size) while giving us the opportunity to extend the spatial resolution compared to regular Ambisonics microphones. Even though we still prefer the ORTF3D over Ambisonics in general for surround and VR recordings, for this particular project the Eigenmike was our choice.
Choice of excitement:
If you want to record an impulse response, you have to excite the space you want to capture. We all know the method of firing a pistol, like we did for our Outdoor Impulse Responses library. This is not an option if you want to do a session in a church or indoors in general. Firing an impulse that way always has some natural random variables. Sweeping is definitely something more reproductive and scientific. You can also excite a space with MLS (Minimum Length Sequence) measurement, which we also tested. In comparison, the signal to noise ratio of sweeping was way better.
Which leads to the next step. Loudspeakers:
The first few criteria that come to mind if you think of a proper loudspeaker for tracking impulse responses are a.) frequency response and b.) does it have enough power to excite bigger rooms?
In the beginning we did some tests with omnidirectional speakers. But all of them did not fit our needs. One reason was the frequency response. Area of application for those kinds of speakers is usually building acoustics. The frequency response only goes up to a maximum of 10 kHz and then rapidly drops. As we were trying to get those frequencies back by eq-ing it in post, it did not give us the spatial perception we were aiming for. Whatever we tried after all we always came back to regular studio monitors. Most places were done with Genelecs 1030s, in some smaller locations with the smaller Yamaha HS-50, which sounded surprisingly good. We also tried different speaker configurations and found out that the configurations we liked the most were two speakers: one pointing at the microphone array and one away. Adding one speaker that is pointing away from the microphone gave us a little more extra depth of the spatial perception and less distinct early reflections. Early reflections is something that can easily be added with algorithmic reverbs in run-time which also helps with run-time localization of sound sources in space. With this in mind, we tried the omni directional speakers in the first place but found the two speakers in two directions to be sonically and technically the best solution.
As soon as we started traveling around to record impulse responses, we had to realize that it is not always that easy to have access to proper power supply. We did not think about that in the first place, because the library was planned to be indoors only. But if you want to do a session in an old tunnel, you have to be independent. We acquired the Hyundai Portable Power Station HPS-600, which fit our needs to power two speakers (at high volumes) plus the microphone, interface and a MacBook, but still being portable.
There is no perfect or right impulse response. The recording equipment colors the signal and the position of the speaker(s) and microphone always changes the perception of the room. For positioning we mostly decided by ear on a location-to-location basis.
The fiddly process on the software side of things, which turned out to be pretty time consuming (deconvolution, A to B format conversion, importing/exporting tracks, etc.) has a high potential of putting some extra unwanted artifacts into the chain. In worst cases this can make an IR unusable. To make sure that we can trust our results, we recorded reference tracks in every location. So before capturing the impulse response we always played back four different genres of music (funk, classic, rock, voice) and recorded it with the microphone in the exact same position. This gave us the possibility to later compare our reference track with the real recorded ambience to the simulated version of a convolution with our impulse response.
For a good impulse response, you need to have a certain distance between the microphone and the speaker. Because you want to avoid the direct sound of the speaker as much as possible. We always tried to find the best balance between distance, mic gain, speaker volume and surrounding walls to achieve a good S/N ratio. In the original reference track you can hear that noise, but keep in mind that these are 32 audio channels summed down to binaural.
We then found a way to denoise all 32 channels of the impulse responses simultaneously, which was not that easy in the first place. So now we had our recorded reference tracks with audible noise and could A/B those with the reference track convoluted with the corresponding impulse response. If the noise were not in the original, it would be pretty hard to tell which one the original would be. We did blind listening sessions and chances are 50/50 that the guesses are correct. That means with our audio-professionals test group no significant difference has been detected and we are extremely happy with our results.
Our main goal was to capture spaces that allow creative immersive sound design with a high priority to video games and postproduction. Although some spaces also turned out to be very suitable in musical contexts of course. We covered a broad list of spaces including churches, tunnels, hallways and everyday places like kitchens or bathrooms. We also did some really special things like a submarine or experimental stuff with boxes and helmets.
For us, the journey was exciting. And we hope that the HOA Impulse Response library helps you to set the acoustics right, to make your locations sonically more emotional and believable or possibly more creative.