Translator's Note:
This series highlights audio best practices from China. China topped the world gaming revenue charts in 2016. And the Wwise Tour 2016 China stop not only featured popular (50 million daily active users) Chinese games using Wwise, but also boasted over 200 attendees from the local game audio community. Therefore, we were certainly intrigued to take a deeper look into game audio practices in China. By translating blog articles by one of the most influential audio designers within the Chinese gaming industry, we aim to help better understand the audio community and the culture of this vast territory. To the best of our knowledge, this would be the first-ever effort in translating Chinese audio tech blogs into English.
Translation from Chinese to English by: BEINAN LI, Product Expert, Developer Relations - Greater China at Audiokinetic
For the great variety of media forms out there, whether they are push services like broadcasting, or multi-platform interactive entertainment, or Internet streaming, loudness is not just about signal transmission standards, but also a direct influence over consumer aesthetics, such as loudness wars. Today, it seems nothing but normal that a piece of music can be played over all these platforms simultaneously. However, such a requirement gives content production a lot of headaches. How can we ensure that the same sounds can achieve the same quality over multiple platforms? Additionally, we face losing frequencies and dynamics over different audio formats. Although the so-called "quality" can be subjective, basic evaluation frameworks exist, such as frequency response. The amplitude response on different frequencies is a definitive factor for the loudness of any sound. The main problem that we face here is not about creating a digital file that makes noise, but about how we can control the amplitudes of the different frequencies of sounds, electric power, or loudness.
The problem boils down to what is "loud" and what is "quiet". We need an objective reference.
The loudness measurement system is pretty mature by now. To start off, let's take a look at some reference numbers. By tradition, there are a few unofficial loudness standards:
- Music RMS = -16 dB (peak <= -3 dB)
- Voice/Speech RMS = -12 dB(peak <= -3 dB)
These were commonly used by most professionals in the business. However, today's professionals (in China) lowered their standards. Many people working in the recording and post-production business, especially junior engineers, don't even know that something like RMS exists. So, junior engineers often don't even undestand when they are going against well established norms that senior engineers respect.
A few more numbers for reference:
- The average RMS loudness found in classical music produced in the 1990s is -21 dB.
- The RMS loudness of the final mix of Hollywood soundtracks usually don't exceed -20 dB, some even don't pass -24 dB.(The current mainstream is at -24 dB LKFS.)
- The built-in speaker output of industry standard mobile phones will clip when the sound sample's RMS goes above -8 dB.
You can also find an HD trailer of a typical Hollywood film and look at its RMS; it should be much lower than you would expect! And most waveforms look pretty: They rise and dip instead of displaying long brick walls. Their dynamic frequency responses are well saturated.
In the above image, the first track comes from the trailer of the film Shooter (2007). The second one is from an early trailer of Star Wars: Episode VII - The Force Awakens (2015).
Note that while the numbers above will depend on experience and aesthetics, they are backed by the standards of post-production, broadcasting, and theatres. In fact, most products that conform to industry standards can be sufficiently amplified during playback, and the resulting quality is largely determined by playback equipment. The standards and procedures are especially strict in the case of theatres (excluding China). Yet, nowadays, popular aural aesthetics deviate from the standards. Even though the impedances of regular headphones have dropped to the point where headphone amplifiers are no longer needed, the loudness of sound assets is still on the rise.
Here is a sound excerpt taken from the film Exodus: Gods and Kings (2014)(720P, AAC)
I extracted the 44 minutes in the middle and did a loudness measurement:
Then I measured its RMS using SoundForge Pro for Mac 2:
The results:
- LKFS= -23.5 dB
- RMS= -26.69 dB
- LRA= 16.4 dB LU
Tips: The "dB" used here is short for dBFS (Decibels relative to Full Scale). Full Scale means the full frequency range of 20-20,000 Hz and a full dynamic range (which, according to the sampling rates in use, in a CD spans 96 dB while a DVD spans 144 dB). The Chinese industry calls it “全幅”.
Here comes our next question: How do we read a meter? Let's first take a look at the differences between the commonly used meters. First, the classic one that we all know: the VU meter.
0 is the upper limit. The red part beyond 0 represents the acceptable headroom. Note that its units are not dB.
A VU meter indicates the current voltage change of the audio signal using a fast-moving pointer. Because the mechanical movement of the needle introduces latency, it may be unable to report fast subtle fluctuations. The most commonly used level meter, the PPM (Peak Program Meter) is shown via the vertical bars to the left of the picture below:
PPM is also called True Peak meter (dBTP) because it reflects the current peak only. The signal level defined by PPM is up to 0 dB, beyond which signals are still seen as 0 dB, although some professional high-precision meters can show above 0 dB. The max reading 0 of a VU meter does not mean "0 dB", but is about the same as the -20 dB of a PPM meter. In other words, when a VU meter is at its 0, the PPM still has 20 dB headroom for the same signal. Some PPM meters show the current mean peaks, too.
Due to the physical limitations of the pointer, a VU meter is only good at showing the relative changes in signal levels. The greater the movement, the wider dynamics it represents. PPM shows the current peak. Neither of them can objectively display "loudness". The broadcasting industry mainly uses the LKFS loudness meter.
Waves WLM
TC LM2n
Comparing the two meters using the same sample:
It's very clear that the two readings are different.
It turns out that there are a number of common international loudness standards. The newest and most popular one is ITU-R BS.1770-3 (created in 2012). Originally created by BBS and EBU, and also officially adopted by the Chinese radio and television broadcasting industry, this standard is the result of years of evolution. It has variants for different application domains, such as BS.1770-2, EBU R128, and TR-B32. In interactive entertainment, Xbox One and PS4 both include loudness tests as part of their TRCs. A TRC is a non-negotiable technical standard for a platform. Game products that fail a platform TRC are not allowed to be published on the platform; in other words, they cannot be released on the consoles. Similarly, even on an iPhone, an iPad, or a Samsung mobile phone, the frequency response and dynamic range of the DAC (digital-analog conversion) also have standards. These standards are not only meant for cost-benefit balance under mass production, but also about about user aural experience. From another perspective, cellphone manufacturers understand that no professional producer or pro media will compromise for their cellphone speakers, because they have run their businesses for decades and know what they are doing. Therefore, cellphone manufacturers have to ensure that their phone speakers can accommodate Taylor Swift songs, rather than expect Swift's producers to concede to their cellphones. To this end, the big manufacturers participated in the establishment of a series of loudness standards. Refer to the following literature for details:
The loudness standard presets of Waves WLM:
The loudness standard presets of TC LC2n:
Two agencies are relevant to our daily work, and are the authorities of the standards:
- ITU: The International Telecommunication Union
- EBU: European Broadcast Union
Among the ITU and EBU loudness standards and measurement systems, the Loudness Meter is the most widely adopted one. As a new meter, it serves the same purposes as the VU meter and the PPM. However, the meter interface introduces some brand new terminology. It is crucial to understand these terms to control loudness and dynamics. They are the main references for your work. Let's take a look at these terms:
LKFS: Loudness, K-weighted, relative to Full Scale, know as 全幅K权重响度单位 in Chinese. The K-weighted is the result of a research collaboration between McGill University and CRC (the Communications Research Centre Canada). It is a non-linear curve representing human loudness perception. It is widely accepted to be the most accurate algorithm for representing loudness. This algorithm has great significance regarding digital signal amplification, because radio, television broadcasting, and video games all need to minimize signal distortion during sound amplification while meeting the inherent human aural requirements. To clarify, LKFS is a measurement unit for loudness, 1 LKFS = 1 dB,so I will use dBLKFS accordingly in the following articles of this series.
LUFS: Loudness Units Full Scale, another loudness measurement unit, and essentially the same as LKFS. LUFS is just an EBU thing. 1 LUFS = 1dB.
Gating: 门限 in Chinese. Not all loudness meters and loudness processing tools have this parameter. Take classical music and films as examples. There can be long and quiet sections, but there can also be very loud moments. With such complications, how do we describe its overall loudness, or even have a somewhat objective basis for measurement? Gating to the rescue. It is there for ignoring relative low levels. For example, signals lower than -45 dB are usually ignored. Then, the loud sections will be used for describing our perception. Gating can perhaps also help us in another important area: the loudness metrics. Your aural perception requires a stable reference (more on this later). For instance, we listen to pop music more than other music genres in daily life. We, therefore, have subconscious references, although vague, for all kinds of volume and loudness changes in pop music. However, we are often at a loss when we deal with films or video games, and we are not sure how loud they should be made. This is because we have vague, if any, loudness references for the complex changes in sounds or unfamiliar sounds present in those media forms. However, pop music often can serve as our reference. The gating here is used precisely for that. Of course, whether or not to open the gate—and the resulting loudness metrics once the gate is open—still depends on you establishing the rules, which requires ear training. Gating helps you set up your system of loudness comparison and decision making.
We usually won't hear sounds of an identical volume over time. So the loudness measurement adopted by ITU-R BS.1770 specifies that during a 30-minute continuous playback, the acceptable average loudness is about -24 dB LKFS (-23 dB LUFS by EBU standards), while the maximum loudness is -12 dB LKFS, above which is considered to be too loud.
In the case of video games, this measurement approach becomes problematic, because an individual sound asset rarely lasts that long. So we care more about short-term loudness, a game's runtime average loudness, and max short-term loudness (during a 400-3,000 ms time frame). Of course, sometimes we have to measure low-loudness passages and their durations, which is also important. If the running game's low-loudness passages were too long, and their loudness values were too low, then the overall game would sound too quiet and unexciting; there could even be times when nothing seems to come through a gamer's headphones. This experience would be unnatural. In fact, we must consider these values not only for the game's overall audio output, but also during the post-production and balancing of its BGM (background music), especially the low-loudness measurement, which is often overlooked. These values can be seen in the following screenshots:
Making sense out of these numbers requires ear training. I share my personal tips and best practice next.
Stay tuned for my next blog: Loudness Processing Best Practices, Chapter 1 : Loudness Measurement (part 2)
Comments
Simon N Goodwin
June 06, 2017 at 12:47 pm
Interesting perspectives, thanks. More insights into the Chinese market, especially as regards things done differently there, are very welcome. The CD stats seem wrong, though; 16 bits would equate to 96.3 dB not the 120 dB (requiring nearly 20 bit resolution) stated. 144 dB for DVD is about right (at 6.02 dB per bit, 24 bits yields 144.5). Perhaps the CD figure is allowing generously for the perceptual benefit of superbitmapping, as Wikipedia cites, but if so the DVD figure could just as legitimately (or illegitimately) be bumped up to 169 dB. I think 96 and 144 dB would be more realistic CD/DVD figures. In any case, it's lots! My 6.02 dB Simon
Harry Teabout IV
June 09, 2017 at 04:25 am
Thank you for bringing me up to speed with the audio trends of the gaming world.
Beinan Li
June 12, 2017 at 10:33 am
As the translator of the post, I'm commenting on behalf of the author Digimonk. Thanks very much for pointing this out, Simon. Digimonk confirmed that this was actually a mistake in his original post. The "120dB" in the text should have been "96dB". I have fixed the number in place on behalf of him as well. Thanks again for supporting this post, Simon.
Beinan Li
June 12, 2017 at 10:33 am
As the translator of the post, I'm commenting on behalf of the author Digimonk. Thanks very much for pointing this out, Simon. Digimonk confirmed that this was actually a mistake in his original post. The "120dB" in the text should have been "96dB". I have fixed the number in place on behalf of him as well. Thanks again for supporting this post, Simon.