Hello everyone. I’m Yu Zhang, a member of the audio team at YooZoo Games in China. It has been more than 13 years since the introduction of the ITU-R BS 1770 standard in 2006. While there are many articles about loudness, I thought I would explain several things about loudness in the form of a Q&A. I hope you find this post useful for your future work.

Loudness and Loudness Level

The concept of “loudness level (LN)” was introduced as a solution to measure human sensitivity to pure tones with equal-loudness contours. The purpose of the loudness level is to provide a standard measurement for the perceived loudness. The unit of measurement for loudness level is the phon. The phon matches the sound pressure level in decibels of a similarly perceived 1 kHz pure tone. The phon reflects our subjective perception of the sound pressure, however it’s not convenient for measurement. As such, the unit of loudness “sone” was introduced. A loudness of 1 sone is equivalent to the loudness of a signal at 40 phons, the loudness level of a 1 kHz tone at 40 dB SPL. Each 10 phon increase produces almost exactly a doubling of the loudness in sones.

What does dB represent?

The decibel (dB) is a logarithmic unit. It cannot be used to directly describe the size or amount of a physical quantity, only the ratio of two physical quantities of the same unit. It is widely used in the measurement of acoustic level. It usually indicates the relative difference in power or intensity between two acoustical or electrical signals. While expressing a power ratio, the number of decibels is ten times its logarithm to base 10. In “dB”, “d” indicates decibel, “B” indicates Bel. The decibel is one-tenth of a bel: 1 dB = 0.1 B. Suffixes are commonly attached to the basic dB unit in order to indicate the reference value by which the ratio is calculated.

What does dBSPL represent?

“SPL” is the short form of “Sound Pressure Level”. In essence, sound is a kind of energy wave. Waves with different lengths have different frequencies. With different intensities, they can be perceived as quiet or loud when they act on microphone diaphragms, human ears etc. This kind of pressing force caused by a sound is what we call “sound pressure”. The relationship between sound intensity and sound pressure is shown below:

“I” is the sound intensity (W/m²); “p” is the sound pressure (Pascal); “ρC” is the aerodynamic drag coefficient, ρC = 400 rayleigh (N·s/m3).

What does dBFS represent?

dBFS (Decibels relative to full scale) is a unit of measurement for amplitude levels in digital systems. The level of 0 dBFS is assigned to the maximum possible digital level. “Full Scale” indicates the maximum analog signal level that can be coded before a converter reaches the digital overload threshold. In other words, dBFS represents the amplitude of a signal compared with the maximum which a device can handle before clipping occurs. All peak measurements smaller than the maximum are negative levels.

What are we really referring to while talking about loudness?

The subjective perception in terms of which sounds can be ordered on a scale extending from quiet to loud (i.e. sound intensity).

What are the objective characteristics that affect the sound intensity?

Signal level, frequency response, duration, sound field.

What are the subjective characteristics that affect our perception of the sound intensity?

Physiological peculiarity, psychological traits, listening environment, cultural background, personality differences.

What are psychoacoustic parameters besides loudness?

Sharpness: Sharpness is a measure of the high frequency content of a sound. The greater the proportion of high frequencies, the ‘sharper’ the sound. The unit of measurement for sharpness is the acum. 1 acum is defined as the sharpness produced by a narrow-band noise of 60 dB with a bandwidth of 160 Hz at a center frequency of 1 kHz.

Fluctuation Strength: Fluctuation strength quantifies the subjective perception of slower (up to 20 Hz) amplitude modulation of a sound. The unit of measurement for fluctuation strength is the vacil. 1 vacil is defined as the fluctuation strength produced by a 1 kHz tone of 60 dB which is 100% amplitude modulated at 4 Hz.

Roughness: Roughness quantifies the subjective perception of rapid (15-300 Hz) amplitude modulation of a sound. The unit of measurement for roughness is the asper. 1 asper is defined as the roughness produced by a 1 kHz tone of 60 dB which is 100% amplitude modulated at 70 Hz.

Psychoacoustic Annoyance (PA): PA involves loudness and the above three components. It quantifies our overall perception of a sound. It’s a dimensionless value, the bigger the value, the more annoying the sound.

Psychoacoustic parameters are physical quantities used to describe our subjective perception of a sound. They quantify the difference of our auditory sensation and eliminate the influences of individual characteristics. Loudness performance has a great influence on psychoacoustics, but in the subjective evaluation of a sound, it needs to be considered together with the other three parameters.

What is subjective and objective when it comes to the listening experience?

Generally speaking, the signal level is objective, and the loudness is subjective.

What is the subject-object relation while evaluating a sound?

The sound is an objective existence. Our perception of a sound is a subjective evaluation.

How was the loudness measurement technology developed?

Since the late 20th century, computer technology has achieved a leapfrog development. With the extensive use of Fast Fourier Transform Algorithm (FFT) in practice, especially in DSP technology, it provided a technical basis for the development of loudness measurement. Excellent loudness algorithms such as Leq (Revised Low-frequency B-curve weighting), TC LARM, TC HEIMDAL were introduced during this period. The Leq (RLB) algorithm was adopted by the ITU as part of the technical base of the 1770 case.

What does the ITU do?

The International Telecommunication Union (ITU) is a specialized agency of the United Nations that is responsible for issues that concern information and communication technologies. PS: Many people majored in radio and electroacoustics eventually became first-class sound designers because they know more than anyone about the "wave” concept in physics. LKFS (Loudness K-weighted relative to full scale) is standardized in the ITU-R BS.1770 standard.

What does the EBU do?

The European Broadcasting Union (EBU) is an alliance of public service media organizations, established on 12 February 1950. Some of the early technical indicators used by the China Radio and Television Department made extensive reference to these European standards. LUFS (Loudness units relative to full scale) is a synonym for LKFS that was introduced in the EBU R128 standard. LKFS and LUFS are identical in that they are both measured in absolute scale and both equal to one decibel (dB).

Why did the ITU introduce the ITU-R BS 1770 standard?

1. The relationship between audio level and human perception of the sound intensity is not quite linear.

2. VU, PPM and RMS meters as well as other conventional peak meters do not reflect the subjective loudness.

3. VU, PPM and RMS meters as well as other conventional peak meters cannot accurately measure the actual peak level.

4. A common standard is required to balance the loudness of different TV programmes.

5. The peak level of a digital signal may be larger than the sampled signal, so a more scientific solution is needed to implement global monitoring.

6. In addition to the ITU-R BS 1770 standard, the ITU also introduced the ITU-R BS.1864 standard “Operational practices for loudness in the international exchange of digital television programmes” as a practical supplement to provide further advice on the target loudness of TV programmes.

Why is the VU meter not suited as an instructive monitor for sound design?

It has been over 80 years since the VU meter was originally developed in 1939. This technology is no longer able to meet our current needs. Due to the integral time response, the VU meter may produce a reading that is lower than the actual signal peak. The mass of the needle causes a relatively slow response, it can never reflect the instantaneous signal peaks of complex audio signals. The signal peak level is usually 6 ~ 12 dB higher than the indicated value while using the VU meter to monitor a sound.

Why is the RMS meter not suited to be a main monitor for sound design?

RMS meters are more suitable for waves with fixed periods and no evident shape changes. The value is just a measure of the signal voltage, it doesn’t reflect our subjective perception of a sound. In short, the RMS feedback is too objective. Human sensitivity to a sound is variable across different frequencies. Two signals may have the same RMS value, but they can be psychoacoustically perceived as differing in loudness.

What are the technical parameters introduced in the ITU-R BS 1770 standard?

1. Loudness Unit (LU): Loudness unit is the scale unit of the loudness meter. The value of the programme in loudness unit represents the loss or gain (dB) that is required to bring the programme to 0 LU, e.g. a programme that reads -20 LU will require 20 dB of gain to bring that programme up to a reading of 0 LU.

2. Momentary Loudness: Momentary loudness is defined as the ungated loudness when passed through a first-order IIR (infinite impulse response) low-pass filter with a 400 ms time-constant.

3. Short-term Loudness: Short-term loudness is defined as the ungated loudness when integrated over an interval of 3 seconds (“gated” means that it won’t work unless certain technical parameters are met).

4. Programme Loudness: Programme loudness is the integrated loudness over the duration of a programme. It shall be measured with a 400 ms integral time, 75% overlap between consecutive gating blocks, -70 LKFS absolute-gated threshold and -10 LU relative-gated threshold.

5. Loudness Range (LRA): Loudness range quantifies the variation in a time-varying loudness measurement. LRA is defined as the difference between the estimates of the 10th and the 95th percentiles of the distribution. The lower percentile of 10% can, for example, prevent the fade-out of a music track from dominating Loudness Range. The upper percentile of 95% ensures that a single unusually loud sound, such as a gunshot in a movie, cannot by itself be responsible for a large Loudness Range.

6. True Peak (in dBTP): True peak is the maximum peak level of a signal in the continuous time domain, which is relative to the sample peak level. Since the system has a time window, the true peak may occur during the interval between two windows. Normally, in order to prevent a possible underestimate of 0.5 dB, a 1 dB headroom is required on a loudness meter with a 48 kHz sampling rate and 4x oversampling measurement. The GY/T 282-2014 standard “Technical requirements for the average loudness and true-peak audio level of digital television programmes”, which was implemented in China in December 2014, states: The maximum true peak audio level over the duration of a programme shall not exceed -2 dBTP.

7. Integrated Loudness: Integrated loudness is the average loudness from the start point to the end point. It is approximately equal to the programme loudness.

How do the K weighting and RLB filters in the ITU-R BS 1770 standard work?

1. K weighting filter: The K weighting filter is used to simulate the scattering around our head and auricle. The gain will be gradually increased to 4 dB from a 1 ~ 3 kHz frequency range, and it remains unchanged above 3 kHz.

2. RLB filter: The RLB (Revised Low-frequency B-curve weighting) filter is, in essence, a high-pass filter. It is mainly used to reflect the insensitivity of human ear to the low frequency content.

Why set two thresholds in the ITU-R BS 1770 standard while performing the signal analysis?

In order to calculate the effective sound level, a two-stage process is used to make a gated measurement, first with an absolute threshold, then with a relative threshold. This allows us to calculate the main loudness component of each channel while weighting individually. The absolute threshold is set to -70 LKFS (The LKFS unit is equivalent to a decibel in that an increase in the level of a signal by 1 dB will cause the loudness reading to increase by 1 LKFS). It is used to remove the content that are lower than the threshold from the loudness calculation. The relative threshold is set to -10 LKFS. It is used to remove the content that are 10 LKFS lower than the current loudness based on the absolute threshold calculation. This allows us to remove the silence content and the background noise from a sound. Setting thresholds ensures that all content involved in the loudness calculation are valid.

What other standards do we have?

Loudness Standard	Integrated Loudness (LKFS/LUFS)	Tolerance (+/-LU)	Max True Peak (dBTP)	Standard-setter	Standard Title
ITU-R BS1770	-24	2	-2	ITU	Algorithms to measure audio programme loudness and true-peak audio level
EBU-R128	-23	0.5	-1	EU	Loudness normalization and permitted maximum level of audio signals
ATSC A/85	-24	2	-2	USA	Establishing and maintaining audio loudness for digital television
AGCOM	-24	0.5	-2	Italy
OP-59	-24	1	-2	Australia
ARIB TR-B32	-24	2	-1	Japan	ARIB TR-B32: Operational guidelines for loudness of digital television programs
GY/T282-2014	-24	2	-2	China	Technical requirements for the average loudness and true-peak audio level of digital television programmes

About the loudness standards in China

It’s not fair to say that there are no loudness standards in China. To be precise, in game audio development, we have no specific, proven and enforceable loudness standards. We spend most of time on development instead of research. We practice more than discussion. In actual work, everyone goes his own way. We don’t have much study samples, not to mention quantitative standards. At this stage, it’s a reasonable way to refer to someone else’s standards while accumulating experience and summarizing lessons.

Research Data

1. According to the research results from CCTV (China Central Television), the average loudness of historical programmes is -27.4 LKFS (TV channels only).

2. Conversion of Units: As the GY/T192-2003 standard defined, 0 dBFS is equivalent to 24 dBu (dBu is used to describe the r.m.s. voltage of a signal relative to 0.775 V (r.m.s.)).

3. Based on CCTV’s measurements on TV programmes in 2013, the maximum true peak levels are averaged at -5.7 dBTP. You may not get enough headroom at -5 dBTP. And, the dynamic range will be affected at -2 dBTP.

4. The maximum loudness offset is defined as the difference between the maximum short-term loudness (over a 3 seconds interval) and the programme loudness, the average of which is 3.5 LKFS. 7.1 LKFS for snooker games, 1.1 LKFS for propaganda films.

5. According to CCTV’s research in October 2013, the average loudness of 123 out of 322 programmes (40%) from CCTV 1 channel is -24 LKFS ± 2 LU; the maximum true peak levels of 305 out of 322 programmes (95%) are no more than -2 dBTP.

Some thoughts on sound processing under current loudness standards

1. Monitoring at a fixed volume is the basis for ensuring a consistent sound loudness. When you are accustomed to listen at a certain volume, you barely need the loudness meter to evaluate whether it’s too loud or quiet. Therefore, it’s better to calibrate the monitoring volume before rather than during your sound design.

2. There are a lot of short sounds in a game, such as UI sounds or prompts. Their peak duration are short and the dynamic fluctuation are obvious. For these audio files, we need to pay attention to their instantaneous loudness. According to the relationship between loudness and time, subjective loudness often refers to the overall perception of a sound signal by human ear within a certain time range, which should be at least 200 ms. However, the peak level at a certain point in time, even at the sampling point, does not actually reflect the loudness in that time domain. A signal that appears to have a higher peak level is not necessarily perceived as louder. In contrast, a less dynamic signal with small level fluctuation in the peak amplitude can be perceived as louder. In other words, loudness reflects the average level of a signal over a period of time rather than the peak level. According to my experience, you should pay special attention to the instantaneous loudness for sounds with a 200 ms - 3 s duration.

Relatively quieter, yet more dynamic

Relatively less dynamic, yet louder

3. When you created a very complex project, you will find that the integrated loudness can be affected by the peak levels with an instant jump to the high points, the loudness increase of a signal might be limited by the maximum peak level that the audio system can receive, and more dynamic audio signals have an effect on the overall average level due to higher transient peak levels. For such issues, we can balance with decentralized controls and global restrictions. As shown below, we can use automation to control the time of starting oscillation for some signals so that the jump comes slower and goes faster. At the global level, we can add compressor/limiter effects to the process chain. This will allow us to lower the transient overloads that are not well controlled, compress the dynamics, increase the loudness and improve the integrity. While selecting a compressor/limiter, we can use some effects that match the loudness mode for sound processing, as shown in Figure 2 below.

Controlling the waveform jump to prevent it from generating a new one with a too long hold time on peak load.

4. The frequency processing should be done as early as possible, and the frequency balancing should be implemented prior to the bus processing. Applying multi-band compression on a bus may help you adjust the frequency, but as a whole, it will have the overall frequency ratio reconstructed, resulting in the spectrum content specified at the early stage being changed dramatically. You may want to avoid this.

Sorry for all the terminology in this post, I just don’t want you guys to miss out on anything. Finally, I’d like to thank the Audiokinetic team in China, for revising my blog. Have a good day!

Comments

Juan Baez

December 05, 2019 at 03:44 pm

Great post! Thanks for revealing the math and terminology behind loudness. The CALM standards are part of my daily work, but it's nice to know what's actually behind these practices.

Jeff Hilman

December 08, 2019 at 05:04 pm

Just wanted to say this was a very helpful, informative, and interesting article! Thank you Yu Zhang for sharing your knowledge.

Your email address will not be published.

Comment

Where 40,000+ audio professionals share interactive audio ideas, news and beyond.

Several Things About Loudness

Sound Design

Yu Zhang (张禹) | December 03, 2019