Empowering audio creators also means transferring some game resource responsibilities into their hands. As an editor and SDK, Wwise lets you create a lean and efficient audio environment that can respect the smallest CPU budgets. On the other hand, because it provides users with a panoply of artistic features and creation methods, it also gives the power to easily hog a CPU. Therefore, to gain processing that could be used elsewhere by the game, and to maintain a smooth gaming experience, it's important to ensure that Wwise is used efficiently. This article is intended for both programmers and sound designers to find solutions within the authoring tool, or the Wwise SDK, to win precious cycles.There are a few key elements that can be verified in order to determine if Wwise is being used in an optimal way. The focus of this article will be on those key data that can be retrieved from the Performance Monitor view while using the Wwise profiler.
The first step would be to get familiar with Wwise Profiling, especially the Performance Monitor. Profiling, Troubleshooting and Debugging using Wwise is a great article on this subject.
Additionally, using AK::SoundEngine::StartProfilerCapture() within the game directly is a good to way to create profiling sessions automatically.
When using the Performance Monitor, you can customize the view settings to graphically display the values needed. This will become useful for finding where peaks are occurring, and for making correlations with the Audio Thread CPU% graph.
(click on image to enlarge)
In the example above, we can see that the Audio Thread CPU peaks seem to correspond with the Number of Transitions/Interpolations as well as Number of Voices (Total).
Audio thread CPU
The Audio thread CPU is the main reference for taking decisions regarding CPU. It should be seen as how much time Wwise takes to render its final audio frame to be sent to the hardware. The size of an audio frame is determined at the initialization of the sound engine through AkInitSettings::uNumSamplesPerFrame. On Windows platforms, the default is 1,024 samples. Since the native sampling rate is 48,000 Hz, Wwise needs to submit an audio frame every 21.333 ms (this is 1,024 / 48,000). If the audio thread takes 21.333 ms to create one frame, it will be displayed as 100% CPU in the Performance Monitor. A good practice would be to aim for an Audio thread CPU usage below 50%, which would mean that it takes less than 10.666 ms to fill a frame.
Missing an audio frame (or being above 100% usage) might not be a problem for a single audio frame because Wwise buffers more than one frame in advance as a safety net. This is called the number of Refills and is defaulted to 4 in AkPlatformInitSettings::uNumRefillsInVoice. When all refills are depleted, and the audio thread does not have enough time to render another one within the current frame, a Voice Starvation error occurs. This can lead to audible clicks if it happens over multiple frames. Note that it is possible to get a Voice Starvation error displayed in the Capture Log while having a % value below 100%. This is because the Performance Monitor refresh rate is 200 ms and could miss quick CPU spikes.
There are three main reasons for Voice Starvation:
1) The audio thread (also named EventManager thread or LowerEngine thread) is being preempted by another thread from the game -- in that case, the percentage displayed will also be high even if Wwise itself is not processing much. The audio thread has a priority Above Normal by default. It is important that this thread retains a higher priority. It is also a good practice to keep it on the same CPU core by forcing an affinity in AkPlatformInitSettings::threadLEngine for platforms that support it.
2) The game is processing too much within a Wwise callback function. Locks in callbacks would also make the audio thread wait and thus be displayed as a higher CPU usage.
3) Wwise itself is processing too much. This is where carefully monitoring the different performance monitor values becomes important.
Number of Voices (Physical)
The number of physical voices usually has the biggest impact on CPU resources. It should be the first value to look at. What is a good number to target for physical voices? This could become more of a design question and more about how a mix should sound. While keeping in mind that each voice does not take the same amount of CPU, the average numbers usually seen in any type of game are between 30 to 70 physical voices. The number of channels, the built-in properties applied (such as LPF, HPF, and pitch), and the different conversion settings used will all impact CPU usage per voice. For example, decoding a Vorbis file will use more CPU than decoding a PCM file. Carefully selecting conversion settings becomes very important.
Number of Voices (Virtual)
Although virtual voices are actually used to save I/O, memory, and CPU cycles, having that number below 500 is a good sign that the game is managing active game objects and its "Number of Active Events", which is also a value to keep an eye on. The default choice for the Virtual behavior should be “Kill if finite, else virtual”. This is the easiest and most efficient virtual voice option. It takes care of killing inaudible voices that are not looping, while letting the looping ones continue virtually.
Total Plug-in CPU
To add the Total Plug-in CPU value in the Performance Monitor as well as the Plug-ins tab in the Advanced Profiler, the Plug-in Data option should be enabled in the profiler settings (Alt+G). This view displays the number of instances currently active, and how much CPU they are using. It is this number of active plug-in instances that will determine the Total Plug-in CPU load. Plug-ins inserted within the Actor-Mixer or Interactive Music Hierarchy will create a single instance for each sound playing. Since playing many of these sounds at once can quickly raise the CPU usage, the render checkbox to "bake" the Effect in the media WEM file itself should be considered. Plug-ins applied on busses will only create a single instance per bus. Thus, it is the number of busses active with a plug-in that will make a difference on the CPU rather than the actual number of sounds playing.
Another variable option for plug-ins that has an impact on CPU is the number of channels processed. For example, three instances of an Effect active on mono sounds (three channels) inside the Actor-Mixer Hierarchy, will be cheaper than a single instance of that Effect active on a 7.1 bus (eight channels). As such, it can be more beneficial to have an Effect inserted within an Actor-Mixer when we know that the total number of channels accounted for by all instances will be lower than the channel count of a bus. For all Effect plug-ins , the CPU cost increases linearly with the number of channels to process. The only exceptions are Reverbs - their performance cost per channel is closer to a flat line, and are in most cases best used on Auxiliary Busses.
Number of Registered Game Objects
Some game objects might be inactive for a long period. It is therefore good practice to apply culling and unregister those. This is to avoid going through a long list of objects to update their positions or any other parameter, whether it be by the game or within the sound engine itself. Again, what is a good number of registered game objects? Some open world games are able to keep this below 80, while other developers decide to keep alive all objects within a certain map/level permanently, reaching numbers above 1,000 while maintaining good performances. Nonetheless, chances of hitting performance issues are higher when handling a large list of game objects. Moreover, it is easier to pinpoint issues within the Profiling layout when a game has a smaller list of registered game objects.
How often are positions, occlusion values, or game sync values updated in the game? Are those calls being spread within a few game frames or is there a burst of SDK calls within a single frame? Do all game objects need their RTPC values updated with each frame? By enabling the API Calls option in the Profiler Settings (Alt+G), you’ll be able to verify if too many or unnecessary API calls are being made to Wwise at each frame.
Number of Transition/RTPC Interpolation
For the Performance Monitor, a transition (not to be confused with music transitions) is the increase or decrease of a property following a specific rate/curve. The most common transition is on the Volume property during fades. Basically, any property controllable by an RTPC could create a transition over time, and each of these transitions will require a small amount of processing by the sound engine. Consider this value a potential culprit for CPU spikes when it is above 500. An example where it becomes easy to create transitions is when using Game Parameter interpolation combined with built-in parameters such as Distance; it can spawn a transition each time a game object or a listener moves.
Lower end platforms and mobile
Leveraging the sub-platform system of Wwise (see Managing Platforms), combined with the linking and exclusion features, becomes important when handling less powerful processors. Therefore, for a sub-platform it may be necessary to apply all of the recommendations mentioned above more economically. For lower end mobile devices, for example, you might want to aim for 30 physical voices instead of 70, with less plug-in instances active. Leveraging the quality and low cost of PCM sounds is also being seen more often, especially for short and/or frequently repeated sounds.
In conclusion, while there may be other factors within Wwise that can have an impact on CPU, the values presented above should give you the biggest payback in regards to CPU optimization.