The story we will tell today is a result of serendipity, otherwise known as the art of making discoveries haphazardly. We did not end up where we intended to and thus the evolution of our research project may seem convoluted, but stick with me. This kind of unexpected discovery is not uncommon when conducting historical research in general. The Conker’s Bad Fur Day research project is the perfect example. (If you don't know who Conker is, click here for more.) What started as a simple question that I asked myself while writing a Twitter thread about the Nintendo 64 sound system became an avalanche of more questions and leads, and eventually to a narrowly avoided historical catastrophe. Well, “catastrophe” is an overly-dramatic word choice, but maybe you’ll agree with me by the time you’ve finished this article.
Crawling into technical exceptions
The N64 sound system is extremely complicated. The hardware was conceived without any dedicated sound card, chip, or processor. Each developer had to improvise and build everything from scratch to enable music, sound effects and eventually dubbing to be played using RCP and RSP processors. The technical solutions to maintain the fragile equilibrium of the N64 system between sound, graphics, AI, data storage and etc. were numerous back then. This fragile equilibrium created a complex and interesting N64 sound history. It is also not well known, and retro-engineering is extremely tricky on this hardware. So when I realized that the developer Rare Limited was probably the first studio to use MP3 lossy audio compression format in games like Conker and Perfect Dark. I couldn’t help but ask “How?”. At the time, MP3 was a new format and not as popular in the late 90s as it became in the 2000s. Integrating the file format for games must have caused trouble at some point, or could conceal interesting pieces of the puzzle in the history of sound in games. While searching for information, I learned that only the voiceovers were using MP3s, which is logical because the compression rate would render music very hard to listen to.
I found something intriguing through an old post on a tech enthusiast forum where someone extracted all the MP3 files from Conker. When listening to the audio clips you can hear all the voices from the game sounding as you’d expect except for one. All the extracted files sound just as they did in the game except for Conker. The main character’s voice seemed sped-up and contained a lot of artifacts. There were many examples uploaded which made it easy to grab a few files to confirm that this was true in every file. Without knowing which tools were used to extract the files, a hypothesis about the strange artifacts being caused by a bad extraction rapidly lost its footing because this “bug” was just too specific. Even files with Conker being accompanied by the other character’s voices showed the nasty squirrel being the one audio impacted. My non-existent technical capabilities couldn’t think of any reason for this, so I asked my friend @Percight to help. He became equally as intrigued by this case and decided to join by running some tests.
The tests @Percight conducted left us as puzzled as we were from the very beginning, but we started to collect some interesting pieces of information. Our main obstacle for understanding the strange audio phenomenon was having no information about the decompilation, extraction, and decompression tools used to obtain the files. [Archivist Note: knowing which tools were used is very important, especially when they are not official tools. We can guess which tools were used to extract a file, but within that guess, we would have no knowledge of which version was used which limits ruling out version-related issues.] We noticed that Conker’s voice sounded different depending on the tools we used to read the MP3. I converted the MP3 file to WAV to run a spectrogram analysis in the Acousmographe tool. The MP3 was full of bugs and static making Conker impossible to understand when using Windows’s default player, but the WAV version sounded sped-up. It was at this moment that we had a crazy idea that was mentioned in the original forum post. What if Conker’s voice had intentionally been accelerated to gain storage space? His character being the most chatty in the game, his dialogue would likely take up the most data-space. What if this was a sneaky solution used to avoid taking up the N64 cartridge space (which was the biggest ever made at the time with 64MB) with sound.
This leaves the question as to why weren’t all the character’s voices sped-up to save space. With any hypothesis, every assumption counts.
Take a listen here, notice how Conker sounds. CW: Language.
Digging into historical leads
Our technical analysis led us to the craziest consideration. Who would speed-up sound to gain space in a game? Without a way to confirm our theory, we agreed to try digging a bit more on the historical side. The advantage to video game history is that a lot of the people who created them are still with us. We can then speak directly to the creators about their work and document their knowledge and expertise. It may seem obvious, especially with so many great conferences in the gaming industry, but there is still the need to ask these specific questions before it is too late, as memory fades easily. We were so lucky to get quick and enlightening answers from Conker’s composer, Robin Beanland, through Twitter. If MP3 was the format used exclusively for the voiceovers, the low bitrate varying between 24 and 40kb/s (and the higher quality being used in songs) would not allow for more without sacrificing intelligibility. He did not agree with our “space-saving” hypothesis, as he had never heard of such a thing. What we did learn was that Conker was the very first game Rare Limited used MP3. The code was given to the Perfect Dark dev team before the release so they can use it as well. This is already an interesting insight into how things were handled internally back then.
We were also able to get more details about sound such as the fact that at that time it was not possible to decompress and run MP3 along with ADPCM during gameplay sequences. This led to format change whenever music was important in the game like the “Great Mighty Poo” song. Cinematic scenes use MP3s of Chris Marlow’s voice (the voice actor for the Great Mighty Poo), but the vocalizations during gameplay sequences are using ADPCM. You need to know when he is singing to throw some toilet paper in his throat! Back to our very oddly sped-up voice, Beanland had no insight beyond the confirmation that it was never considered for memory-sparing purposes. With this new information, @Percight decided to push his work a little further. @Percight created conversions with Foobar2000 and analysis with Checkmate MP3 Checker which always led to errors reports. The constant “unidentified bytes'' and “invalid header values'' proved one thing, this was some kind of custom MP3 and the headers used when Conker speaks were peculiar. We could have stopped this research adventure here, but a few weeks later I spoke about this project at a monthly Interactive Audio Montreal (IAM) meeting, where I shared a presentation with Plogue’s founder David Viens which sparked a new flame.
This story is a perfect example of the questions that historians and archivists are frequently brought to ask. The reason I shared this project at the IAM meeting was to highlight how big mistakes can be made by trying to understand the game's sound out of its context and without the sound teams’ point of view. I was approached after I concluded my presentation with the most interesting feedback from a few sound professionals. Speeding-up voiceover to gain memory space was an actual practice a few years back. Also that the music composer wouldn’t be aware of what happens to their work and the sound in general after their work is handed off. A lot of technical handling takes place without them knowing how the implementation specialists deal with it, meaning they do not have the answer to every mystery. This feedback came by chance, again leaving me with more unanswered questions than ever. As happenstance would have it I was asked to write a piece on the N64 sound system for the French tech magazine Canard PC Hardware not too long after. This was the perfect opportunity to definitively put an end to this story.
The Technical Corner: Understanding “MyButt.mp3”
For technical details’ lovers, here is what @Percight found when he analyzed the MP3 file named after a very colorful extract from the “Great Mighty Poo” dialogue where the antagonist and squirrel’s rapid discussion helped making the speeding issue more obvious. Some frames corresponding to Conker’s voice use a specific header. While “normal” voices use the following : 0xFF 0xF3 0x50 0xC0, the last byte of the squirrel’s voice header is different: 0xFF 0xF3 0x50 0xC8. This represents changing 0000 into 1000 on the bit that should be allocated to the “Copyright” information. Up until now, it seems unlikely that this bit is here for this purpose. As we will see below, the “1” indicates that data has been added between the frame’s end and the beginning of the next header, but the piece of information does not necessarily carry copyright content.
Strangely, nothing specific appears on the spectrogram, except that the actors may have used different microphones.
Another odd thing caught his attention: not every frame associated with Conker’s voice uses this same header. It is different in other files which called for further experimentation. @Percight tried deleting the frames coming from the minority-header to make the file more stable, but the sound was just awful. The result was similar to what can be heard when audio-players choose to ignore the default frames instead of speeding the reading. We started to discover that some data was deleted by cutting parts of the sound and changing Conker’s words into gibberish. @Percight then tried to change all the “0xC8” into “0xC0” headers to make the file look like a normal constant bitrate MP3 which in the end didn’t change anything as he suspected. It was still worth trying.
There was one last intrigue to explore: the rebel frames made the file heavier. Conker’s files were 9 bytes longer than the ones using different headers. He tried to suppress the groups of 9 bytes in each one of them and miraculously the file was now reading perfectly. The space-saving hypothesis was finally completely debunked!
Solving the case: Audio Software Engineer to the rescue
Now that @Percight finally found what had altered Conker’s voice all that was left to answer was the purpose of these additional bytes? The headers were clearly not present without reason looking at their very specific pattern (0x4C 0x3A 0x01 0xXX 0x80 0x80/0x81 0xXX 0xXX 0x00). We started to think that this might have been unrelated to sound based on audio decoders not being able to read them properly. This final question could only be answered by someone who was on this project because it could be for many reasons. As the deadline for this research article was approaching we managed to get in touch with Mike Currington, who was Audio Software Engineer on the project at the time. He very kindly answered all of our questions and provided the missing piece to this puzzle.
Conker’s character, aside from being an indecent talking squirrel, has a hidden feature we failed to overlook by assuming the headers might be unrelated to audio. It turned out to be exactly that. All the other characters in the game just flap their mouths randomly during their dialogue but our cute hero benefited from preferential treatment in the animation process. He was animated with a lot of different facial and mouth expressions, which required various facial blend shapes that the artists had to synchronize during dialogs. Those extra bytes in the MP3 file are the remnants of the tool Michael Currington created for them to simplify lip-syncing during cinematic scenes. After months of research, we finally found the answer we were looking for. The world, and my article, was finally saved.
This story was originally published in French and through another Twitter Thread, we got into a little more information thanks to Rare’s team. Conker’s cartoon-like voice is pitched by two semitones, but the whole process and rendering were made before the implementation. Robin Beanland made sure to keep the same voice duration, so it became 100% obvious that there were no chances the issue happened to save storage space.
Chris Seavor recently started to share sketches and level design scans on his Twitter account: A great way to see how a game can change during development (Source: https://twitter.com/conkerhimself/status/1359160911769010183 ) (used with the kind permission of Chris Seavor)
The wise conclusion of our adventure
It is thanks to anonymous, very passionate people creating tools and sharing their results openly that we were able to dig-up this curious tale about the strange audio behind Conker’s audio file. Without the analysis from @Percight and the original game sound team’s answers, all of this could have become a big misunderstanding, and I might have shared the inaccurate history of Conker’s voice and sound. It may not seem like a big deal, but sharing the MP3s anomaly as a data-storage tactic would have been a double historical mistake. The first mistake is that it is false information. The second being there was a much more complex and interesting answer that should not be forgotten. Thanks to endless discussions with the right people, we shed light on a very interesting story about how games worked in the 1990s.
In the end, we learned that the easiest answer is not always the right one, and obvious pieces of information may be forgotten right when we need them. We also learned that a sound file sometimes contains more than just sound. This raises an interesting question as an archivist about what should be done with these lip sync bytes. They are altering our sound archive. Retrieving them from the sound files, even if it restores the sound authentic to the game, will likely tear off parts of this voiceover history. The MP3 file is not sufficient on its own to tell the full story, it can’t be listened to without context, making this problem pretty much unsolvable.
Technical tips and processes are still not documented and shared by programmers and sound integrators today. There are some obvious and important reasons for keeping these practices private starting with a work ethic and respecting NDAs. Once a game is released and we are moving on to the next generation of games, hardware, and technologies, another wave of oblivion will happen: some tools and tips will become obsolete. The tools and practices which are not shared are in danger to be lost and forgotten without proper archiving. As we have seen, the data extracted from the games never provide enough information to get a complete understanding of how they work and why. Reverse engineering is not a perfect science, consisting mostly of filling information gaps. This leaves game audio history with unanswered technical questions and without the tools to paint the full picture. Additionally, not all games draw the attention of amateur game tech historians and sound explorers' equally, and some great things may be lost forever because we aren’t looking in the right place or looking at the right time. If serendipity is what allows such interesting discoveries to be made, like the one in this article, then imagine how many we may miss as time goes on.
This whole journey was not for nothing. The “false” theory of space-saving holds historical truth. Through my research, we found that a few games did use sped-up audio files to optimize storage. Some sources say that this was a technique used up until the PlayStation3 era. We do not, however, have specific leads or examples of this method. Players can be found asking questions about sped-up sounding voices in Rainbow Six 3 voice overs with answers that this is (or was) a common practice in Ubisoft games. There are also older N64 games such as Super Mario and Mario Kart 64 that may have used this technique. As we learned today, we can never be sure until we have a straight answer. Perhaps someone is reading this article right now who has first-hand experience with these technical practices! If you would like to contribute to the continued documentation of game audio history you can contact me with your story so your work is not forgotten as technology continues to evolve.
Comments