- What the original concept and audio requirements were
- How we handled the dialogue recording and editing
- The systems design and implementation using Wwise and Unity.
If you haven’t played or seen it yet, Murderous Pursuits is a kill-or-be-killed Victorian stealth-em-up for 1-8 players in which you must hunt and kill your quarry before your hunters do the same to you, all while avoiding witnesses. You can buy it on Steam right now!
Firstly, what are Vignettes? As part of Murderous Pursuits’ stealthy gameplay you can use various spots around the level to blend into the environment or join crowds to either hide from hunters or stalk your quarry, and strike without warning. It looks like this in action:
We use Vignettes to bring more life to the world as well as serving a gameplay purpose. When people group together we want them to strike up conversations, make the levels feel (and sound!) busier, and pepper in a little world and character building to boot. The characters also have different talking animations based around three states: positive, negative and neutral, and incidental reactions such as nodding or disagreeing, to stop things looking and sounding too samey. Basically, we wanted to emulate actual group conversation.
You’re probably reading that and thinking “That sounds like a lot of VO needed there” and you’re not wrong, especially if we were looking to avoid the dreaded looping dialogue lines!
As a compromise and budget/time friendlier solution our Creative Director, Kitkat, floated using Simlish dotted with the occasional word or phrase to make it feel like the characters were discussing something instead. It could also play into the game’s comedic edge, with them going into random tangents about misadventures they may have had. A bit like the Drunk Guy sketch from the Fast Show:
This would reduce the writing required for conversation vignettes down to about 200 words/phrases per character as opposed to… a lot more, and pushed towards a more technical solution – setting up a playback system that can string random lines of Simlish together will occasionally throwing a word in. That is, of course, after we worked out how we wanted the Simlish to sound.
Simlish, for those who are unaware, is the language used in The Sims series of games. It was designed to be as universal as possible to provide a degree of context to what a player’s Sims were saying, while also acting as a practical solution to VO repetition and localisation. Currently it has an alphabet, rough but expansive phrase book, and even real world pop songs translated into the language. You can read a bit more here.
Planet Coaster also has a similar solution with Planco, their own in-world language. However, that was designed to be a working, functional language and even has its own official dictionary. You can read more about the design process of it here.
Unfortunately, we had neither the time nor the budget to do something quite as detailed, so we opted for a slimmer solution: having our actors voice different lengths of Simlish in each of the 3 tones. We also grouped these requirements into 4 lengths: Short, Medium, Long and Questions. Even within these sets, we have some variation in the length of the phrases for each character to help establish their quirks.
For example: The Brute’s short phrases are around 1 to 2 seconds, and his longer ones around the 8 second mark. He’s a pretty to-the-point kinda guy.
The Admiral, on the other hand, has shorter phrases around the 3 to 5 second mark and his longer ones reaching up to and beyond 20 seconds in length, reflecting his more blowhard, longwinded nature.
We also had to consider conversation flow and the fact that any word or phrase, Simlish or otherwise, could potentially tie into another.
But what about the actual recording process? While The Sims and Planet Coaster went for a more unified world language, each individual person in those games is practically a blank slate, with whatever the player projects or builds into them forming their background. Murderous Pursuits has characters that have some degree of backstory, not to mention varying nationalities, so we opted to give our actors a bit more freedom in terms of what they performed so long as it met our phrase length and tone criteria. All of them took the ball and ran with it, resulting in several different approaches. Kim Allan used news articles as a base, scrambling the words and making it sound more like Gaelic for the Scottish Duchess, while Jay Britton’s take on the Dodger involved him making up short stories, garbling the words while maintaining the ebb and flow of his tales that played into his Cockney cheeky chappy character. Here’s an example phrase:
Sounds about right! In terms of scope, each character archetype has roughly 600 individual VO clips in game, which is about 4800 files. When we add in Mr. X and the guards that comes to around 5100, which was trimmed down from over 12,000 takes in total. This covers everything from Simglish, spoken words and phrases like reactions, grunts like attacking and dying, and a whole other variety of weird requests that never made it into the game.
Which is a lot of talking, and even more editing. Especially for a single person! So, a special shout out to Stephen, who was a QA tester at the time before making the jump to marketing, who took on the grunt work of chopping up the Admiral’s lines when a deadline was looming. Here’s the how the final session of the Duchess looked:
As far as clean up went, I used Izotope’s RX suite to remove some of the pops and crackles that occur when people speak, like when your lips smack or open. You don’t really notice them in real life conversation, but in a quiet space close to a mic they can stand out and sound unnatural. After that, I used a gate and volume automation to cut out some of the background noises and control the tail offs of words during longer takes, and some De-essing to control plosive sounds (like… “ess”es). There was also some compression and EQ to even things out too.
One handy tip that I’ve seen a few other dialogue editors share is to record a few extra takes and pronunciations of certain troublesome letters and word endings that are plosives or stops (like, ess’s, t’s and f’s to name a few) in case you need to do some further edits and repairs. While I didn’t do this at the time around, there was enough content recorded that I was able to stitch together takes that would have been otherwise un-usable, and stretch out the number of variations we have even further.
Below is a quick example, where a rogue “th” got a little lost post-processing, so instead of trying to automate volume and EQs I grabbed a clean one from another take before the processing was applied and dropped it in. It might look a bit odd to have a mono clip in between two stereo ones. Ableton froze the tracks to stereo despite the original source being mono, and everything was summed to mono afterwards, so no weird spatial stuff going on in the end!
Phew. Let’s stop there. Come back for part two where we’ll cover actual implementation and systems design in Wwise, and some of the issues we encountered and how we solved them. A big thank you to our actors:
This was originally posted in two parts on the Blazing Griffin Website.