“Why do robots need audio?” I get asked this a lot and, honestly, I had asked the same question when I got to Anki a few years back. Anki is a San Francisco based robotics/AI company of about 150 people and growing fast. We applied some of our skills towards developing smartphone-driven physical toy products, such as DRIVE and OVERDRIVE. Our latest project, COZMO, goes a step further by applying robotics and AI to bring a robot character to life in the vein of Pixar’s Wall-E. And this is just the beginning.
Building robots in the real world is really hard, especially consumer robots that can be mass manufactured at a price point that doesn’t cost a fortune. As I found out, most consumer “robots” are actually simple gimmicky remote “drones” without agency or intent, or so limited in capabilities that you couldn’t really call them robots. Real robots have AI, pathfinding, memory mapping, computer-vision, the capability of manipulating their environment around them, and much more. I’m certainly not a roboticist, so I speak as an Anki layman. Anki not only wants to build real robots for the consumer space, but to bring robots to a price point where a child can own one (or more), much like a gaming console, complete with an SDK.
I saw the plans for COZMO on my first day at Anki and I could just feel the excitement surrounding audio. The team didn’t exactly know how audio was going to enrich the COZMO experience, they just knew that it would. The Anki team was certain that in order to make our robots feel like a character, and especially in order to keep the mechanical costs down, we would use interactive audio to give COZMO personality and charm, and to keep the user engaged, informed, and entertained. The content we generate is very much in the tradition of the best that film and games have to offer. COZMO would have a uniquely emotive voice and a language all his own, along with a custom soundtrack to accompany him on his adventures. And, all of this is being powered by Wwise.
One of the first decisions I made when I got to Anki was to choose Wwise as our audio middleware platform. There were bumps along the way, but in the end, the team understood it was important to leverage tools available to us now to get results faster. With Wwise, we were able to quickly get audio on the smart devices that run the App that talks to COZMO. Furthermore, with access to Wwise source code, we were able to write custom plug-in’s to talk to COZMO’s onboard speaker, ensure sample accuracy, and manage audio buffers more efficiently. In many ways, Wwise has become the common audio “language” speak here at Anki. Wwise makes it easy to describe non-linear audio concepts and behaviors to others who may not be audio professionals. While we still do plenty of custom workflows and pipeline work, Wwise gave us the jump start we needed to focus more on content aesthetics, and less on building tools.