There have been many innovations in voice communication over the past few hundred years — everything from phonographs to telephones to digital recordings and playback. Humanity has taken another leap forward in recent decades with voice-recognition and -controlled technologies like Google Assistant, allowing people to interact, learn, and create simply by saying, “Hey, Google.”

On the latest episode of our podcast, Decoded, we talked to Jessica Earley-Cha, developer relations engineer on the Assistant team at Google. In this role, she helps developers understand how to integrate their content and services into Google Assistant to effectively reach users in this new medium.

Turning Up the Volume

These days, voice feels like it's everywhere. Google, Apple, and Microsoft have all built voice assistants into their products, while voice-activated assistant devices are now found in homes around the world. However, Earley-Cha notes that the technology is still in its early stages, with developers only beginning to scratch the surface of what they can build. She said: 

“Where we’re at in voice is where we were in mobile ten years ago. It was the Wild West and we were still figuring things out at that point. I like to think of that when it comes to voice development. We have a lot of tooling, but we don’t exactly know what’s ideal. How do you even market it? Where does it fit within the larger ecosystem?”

For developers used to building apps with a screen and tactile interface, developing for voice can require a whole new way of thinking. Software has to listen for different languages, accents, and vocal cues to understand user intent, making it much more difficult to know what a user wants compared to having someone click a button.

Intensifying the Focus on Intent

“It’s one thing to translate audio into text when someone knows they’re being recorded versus how they naturally talk. Then, each region has their own way of talking about things — for example, ‘soda’ versus ‘pop’ in the US. It’s the same thing, but it depends on the region you’re in.”

Said Earley-Cha. For bilingual households, the app may even have to interpret instructions from multiple languages within a single conversation to not only understand what is being said, but to also know which language to use to respond.

With tools like Actions Builder and Actions SDK, Google is providing the functionality to help solve for intent so developers can focus on their apps.

“We focus on the concept of a scene; it’s this idea that in this time, certain activities will happen. We’re giving developers that type of tooling to help them tether something so ethereal and make something that’s not concrete more concrete.”

By using machine learning to understand context, users can interact with voice in a holistic, conversational way instead of in a structured, unnatural manner.

The Future of Voice

The goal for Earley-Cha and her team is to help developers build a foundation for voice development.

“I’m hoping there will be a day where we have these common pathways or user journeys that make it easier for developers, such as the user journey for checkout, making an order, or viewing inventory. Then, a developer can focus on adding the information they need that is unique to their experience. Because that’s what we did with all other platforms where we don’t build from scratch anymore. You download boilerplate code and you build off of that, and you grow out different pieces.” 

In the future, developers will need to think outside the screen to ensure their apps are optimized for voice. This will require them to build more flexibility and fluidity into their work, as well as work with conversation designers who understand how users interact with voice.

“It’s not like a GUI where you can offer two buttons and that’s all a user can do. You can’t just build something with voice as a developer; you need a designer who understands word choices and how people communicate.”

Check out this week’s Decoded podcast to learn more about how Jessica Earley-Cha and Google helps developers create voice-driven apps and services. Listen now, and subscribe to future episodes today.