Mobile app - Speech recognition

I'm starting to look into some of the mobile application aspects of Outsystems.   I have an IVR/Auto-Attendant background but struggling to determine how to implement speech recognition with Outsystems.   I have seen a Speech reco plugin on the forge, but it appears that it just captures what the user says, I don't see any way to define a list of grammars/phrases that it should be listening for.   

When I work with Nuance and I am "listening" for a phrase I can define many possible utterances and then tag those out as a certain phrase.   Example :  "Account summary", "Account", "Show me my account", "give me account information"   I could tag all of those as account and then handle it appropriately.     

What I'm wanting to implement is a screen routing based on the user input.  For example, say we have an account screen, account details and user settings.     If the user presses the speech button and we capture "Show me my account" the account screen would be displayed, but since there are so many ways a person might say that phrase I'm struggling to determine the best practice to actually make it work.   Am I missing something obvious here? 

Solution

Hello Josh,


I never implemented an algorithm like this,

But for starters, I would build a library of keywords to search the action the user wants to do, like "go", "show me", "I want to see", etc, that users would use to say they want to go/see something.

Then, to know where they want to "go" I would search for keywords in the speech, on your example, I would search for the words "account", "account details" and "User Settings" and then with both steps I would redirect the user


Best Regards

Francisco Freire

Solution

Francisco Freire wrote:

Hello Josh,


I never implemented an algorithm like this,

But for starters, I would build a library of keywords to search the action the user wants to do, like "go", "show me", "I want to see", etc, that users would use to say they want to go/see something.

Then, to know where they want to "go" I would search for keywords in the speech, on your example, I would search for the words "account", "account details" and "User Settings" and then with both steps I would redirect the user


Best Regards

Francisco Freire

So make a keyword entity with utterance phrases along with the desired "tag".      Then when recognition is complete, pull the list of phrases from the aggregate and loop through until a match is found. If found set a navigation variable to the tag value.   At the end of the loop use a switch that handles the current navigation variable value.

I think that would work, and it's essentially what I am used to while working with Nuance/telephony systems.


Thanks!

Hello again,

No problem, let me know if you need any further help.


Best Regards,

Francisco Freire.


Just a general question, but what is a typical use case for speech reco?   I assume navigating through a site isn't "normal" so is it typically just used to gather user input, for a dictation or web searches?

Josh Herron wrote:

Just a general question, but what is a typical use case for speech reco?   I assume navigating through a site isn't "normal" so is it typically just used to gather user input, for a dictation or web searches?


Hello,

As I said i never used Speech to text but there are 2 cases that passed by me during the years, never implemented, just ideas for app use cases:

  • Accessibility: so people with disabilities could easily use the apps
  • Audit applications: so the auditor could speak during the evaluation and record the info, or translate audio files recorded by the auditor to text after the audit.


Best Regards

Francisco Freire

Gotcha, I thought you meant you had never used it for navigation.   Thanks!