How do I use cognitive services to create AI-enabled Apps?

Connectors from the OutSystems Forge enable developers and users access to cognitive services that they can add to their applications to solve business problems from unstructured data. These services include:

  1. Language Services
  2. OutSystems.AI Document Processor
  3. Vision Services

Language Services

Language services are used for speech recognition, intent recognition, transcription, synthesization, and content and sentiment analysis. These connectors and components are available from OutSystems Forge for accessing language analysis: 

  • OutSystems.AI Language Analysis
  • Azure Speech Services
  • IBM Watson Speech Services and Text Services

OutSystems.AI Language Analysis

When content is collected from a document, webchat, social media, it still has to be processed and analyzed, especially documents with free text fields. OutSystems.AI Language Analysis is a set of pre-built services that developers can add to applications to determine customer sentiment, needs, and issues, and to accelerate business processes. No training is required. 

OutSystems.AI Language Analysis receives text as an input parameter, along with a language code, and uses it for the following services:

  • Sentiment analysis: Scores the text based on its sentiment, such as negative (0-39%), neutral (40-69%), and positive (70-100%). 
  • Key phrase extraction: Analyzes the text and highlights the key phrases that were found.
  • Entities detection: Analyzes the text, and highlights the entities that were found.
  • Translator: Automatically detects source language (text can also be defined) and provides translation. Single word translations include alternatives.
  • Language detection: Automatically detects language.
  • Spell check: Detects spelling mistakes in the text and returns the errors, using a market code instead of a language code. Also suggests alternative spellings for those errors.

OutSystems.AI Language Analysis also has a service for converting speech-to-text. A server action receives an audio file as input parameter, its file format, the language code to identify the spoken language to be recognized, and a profanity mode that specifies what to do in the case of receiving an audio file with profanity in it.

Azure Speech and Text Services

Developers can also use the Azure Cognitive Services Connector to access services for speech and text right from the OutSystems IDE and add them to their apps. 

Azure Speech Services include:

  • Speech transcription: Converts spoken audio to text.
  • Custom speech service: Trains a speech model with custom speech analysis for specific use cases.
  • Speaker verification: Uses voice to verify a speaker’s claim of identity, powering applications with an intelligent verification tool.
  • Speaker identification: Determines the identity of an unknown speaker by comparing the speaker’s input audio with a group of selected speakers and returning an identity if one is found.
  • Text to speech: Converts text to audio in near real-time and plays back so applications can speak to users naturally, improving accessibility and usability.

There are also Azure Text Services available that are similar to those offered by OutSystems.AI Language Analysis.

IBM Watson Services for Speech and Text

Developers can also use IBM Watson Speech Services and IBM Watson Text Services right from the OutSystems IDE and add them to their apps. These services are a part of the IBM Watson Services Forge component. Watson Speech Services can convert spoken audio into text or written text into natural sounding audio, enable the use of voice for verification, or add speaker recognition to an app, all in a variety of languages and voices.  The text-to-speech service produces human-like audio from written text in multiple languages and tones. It makes content more accessible to users with different abilities or activities, such as providing audio options to avoid distracted driving for example. 

OutSystems.AI Document Processor

Document processing is another type of cognitive service that OutSystems makes available to developers. It automates and accelerates forms, applications, and other document processing. OutSystems.AI Document Processor enables developers to add capabilities to their applications to reduce the manual effort spent analyzing and standardizing documents.

With OutSystems.AI Document Processor, developers and other users can:

  • Capture and qualify data from thousands of documents to reduce processing from days to hours.
  • Identify and classify important account-related documents to inform customer strategy and experience.
  • Transform forms into usable data quickly and cost-effectively, so that end-users benefit from accelerated time-to-insight.  

OutSystems.AI Document Processor includes the following services:

  • Forms Recognizer
  • Analyze Receipt
  • Analyze Layout

Form Recognizer

This tailored service identifies and extracts text, key/value pairs and table data from form documents. It is a custom service that generates and uses a model trained with a few sample documents. 

Examples of how Form Recognizer can be used include:

  • Automation of customer onboarding: Onboarding a new customer by automatically extracting the key value entities from a previous electric bill receipt. The model is trained based on standard electric bill receipts.
  • Customs clearance document processing: Automatically processing and approving customs clearance requests by validating submitted documents.

Analyze Receipt

This pre-built receipt API identifies and extracts key information from sales receipts, such as the time and date of the transaction, merchant information, the amounts of taxes, totals, and more, with no training required. Currently, it is tailored to the format of US receipts. 

An example of how Analyze Receipt can be used is for expense reporting. It can automatically extract merchant and transaction information from USA receipts, significantly reducing the manual effort of reporting and auditing expenses.

Analyze Layout

This service extracts text and table structure (the row and column numbers associated with the text) using high-definition optical character recognition (OCR). 

Vision Services

These services include image-processing algorithms that intelligently identify, caption, index, and moderate pictures and videos. It’s possible to understand the content of an image, classify it, detect individual objects and faces within images, and read printed words in the images (OCR). With the help of Forge components and connectors, developers can take advantage of the following services:

  • Azure Vision Services
  • Google Vision Services
  • IBM Vision Services
  • AWS Rekognition

Azure Vision Services

Using the Azure Cognitive Services Connector, developers can pull Microsoft Azure Vision Services right into the OutSystems IDE and add them to their apps. Azure Vision Services offer:

  • Image analysis: Returns information about visual content found in an image, identifying content and labeling it with confidence. It detects objects and potential adult content and retrieves their location from an image. It also identifies image types and color schemes.
  • Recognition of text in image (handwritten or printed): Uses optical character recognition (OCR) to detect text in an image and then pulls any recognizable words into a stream of characters that are machine-readable.
  • Recognition of celebrities and landmarks: Spots famous people from business, politics, sports, and entertainment from images, as well as natural and manmade landmarks from all over the globe.
  • Thumbnail generation: Produces a thumbnail based on any image, and modifies images to best suit the needs for size, shape, and style. Smart cropping generates thumbnails that differ from the aspect ratio of your original image while preserving the part that’s of interest.
  • Face verification: Checks if two faces belong to the same person and assigns a confidence score for how likely it is that the two faces belong to the same person.
  • Face detection: Finds human faces in an image, using face rectangles to show the location of a face or faces in the image, along attributes such as age, emotion, and gender..
  • Emotion recognition: Analyzes facial expressions and returns a confidence score for the likelihood of emotions, such as anger, contempt, disgust, fear, happiness, neutral, sadness, and surprise.
  • Custom model for image recognition: Trains the image recognition model with custom images and custom tags for specific use cases.

Google Vision Services

The Google Cloud Vision OCR component available from the Forge enables developers to use Google Cloud Vision right in the OutSystems IDE and add these services to their apps:

  • Extract text: Uses OCR to detect text in images and automatically identifies language.
  • Identify text in image: Uses object localization to create a list of all the text objects in an image and identifies the area of the image where text was detected.
  • Get specific data types in image: Uses specified regular expressions to pull text from an image, which is useful for emails or dates.

IBM Watson Vision Services

Developers can access these services through the IBM Watson Services component available from the Forge, Developers can use these services right in the OutSystems IDE and add them to their apps. IBM Watson Visual Recognition Services offer:
  • Facial analysis: Analyzes faces in images and identifies estimated age, gender, and names of celebrities.
  • Image recognition: Tags and classes objects in images, and can be trained to use custom classes

AWS Rekognition

Available from the Forge, the Amazon Rekognition Face Matching component enables developers to use AWS Recognition capabilities in the OutSystems IDE and add them to their applications. AWS Rekognition offers:
  • Face matching: Allows the application to create collections of faces. These collections can then be "searched" to see if they match a sample image.
  • Add face to collections: Adds a new facial image to the specified collection.