65
Views
4
Comments
Read Text from the binary data

I need to extract text from a given file, which can be in any format, such as PDF, Excel, TXT, or even an image. The input will be binary data (the file), and the output should be the extracted text from the file. Are there any APIs or Forge components available to achieve this functionality? Please guide me on how to implement this 

2016-04-22 00-29-45
Nuno Reis
 
MVP

Read text from any possible format? Yes, a new thing called AI can do that :)

There are multiple APIs in the Forge for it. It is more about cost so it will come down to business decision, not a simple technical choice.


The alternative would be to read the mimetype, determine if there is a way to read it as text and, if so, pass it to the according action (one for txt, one for excel, one for word, one OCR for images...). Leave any failed file to AI or to a manual process.


It is very specific. Without more details, not much we can say.

2024-09-17 12-24-07
Rammurthy Naidu Boddu
Champion

Hi @Ramprasath R

As per you question, Binary data(any type). 
How OS knows which type of data? 

1. If it xlsx format(excel) then convert binary data Json by deserilze and use the forge compondent name convert josn to xlsx (JsontoXlsx). 

2. If other than this like image, pdf, text files using AI support intergrations and do extract. 

3. If images, using OCR scanner to extract image to text. 

2021-11-19 11-12-44
Rui Mendes

Hello @Ramprasath R

To achieve your goal of reading values from any type of binary, today you have AI systems at your disposal.  

We can have a conversation to better understand the requirements so that I can create an execution plan for a solution that allows you to obtain data from the files.

At this moment, I have a solution that can be customized for both OS11 or ODC

2024-05-14 06-49-08
Karnika-EONE

Hi @Ramprasath R 

You can use this forge Component:https://www.outsystems.com/forge/component-overview/2881/amazon-rekognition-o11.this is only for image and video.

Sign In AWS and get needed Authentication credentials and use this Forge.

1.Google Cloud Vision API 

2.Microsoft Azure Cognitive Services (OCR ).API's is also available.


1.For PDFs, use a PDF extraction library or Cloudmersive API.

2.For Excel, leverage the Excel File Parser or Cloudmersive.

3.For images, use Amazon Rekognition, Google Vision, or Azure OCR.


Best Regards

Karnika.K

Community GuidelinesBe kind and respectful, give credit to the original source of content, and search for duplicates before posting.