I need to extract text from a given file, which can be in any format, such as PDF, Excel, TXT, or even an image. The input will be binary data (the file), and the output should be the extracted text from the file. Are there any APIs or Forge components available to achieve this functionality? Please guide me on how to implement this
Read text from any possible format? Yes, a new thing called AI can do that :)
There are multiple APIs in the Forge for it. It is more about cost so it will come down to business decision, not a simple technical choice.
The alternative would be to read the mimetype, determine if there is a way to read it as text and, if so, pass it to the according action (one for txt, one for excel, one for word, one OCR for images...). Leave any failed file to AI or to a manual process.
It is very specific. Without more details, not much we can say.
Hi @Ramprasath R,
As per you question, Binary data(any type). How OS knows which type of data?
1. If it xlsx format(excel) then convert binary data Json by deserilze and use the forge compondent name convert josn to xlsx (JsontoXlsx).
2. If other than this like image, pdf, text files using AI support intergrations and do extract.
3. If images, using OCR scanner to extract image to text.
Hello @Ramprasath R,
To achieve your goal of reading values from any type of binary, today you have AI systems at your disposal.
We can have a conversation to better understand the requirements so that I can create an execution plan for a solution that allows you to obtain data from the files.
At this moment, I have a solution that can be customized for both OS11 or ODC.
Hi @Ramprasath R
You can use this forge Component:https://www.outsystems.com/forge/component-overview/2881/amazon-rekognition-o11.this is only for image and video.
Sign In AWS and get needed Authentication credentials and use this Forge.
1.Google Cloud Vision API
2.Microsoft Azure Cognitive Services (OCR ).API's is also available.
1.For PDFs, use a PDF extraction library or Cloudmersive API.
2.For Excel, leverage the Excel File Parser or Cloudmersive.
3.For images, use Amazon Rekognition, Google Vision, or Azure OCR.
Best Regards
Karnika.K