Read Text from the binary data

Ramprasath R

Question

I need to extract text from a given file, which can be in any format, such as PDF, Excel, TXT, or even an image. The input will be binary data (the file), and the output should be the extracted text from the file. Are there any APIs or Forge components available to achieve this functionality? Please guide me on how to implement this

17 Jan 2025

Nuno Reis

MVP

Read text from any possible format? Yes, a new thing called AI can do that :)

There are multiple APIs in the Forge for it. It is more about cost so it will come down to business decision, not a simple technical choice.

The alternative would be to read the mimetype, determine if there is a way to read it as text and, if so, pass it to the according action (one for txt, one for excel, one for word, one OCR for images...). Leave any failed file to AI or to a manual process.

It is very specific. Without more details, not much we can say.

17 Jan 2025

Rammurthy Naidu Boddu

Champion

Hi @Ramprasath R,

As per you question, Binary data(any type).
How OS knows which type of data?

1. If it xlsx format(excel) then convert binary data Json by deserilze and use the forge compondent name convert josn to xlsx (JsontoXlsx).

2. If other than this like image, pdf, text files using AI support intergrations and do extract.

3. If images, using OCR scanner to extract image to text.

17 Jan 2025

Rui Mendes

Hello @Ramprasath R,

To achieve your goal of reading values from any type of binary, today you have AI systems at your disposal.

We can have a conversation to better understand the requirements so that I can create an execution plan for a solution that allows you to obtain data from the files.

At this moment, I have a solution that can be customized for both OS11 or ODC.