Reading Data from PDF

Reading Data from PDF

  

Team,

Is there a way or any forge component which i can use to read data from PDF files.


Shashank...

Hi Shashank,

I'm not sure if there is any forge component to do so.

But we are using Apache tika library to read the content from PDF and expose the service as REST and integrate that into OutSystems.

Let me know if you need more details.

Thanks,

Som

Hi Shashank,

There are various, mostly commercial, components available that let you do this. I have not yet seen a free one though (and I'm not sure whether there's anything in the Forge).

Som wrote:

Hi Shashank,

I'm not sure if there is any forge component to do so.

But we are using Apache tika library to read the content from PDF and expose the service as REST and integrate that into OutSystems.

Let me know if you need more details.

Thanks,

Som

Thanks for the reply Som. I have also started to look if there is any free .Net component or API available so that i can create an extension and use it in OutSystems.

Shashank...


Kilian Hekhuis wrote:

Hi Shashank,

There are various, mostly commercial, components available that let you do this. I have not yet seen a free one though (and I'm not sure whether there's anything in the Forge).

Thanks Kilian. I didn't find anything in Forge either. I have started looking for some free component. Will post the details if i come across anything.

Shashank...


Hi Shashank,

I found this on Forge. Please explore this:

https://www.outsystems.com/forge/component/1819/pdf-helper/

Thanks,

Som


Thanks Som, I will check this out.

Shashank...

Hi Shashank,

there is a free api named OCR API which can extract text from images and PDF too.

Here is the link.

But that's OCR. PDFs typically contain actual text, and you don't need to OCR things.

Kilian Hekhuis wrote:

But that's OCR. PDFs typically contain actual text, and you don't need to OCR things.

you are partially correct. PDF may contain scanned images of text documents as well.


Yes, that's why I said "typically".

Debasis,

I my case i just have data in text format, in the PDF which i have to read.


Shashank...