How can I read the data from pdf inside tables and extract the heading text

SreenivasuluReddy Lingala

Question

Application Type

Reactive

I need help with a PDF file that contains text and tables. How can I read the data from the tables and extract the heading text

07 Apr 2025

Vignesh Sekar

Dear,

Please try this component

https://www.outsystems.com/forge/component-documentation/1819/pdf-helper-o11/0

Action name ReadTextFromPDF()
Based on the table position you can extract the header value

https://miguel-antunes.outsystemscloud.com/PDFHelperDemoApp/ReadTextFromPDF.aspx?(Not.Licensed.For.Production)=

07 Apr 2025

4 replies

Last reply 07 Apr 2025

Show thread

Hide thread

SreenivasuluReddy Lingala

Hi @Vignesh Sekar

thanks for your reply, I tried that forge component it should only give the text format for the pdf file, But i want to read and get the data from pdf inside tables and text

07 Apr 2025

Vignesh Sekar

Replying to SreenivasuluReddy Lingala's comment on 07 Apr 2025 05:02:22

Does the PDF have any standard formats?

If yes, we can use this component because it will extract each word. For example, if your table was on the 3rd line, you can read the 3rd line of the extracted text and do a small workaround after extracting the word.
(I tried with this attached sample pdf i can able to extract and read data by position)

If not, we can't use this component.

sample-tables.pdf

07 Apr 2025

SreenivasuluReddy Lingala

Replying to Vignesh Sekar's comment on 07 Apr 2025 05:14:34

Hi @Vignesh Sekar

this is the pdf format i want to get the headings of Red and Yellow color and remaining table

07 Apr 2025

Vignesh Sekar

Replying to SreenivasuluReddy Lingala's comment on 07 Apr 2025 05:28:40

dear

can you share the pdf (change some dummy value and share if its confidential)

07 Apr 2025

Venkatesaiya

Hi ,
Can you please take a look on this discussion https://www.outsystems.com/forums/discussion/75875/extract-data-from-pdf/

07 Apr 2025

Mandar Deshpande

Hi @SreenivasuluReddy Lingala

You can build a small Integration Studio extension using iText7 to read table data from a PDF.

This would support table detection, can read text inside table cells.

This works on server and no 3rd-party API calls so would be secure.

12 Dec 2025

Miguel Verdasca

Champion

Hi,

In practice this depends on how the PDF is generated.

If the PDF has selectable text (not scanned), you can extract text and tables using a PDF parser (e.g. PDFBox / iText) and then identify headings based on layout information such as font size, position, or style.
If the PDF is scanned or you need to detect headings by colour (red/yellow), you’ll need an OCR / Document AI approach. Services like AWS Textract, Azure Form Recognizer or Google Document AI can extract tables and structured text. If colour is a requirement, an additional image-processing step is needed to detect coloured regions before or after OCR.

From an OutSystems perspective, the usual approach is to:

Integrate one of these services via REST,
Receive structured JSON (headings + table rows/columns),
Map the result into OutSystems structures.

Pure OutSystems logic alone is usually not enough for reliable table and heading extraction from PDFs.

Hope this helps.

12 Dec 2025

Community GuidelinesBe kind and respectful, give credit to the original source of content, and search for duplicates before posting.

See the full guidelines