How read data present in PDF in table format

Back to Forums

Shubham Mishra

Question

Reactive

Application Type

Reactive

Service Studio Version

11.14.5 (Build 57418)

Hello Everyone,

I need to read data that is present in the format of a Table in PDF but whenever I convert it to Text the complete data of PDF gets mixed i.e. all rows and columns get mixed up(1st image). I need to read the data by rows or columns. Data in PDF looks something similar to this image(2nd image).

Thank you

Shubham Mishra

ast_sci_data_tables_sample.pdf

25 Jan 2022

Siddhant Chauhan

Hi Shubham,

May be below component could help you extract pdf data to text

https://www.outsystems.com/forge/component-overview/1819/pdf-helper

Thanks,

Siddhant

25 Jan 2022

1 reply

25 Jan 2022

Show thread

Hide thread

Shubham Mishra

Thank You but I can convert PDF data to text(Image1 is converted data) but I need to get the data present in the table at once either row or column-wise.

25 Jan 2022

Siddhant Chauhan

try converting the data with pipe delimited values then pick the values accordingly based on |. you would be able to separate the data as required.

or could you pass me the pdf, may be I can help with conversion.

25 Jan 2022

1 reply

25 Jan 2022

Show thread

Hide thread

Shubham Mishra

Sure, I have uploaded the PDF.

25 Jan 2022

Stefan Weber

MVP

Hi Shubam,

if you are willing to create your own extension you might take a look at tabula-sharp BobLd/tabula-sharp: Extract tables from PDF files (port of tabula-java) (github.com). This one detects tables in a document and extracts rows and columns. I personally only did some small tests as i have access to professional document extraction solution.

Best

Stefan

25 Jan 2022

1 reply

23 Nov 2023

Show thread

Hide thread

Rachelle

Hello, I'm currently checking this tabula-sharp in github. Can you help on how I can use this in my extension? Thank you!

23 Nov 2023

Community GuidelinesBe kind and respectful, give credit to the original source of content, and search for duplicates before posting.

See the full guidelines