148
Views
6
Comments
How read data present in PDF in table format
Question
Application Type
Reactive
Service Studio Version
11.14.5 (Build 57418)

Hello Everyone,

I need to read data that is present in the format of a Table in PDF but whenever I convert it to Text the complete data of PDF gets mixed i.e. all rows and columns get mixed up(1st image). I need to read the data by rows or columns. Data in PDF looks something similar to this image(2nd image).


Thank you 

Shubham Mishra



ast_sci_data_tables_sample.pdf
2023-03-01 13-44-24
Siddhant Chauhan

Hi Shubham,


May be below component could help you extract pdf data to text

https://www.outsystems.com/forge/component-overview/1819/pdf-helper 


Thanks,

Siddhant

UserImage.jpg
Shubham Mishra

Thank You but I can convert PDF data to text(Image1 is converted data) but I need to get the data present in the table at once either row or column-wise.

2023-03-01 13-44-24
Siddhant Chauhan

try converting the data with pipe delimited values then pick the values accordingly based on |. you would be able to separate the data as required.

or could you pass me the pdf, may be I can help with conversion.

UserImage.jpg
Shubham Mishra

Sure, I have uploaded the PDF.

2021-10-09 07-57-44
Stefan Weber
 
MVP

Hi Shubam,

if you are willing to create your own extension you might take a look at tabula-sharp BobLd/tabula-sharp: Extract tables from PDF files (port of tabula-java) (github.com). This one detects tables in a document and extracts rows and columns. I personally only did some small tests as i have access to professional document extraction solution.

Best

Stefan

2023-09-07 04-17-38
Rachelle

Hello, I'm currently checking this tabula-sharp in github. Can you help on how I can use this in my extension? Thank you!

Community GuidelinesBe kind and respectful, give credit to the original source of content, and search for duplicates before posting.