27
Views
13
Comments
Solved
[Export PDF to images] PDF to Image conversion causes text to disappear
export-pdf-to-images
Service icon
Forge asset by Luis Filipe
Application Type
Reactive

I am currently converting balance sheets from PDFs to images using Export PDF to images forge asset to prepare them for OCR, however after my pdf is converted to an image, all the text is gone. I have included the before and after of my file, as shown below all the text is gone and only some lines remain.




Has anyone encountered this problem before and is it an issue with the asset? And if so how did you manage to fix it?

UserImage.jpg
Chin Kai
Solution

I convert the pdf you shared to word doc first and then convert them to pdf again. And it works. Hahaha. 
Here's the result, 


The attached PDF file is a converted version..

AmazonBS11.pdf
UserImage.jpg
Chin Kai

Hi @Clara Aw , have you tried the demo file? From demo file can take reference of the logic and how it is being implemented.

I tried the URL on the link provided seems to be working.

UserImage.jpg
Clara Aw

Hi @Chin Kai 

Yes I have tried out the demo both on my own env and through the url and the image returned is still missing the text.

Would passing the pdf I'm using over help?

UserImage.jpg
Chin Kai

What is your "PDF"? because I tried to convert accounting pdf demo and colors pdf demo. Doesn't seems to have any issue...

UserImage.jpg
Clara Aw

Just tested it out on a "PDF" of Amazon's balance sheet which I have included below do let me know if it works on your side.

AmazonBS.pdf
UserImage.jpg
Chin Kai

It cant work because the "text" is not recognized by the PDF readers as editable text..


For instance, a promo pdf page as shown below, the highlighted text can be picked up

comparing with the one you shared, its "disabled" for text extraction.


UserImage.jpg
Clara Aw


Thanks @Chin Kai for the explanation, so am I right to assume that any pdf to image conversion tool will encounter this "text disappearing" problem as long as text extraction is disabled on the pdf I am using? 

UserImage.jpg
Chin Kai
Solution

I convert the pdf you shared to word doc first and then convert them to pdf again. And it works. Hahaha. 
Here's the result, 


The attached PDF file is a converted version..

AmazonBS11.pdf
UserImage.jpg
Clara Aw


Haha nice discovery, I will give it a try on my side as well thank you for all the help!

2025-01-31 03-15-38
Irfan Ahamed Abdul Shukoor

Hi @Clara Aw ,

I have checked the plugin, it is working fine.

Can I know what kind of PDF are you trying to convert? 


Thanks.

UserImage.jpg
Clara Aw

Hi @Irfan Ahamed Abdul Shukoor 

I'm not very sure what kind of PDF it is as I did not personally create it. If there are any ways of identifying it using tools on outsystems please do let me know! 

For now I have included the image generated and the PDF of the Amazon balance sheet down below for your reference.

Thank you.


AmazonBS.pdf
2025-01-31 03-15-38
Irfan Ahamed Abdul Shukoor

Hi @Clara Aw ,

What @Chin Kai mentioned is true, the shared PDF is not recognizable as text, so please check how the PDF is generated.  

The Demo version even gives a error, while trying to convert.

Thanks.

UserImage.jpg
Luis Filipe

Hello everyone,

What’s happening is the asset is using the temporary file storage service File.io

However, File.io has now been integrated into LimeWire, and its API is likely no longer available or may have changed.

I'm working on it to fix — stay tuned for updates!

UserImage.jpg
Luis Filipe

📢 New Version Release – Improved Pdf File Upload & Download

Hi everyone! 👋

We’ve just released a new asset version that includes important improvements to the upload pdf file handling system. 

🚀 What's New:

  • ✅ Support for uploading and downloading PDF files

  • 📦 Automatic conversion of PDF files to ZIP format before download

  • 🛠️ Improved stability and error handling during file transfers

  • 🔁 Maintained full compatibility with other file types 


Community GuidelinesBe kind and respectful, give credit to the original source of content, and search for duplicates before posting.