[Simple OCR] How to fix error on Simple OCR "Could not initialize tesseract"

Back to Forums

Jessica Marques

Solved

Question

Forge

Hi,

I´m working with the forge component "Simple OCR" https://www.outsystems.com/forge/component-overview/3086/simple-ocr

And I am having problems to convert the image text.

Note: In my personal environment it works perfectly, but in another environment I get this error message: "Could not initialize tesseract."

Any Ideia about what I can do to fix it?

06 Jan 2022

Takasi Moriya

MVP

Solution

Replying to Jessica Marques' comment on 07 Jan 2022 08:43:42

ExtractTextFromMemImage action of SimpleOCR extension takes three parameters.
DataPath is the third parameter of it.
You can specify the path by an expression like below.

Path_GetApplicationDirectory.ApplicationDirectory + "\tessdata"

Path_GetApplicationDirectory is an action of FileSystem.
You might find something by checking the data file by using File_Exists action of FileSystem.

See solution in context

07 Jan 2022

Takasi Moriya

MVP

It seems that trained data file was not able to read.

Is the following sample component able to perform in your environment?
https://www.outsystems.com/forge/component-overview/3500/simple-ocr-sample

Specifying DataPath parameter with exact directory path may solve your problem.
You might need Forge's FileSystem to build the exact directory path.
https://www.outsystems.com/forge/component-overview/68/filesystem

I hope it helps you.

07 Jan 2022

6 replies

Last reply 07 Jan 2022

Show thread

Hide thread

Jessica Marques

Hi, @Takasi Moriya

Thanks for your attention.

1. Yes, I was guided by Simple OCR Sample in the two environments where I used the component.

2. "Specifying DataPath parameter with exact directory path may solve your problem." Could you clarify more about the directory? "trained data file was not able to read." Means that I need some configuration on this resources?

Regards, Jessica Marques.

07 Jan 2022

Stefano Valente

Replying to Jessica Marques' comment on 07 Jan 2022 08:43:42

I have no experience with the OCr component, but why does your resource have a \ in the name, where the path has / (as they should have)?

07 Jan 2022

Jessica Marques

Replying to Stefano Valente's comment on 07 Jan 2022 10:22:36

Hi, these prints are from the sample built by Takashi, and I entered a new language just like the features that were already (jpn and eng). The name of the resources is the same as the one I uploaded from this page: https://tesseract-ocr.github.io/tessdoc/Data-Files

Regards,

Jessica.

07 Jan 2022

Takasi Moriya

MVP

Solution

Replying to Jessica Marques' comment on 07 Jan 2022 08:43:42

ExtractTextFromMemImage action of SimpleOCR extension takes three parameters.
DataPath is the third parameter of it.
You can specify the path by an expression like below.

Path_GetApplicationDirectory.ApplicationDirectory + "\tessdata"

Path_GetApplicationDirectory is an action of FileSystem.
You might find something by checking the data file by using File_Exists action of FileSystem.

07 Jan 2022

Stefano Valente

Replying to Jessica Marques' comment on 07 Jan 2022 10:39:02

when i download the jpn, i get 43mb, but your file is only 2,4...

Could it be a corrupt file?

07 Jan 2022

Jessica Marques

Replying to Takasi Moriya's comment on 07 Jan 2022 10:40:14

Hi @Takasi Moriya

Now it works!

Thanks a lot the help.

Best Regards,