Available Actions
Alternative action sets with suffixes "_FromREST" and "_WithREST" are also available, to enable input file content retrieval and result posting via REST APIs. These actions are meant to be used in situations where the external library input/output 5.5MB payload limit needs to be avoided.
OCR Capabilities (Tesseract 5)
TEXTractor supports text extraction from scanned PDFs and from the following image formats: bmp, gif, jpeg, pbm, png, tiff, webp.
English is the default language, but you can choose to use any of the tesseract supported languages (https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html).
TEXTractor installation includes English, Spanish, Portuguese and German trained data, but any other Tesseract supported language can be used by including the corresponding trained data file as an ODC App resource, and by passing its URL to TEXTractor. The expected trained data files can be found at the github tessdata_fast repository (https://github.com/tesseract-ocr/tessdata_fast).
When adding .traineddata files as resources, you must set the Deploy Action to "Deploy to Target Directory", and rename the file to include a .bin extension (e.g., fra.traineddata.bin). This bypasses OutSystems platform restrictions on non-whitelisted file extensions and ensures TEXTractor can fetch the resource.
Security & Privacy
All processing is performed entirely in-memory within the server context. No data is persisted, and no data ever leaves your environment.