TEXTractor - Documentation (O11)

Stable version 2.7.0 (Compatible with OutSystems 11)

Uploaded

on 27 May (2 days ago)

5.0

(1 rating)

Documentation

2.7.0

Available Actions

GetText - Get file content in plain text.
GetMetadata - Get file metadata in a structured format.
GetDocument - Get document content in a structured format.
GetDom - Get dom content in a structured format.
GetEmail - Get email content in a structured format..
GetSlideshow - Get slideshow content in a structured format.
GetSpreadsheet - Get spreadsheet content in a structured format.
GetVCard - Get vcard content in a structured format.

OCR Capabilities (Tesseract 5)

TEXTractor supports text extraction from scanned PDFs and from the following image formats: bmp, gif, jpeg, pbm, png, tiff, webp.

English is the default language, but you can choose to use any of the tesseract supported languages (https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html).

Language trained data files are automatically fetched from github tessdata_fast repository (https://github.com/tesseract-ocr/tessdata_fast) and cached in the front-end temp directory.

Security & Privacy

All processing is performed entirely in-memory within the server context. No data is persisted, and no data ever leaves your environment.

2.6.0

Available Actions

GetText - Get file content in plain text.
GetMetadata - Get file metadata in a structured format.
GetDocument - Get document content in a structured format.
GetDom - Get dom content in a structured format.
GetEmail - Get email content in a structured format..
GetSlideshow - Get slideshow content in a structured format.
GetSpreadsheet - Get spreadsheet content in a structured format.
GetVCard - Get vcard content in a structured format.

OCR Capabilities (Tesseract 5)

TEXTractor supports text extraction from scanned PDFs and from the following image formats: bmp, gif, jpeg, pbm, png, tiff, webp.

English is the default language, but you can choose to use any of the tesseract supported languages (https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html).

Language trained data files are automatically fetched from github tessdata_fast repository (https://github.com/tesseract-ocr/tessdata_fast) and cached in the front-end temp directory.

Security & Privacy

All processing is performed entirely in-memory within the server context. No data is persisted, and no data ever leaves your environment.

2.5.0

Available Actions

GetText - Get file content in plain text.
GetMetadata - Get file metadata in a structured format.
GetDocument - Get document content in a structured format.
GetDom - Get dom content in a structured format.
GetEmail - Get email content in a structured format..
GetSlideshow - Get slideshow content in a structured format.
GetSpreadsheet - Get spreadsheet content in a structured format.
GetVCard - Get vcard content in a structured format.

OCR Capabilities (Tesseract 5)

TEXTractor supports text extraction from scanned PDFs and from the following image formats: bmp, gif, jpeg, pbm, png, tiff, webp.

English is the default language, but you can choose to use any of the tesseract supported languages (https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html).

Language trained data files are automatically fetched from github tessdata_fast repository (https://github.com/tesseract-ocr/tessdata_fast) and cached in the front-end temp directory.

Security & Privacy

All processing is performed entirely in-memory within the server context. No data is persisted, and no data ever leaves your environment.

2.4.1

Available Actions

GetText - Get file content in plain text.
GetMetadata - Get file metadata in a structured format.
GetDocument - Get document content in a structured format.
GetDom - Get dom content in a structured format.
GetEmail - Get email content in a structured format..
GetSlideshow - Get slideshow content in a structured format.
GetSpreadsheet - Get spreadsheet content in a structured format.
GetVCard - Get vcard content in a structured format.

OCR Capabilities (Tesseract 5)

TEXTractor supports text extraction from scanned PDFs and from the following image formats: bmp, gif, jpeg, pbm, png, tiff, webp.

English is the default language, but you can choose to use any of the tesseract supported languages (https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html).

Language trained data files are automatically fetched from github tessdata_fast repository (https://github.com/tesseract-ocr/tessdata_fast) and cached in the front-end temp directory.

Security & Privacy

All processing is performed entirely in-memory within the server context. No data is persisted, and no data ever leaves your environment.

2.4.0

Available Actions

GetText - Get file content in plain text.
GetMetadata - Get file metadata in a structured format.
GetDocument - Get document content in a structured format.
GetDom - Get dom content in a structured format.
GetEmail - Get email content in a structured format..
GetSlideshow - Get slideshow content in a structured format.
GetSpreadsheet - Get spreadsheet content in a structured format.
GetVCard - Get vcard content in a structured format.

OCR Capabilities (Tesseract 5)

TEXTractor supports text extraction from scanned PDFs and from the following image formats: bmp, gif, jpeg, pbm, png, tiff, webp.

English is the default language, but you can choose to use any of the tesseract supported languages (https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html).

Language trained data files are automatically fetched from github tessdata_fast repository (https://github.com/tesseract-ocr/tessdata_fast) and cached in the front-end temp directory.

Security & Privacy

All processing is performed entirely in-memory within the server context. No data is persisted, and no data ever leaves your environment.

2.3.2

Available Actions

GetText - Get file content in plain text.
GetMetadata - Get file metadata in a structured format.
GetDocument - Get document content in a structured format.
GetDom - Get dom content in a structured format.
GetEmail - Get email content in a structured format..
GetSlideshow - Get slideshow content in a structured format.
GetSpreadsheet - Get spreadsheet content in a structured format.
GetVCard - Get vcard content in a structured format.

OCR Capabilities (Tesseract 5)

TEXTractor supports text extraction from scanned PDFs and from the following image formats: bmp, gif, jpeg, pbm, png, tiff, webp.

English is the default language, but you can choose to use any of the tesseract supported languages (https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html).

Language trained data files are automatically fetched from github tessdata_fast repository (https://github.com/tesseract-ocr/tessdata_fast) and cached in the front-end temp directory.

Security & Privacy

All processing is performed entirely in-memory within the server context. No data is persisted, and no data ever leaves your environment.

2.3.1

Available Actions

GetText - Get file content in plain text.
GetMetadata - Get file metadata in a structured format.
GetDocument - Get document content in a structured format.
GetDom - Get dom content in a structured format.
GetEmail - Get email content in a structured format..
GetSlideshow - Get slideshow content in a structured format.
GetSpreadsheet - Get spreadsheet content in a structured format.
GetVCard - Get vcard content in a structured format.

OCR Capabilities (Tesseract 5)

TEXTractor supports text extraction from scanned PDFs and from the following image formats: bmp, gif, jpeg, pbm, png, tiff, webp.

English is the default language, but you can choose to use any of the tesseract supported languages (https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html).

Language trained data files are automatically fetched from github tessdata_fast repository (https://github.com/tesseract-ocr/tessdata_fast) and cached in the front-end temp directory.

Security & Privacy

All processing is performed entirely in-memory within the server context. No data is persisted, and no data ever leaves your environment.

2.3.0

Available Actions

GetText - Get file content in plain text.
GetMetadata - Get file metadata in a structured format.
GetDocument - Get document content in a structured format.
GetDom - Get dom content in a structured format.
GetEmail - Get email content in a structured format..
GetSlideshow - Get slideshow content in a structured format.
GetSpreadsheet - Get spreadsheet content in a structured format.
GetVCard - Get vcard content in a structured format.

OCR Capabilities (Tesseract 5)

TEXTractor supports text extraction from scanned PDFs and from the following image formats: bmp, gif, jpeg, pbm, png, tiff, webp.

English is the default language, but you can choose to use any of the tesseract supported languages (https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html).

Language trained data files are automatically fetched from github tessdata_fast repository (https://github.com/tesseract-ocr/tessdata_fast) and cached in the front-end temp directory.

Security & Privacy

All processing is performed entirely in-memory within the server context. No data is persisted, and no data ever leaves your environment.

2.2.0

Available Actions

GetText - Get file content in plain text.
GetMetadata - Get file metadata in a structured format.
GetDocument - Get document content in a structured format.
GetDom - Get dom content in a structured format.
GetEmail - Get email content in a structured format..
GetSlideshow - Get slideshow content in a structured format.
GetSpreadsheet - Get spreadsheet content in a structured format.
GetVCard - Get vcard content in a structured format.

OCR Capabilities (Tesseract 5)

TEXTractor supports text extraction from scanned PDFs and from the following image formats: bmp, gif, jpeg, pbm, png, tiff, webp.

English is the default language, but you can choose to use any of the tesseract supported languages (https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html).

Language trained data files are automatically fetched from github tessdata_fast repository (https://github.com/tesseract-ocr/tessdata_fast) and cached in the front-end temp directory.

Security & Privacy

All processing is performed entirely in-memory within the server context. No data is persisted, and no data ever leaves your environment.

2.1.0

Available Actions

GetText - Get file content in plain text.
GetMetadata - Get file metadata in a structured format.
GetDocument - Get document content in a structured format.
GetDom - Get dom content in a structured format.
GetEmail - Get email content in a structured format..
GetSlideshow - Get slideshow content in a structured format.
GetSpreadsheet - Get spreadsheet content in a structured format.
GetVCard - Get vcard content in a structured format.

OCR Capabilities (Tesseract 5)

TEXTractor supports text extraction from scanned PDFs and from the following image formats: bmp, gif, jpeg, pbm, png, tiff, webp.

English is the default language, but you can choose to use any of the tesseract supported languages (https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html).

Language trained data files are automatically fetched from github tessdata_fast repository (https://github.com/tesseract-ocr/tessdata_fast) and cached in the front-end temp directory.

Security & Privacy

All processing is performed entirely in-memory within the server context. No data is persisted, and no data ever leaves your environment.

2.0.0

Available Actions

GetDocument - Get document content in a structured format.
GetDom - Get dom content in a structured format.
GetEmail - Get email content in a structured format.
GetMetadata - Get file metadata in a structured format.
GetSlideshow - Get slideshow content in a structured format.
GetSpreadsheet - Get spreadsheet content in a structured format.
GetText - Get file content in plain text.
GetVCard - Get vcard content in a structured format.

Security & Privacy

All processing is performed entirely in-memory within the server context. No data is persisted, and no data ever leaves your environment.

1.10.0

Available Actions

GetDocument - Get document content in a structured format.
GetDom - Get dom content in a structured format.
GetEmail - Get email content in a structured format.
GetMetadata - Get file metadata in a structured format.
GetSlideshow - Get slideshow content in a structured format.
GetSpreadsheet - Get spreadsheet content in a structured format.
GetText - Get file content in plain text.
GetVCard - Get vcard content in a structured format.

Security & Privacy

All processing is performed entirely in-memory within the server context. No data is persisted, and no data ever leaves your environment.

1.9.0

Available Actions

GetDocument - Get document content in a structured format.
GetDom - Get dom content in a structured format.
GetEmail - Get email content in a structured format.
GetMetadata - Get file metadata in a structured format.
GetSlideshow - Get slideshow content in a structured format.
GetSpreadsheet - Get spreadsheet content in a structured format.
GetText - Get file content in plain text.
GetVCard - Get vcard content in a structured format.

Security & Privacy

All processing is performed entirely in-memory within the server context. No data is persisted, and no data ever leaves your environment.

1.8.0

Available Actions

GetDocument - Get document content in a structured format.
GetDom - Get dom content in a structured format.
GetEmail - Get email content in a structured format.
GetMetadata - Get file metadata in a structured format.
GetSlideshow - Get slideshow content in a structured format.
GetSpreadsheet - Get spreadsheet content in a structured format.
GetText - Get file content in plain text.
GetVCard - Get vcard content in a structured format.

Security & Privacy

All processing is performed entirely in-memory within the server context. No data is persisted, and no data ever leaves your environment.

1.7.2

Available actions:

GetDocument - Get document content in a structured format.
GetDom - Get dom content in a structured format.
GetEmail - Get email content in a structured format.
GetMetadata - Get file metadata in a structured format.
GetSlideshow - Get slideshow content in a structured format.
GetSpreadsheet - Get spreadsheet content in a structured format.
GetText - Get file content in plain text.
GetVCard - Get vcard content in a structured format.

1.7.0

Available actions:

GetDocument - Get document content in a structured format.
GetDom - Get dom content in a structured format.
GetEmail - Get email content in a structured format.
GetMetadata - Get file metadata in a structured format.
GetSlideshow - Get slideshow content in a structured format.
GetSpreadsheet - Get spreadsheet content in a structured format.
GetText - Get file content in plain text.
GetVCard - Get vcard content in a structured format.

1.6.0

Available actions:

GetDocument - Get document content in a structured format.
GetDom - Get dom content in a structured format.
GetEmail - Get email content in a structured format.
GetMetadata - Get file metadata in a structured format.
GetSlideshow - Get slideshow content in a structured format.
GetSpreadsheet - Get spreadsheet content in a structured format.
GetText - Get file content in plain text.
GetVCard - Get vcard content in a structured format.

1.5.0

Available actions:

GetDocument - Get document content in a structured format.
GetDom - Get dom content in a structured format.
GetEmail - Get email content in a structured format.
GetMetadata - Get file metadata in a structured format.
GetSlideshow - Get slideshow content in a structured format.
GetSpreadsheet - Get spreadsheet content in a structured format.
GetText - Get file content in plain text.
GetVCard - Get vcard content in a structured format.

1.4.0

Available actions:

GetDocument - Get document content in a structured format.
GetDom - Get dom content in a structured format.
GetEmail - Get email content in a structured format.
GetMetadata - Get file metadata in a structured format.
GetSlideshow - Get slideshow content in a structured format.
GetSpreadsheet - Get spreadsheet content in a structured format.
GetText - Get file content in plain text.
GetVCard - Get vcard content in a structured format.

1.3.0

Available actions:

GetDocument - Get document content in a structured format.
GetDom - Get dom content in a structured format.
GetEmail - Get email content in a structured format.
GetMetadata - Get file metadata in a structured format.
GetSlideshow - Get slideshow content in a structured format.
GetSpreadsheet - Get spreadsheet content in a structured format.
GetText - Get file content in plain text.
GetVCard - Get vcard content in a structured format.

1.2.0

Available actions:

GetDocument - Get document content in a structured format.
GetDom - Get dom content in a structured format.
GetEmail - Get email content in a structured format.
GetMetadata - Get file metadata in a structured format.
GetSlideshow - Get slideshow content in a structured format.
GetSpreadsheet - Get spreadsheet content in a structured format.
GetText - Get file content in plain text.
GetVCard - Get vcard content in a structured format.

1.1.0

Available actions:

GetDocument - Get document content in a structured format.
GetDom - Get dom content in a structured format.
GetEmail - Get email content in a structured format.
GetMetadata - Get file metadata in a structured format.
GetSlideshow - Get slideshow content in a structured format.
GetSpreadsheet - Get spreadsheet content in a structured format.
GetText - Get file content in plain text.
GetVCard - Get vcard content in a structured format.

1.0.1

Available actions:

GetDocument - Get document content in a structured format.

GetDom - Get dom content in a structured format.

GetEmail - Get email content in a structured format.
GetMetadata - Get file metadata in a structured format.
GetSlideshow - Get slideshow content in a structured format.
GetSpreadsheet - Get spreadsheet content in a structured format.
GetText - Get file content in plain text.
GetVCard - Get vcard content in a structured format.

1.0.0

Available actions:

GetDocument - Get document content in a structured format.
GetDom - Get dom content in a structured format.
GetEmail - Get email content in a structured format.
GetMetadata - Get file metadata in a structured format.
GetSlideshow - Get slideshow content in a structured format.
GetSpreadsheet - Get spreadsheet content in a structured format.
GetText - Get file content in plain text.
GetVCard - Get vcard content in a structured format.