textractor
Service icon

TEXTractor

Stable version 2.3.2 (Compatible with OutSystems 11)
Uploaded
 on 3 Apr (23 hours ago)
 by 
5.0
 (1 rating)
textractor

TEXTractor

Documentation
2.3.2

Available Actions

  • GetText - Get file content in plain text.
  • GetMetadata - Get file metadata in a structured format.
  • GetDocument - Get document content in a structured format.
  • GetDom - Get dom content in a structured format.
  • GetEmail - Get email content in a structured format..
  • GetSlideshow - Get slideshow content in a structured format.
  • GetSpreadsheet - Get spreadsheet content in a structured format.
  • GetVCard - Get vcard content in a structured format.


OCR Capabilities (Tesseract 5)

TEXTractor supports text extraction from scanned PDFs and from the following image formats: bmp, gif, jpeg, pbm, png, tiff, webp.

English is the default language, but you can choose to use any of the tesseract supported languages (https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html).

Language trained data files are automatically fetched from github tessdata_fast repository (https://github.com/tesseract-ocr/tessdata_fast) and cached in the front-end temp directory.


Security & Privacy

All processing is performed entirely in-memory within the server context. No data is persisted, and no data ever leaves your environment. 


2.3.1

Available Actions

  • GetText - Get file content in plain text.
  • GetMetadata - Get file metadata in a structured format.
  • GetDocument - Get document content in a structured format.
  • GetDom - Get dom content in a structured format.
  • GetEmail - Get email content in a structured format..
  • GetSlideshow - Get slideshow content in a structured format.
  • GetSpreadsheet - Get spreadsheet content in a structured format.
  • GetVCard - Get vcard content in a structured format.


OCR Capabilities (Tesseract 5)

TEXTractor supports text extraction from scanned PDFs and from the following image formats: bmp, gif, jpeg, pbm, png, tiff, webp.

English is the default language, but you can choose to use any of the tesseract supported languages (https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html).

Language trained data files are automatically fetched from github tessdata_fast repository (https://github.com/tesseract-ocr/tessdata_fast) and cached in the front-end temp directory.


Security & Privacy

All processing is performed entirely in-memory within the server context. No data is persisted, and no data ever leaves your environment. 


2.3.0

Available Actions

  • GetText - Get file content in plain text.
  • GetMetadata - Get file metadata in a structured format.
  • GetDocument - Get document content in a structured format.
  • GetDom - Get dom content in a structured format.
  • GetEmail - Get email content in a structured format..
  • GetSlideshow - Get slideshow content in a structured format.
  • GetSpreadsheet - Get spreadsheet content in a structured format.
  • GetVCard - Get vcard content in a structured format.


OCR Capabilities (Tesseract 5)

TEXTractor supports text extraction from scanned PDFs and from the following image formats: bmp, gif, jpeg, pbm, png, tiff, webp.

English is the default language, but you can choose to use any of the tesseract supported languages (https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html).

Language trained data files are automatically fetched from github tessdata_fast repository (https://github.com/tesseract-ocr/tessdata_fast) and cached in the front-end temp directory.


Security & Privacy

All processing is performed entirely in-memory within the server context. No data is persisted, and no data ever leaves your environment. 


2.2.0

Available Actions

  • GetText - Get file content in plain text.
  • GetMetadata - Get file metadata in a structured format.
  • GetDocument - Get document content in a structured format.
  • GetDom - Get dom content in a structured format.
  • GetEmail - Get email content in a structured format..
  • GetSlideshow - Get slideshow content in a structured format.
  • GetSpreadsheet - Get spreadsheet content in a structured format.
  • GetVCard - Get vcard content in a structured format.


OCR Capabilities (Tesseract 5)

TEXTractor supports text extraction from scanned PDFs and from the following image formats: bmp, gif, jpeg, pbm, png, tiff, webp.

English is the default language, but you can choose to use any of the tesseract supported languages (https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html).

Language trained data files are automatically fetched from github tessdata_fast repository (https://github.com/tesseract-ocr/tessdata_fast) and cached in the front-end temp directory.


Security & Privacy

All processing is performed entirely in-memory within the server context. No data is persisted, and no data ever leaves your environment. 


2.1.0

Available Actions

  • GetText - Get file content in plain text.
  • GetMetadata - Get file metadata in a structured format.
  • GetDocument - Get document content in a structured format.
  • GetDom - Get dom content in a structured format.
  • GetEmail - Get email content in a structured format..
  • GetSlideshow - Get slideshow content in a structured format.
  • GetSpreadsheet - Get spreadsheet content in a structured format.
  • GetVCard - Get vcard content in a structured format.


OCR Capabilities (Tesseract 5)

TEXTractor supports text extraction from scanned PDFs and from the following image formats: bmp, gif, jpeg, pbm, png, tiff, webp.

English is the default language, but you can choose to use any of the tesseract supported languages (https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html).

Language trained data files are automatically fetched from github tessdata_fast repository (https://github.com/tesseract-ocr/tessdata_fast) and cached in the front-end temp directory.


Security & Privacy

All processing is performed entirely in-memory within the server context. No data is persisted, and no data ever leaves your environment. 


2.0.0

Available Actions

  • GetDocument - Get document content in a structured format.
  • GetDom - Get dom content in a structured format.
  • GetEmail - Get email content in a structured format.
  • GetMetadata - Get file metadata in a structured format.
  • GetSlideshow - Get slideshow content in a structured format.
  • GetSpreadsheet - Get spreadsheet content in a structured format.
  • GetText - Get file content in plain text.
  • GetVCard - Get vcard content in a structured format.


Security & Privacy

All processing is performed entirely in-memory within the server context. No data is persisted, and no data ever leaves your environment. 


1.10.0

Available Actions

  • GetDocument - Get document content in a structured format.
  • GetDom - Get dom content in a structured format.
  • GetEmail - Get email content in a structured format.
  • GetMetadata - Get file metadata in a structured format.
  • GetSlideshow - Get slideshow content in a structured format.
  • GetSpreadsheet - Get spreadsheet content in a structured format.
  • GetText - Get file content in plain text.
  • GetVCard - Get vcard content in a structured format.


Security & Privacy

All processing is performed entirely in-memory within the server context. No data is persisted, and no data ever leaves your environment. 


1.9.0

Available Actions

  • GetDocument - Get document content in a structured format.
  • GetDom - Get dom content in a structured format.
  • GetEmail - Get email content in a structured format.
  • GetMetadata - Get file metadata in a structured format.
  • GetSlideshow - Get slideshow content in a structured format.
  • GetSpreadsheet - Get spreadsheet content in a structured format.
  • GetText - Get file content in plain text.
  • GetVCard - Get vcard content in a structured format.


Security & Privacy

All processing is performed entirely in-memory within the server context. No data is persisted, and no data ever leaves your environment. 


1.8.0

Available Actions

  • GetDocument - Get document content in a structured format.
  • GetDom - Get dom content in a structured format.
  • GetEmail - Get email content in a structured format.
  • GetMetadata - Get file metadata in a structured format.
  • GetSlideshow - Get slideshow content in a structured format.
  • GetSpreadsheet - Get spreadsheet content in a structured format.
  • GetText - Get file content in plain text.
  • GetVCard - Get vcard content in a structured format.


Security & Privacy

All processing is performed entirely in-memory within the server context. No data is persisted, and no data ever leaves your environment. 


1.7.2

Available actions:

  • GetDocument - Get document content in a structured format.
  • GetDom - Get dom content in a structured format.
  • GetEmail - Get email content in a structured format.
  • GetMetadata - Get file metadata in a structured format.
  • GetSlideshow - Get slideshow content in a structured format.
  • GetSpreadsheet - Get spreadsheet content in a structured format.
  • GetText - Get file content in plain text.
  • GetVCard - Get vcard content in a structured format.

1.7.0

Available actions:

  • GetDocument - Get document content in a structured format.
  • GetDom - Get dom content in a structured format.
  • GetEmail - Get email content in a structured format.
  • GetMetadata - Get file metadata in a structured format.
  • GetSlideshow - Get slideshow content in a structured format.
  • GetSpreadsheet - Get spreadsheet content in a structured format.
  • GetText - Get file content in plain text.
  • GetVCard - Get vcard content in a structured format.

1.6.0

Available actions:

  • GetDocument - Get document content in a structured format.
  • GetDom - Get dom content in a structured format.
  • GetEmail - Get email content in a structured format.
  • GetMetadata - Get file metadata in a structured format.
  • GetSlideshow - Get slideshow content in a structured format.
  • GetSpreadsheet - Get spreadsheet content in a structured format.
  • GetText - Get file content in plain text.
  • GetVCard - Get vcard content in a structured format.

1.5.0

Available actions:

  • GetDocument - Get document content in a structured format.
  • GetDom - Get dom content in a structured format.
  • GetEmail - Get email content in a structured format.
  • GetMetadata - Get file metadata in a structured format.
  • GetSlideshow - Get slideshow content in a structured format.
  • GetSpreadsheet - Get spreadsheet content in a structured format.
  • GetText - Get file content in plain text.
  • GetVCard - Get vcard content in a structured format.

1.4.0

Available actions:

  • GetDocument - Get document content in a structured format.
  • GetDom - Get dom content in a structured format.
  • GetEmail - Get email content in a structured format.
  • GetMetadata - Get file metadata in a structured format.
  • GetSlideshow - Get slideshow content in a structured format.
  • GetSpreadsheet - Get spreadsheet content in a structured format.
  • GetText - Get file content in plain text.
  • GetVCard - Get vcard content in a structured format.

1.3.0

Available actions:

  • GetDocument - Get document content in a structured format.
  • GetDom - Get dom content in a structured format.
  • GetEmail - Get email content in a structured format.
  • GetMetadata - Get file metadata in a structured format.
  • GetSlideshow - Get slideshow content in a structured format.
  • GetSpreadsheet - Get spreadsheet content in a structured format.
  • GetText - Get file content in plain text.
  • GetVCard - Get vcard content in a structured format.

1.2.0

Available actions:

  • GetDocument - Get document content in a structured format.
  • GetDom - Get dom content in a structured format.
  • GetEmail - Get email content in a structured format.
  • GetMetadata - Get file metadata in a structured format.
  • GetSlideshow - Get slideshow content in a structured format.
  • GetSpreadsheet - Get spreadsheet content in a structured format.
  • GetText - Get file content in plain text.
  • GetVCard - Get vcard content in a structured format.

1.1.0

Available actions:

  • GetDocument - Get document content in a structured format.
  • GetDom - Get dom content in a structured format.
  • GetEmail - Get email content in a structured format.
  • GetMetadata - Get file metadata in a structured format.
  • GetSlideshow - Get slideshow content in a structured format.
  • GetSpreadsheet - Get spreadsheet content in a structured format.
  • GetText - Get file content in plain text.
  • GetVCard - Get vcard content in a structured format.

1.0.1

Available actions:

  • GetDocument - Get document content in a structured format.
  • GetDom - Get dom content in a structured format.
  • GetEmail - Get email content in a structured format.
  • GetMetadata - Get file metadata in a structured format.
  • GetSlideshow - Get slideshow content in a structured format.
  • GetSpreadsheet - Get spreadsheet content in a structured format.
  • GetText - Get file content in plain text.
  • GetVCard - Get vcard content in a structured format.

1.0.0

Available actions:

  • GetDocument - Get document content in a structured format.
  • GetDom - Get dom content in a structured format.
  • GetEmail - Get email content in a structured format.
  • GetMetadata - Get file metadata in a structured format.
  • GetSlideshow - Get slideshow content in a structured format.
  • GetSpreadsheet - Get spreadsheet content in a structured format.
  • GetText - Get file content in plain text.
  • GetVCard - Get vcard content in a structured format.