TEXTractor provides the functionality to extract text and/or metadata from dozens of file types.
Please find the full list of supported file types here.
Built using a modified version of the Toxy library (https://github.com/bmlpg/toxy).
Improved robustness of the PDF structured content extraction mechanism.