Process large scanned documents with AI services (Claude, GPT-4, Azure Document Intelligence) by compressing TIFFs into API-compatible sizes.
Solves:
• AI APIs reject files over 5-32MB → Compress by 70-90% to fit limits
• OutSystems ODC 5.5MB cap → Use S3 pre-signed URLs to bypass
• High AI costs → Smaller files = fewer tokens = lower costs
Features:
• Convert TIFF → PDF (multi-page) or JPEG (first page)
• Intelligent compression maintains OCR quality
• Adjustable quality settings (1-100)
• S3 integration for unlimited file sizes
• Tested: 100MB files, 50+ pages
Why It Matters:
Enables AI document workflows that would otherwise be impossible. Extract invoices, contracts, forms at scale without file size blockers.
Example: 200MB scanned contract → 20MB compressed PDF → Send to Claude for extraction → Process structured data in your app.
Requires: AWS S3 bucket with IAM credentials.
File Size Limits:
• Tested: Up to 100MB work reliably
• May timeout: Files > 500MB may exceed Lambda's 15-minute limit
• TIFF processing requires 2-3x file size in memory
AWS Requirements:
• Requires AWS S3 bucket and IAM credentials (additional cost)
• Files must be in S3 first (not direct from OutSystems)
• CRITICAL: Must configure CORS on S3 bucket for browser uploads/downloads
Format Notes:
• JPEG output: First page only (multi-page TIFFs become single JPEG)
• Some proprietary TIFF formats may not be supported
• Processing: ~2-5 seconds per page
Recommendation: Test with your specific files before production use. For very large files (>100MB), consider splitting into batches.
BSD-3 license (https://opensource.org/licenses/BSD-3-Clause)