Login to follow
ChunkingLibrary

ChunkingLibrary (ODC)

Stable version 0.1.10 (Compatible with ODC)
Uploaded on 13 Jun (3 weeks ago) by Michael Guzman
ChunkingLibrary

ChunkingLibrary (ODC)

Details
Detailed Description

An advanced text chunking engine for OutSystems Developer Cloud (ODC) via C# External Logic. Built for enterprise RAG and semantic search pipelines, it offers structure-aware parsing to prepare text for vector databases without context rupture. It protects Markdown tables, keeps fenced code blocks atomically unified, and injects parent heading breadcrumbs into fragments to maximize LLM accuracy.

Limitations
  • Markdown Positions: Start indices default to 0; text is exact but position values are approximate.

  • Repeated Text: Substring searches can link back to the first occurrence if text duplicates.

  • Squished Titles: Recursive overlaps can accidentally merge adjacent headers onto a single line.

  • Rough Token Count: Estimates use a basic character division shortcut (characters ÷ 4).

  • Missing Title Trails: Recursive text splitting does not track heading pathways (returns "").

  • Overlap Skips: Slicing at 1 sentence per chunk bypasses context blending entirely.

  • Regex Blindspots: Sentence breaks are missed before lowercase names or near math decimals.

  • Language Limit: Engine is optimized for English; untested on other scripts or non-Latin text.