Stable version 0.1.0 (Compatible with ODC)

Uploaded on 23 Jun by Michael Guzman

Details

Detailed Description

Level 5 of the five-level chunking framework for ODC. Chunking stops being a calculation here. It becomes a judgment call.

Levels 1 through 4 split on structure or semantic distance. Level 5 hands that to an LLM. The model breaks input chunks into atomic facts called propositions, then reassembles them by theme. Related content gets pulled together. Mixed-domain chunks get split apart. Duplicated overlap collapses to one proposition.

Two generative calls per batch: extraction, then grouping. Both at Temperature=0. Output is a typed list of AgenticChunk structs, each with a content hash ready for a cache or embedding step.

Requires Agentic Chunking Library and an AI Gateway connection to Claude. Includes a 32-case test runner to validate before deploying.

Limitations

Granularity is model judgment, not a dial. No parameter controls how finely the grouper subdivides a domain. Re-validate when the model version changes.

Proposition expansion adds tokens. Extraction rewrites source sentences into atomic propositions, increasing character and token counts. Dense compound sentences expand more than simple ones. Budget for this in any downstream embedding step.

Two generative calls per batch is a real cost. If documents are cleanly structured with no compound sentences or interleaved topics, Level 4 alone is enough.

This build uses Claude 3.7 Sonnet via ODC AI Gateway. Changing the vendor or model requires republishing the app.

Site properties cap at 2000 characters. Refined prompts can exceed this. Use Prompt Assembler to manage prompt content beyond that limit.

Validated on single documents processed one at a time. Not tested under the concurrency or volume a production ingestion pipeline would impose.

License

https://opensource.org/licenses/BSD-3-Clause)

Agentic Chunking (ODC)

Agentic Chunking (ODC)