Login to follow
ODC Native Chunking Tester

ODC Native Chunking Tester (ODC)

Stable version 0.1.1 (Compatible with ODC)
Uploaded on 12 Jun (11 days ago) by DB Results Labs
ODC Native Chunking Tester

ODC Native Chunking Tester (ODC)

Documentation
0.1.1

An administrative preview dashboard and testing harness built to visually audit and benchmark native ODC Semantic Search splitting profiles before index publication.

Because ODC's ingestion layer runs as a hidden background cloud process, developers suffer from a "black box" blindspot regarding exactly how text is partitioned. This Forge component provides a design-time validation workspace. It features a dashboard to simulate the platform's 4 native behaviors (Smart, Fixed-Size, Sentence, and Recursive) against integrated test files (TestFile.md and TestFile.txt). Running the SaveTestContent timer writes data to dedicated testing tables, triggering native background embeddings. This allows you to inspect text fragments, alignment gaps, and token splits directly on an interactive screen.


Read the deep-dive research article that inspired this utility: 👉 What ODC Actually Does When It Chunks Your Text

Platform Prerequisites & Requirements

To execute this testing application successfully within your ODC cloud workspace, ensure that your infrastructure satisfies the following platform baseline criteria:

  • Semantic Search Activation: Native Semantic Search functionality must be explicitly enabled within your active OutSystems Developer Cloud (ODC) tenant instance.

  • Beta Infrastructure Bounds: If your factory environment runs on an active preview/beta evaluation track, consult your regional availability constraints and platform licensing metrics before establishing automated high-volume production queues.

Technical Architecture & Asset Schema

The application is structured as a self-contained, stateless evaluation environment. The Service Studio tree view contains the following critical components:

├── Entities
│   └── Database
│       ├── FixedSizeText (Level 1 Character Splitting Entity)
│       ├── RecursiveText (Level 2 Cascading Separator Entity)
│       ├── SentencedText (Sentence Density Evaluation Entity)
│       └── SmartText     (Platform Default Baseline Entity)
├── Settings
│   └── UsingMDFile       (Boolean App Setting Toggle)
└── Resources
    ├── TestFile.md       (Syntax-Aware Structured Data Volume)
    └── TestFile.txt      (Pristine Unstructured Prose Volume)

Core Simulation Architecture (The 4 Native Strategies)

This component helps you analyze the exact text-splitting behaviors exposed within ODC Studio:

  • Smart Chunking: The baseline unconfigured platform option. It functions as an implicit recursive splitter that automatically targets paragraph breaks and dynamically scales blocks to fit the underlying model's token limits.

  • Fixed-Size Chunking: A mechanical sliding window that slices data strictly by absolute character index lengths (defaulting to a 1,000 character size limit and a 200 character overlap for a standard 20% sliding window).

  • Sentence-Based Chunking: Evaluates punctuation limits to enforce clean sentence density boundaries (defaulting to 1 sentence per chunk) backed by a character safety envelope to protect platform memory.

  • Recursive Chunking: Exposes explicit control over the delimiter array string sequence ("\n\n", "\n", " ", ""), walking hierarchically from paragraph down to character fallback elements to minimize tearing.

Data Management & Execution Workflows
1. Ingesting and Populating Chunks (SaveTestContent)

Because ODC's Semantic Search engine operates on database mutation tracking hooks in the cloud background layer, data must exist inside the physical entity records before the platform can run its splitting algorithms.

The application provides a dedicated database initialization timer to bootstrap the testing workspace:

  • Purpose: Reads the active source asset from the compilation binary repository, flushes existing evaluation records, and populates the four platform testing tables (FixedSizeText, RecursiveText, SentencedText, and SmartText).

  • Execution: Navigate to the SaveTestContent timer layout inside your ODC Portal app view and click Run now.

  • Payload Dependency Loop: The background process dynamically reads the state of the UsingMDFile app setting before starting its batch insertion:

    • If UsingMDFile is set to True, the timer loads the raw TestFile.md file layout to evaluate structural elements like fenced code blocks, nested indentation lists, and pipe-delimited data matrices.

    • If UsingMDFile is set to False, the timer pulls the pristine TestFile.txt file layout, stripping markdown syntax to establish a clean prose baseline.

2. Purging Ingested Vectors (ResetData)

Whenever you alter your core chunking profiles inside ODC Studio—or when you want to change your target evaluation tracks by toggling the UsingMDFile variable—you must clear your existing database records to maintain testing purity.

Apps  >  ODC Native Chunking Tester  >  ResetData
  • Purpose: Truncates records across all four evaluation tables, completely flushes the hidden platform-managed vector index, and resets tracking pointers.

  • Execution: Navigate to the ResetData background cloud timer inside the ODC Portal and click Run now.

  • Operational Profile: This maintenance process features an automated platform execution timeout window of 20 minutes, allowing it to safely handle high-volume dataset flushes horizontally across your serverless architecture microservices.

Step-by-Step Test Execution Protocol

To successfully audit, test, and extract data boundaries from ODC’s native text processing, follow this sequential test pipeline:

  1. Set the Target Payload: Open the ODC Portal settings panel for the application and toggle UsingMDFile to your intended format track.

  2. Flush Residual Data: Run the ResetData background timer to guarantee you are starting with a completely clean database index slate.

  3. Execute Ingestion Loop: Run the SaveTestContent timer to read your target file and write the raw strings to the database entities.

  4. Allow for Background Asynchronous Sync: > ⚠️ CRITICAL STEP: After the SaveTestContent background timer has finished executing, give it a little time for the chunking process to finish completely under the hood. ODC processes text-splitting and embedding cycles asynchronously in the cloud AI layer. Moving to the next step too quickly will result in zero search results.

  5. Audit via UI Application: Once the background sync concludes, open your interactive application UI screen. Pass a neutral "a" query string fallback to pull the resulting raw text chunk fragments straight onto your screen to inspect the exact structural splits generated by the platform.