To ensure a clean environment setup, install and deploy the suite components in the following modular order:
Install the Core Extension (SemanticEngineV2): Download the asset via OutSystems Service Studio or Integration Studio and publish it to compile the .NET library on your application server.
SemanticEngineV2
Install the Wrapper Library (SemanticEngine_Lib): Deploy this module to handle your foundational integrations and public mapping structures.
SemanticEngine_Lib
Install Core Services (SemanticEngine_CS): Publish this module to create your data schema and activate the background BPT processing engine.
SemanticEngine_CS
Publish Your Consuming Application: Open your end-user application, open the Manage Dependencies window, find SemanticEngine_CS, check the required service actions, and refresh your references.
Before triggering your first document ingestion, you must establish your AI provider credentials:
Obtain API Keys: Set up an active developer account with OpenAI to generate an API key, or provision an Azure OpenAI resource with an active embedding model deployment.
Configure Settings: Use the internal settings mechanism (or map them to your application site properties) to supply the following per call:
Provider: Specify whether you are using native OpenAI or Azure OpenAI.
Model: Provide the specific deployment name or model string (e.g., text-embedding-3-small).
text-embedding-3-small
Endpoint: Define your API base path (e.g., https://api.openai.com for OpenAI, or your specific resource URL for Azure).
https://api.openai.com
Implementing semantic search in your application follows a simple two-phase workflow:
Pass the uploaded document information into the SaveFile service action inside SemanticEngine_CS.
The core services module automatically handles database creation and kicks off the background PDFIngestion BPT process.
In the background, the file is parsed into clean text, segmented using your configured chunkSize and charsOverlap, embedded by your AI provider, and automatically stored in the DocumentChunk and ChunkEmbedding tables.
Capture the search text query entered by your end-user.
Fetch the candidate vector record list from your database for the documents you want to search against.
Pass both the text query and the candidate vector list into the SearchVector service action.
Use the returned, ranked SearchResult collection (ordered descending by relevance score) to construct your RAG context window for your LLM interface.
This extension exposes three foundational server actions that run directly on the application server. It works out of the box with OpenAI (text-embedding-3-small, text-embedding-ada-002, etc.) and Azure OpenAI embedding deployments.
text-embedding-ada-002
Extracts text from a PDF, splits it into overlapping chunks, generates embeddings via an OpenAI-compatible endpoint, and returns a JSON array of vector records ready for storage.
IngestionVectorRecord
Embeds a query string and performs in-memory cosine similarity search over a list of pre-indexed vector candidates, returning the top matches ranked by score.
VectorCandidateDto
VectorSearchResultDto
Extracts and normalizes text from a PDF, returning all pages concatenated with ===PAGE:N=== dividers. Useful for auditing ingestion quality before running embeddings.
===PAGE:N===
The library layer abstracts the raw extension, standardizes logic, and introduces a unified output structure to make your error handling seamless.
Most actions in this library return a standard Result structure:
IsSuccess (Boolean): True if the action completed without issues.
Message (Text): Contains error details or success confirmation messages.
A utility action used to retrieve the active AI provider configurations.
Setting (out Structure): Returns the active configuration profile containing Model and Provider details.
Extracts plain text from a document to audit ingestion quality.
fileContent (in Binary): Raw PDF binary content.
Content (out Text): The extracted plain text.
Result (out Structure): Standard success or error structure.
Wraps the core ingestion pipeline. It automatically pulls your active settings, runs the extraction, handles chunking, calls your embedding provider, and gives you back cleanly structured vector data instead of raw text strings.
Inputs include maxPages, maxCharsPerPage, chunkSize, charsOverlap, and maxChunksTotal.
Vector (out Record List): Structured collection of vector chunks ready for your database.
Evaluates a text query against a structured collection of candidate vectors using in-memory cosine similarity.
queryText (in Text): Plain-language question or query.
candidateVector (in Record List): The collection of vector records to search across.
topK (in Integer): Number of top results to return.
mininimumScore (in Decimal): Similarity threshold between 0.0 and 1.0.
SearchResult (out Record List): Ranked search results ordered by relevance score descending.
This module handles the actual data orchestration, holding the physical tables and managing an asynchronous background pipeline to process files smoothly without locking up user interfaces.
The schema separates metadata from heavy binary and text objects to optimize database responsiveness:
Manages the system processing states.
Id (Identifier)
Label (Text)
Order (Integer)
Is_Active (Boolean)
The master tracking record for file metadata.
DocumentKey (Text)
Title (Text)
OriginalFileName (Text)
FileSizeBytes (Integer)
MimeType (Text)
StatusId (DocumentStatus Identifier)
ErrorMessage (Text)
UploadOn (DateTime)
UploadedBy (User Identifier)
Holds the raw binary contents isolated from the main metadata table.
Id (Document Identifier)
FileContent (Binary Data)
CreatedOn (DateTime)
Houses the complete, plain text extract after the file is parsed.
FileContentText (Text, large capacity)
Stores individual segmented text pieces after chunking.
DocumentId (Document Identifier)
PageNumber (Integer)
ChunkIndex (Integer)
ChunkText (Text)
ChunkHash (Text)
Holds the actual generated floating-point arrays.
DocumentChunkId (DocumentChunk Identifier)
Provider (Text)
Model (Text)
VectorJson (Text)
Handles document uploads by creating the base metadata records and dropping the file into the database, which automatically queues it for background processing.
File (in Record): Compound structure capturing DocumentKey, Title, FileName, FileContent, and MimeType.
Id (out Identifier): The unique identifier assigned to the new Document record.
An exposed service action providing cross-module access to search your indexed document store.
Parameters match the library wrapper, accepting a queryText, a structured candidateVector record list, topK, and mininimumScore.
SearchResult (out Record List): Returns the top ranked matches with their Rank, Score, PageNumber, ChunkIndex, ChunkText, and ChunkHash.
Asynchronous Background Processing (BPT)
To ensure a fast, responsive user experience, document parsing and embedding happen entirely in the background.
Trigger: Automatically launches the second a new record hits the DocumentFile table.
EmbedFile Activity: An automatic activity that triggers the wrapper logic to extract the text, split it into chunks, and fetch the embeddings from your configured AI provider.
Decision: Evaluates the outcome using the IsSuccess flag.
Yes Route: Moves to the Success milestone and terminates cleanly.
No Route: Moves to the Fail milestone, logging the error reasons straight to the document master record so you can audit what went wrong.