30
Views
2
Comments
Agentic AI in ODC: are there any built-in defense against prompt injection?

Hi everyone,

I’m learning Agentic AI apps in ODC and ran into a security/design concern that I can’t find clear guidance on in the official documentation


Problem statement

In ODC Agentic AI, user input is ultimately merged into the prompt context sent to the model. As far as I can see, a user can include technical instructions such as:

  • “ignore previous instructions”
  • “act as system”
  • “override your rules and do X”

This creates a prompt injection risk where user content can influence or attempt to override the system prompt and agent behavior


What I found so far

  • ODC docs describe how prompts are built and how system/context/user messages are composed
  • There are generic sanitization tools in ODC (HTML/JS/SQL sanitization), but these are not designed for AI prompt injection
  • I couldn’t find any documented, built-in mechanism in ODC that specifically:
    • detects instruction-override patterns, or
    • sanitises user input before passing it to AI agents, or
    • enforces hard separation between system and user intent at runtime


Why this matters

For production agentic apps, this opens up risks such as:

  • behavior corruption (agent follows user instructions instead of system rules),
  • data leakage (user tries to extract hidden context),
  • unsafe actions triggered by crafted prompts,
  • compliance issues if agents can be steered outside intended boundaries


Questions to the community / OutSystems team

  1. Is there any built-in prompt injection protection in ODC Agentic AI that I may have missed?
  2. Are there recommended patterns for sanitizing or validating user prompts before sending them to agents?
  3. Is OutSystems planning to provide native guardrails (e.g., prompt injection detection, instruction boundary enforcement, content filtering) for agentic apps?
  4. How are others handling this today in production ODC apps?


What I’m considering as a workaround

  • Adding a custom validation layer to detect common injection patterns
  • Normalizing or transforming user input before passing it into the agent
  • Enforcing strict action validation on the output side (don’t trust agent decisions blindly)


I’d love to hear how others are approaching this, and whether OutSystems has a roadmap or best-practice guidance for securing agentic AI against prompt injection


Thanks!

2025-12-22 13-50-43
Sherif El-Habibi
Champion

Hello Viktar,

I will try to give you an answer as best as I can. First, let us agree that AI agent responses are more about reasoning than decision-making. Of course, you can enhance this using MCP servers or other helpers with structured outputs and detailed descriptions, but in the end, the response is still text generated based on the prompt and the user's input.

For example, if you send a query to the agent to sanitize it, there is a 0.0000001% chance that it might miss something and not give you the desired output. That is why it is advisable for agents to handle complex, time-consuming tasks that would otherwise take significant effort from a human. The purpose of the agent is to make things easier, but some aspects still require manual control and proper implementation.

Even in code, when you use an Advanced SQL query, you have the option to expand inline input parameters. In that case, you are warned that SQL injection might occur, and it is recommended to use EncodeSQL. There, we know the query handling is structured and controlled. However, when it comes to customized AI-generated requests, this requires more careful handling from your side.

To answer your questions:

Is there any built-in prompt injection protection in ODC Agentic AI that I may have missed?

I have not seen any documentation that specifically mentions SQL injection protection related to AI agents, only standard query-related protections.

Are there recommended patterns for sanitizing or validating user prompts before sending them to agents?

It depends on the type of text. If it is simple user input like “how are you today,” sanitization is not really required. However, if we are talking about queries, as mentioned earlier, it is not a good approach to let the agent handle everything blindly. If you are referring to specific keywords such as “document,” this can be handled at the WAF level by blacklisting certain words, and it would apply to all kinds of text, even if it appears in something simple like “how are you document.”

For questions 3 and 4, I do not have specific information about them.

Regarding the workaround you mentioned, I believe it is the best approach, as it summarizes and aligns with everything discussed above.


2025-10-13 10-41-52
João Simões
AI Generated

Hello Viktar,

I’ll try to answer based on what I know and what is available TODAY.

So, to answer your questions:

1 and 2 – As of today, in Agent Workbench there is no native or dedicated mechanism for prompt‑injection protection (such as handling “ignore previous instructions”, “act as system”, etc.), beyond the logical separation between System / User / Grounding / Memory in the BuildMessages step.

Current agent security best practices recommend treating the LLM as an untrusted component, with validation layers on both input and output, and clear rules about what the agent can and cannot do. That includes keeping system instructions strictly separate from user content, never exposing the system prompt in responses, and not giving the agent direct access to sensitive data or powerful actions without extra validation or human‑in‑the‑loop control.

What you can do today is build a “security shell” around your AgentFlow:

Before AgentFlow: create a ValidateAndNormalizeUserPrompt Action that enforces size limits, normalizes the text and looks for typical injection patterns (“ignore previous instructions”, “you are now the system”, etc.), blocking or rewriting the input before it reaches the agent. For example:

  • Pattern filters (regex / lists) for phrases like “ignore previous instructions”, “you are now the system”, “act as developer console”;
  • Size limits and normalization (trim, remove invisible characters, suspicious Unicode control characters).

In the System Prompt / Agent Instructions: explicitly state that the agent must never follow requests to ignore previous rules, change its role, or reveal internal instructions, and that it should refuse such requests.

On Tools (Server Actions): always validate, on the Action side, whether the user is allowed to perform the requested operation, whether the parameters make sense, and whether there are any dangerous requests (e.g. very broad queries, cross‑tenant IDs, destructive operations). Treat every call coming from the agent as “untrusted” until it passes your own checks.

On output: apply a response filter to detect whether the returned text is exposing the system prompt, internal action names, keys, internal endpoints, etc., and if something looks suspicious, either block it or ask the agent to reformulate the answer without sensitive details.

3 and 4 – Regarding roadmap, there is currently no public information about a native engine for prompt‑injection detection or built‑in guardrails, although I do believe that would be a natural evolution. 

Until such a solution exists, you can always rely on a third‑party provider (Operant, CodeIntegrity). A common pattern is to place a dedicated “guardrail layer” or security service between your app and the model (for example, an agent/service dedicated to classifying and cleaning prompts and responses before they reach the main agent).

This answer was AI-generated. Please read it carefully and use the forums for clarifications
Community GuidelinesBe kind and respectful, give credit to the original source of content, and search for duplicates before posting.