Agent Guardrails: Apply Safety and Compliance Rules to Your Agents (Beta)
When you only have a few agents, running them without built-in safety checks might be fine. But as teams ship more agentic apps with Agent Workbench, it becomes hard to answer basic questions at scale: How do we block profanity or filter personally identifiable information? How do we prevent prompt injection and ensure our AI usage adheres to internal policies?
With this release, Agent Workbench now provides a built-in way to validate what goes into and comes out of an Agent’s LLM calls—so guardrails can be applied directly in the platform, not bolted on later.
To support governance and compliance at scale, we built a foundation that lets organizations enable predefined safety rules and lets developers apply them to specific Agents. This provides an initial layer of control and trust today—and sets the stage for more advanced, mandatory governance over time.
Introducing Agent Guardrails
Agent Guardrails are shared safety and compliance controls that validate what an AI Agent can receive (inputs) and produce (outputs), so teams can scale agent usage without increasing risk. System Administrator can configure the following predefined Guardrail Rules:
- Content Safety: Detects and blocks harmful, unsafe, or policy-violating content in prompts or responses.
- PII Filtering: Identifies sensitive data (e.g., personal identifiers) and masks or blocks it to prevent leakage.
- Prompt Injection Prevention: Detects attempts to override instructions, exfiltrate data, or manipulate the agent’s behavior.
When an Agent runs, Guardrails automatically validate its inputs and outputs against the active rules. If a rule is violated, the system can:
- Block the response,
- Mask sensitive content, or
- Log the event (based on configuration).
You also get diagnostics that show which rules ran, what happened, and whether each check passed or failed—making it faster to troubleshoot and validate agent behavior against your policies.
Agent Guardrails is available in Beta—we’re excited for you to try it and share feedback. More improvements are already underway and coming soon
Find out more about Agent Guardrails- ODC
- Artificial Intelligence