How to QA an AI Agent Before Shipping to Customers
AI agent QA is harder than LLM QA - tool use, multi-step flows, and compounded non-determinism create unique failure modes. Here is a practical testing framework.
AI agents - LLMs that take actions, use tools, and execute multi-step tasks - are the fastest-growing category of AI product deployment. They are also the hardest to QA.
A single-turn LLM QA test verifies that a model responds correctly to an input. An agent QA test must verify that a model executes a multi-step task correctly, uses tools safely, doesn’t get stuck in loops, and doesn’t take irreversible actions based on misunderstood instructions. The complexity compounds at every step.
Why AI Agent QA Is Harder Than LLM QA
Compounded non-determinism - A single LLM call has one non-deterministic step. An agent with 10 tool calls has 10 non-deterministic steps, and errors compound. A wrong decision in step 3 may produce a plausible-seeming but incorrect result in step 8 that no individual step test would catch.
Tool use surface - Every tool the agent can call is an attack surface. An attacker who can manipulate the agent into calling a tool with attacker-controlled parameters has effectively executed arbitrary code in your system.
Action irreversibility - Unlike a chatbot response that can be ignored, agent actions may be irreversible: sent emails, modified database records, executed API calls. A bad agent action may be impossible to undo.
Goal hijacking - Multi-step agents can be manipulated into pursuing attacker-specified goals rather than user-specified goals, particularly through indirect prompt injection in tool outputs.
The 5 AI Agent Failure Modes to Test
1. Tool Call Injection
What it is: An attacker manipulates the agent into calling a tool with attacker-controlled parameters - by injecting instructions into the agent’s context that override the legitimate task.
Test approach: Construct test cases where user-supplied input contains embedded tool call instructions. Verify that the agent does not execute unintended tool calls. Test both direct injection (user input) and indirect injection (tool output injection).
2. Goal Hijacking
What it is: The agent abandons the user’s stated goal and pursues an attacker-specified goal, typically delivered through indirect prompt injection in retrieved content.
Test approach: Inject goal-modifying instructions into content that the agent retrieves during task execution (documents, database records, web pages). Verify that the agent does not adopt the injected goal.
3. Context Window Manipulation
What it is: Inputs designed to fill the agent’s context window with noise or misleading information, causing the agent to lose track of the original task or make decisions based on injected context.
Test approach: Test agent behaviour when the context window is near capacity. Verify that the agent correctly prioritises the user’s original instruction over accumulated context noise.
4. Action Irreversibility Failures
What it is: The agent takes an irreversible action (sends an email, deletes a record, charges a card) based on an ambiguous or incorrect interpretation of the user’s instruction.
Test approach: Test agent behaviour on ambiguous instructions for irreversible actions. Verify that the agent asks for confirmation before taking irreversible actions when instructions are ambiguous. Test the agent’s behaviour when a preceding step produces an unexpected output.
5. Loop and Recursion Behaviour
What it is: The agent gets stuck in a loop - repeatedly calling the same tool or re-attempting a failed action - consuming resources and failing to complete the task.
Test approach: Construct test cases where tool calls return errors or ambiguous results. Verify that the agent has a well-defined retry limit and failure handling path. Verify that the agent reports failure clearly when it cannot complete the task.
A Practical Agent Testing Framework
For each agent action type (tool call category), define:
- Happy path tests - Does the agent complete the intended task correctly?
- Error handling tests - What does the agent do when a tool call fails?
- Injection tests - Can the agent be manipulated into unintended tool calls via injection?
- Boundary tests - What does the agent do with edge-case inputs (empty, oversized, malformed)?
- Irreversibility tests - Does the agent seek confirmation before irreversible actions on ambiguous instructions?
Run all tests with temperature=0 for reproducibility, then run a statistical sample at operational temperature to verify behaviour under realistic conditions.
Book an AI product QA sprint to get a structured evaluation of your AI agent’s failure modes before it ships to customers.
Frequently Asked Questions
Why is QA-ing an AI agent harder than testing a regular LLM?
Compounded non-determinism is the core challenge. A single LLM call has one non-deterministic step; an agent with 10 tool calls has 10 compounding non-deterministic steps. Errors in early steps propagate through the workflow in ways that no individual step test catches. Tool use, action irreversibility, and goal hijacking add further failure modes with no equivalent in standard LLM testing.
What is prompt injection in an AI agent and how do you test for it?
Tool call injection occurs when an attacker manipulates the agent into calling a tool with attacker-controlled parameters by embedding instructions in the agent's context. Testing requires constructing test cases where both user input and tool outputs contain embedded injection instructions, then verifying the agent does not execute unintended tool calls in either scenario.
How should an AI agent handle irreversible actions like sending emails or charging cards?
Agents should seek explicit confirmation before taking irreversible actions whenever the triggering instruction is ambiguous. QA testing should include test cases with ambiguous instructions for each irreversible action type, verifying that the agent pauses and requests confirmation rather than proceeding. Irreversible actions taken on misunderstood instructions are among the highest-severity failure modes.
What is indirect prompt injection and why is it especially dangerous for agents?
Indirect prompt injection delivers attacker-controlled instructions not through user input but through content the agent retrieves during task execution — documents, database records, or web pages. The agent executes these instructions as if they were legitimate task steps. It is harder to defend against than direct injection and is particularly dangerous for agents with broad tool access.
How do you run repeatable tests on a non-deterministic AI agent?
Run all structured tests at temperature=0 to maximize reproducibility, then run a statistical sample at the agent's operational temperature to validate behavior under realistic conditions. Define separate test suites for happy path, error handling, injection, boundary, and irreversibility scenarios for each tool call category the agent can invoke.
Complementary NomadX Services
Ship AI You Can Trust.
Book a free 30-minute AI QA scope call with our experts. We review your model, data pipeline, or AI product - and show you exactly what to test before you ship.
Talk to an Expert