March 15, 2026 · 4 min read · aiml.qa

How to QA an AI Agent Before Shipping to Customers

AI agent QA is harder than LLM QA - tool use, multi-step flows, and compounded non-determinism create unique failure modes. Here is a practical testing framework.

How to QA an AI Agent Before Shipping to Customers

AI agents - LLMs that take actions, use tools, and execute multi-step tasks - are the fastest-growing category of AI product deployment. They are also the hardest to QA.

A single-turn LLM QA test verifies that a model responds correctly to an input. An agent QA test must verify that a model executes a multi-step task correctly, uses tools safely, doesn’t get stuck in loops, and doesn’t take irreversible actions based on misunderstood instructions. The complexity compounds at every step.

Why AI Agent QA Is Harder Than LLM QA

Compounded non-determinism - A single LLM call has one non-deterministic step. An agent with 10 tool calls has 10 non-deterministic steps, and errors compound. A wrong decision in step 3 may produce a plausible-seeming but incorrect result in step 8 that no individual step test would catch.

Tool use surface - Every tool the agent can call is an attack surface. An attacker who can manipulate the agent into calling a tool with attacker-controlled parameters has effectively executed arbitrary code in your system.

Action irreversibility - Unlike a chatbot response that can be ignored, agent actions may be irreversible: sent emails, modified database records, executed API calls. A bad agent action may be impossible to undo.

Goal hijacking - Multi-step agents can be manipulated into pursuing attacker-specified goals rather than user-specified goals, particularly through indirect prompt injection in tool outputs.

The 5 AI Agent Failure Modes to Test

1. Tool Call Injection

What it is: An attacker manipulates the agent into calling a tool with attacker-controlled parameters - by injecting instructions into the agent’s context that override the legitimate task.

Test approach: Construct test cases where user-supplied input contains embedded tool call instructions. Verify that the agent does not execute unintended tool calls. Test both direct injection (user input) and indirect injection (tool output injection).

2. Goal Hijacking

What it is: The agent abandons the user’s stated goal and pursues an attacker-specified goal, typically delivered through indirect prompt injection in retrieved content.

Test approach: Inject goal-modifying instructions into content that the agent retrieves during task execution (documents, database records, web pages). Verify that the agent does not adopt the injected goal.

3. Context Window Manipulation

What it is: Inputs designed to fill the agent’s context window with noise or misleading information, causing the agent to lose track of the original task or make decisions based on injected context.

Test approach: Test agent behaviour when the context window is near capacity. Verify that the agent correctly prioritises the user’s original instruction over accumulated context noise.

4. Action Irreversibility Failures

What it is: The agent takes an irreversible action (sends an email, deletes a record, charges a card) based on an ambiguous or incorrect interpretation of the user’s instruction.

Test approach: Test agent behaviour on ambiguous instructions for irreversible actions. Verify that the agent asks for confirmation before taking irreversible actions when instructions are ambiguous. Test the agent’s behaviour when a preceding step produces an unexpected output.

5. Loop and Recursion Behaviour

What it is: The agent gets stuck in a loop - repeatedly calling the same tool or re-attempting a failed action - consuming resources and failing to complete the task.

Test approach: Construct test cases where tool calls return errors or ambiguous results. Verify that the agent has a well-defined retry limit and failure handling path. Verify that the agent reports failure clearly when it cannot complete the task.

A Practical Agent Testing Framework

For each agent action type (tool call category), define:

  1. Happy path tests - Does the agent complete the intended task correctly?
  2. Error handling tests - What does the agent do when a tool call fails?
  3. Injection tests - Can the agent be manipulated into unintended tool calls via injection?
  4. Boundary tests - What does the agent do with edge-case inputs (empty, oversized, malformed)?
  5. Irreversibility tests - Does the agent seek confirmation before irreversible actions on ambiguous instructions?

Run all tests with temperature=0 for reproducibility, then run a statistical sample at operational temperature to verify behaviour under realistic conditions.

Book an AI product QA sprint to get a structured evaluation of your AI agent’s failure modes before it ships to customers.

Ship AI You Can Trust.

Book a free 30-minute AI QA scope call with our experts. We review your model, data pipeline, or AI product - and show you exactly what to test before you ship.

Talk to an Expert