AI agent QA is harder than LLM QA - tool use, multi-step flows, and compounded non-determinism create unique failure …