Agent Evals: Trajectory QualityHow to score multi-step agent behavior, tool choice, and completion efficiency.