Skip to main content

Phase 3 – Testing & Fixing

Test your AI agent with real customer questions, fix knowledge gaps, and retest until accuracy reaches at least 80% before moving to rollout.

D
Written by Dmitry
Updated over 2 months ago

This phase is about making sure your AI agent can answer real customer questions correctly before you roll it out. By testing with sample questions, fixing gaps, and aiming for consistent accuracy, you’ll capture early wins and avoid embarrassing mistakes.

Think of this phase as your dress rehearsal before going live.


Steps

1. Build your test set

Collect 30–50 simple, product-related questions.
Why it matters: A broad sample helps you spot gaps and check consistency quickly.

Where to source real questions:

  • Support tickets, email inbox, or chat transcripts

  • Help center search queries (what customers type in)

  • Common “how-to” questions your support team knows by heart

How to generate if you lack history:

  • Turn document headings into questions (“reset your password” → “how do I reset my password?”)

  • Use an AI assistant (e.g., ChatGPT) with your documents to create 30–50 realistic, customer-style questions

Note: Keep scope tight. Use only general product and documentation questions. Skip billing, cancellations, refunds, and account-specific issues.


2. Test in the Playground

Ask each question manually and track the results.
Why it matters: Direct testing shows what customers will actually see.

How to do it:

  • Enter each question in the Playground

  • Record outcome: correct, partial, or incorrect

  • Capture the source document title (and URL or ID if available)

  • Log results in a simple sheet for team review

Suggested tracking sheet columns:
question | intent/topic | outcome (correct/partial/incorrect) | source doc | notes/fix needed | owner | retest result

Example Playground interface showing a sample question and the retrieved sources used for the AI’s reply.


3. Identify gaps and issues

Review all incorrect or incomplete answers.
Why it matters: Understanding the cause prevents random fixes.

Common problems:

  • Conflicting documentation → update or consolidate overlapping pages

  • Information exists but wasn’t found → confirm it’s uploaded and well-titled; refine additional guidelines

  • Missing documentation → create or update the relevant article in the knowledge base


4. Fix with the right method

Apply targeted fixes instead of patching blindly.

Fix options:

  • Update or add documentation in the knowledge base (preferred and scalable)

  • Use Q&A sparingly for sensitive or non-public information, or as a temporary bridge until docs are updated

  • Adjust additional guidelines if the agent isn’t prioritizing the right source

Example Knowledge Base Q&A entry screen for adding exceptions or sensitive information.


5. Retest until consistent

Run the same set of questions again after each fix.

  • Repeat testing until the agent answers > 80% correctly

  • Confirm that previous “partial” answers are now complete and useful

  • Keep the tracking sheet updated and visible to the team

Continuous testing loop — test, identify gaps, fix, and retest until accuracy stabilizes.


6. Confirm readiness

Run structured tests to know when you’re ready to go live.

Weekly checks:

  • Test with 50–100 real customer questions (golden set)

  • Score each answer: Correct / Partial / Incorrect / Refused appropriately

  • Track groundedness: every answer should cite the right source

  • Spot-audit tone and refusals

Pass gates — move forward only if all are true:

  • ≥80% correct answers on the golden set

  • Fallback rate <20%

  • ≥95% of escalations include transcript + key fields

Note: Don’t over-tune. Once you hit these thresholds, stop iterating and move to Phase 4 (Embedding & Rollout). Defer other refinements to later phases.


Best Practices / Tips

  • Involve your support team — they know real customer phrasing.

  • Balance the test set across FAQs, how-tos, and edge cases.

  • Fix knowledge base documentation before using Q&A.

  • Track results in a shared sheet for clarity and alignment.

  • Aim for progress, not perfection — >80% accuracy is enough to move forward.


Common Mistakes to Avoid

  • Testing with too few questions

  • Jumping to sensitive or account-specific cases too early

  • Ignoring duplicate or conflicting documents

  • Adding too many Q&As instead of fixing the main documents

  • Forgetting to retest after changes


Cross-references


Expected outcome: Agent answers at least 80% of FAQs accurately in the Playground.

Did this answer your question?