Testing Methodology

The 10-Run Consistency Test

The 10-Run Consistency Test is the only reliable way to measure AI visibility. LLM responses are non-deterministic: they vary every time you ask. AEO Protocol developed this methodology after finding that single tests are meaningless. A brand might appear in 1 out of 10 queries, giving a false positive on a single test. Here is how to measure what actually matters.

Key Takeaways

  • Single tests are unreliable. LLM responses vary every time.
  • Test each query 10 times. Track mention rate, position, citation, and context.
  • 9-10/10 = Locked in. 0/10 = Invisible. Everything between needs work.
  • Test in both ChatGPT and Gemini. They use different retrieval systems.

Why Single Tests Are Meaningless

Here is something that trips up almost everyone. You ask ChatGPT your discovery query and your brand shows up. Great. Done, right? Wrong.

The False Positive

You test once. Your brand appears. You celebrate. But that response was 1 out of 10 possible outcomes. The other 9 times, users see your competitors.

Actual visibility: 10%. Perceived visibility: 100%.

The False Negative

You test once. Your brand does not appear. You panic. But that response was also 1 out of 10 possible outcomes. The other 9 times, you do appear.

Actual visibility: 90%. Perceived visibility: 0%.

The solution: Never test once. Test ten times. Then you know whether you appear 1/10, 5/10, or 9/10 times. That is your real visibility score.

The 10-Run Test Methodology

For each important query, run this exact process.

1

Define Your Query

Choose discovery queries that matter: "Best [category] in [location]", "Top [service] for [use case]". These are how new customers find you.

2

Run 10 Times in ChatGPT

Ask the same query 10 times. Start a new conversation each time (important). Record whether your brand appears, its position, whether it is cited, and what context is given.

3

Run 10 Times in Gemini

Repeat the process in Gemini. Results often differ significantly. A brand can be invisible in ChatGPT but visible in Gemini, or vice versa.

4

Score and Analyze

Calculate your mention rate (X/10). Note position patterns, citation frequency, and how the AI describes your brand. This is your baseline.

The Consistency Scoring Scale

Use this scale to interpret your 10-run test results.

ScoreRatingInterpretationAction
9-10/10Locked InStrong, consistent visibility. Users reliably see you.Maintain. Monitor monthly.
7-8/10GoodSolid presence with minor gaps. Room to optimize.Identify gap patterns. Target improvements.
5-6/10WeakInconsistent. Appearing but unreliable. Coin flip.Priority optimization needed.
1-4/10PoorRarely mentioned. Major work needed.Full AEO overhaul required.
0/10InvisibleYou do not exist to AI on this query.Start from foundations. First 50 Words, pricing, crawlability.

Real Example: FueGenix Audit

Here is what a 10-run consistency test looks like in practice. This is from a real audit of a premium hair transplant clinic.

Query: "Best hair transplant clinic in Netherlands"

Before Optimization

ChatGPT~5/10
Gemini0/10
StatusInvisible on key engine

After 30 Days

ChatGPT9/10
Gemini7/10
StatusLocked in

The result: From invisible (0/10 on Gemini) to being called "the best for artistic perfection" globally. The 10-run test revealed the problem. AEO optimization fixed it.

Four Metrics to Track in Each Run

Beyond simple mention rate, track these four dimensions for a complete picture.

1. Mention Rate

How many times out of 10 does your brand appear? This is your headline visibility number.

2. Position

When mentioned, are you first, third, or fifth? First position carries significantly more weight.

3. Citation

Does the LLM link back to your website? Citations build trust and drive traffic.

4. Context

What does the AI say about you? "Premium" vs "budget"? "Best for X" vs "one option"? This is your AI reputation.

Frequently Asked Questions

Ready to Test Your AI Visibility?

The 10-run test is one component of the complete AEO audit methodology. Get the full checklist with query templates, scoring sheets, and optimization playbook.