The 10-Run Consistency Test
The 10-Run Consistency Test is the only reliable way to measure AI visibility. LLM responses are non-deterministic: they vary every time you ask. AEO Protocol developed this methodology after finding that single tests are meaningless. A brand might appear in 1 out of 10 queries, giving a false positive on a single test. Here is how to measure what actually matters.
Key Takeaways
- Single tests are unreliable. LLM responses vary every time.
- Test each query 10 times. Track mention rate, position, citation, and context.
- 9-10/10 = Locked in. 0/10 = Invisible. Everything between needs work.
- Test in both ChatGPT and Gemini. They use different retrieval systems.
Why Single Tests Are Meaningless
Here is something that trips up almost everyone. You ask ChatGPT your discovery query and your brand shows up. Great. Done, right? Wrong.
The False Positive
You test once. Your brand appears. You celebrate. But that response was 1 out of 10 possible outcomes. The other 9 times, users see your competitors.
Actual visibility: 10%. Perceived visibility: 100%.
The False Negative
You test once. Your brand does not appear. You panic. But that response was also 1 out of 10 possible outcomes. The other 9 times, you do appear.
Actual visibility: 90%. Perceived visibility: 0%.
The solution: Never test once. Test ten times. Then you know whether you appear 1/10, 5/10, or 9/10 times. That is your real visibility score.
The 10-Run Test Methodology
For each important query, run this exact process.
Define Your Query
Choose discovery queries that matter: "Best [category] in [location]", "Top [service] for [use case]". These are how new customers find you.
Run 10 Times in ChatGPT
Ask the same query 10 times. Start a new conversation each time (important). Record whether your brand appears, its position, whether it is cited, and what context is given.
Run 10 Times in Gemini
Repeat the process in Gemini. Results often differ significantly. A brand can be invisible in ChatGPT but visible in Gemini, or vice versa.
Score and Analyze
Calculate your mention rate (X/10). Note position patterns, citation frequency, and how the AI describes your brand. This is your baseline.
The Consistency Scoring Scale
Use this scale to interpret your 10-run test results.
| Score | Rating | Interpretation | Action |
|---|---|---|---|
| 9-10/10 | Locked In | Strong, consistent visibility. Users reliably see you. | Maintain. Monitor monthly. |
| 7-8/10 | Good | Solid presence with minor gaps. Room to optimize. | Identify gap patterns. Target improvements. |
| 5-6/10 | Weak | Inconsistent. Appearing but unreliable. Coin flip. | Priority optimization needed. |
| 1-4/10 | Poor | Rarely mentioned. Major work needed. | Full AEO overhaul required. |
| 0/10 | Invisible | You do not exist to AI on this query. | Start from foundations. First 50 Words, pricing, crawlability. |
Real Example: FueGenix Audit
Here is what a 10-run consistency test looks like in practice. This is from a real audit of a premium hair transplant clinic.
Query: "Best hair transplant clinic in Netherlands"
Before Optimization
After 30 Days
The result: From invisible (0/10 on Gemini) to being called "the best for artistic perfection" globally. The 10-run test revealed the problem. AEO optimization fixed it.
Four Metrics to Track in Each Run
Beyond simple mention rate, track these four dimensions for a complete picture.
1. Mention Rate
How many times out of 10 does your brand appear? This is your headline visibility number.
2. Position
When mentioned, are you first, third, or fifth? First position carries significantly more weight.
3. Citation
Does the LLM link back to your website? Citations build trust and drive traffic.
4. Context
What does the AI say about you? "Premium" vs "budget"? "Best for X" vs "one option"? This is your AI reputation.