Most GEO programmes die in their second quarter. The work is real, the budget is approved, the audits look credible — and then leadership asks “is this working?” and nobody has a good answer.
The problem isn't the work. It's the measurement. Classical SEO has 20 years of agreed-upon metrics; GEO has none. Teams default to whatever's easy to count, end up tracking the wrong things, and lose the political case for continued investment.
This article covers the four metrics that actually matter for GEO, the vanity metrics to ignore, and how to set up tracking that survives leadership review.
The four metrics that actually matter
1. Citation share (the headline metric)
Citation share is the percentage of your target buyer-intent queries where AI assistants name your business when answering. If you have 50 target queries and you appear in 12 of them across ChatGPT/Claude/Gemini/Perplexity, your citation share is 24%.
This metric maps directly to revenue. Every query where AI names you is a moment a prospective buyer is shortlisting vendors. Every query where AI doesn't name you is a moment they're shortlisting your competitors. Citation share is the AI-search-era equivalent of organic ranking position — and unlike rank, it's binary in a useful way: you're either in the answer or you're not.
Track citation share by:
- Platform — ChatGPT vs Claude vs Gemini vs Perplexity (different platforms, different signals)
- Query category — “best X” queries vs alternative queries vs informational queries
- Position when cited — first vendor named vs third vendor named vs buried at the end
2. Citation share trend (week over week)
A snapshot of citation share is a baseline. The metric you defend in leadership reviews is the trend. “We went from 12% to 31% citation share in 60 days, with a stable query set and consistent measurement methodology” is a story leadership can understand.
Track week-over-week. Daily is too noisy (AI responses vary slightly per query); monthly is too slow to catch problems. Weekly with a 4-week rolling average smooths the noise without lagging too much.
3. Competitor citation share
Citation share in isolation tells you whether you're visible. Citation share relative to your top competitors tells you whether you're winning or losing.
For each of your top 3-5 competitors, track their citation share on the same query set you track for yourself. The chart that matters is the gap between you and the leader, and whether the gap is closing or widening.
4. Sentiment + position quality
Being mentioned is binary. How you're mentioned isn't. AI assistants sometimes describe your product accurately, sometimes vaguely, sometimes incorrectly. Sometimes they list you first; sometimes they bury you in a footnote.
For each citation, capture:
- Position — first, second, third, or later
- Description quality — accurate, vague, or wrong
- Sentiment — positive, neutral, negative
- Recommendation strength — “the best” vs “another option”
A business cited as “the leading platform for X” produces dramatically more downstream conversion than one cited as “another option to consider.” Track the difference.
The vanity metrics that don't predict revenue
Several metrics look meaningful but don't. Avoid building dashboards around them.
Schema validation pass rate
“100% of pages have valid schema” sounds like progress. It isn't a progress metric — it's a hygiene checkpoint. Once you have schema, having more schema doesn't produce more citations. The work is implementing it once, not improving the validation rate.
Number of pages indexed by AI crawlers
Counting GPTBot or ClaudeBot visits in your access logs is interesting once. Optimising for crawler frequency isn't a goal — being citable is. A page can be crawled weekly and never cited because the content isn't cite-worthy. Tracking crawler hits is the AI version of obsessing over robots.txt syntax.
Domain authority / SEO scores
Useful for SEO, weak proxy for GEO. A page with high domain authority can be invisible to AI for category queries if its content reads as marketing rather than information. Don't mistake DA improvements for GEO progress.
Prompt count tested
“We tested 10,000 prompts last month” sounds like rigour but optimises for breadth at the expense of signal. The 50-100 queries that match your buyer's actual research patterns matter; the long tail of edge cases doesn't. A focused weekly test on a small, stable query set produces clearer signal than scattershot breadth.
AI Overview impression count
Some teams have started tracking how often Google AI Overviews appear for their target queries (regardless of whether they're cited in them). This is a market-condition metric, not a performance metric — it tells you how big the AI surface is in your category, but says nothing about whether your work is moving the needle.
Track citation share automatically
LynxAudit runs your target queries through ChatGPT, Claude, Gemini, and Perplexity weekly, tracks citation share trends across all four, and alerts you when competitors gain ground. Free first audit, two minutes.
Run Free AuditHow to set up your query list
The query list is the foundation. Get it wrong and your metrics measure the wrong thing. Three principles:
Mix the four query types
A good query list has roughly 40% best-of, 25% alternative, 20% use-case, and 15% informational. Skewing entirely to best-of misses the long-tail entity work; skewing entirely to informational misses the highest-conversion queries.
Use buyer language, not category jargon
Don't write the queries you wish your buyers asked; write the queries they actually ask. Pull from intake call notes, customer interviews, and the “people also ask” section in Google for your top keywords. Specifically avoid queries containing your brand name — those are a separate metric (branded visibility).
Lock the list, then iterate slowly
The query list should be stable for 90 days at minimum. If you add new queries every week, your trend line is meaningless because the underlying measurement basis keeps shifting. Lock 50-100 queries; review and refresh quarterly.
The measurement methodology that holds up
For each query in your list, you need to:
Manual measurement: build a spreadsheet, run each query weekly on each platform (logged out, incognito, fresh browser session), record what you see. For 50 queries × 4 platforms × 4 fields, that's 800 data points/week. Roughly 4-6 hours of work, depending on speed.
Automated measurement: tools like LynxAudit, Profound, AthenaHQ, and Otterly use the OpenAI/Anthropic/Google APIs to run the queries programmatically and parse the responses. Same data, fraction of the time. Budget around $50-300/month depending on query volume and platform coverage.
Things to control for
AI responses vary across runs even with identical inputs. To get reliable signal:
- Run each query 3-5 times, average the results
- Use the same model version over time (e.g., GPT-4o specifically, not “ChatGPT” in general)
- Strip personalisation — anonymous browser, no logged-in account, no location overrides
- Hold the temperature constant if using API access (most automated tools default to temperature 0.7; some use 0)
The dashboard that actually works
After tracking dozens of GEO programmes, the dashboard that consistently survives executive review has four cards:
“Schema markup pass rate: 98% | Pages indexed: 1,247 | AI Overview appearances: 387 | Average AI sentiment: 'positive'”
“Citation share this week: 31% (+5pp) | Trend (4-week MA): rising | Top competitor: 47% (gap closing) | Top opportunity query: 'best CRM alternatives' (we appear 0/4)”
Card 1: Citation share this week
Top-line number, with delta vs prior week. Color-coded: green if up, red if down, grey if flat.
Card 2: 12-week trend
Line chart showing weekly citation share over the last quarter. Annotate major events (content launches, product changes, competitor moves) so context survives in the chart.
Card 3: Competitive gap
Your citation share next to your top 3-5 competitors. Even one chart is enough: are you gaining, holding, or losing ground? Most boards understand this immediately.
Card 4: Top 3 opportunity queries
The 3 queries with the largest gap between you (low citation) and competitors (high citation). These are the obvious next targets for content investment. Refresh weekly.
How to handle the “is GEO working?” conversation
The leadership review you'll have at month 3 of any GEO programme follows a predictable pattern. Here's how to handle it:
What they'll ask
“What's the ROI on GEO?”
What they actually want
Evidence that the work is connected to revenue, framed in metrics they can intuit.
How to answer
Show citation share rising over time. Pair it with one or two specific deals where the prospect mentioned discovering you through AI search. Acknowledge that direct ROI attribution is imperfect and propose the gap analysis: “Our top three competitors appear in 78% of our target queries; we appear in 31%. Closing that gap is worth roughly $X/month based on these assumptions.”
If you don't have specific deals to attribute, that's also data: the work hasn't produced measurable downstream behaviour yet. Either it's too early (compounding hasn't kicked in), the wrong queries are being targeted, or GEO isn't a fit for your business. The honest read is more credible than spin.
Frequently asked questions
Should I track AI traffic in Google Analytics?
AI assistants increasingly send referrer traffic that shows up in your analytics with sources like “chatgpt.com,” “perplexity.ai,” and similar. Worth tracking as a supporting metric — confirms users are clicking through after AI cites you. But traffic volume alone underestimates GEO impact, because most AI users get their answer without clicking. Citation share remains the leading indicator; traffic is a lagging confirmation.
How many queries should I track?
50-100. Below 50 and your sample is too small to detect real changes from noise. Above 100 and the marginal queries dilute focus and make weekly tracking expensive. The 50-100 range gives stable signal with manageable measurement load.
How often should I refresh the query list?
Quarterly. Add new queries that emerge from customer conversations, retire queries that consistently return generic non-business answers (poor signal), keep the core set stable so trends remain comparable.
What's a realistic citation share target?
Depends on category competitiveness. In fragmented categories (50+ vendors competing), 20-30% citation share is dominant. In consolidated categories (5 incumbents), the leader often holds 70-90%. Map your competitive landscape first; pick the target accordingly. As a rule of thumb: aim for top-3 share in your category, regardless of absolute number.
Can I track Google AI Overviews specifically?
Yes, but separately. AI Overviews use Google's ranked organic results as their candidate pool, so the dynamics differ from open-web AI assistants. Track AI Overview citation share as a separate metric from your ChatGPT/Claude/Gemini/Perplexity blend.
How do I know if my measurement is reliable?
Run the same query manually 5 times across a week and compare to your tracking tool's results. If the manual and automated measurements agree on whether your business is named, the tool is reliable. If they disagree more than 20% of the time, your methodology has a noise problem to fix.
Bottom line
GEO measurement isn't complicated, but it requires resisting the temptation to track what's easy. Citation share is the metric. Trend is the story. Competitor gap is the political case. Quality of citation is the conversion lever.
Set up the four-card dashboard, lock a 50-100 query list, run weekly, refresh quarterly. The teams that get traction with GEO are the ones who measured what mattered from week one — and the ones who didn't are mostly the ones whose programmes died in budget review.
