DOCUMENT TSC-2026/B119 · BLOG POST 119 · CONSUMER COMMERCE · REV. 01
FILED UNDER AI Visibility· Measurement· Analytics

How to actually
measure AI visibility:
the GA4 + citation
setup.

Zero-click AI answers broke attribution. Here's the measurement stack I run on my own site: the four metrics that matter, GA4's AI Assistant channel, server-log crawler monitoring, and the tools that tell you whether the assistants cite you.

Author
Taylor Sicard
Published
June 2026
Read
30 min · ~7,100 words
Ring
I · Consumer Commerce
About the author
Taylor Sicard

Early Shopify employee who built the partner program, sold getuptime.co to Tiny, and co-founded WIN Brands Group, a nine-figure DTC operator and acquirer. Runs AI-referral tracking on taylorsicard.com, which classifies traffic from ChatGPT, Perplexity, Claude, Gemini, and the rest. Advises DTC brands and Shopify ecosystem SaaS on measurement, growth, and the move from search visibility to AI visibility.

Full background →

Here's the uncomfortable truth about AI visibility in 2026: most brands have no idea whether ChatGPT, Perplexity, Gemini, or Claude recommend them, and the few who think they do are usually looking at a number that's badly wrong. The reason is simple. AI answers are mostly zero-click. The assistant reads your page, summarizes it, names you or doesn't, and the user never visits your site. Your analytics see nothing, even when the assistant just sent a buyer your way.

That breaks the attribution model the entire marketing industry was built on. For twenty years the deal was straightforward: someone searched, clicked a result, landed on your site, and you tracked the whole chain. Now a growing slice of the most valuable demand never produces a click at all. The recommendation happens inside a model you can't see into, and by the time anyone shows up, the decision is already half-made. You can't manage what you can't measure, and right now most brands literally cannot see the channel that's quietly becoming their highest-intent one.

I run AI-referral tracking on my own site, taylorsicard.com, and it classifies every visit by source, including the AI tools. So I'm not theorizing here. I've watched the data come in, watched GA4 ship its native AI channel in May, watched the gap between what the analytics show and what's actually happening in the assistants. The measurement is genuinely hard, but it's not impossible, and the brands that figure it out first get a real edge while everyone else flies blind.

There's a reason this matters beyond vanity. AI-referred traffic converts. Across 94 ecommerce brands in 2025, ChatGPT referral traffic converted at 1.81 percent versus 1.39 percent for non-branded organic search, about 31 percent higher, and Adobe reported AI-referred shoppers converting roughly 42 percent better than non-AI traffic on US retail sites in March 2026. The visitor arrives pre-qualified, because the assistant already vouched for you. So even a small number of AI-referred sessions is worth measuring carefully, because each one is worth more than the average click.

This is the full setup. I'll walk through why measurement is broken, the four metrics that actually matter, the GA4 configuration including the new AI Assistant channel, the custom events and regex that catch what GA4 misses, server-log monitoring for the AI crawlers, the citation-tracking tools and what each one really does, a manual weekly routine you can run for free, and finally how to connect a bot crawl all the way through to revenue. By the end you'll have a stack you can actually stand up, not a list of buzzwords.

Why you can't see
most of your AI
visibility.

The core problem is that AI visibility and AI traffic are two different things, and the gap between them is enormous. Visibility is whether the assistant names or cites you inside its answer. Traffic is whether someone then clicks through to your site. Most AI answers produce visibility with no traffic, because the user got what they needed inside the chat. So the slice you can see in GA4 is a small, leaky sample of the influence you're actually having.

It gets worse on the traffic side, because even the clicks that do happen are badly under-counted. Statcounter data from March 2026 found that somewhere between 35 and 70 percent of AI referral sessions arrive with no referrer information at all, which means GA4 files them under Direct rather than crediting the AI source. The assistant sent the visitor, but the analytics record it as someone who typed your URL from memory. Half your AI clicks, give or take, are hiding inside Direct traffic, mislabeled.

Why does that happen? A few reasons stack up. Some assistants strip the referrer header for privacy. Some surface answers inside a mobile app that doesn't pass referrer data the way a browser does. Some users copy a brand name out of a chat and search it separately, which shows up as organic or direct, not as AI. The technical plumbing of how these tools pass (or don't pass) attribution data simply wasn't designed for marketers to track, and it shows.

Then there's the part that never touches your analytics at all: the pure zero-click answer. A shopper asks ChatGPT for the best moisturizer for sensitive skin under 40 dollars. The model lists three brands, describes each, and the shopper picks one and goes straight to that brand's site by typing the name, or buys through an emerging agentic checkout. Your brand was either recommended or wasn't, and in neither case did your GA4 register a thing. That's the influence layer, and it's invisible to every analytics tool by design.

There's a timing reason this is urgent rather than academic. The volume is small today but compounding fast. One study tracking 94 ecommerce brands found ChatGPT sessions growing more than 1,000 percent across 2025, from roughly 1,500 sessions in January to over 18,000 in December. A channel growing at that rate, that also converts better than your existing ones, is precisely the kind of thing you want instrumented while it's still cheap to win. Measure it late and you'll be optimizing a channel your competitors already mapped. Measure it early and you get a clean read before the field crowds in.

It's worth being precise about who can and can't see what here, because the confusion costs brands real money. Your own analytics see a leaky sample of the clicks. The AI platforms see everything but share almost none of it back. The citation tools see the answers but not your traffic. Server logs see the crawls but not the answers. No vantage point sees the whole funnel, which is the structural reason measurement has to be assembled rather than bought. Anyone selling you a single dashboard that claims to capture "all your AI visibility" is overstating what's technically possible.

So the honest starting point is this: no single tool gives you the full picture, and anyone who tells you otherwise is selling something. The real job is to assemble a few partial views that, together, get you close enough to act. GA4 for the clicks it can see. A custom layer for the clicks it mislabels. Server logs for crawler behavior. Citation tools or manual checks for the influence layer that never produces a click. Stitch those together and you have a measurement system. Rely on any one of them and you have a comforting illusion.

"AI visibility and AI traffic are different things. Most answers give you visibility with no click, which is exactly why your analytics undercount the channel that matters most."

The four metrics
that actually
matter.

If you only measure one thing you'll measure the wrong thing, usually AI-referred traffic, because it's the only one that shows up in your existing dashboards. There are four metrics that matter, and three of them won't appear in GA4 at all. They are citation frequency, share of AI voice, sentiment, and AI-referred traffic. Each answers a different question, and you need all four to know where you stand.

Citation frequency is the foundational one: how often do the assistants name your brand or cite your pages when answering questions a buyer would actually ask? This isn't about a single lucky mention. It's a rate, measured across a defined set of prompts. If you ask ChatGPT, Perplexity, Gemini, and Claude your 20 most important buyer questions, how many of those answers include you? A brand cited in 2 of 20 has a very different visibility problem from one cited in 14 of 20, and you can't improve the number until you're tracking it.

One nuance the tools draw out: a mention and a citation aren't the same. A mention is the assistant naming your brand in its prose. A citation is the assistant using your page as a source, sometimes linking it, sometimes not even naming you. Both matter, but citations are harder to track because they're often invisible to the user reading the answer. You want to know both whether you're named and whether your content is the source feeding the answer, because the second one is what you can most directly influence with better pages.

Share of AI voice is citation frequency measured against your competitors on the same prompts. This is the one that actually tells you whether you're winning. If an assistant answers 100 questions about the best protein powder, how many times do you appear versus the three brands you compete with? You might be cited 30 percent of the time and feel good, until you learn a rival is cited 70 percent of the time on the identical prompts. Visibility is relative. Share of voice is the metric that keeps you honest about it.

Sentiment is whether the description is accurate and flattering, and it's the metric most brands skip and most regret skipping. An assistant can cite you frequently and describe you badly, calling you overpriced, or confusing you with a discontinued product, or repeating a years-old criticism. Sentiment moves conversion more than raw frequency does, because a buyer reading a lukewarm or wrong description won't click no matter how often you appear. Tracking only how often you show up, while ignoring how you're described, misranks your own priorities.

AI-referred traffic is the one your existing stack can partly see: the sessions, and ideally the revenue, that AI sources actually send to your site. It's the most concrete of the four because it ties to dollars, but it's also the leakiest, for all the reasons in the last section. Treat it as the bottom of the funnel, the small visible tip of a much larger influence layer, not as the whole story. The brands that obsess over this number alone are optimizing the one part of AI visibility that's least representative of the whole.

Figure 1 · The four AI visibility metrics and how to track eachMeasurement map
MetricWhat it answersHow you track it
Citation frequency
Mentions and source-citations
How often do the assistants name or cite you across buyer prompts?Manual prompt checks, or a tool (Otterly, Scrunch, Peec, Profound) that runs prompts at scale.
Share of AI voice
Your rate vs. competitors
On the same prompts, how often do you appear vs. named rivals?Same prompts run with competitors tracked. Tools automate the comparison; manual is a scorecard.
Sentiment
Accuracy and tone
Is the description right, current, and positive?Read the answers yourself, or use a tool's sentiment flagging. Catch wrong or stale claims.
AI-referred traffic
Sessions and revenue
What did AI sources actually send to your site, and did it convert?GA4 AI Assistant channel + custom channel group + store revenue. Expect heavy Direct leakage.

Which of the four should you weight most? It depends on where you are. A brand nobody's heard of in the assistants should obsess over citation frequency first, because you can't have share of voice or sentiment problems on prompts where you never appear at all. A brand that already shows up often but keeps losing to one rival should pivot to share of voice, because the absolute number looks fine and the relative one is where the leak is. A brand that appears and is described wrong, an old price, a discontinued line, a competitor's feature attributed to you, should treat sentiment as the emergency, because frequent wrong mentions actively cost you sales. The metrics aren't a checklist to track equally. They're a diagnostic that tells you which problem you actually have.

The reason to separate them this cleanly is that they fail in different ways and improve through different work. You raise citation frequency by being the source the models trust, which is the whole point of answer engine optimization for commerce. You raise share of voice by out-publishing and out-structuring the specific competitors who keep beating you on the prompts that matter. You fix sentiment by correcting the source material the models are reading. And you grow AI-referred traffic mostly as a downstream effect of the first three. Conflate them and you'll throw effort at the wrong lever.

GA4's new AI
Assistant channel,
and what it misses.

The single most useful recent development for measuring AI traffic is that GA4 finally built it in. On May 13, 2026, Google added a native AI Assistant channel to GA4's default channel group. When an incoming session has a referrer matching a recognized AI domain, GA4 now tags it with the medium ai-assistant and files it under AI Assistant automatically. No setup required. For the first time, AI traffic has its own row instead of being smeared across Referral and Direct.

To find it, open Reports, then Acquisition, then Traffic acquisition, and set the primary dimension to Session default channel group. If you've had AI traffic since the channel went live in mid-May, AI Assistant shows up as its own line. That's the starting point for everything else, and it's worth checking yours today, because most brand owners I talk to haven't looked and are surprised by how much is already there.

Now the catch, because there's always a catch. Google's published list for the native channel names ChatGPT, Gemini, and Claude. Perplexity and Copilot are not yet captured by it, so their visits stay buried in Referral where you have to dig them out manually. That's a real gap, especially because Perplexity is a meaningful answer engine for research-heavy and considered purchases. The native channel is a great floor, but it is not the ceiling, and treating it as complete will undercount you.

And of course the native channel can only ever see the clicks that arrive with a referrer. Everything from the last section still applies. The 35 to 70 percent of AI sessions that land in Direct with no referrer don't get rescued by the AI Assistant channel, because there's no referrer for it to match. So even a perfectly configured GA4 is seeing a fraction of true AI-referred traffic, and a much smaller fraction of total AI influence. The native channel reduces the mislabeling, it doesn't eliminate the invisibility.

Here's how I'd frame it honestly. The AI Assistant channel is the easiest win in this entire post: zero work, real signal, available right now. Turn to it first, confirm AI traffic exists for your site, and get a baseline. Then treat that baseline as a known undercount and build the rest of the stack to fill the gaps. The mistake would be to see the new channel, feel measured, and stop. It's the first 30 percent of the picture, delivered for free, and the remaining 70 percent is where the real work lives.

Building the custom
tracking GA4 doesn't
give you.

To catch the AI sources the native channel ignores, you build a custom channel group with a regex rule that covers every AI domain you care about, not just the three Google recognizes. In GA4 admin, under Channel groups, you create a custom grouping and add a rule that classifies a session as AI when the source matches a pattern. The pattern should include chatgpt.com, perplexity.ai, gemini.google.com, claude.ai, copilot.microsoft.com, and the others, so all of them roll up into one consistent AI bucket you control.

This is exactly the layer I run on my own site. The referral-detection script on taylorsicard.com reads the referrer on every page load, classifies it into a source type, and fires a clean event so the AI tools, social platforms, and ordinary referrers each get counted properly instead of dumped into Direct. The mechanics matter less than the principle: you want one canonical place where a visit gets labeled AI versus not, rather than trusting a dozen scattered default rules to agree. When you own the classification logic, you can add new AI domains the day they launch instead of waiting for Google.

The second piece is custom events. A raw session count tells you AI sent someone, but not what they did. So you want events that mark the high-value actions: an add-to-cart, a lead form start, a newsletter signup, a checkout. When those events carry the source classification with them, you can answer the question that actually matters, which isn't how many AI visitors you got but how many of them did something worth money. That's the difference between a vanity number and a decision-grade one.

If you're on the tag-manager pattern most serious sites use, the clean way to fire these is to push a plain object to the data layer with an event key, then have your container listen for it. I run everything through a single GTM container and the custom events push to the data layer in the object form, which keeps the page free of scattered inline tracking and lets me change what fires without touching the HTML. The specific syntax depends on your stack, but the rule of thumb is one event per meaningful action, each carrying the traffic source, so AI-attributed conversions are queryable later.

One honest caveat on all of this: the custom layer improves the clicks GA4 can see, but it can't conjure referrer data that the assistant never sent. UTM tagging helps where you control the link, for instance a URL you drop into a ChatGPT conversation or a newsletter, but you don't control how someone arrives from an organic AI recommendation. So the custom setup narrows the Direct-leakage gap, it doesn't close it. Pair it with the manual and citation-tool methods later in this post, because together they cover the influence the click-based tracking structurally can't. The pattern here mirrors what I laid out on AI-referred traffic and Shopify conversion data, where the conversion quality justifies the tracking effort.

The UTM habit that recovers lost attribution

Any time you share a link to your own site inside an AI conversation, a newsletter, or a social post, tag it. A URL with utm_source set to chatgpt or perplexity arrives fully attributed even when the platform strips its referrer, because the parameters travel in the link itself. You can't tag the organic recommendations the models make on their own, but you can tag every link you personally place, which recovers a meaningful slice of the traffic that would otherwise vanish into Direct. It's the cheapest attribution fix available, and almost nobody does it consistently.

Server logs: the one
place you can watch
the AI crawlers.

Before an assistant can cite you, a crawler has to read you, and the crawl leaves a footprint you can actually see. Your server logs or CDN logs record the user-agent string of every request, and the AI bots announce themselves in that string. Filter for them and you get a direct, unambiguous record of which AI systems are reading which of your pages and how often. This is the one measurement layer that isn't a leaky estimate, it's a literal access log, and most brands have never once looked at it.

The user-agents worth filtering for are a short, knowable list. GPTBot is OpenAI's primary crawler, gathering content for training and search, and it's consistently the highest-volume AI bot across most sites. ChatGPT-User is different: it's not a bulk crawler at all, it fetches a single page live when a ChatGPT user clicks a link or asks something that needs current information, so seeing it means a real person's query touched your page right now. OAI-SearchBot powers ChatGPT's search index. ClaudeBot collects content for Anthropic's models, and PerplexityBot indexes pages so Perplexity can cite them in its answer engine. Watch those five and you've covered most of what matters.

The scale of this is larger than most people expect. By early 2026, Cloudflare measured AI crawlers generating well over 50 billion requests a day across its network. In a two-month analysis around January 2026, Cloudflare found Googlebot still reaching far more unique URLs than the AI bots, roughly 1.76 times more than GPTBot and 1.70 times more than ClaudeBot, but the AI crawlers were no longer a rounding error. They were a real, growing share of who's reading the web, and PerplexityBot in particular crawled far fewer unique URLs than GPTBot, which tells you something about coverage gaps you might have.

What do you do with the log data once you have it? Two things. First, coverage: a page that no AI bot ever fetches is a page no assistant can cite, full stop, so a section of your site that the crawlers ignore is invisible to AI no matter how good it is. Second, freshness: ChatGPT-User hits tell you which pages are being pulled in real time to answer live questions, which is a strong signal of what's actually getting surfaced. If your most important commercial pages show heavy GPTBot and ClaudeBot activity, the models are reading you. If they show none, that's the first problem to fix, and it's usually a crawl-access or structure issue rather than a content one.

One practical caution on the logs: user-agent strings can be spoofed, so for anything you're going to act on, verify the request actually came from the operator it claims. OpenAI, Anthropic, and Perplexity publish IP ranges for their crawlers, and the clean ones support reverse-DNS verification, so a request claiming to be GPTBot from an address that doesn't resolve back to OpenAI is something pretending to be GPTBot. For routine coverage monitoring this matters little. For decisions about access, rate limits, or blocking, it matters a lot, because you don't want to fence off a real Anthropic crawler on the strength of a forged string, or trust a fake one. If you sit behind a CDN or WAF, that layer usually does the verification and gives you cleaner crawler logs than your raw origin server would.

There's a strategic decision buried in here, too: whether to let the bots in at all. Some publishers block AI crawlers to protect content. For most commerce brands that's exactly backwards, because being crawled is the precondition for being cited, and being cited is how you show up in the answers buyers are reading. Unless you have a specific reason to fence off content, you generally want GPTBot, ClaudeBot, and PerplexityBot reading your commercial pages, and your logs are how you confirm they actually are. This connects directly to making your products visible inside ChatGPT rather than invisible, which starts with the crawl.

Figure 2 · The AI crawler user-agents to monitor in your logsDetection reference
User-agentOperator & purposeWhat it signals
GPTBot
OpenAI. Bulk crawl for training and search index.Highest-volume AI bot on most sites. Broad coverage of your pages.
ChatGPT-User
OpenAI. Live fetch when a user clicks or asks something current.A real query just touched this page. Strong real-time surfacing signal.
OAI-SearchBot
OpenAI. Powers ChatGPT search results.Inclusion in the search-grounded answers, not just training.
ClaudeBot
Anthropic. Gathers content for Claude.Whether Claude can read and cite the page.
PerplexityBot
Perplexity. Indexes pages for its answer engine.Eligibility for Perplexity citations, which link back to sources.

Citation-tracking tools:
what they do, and
when you need one.

Once your prompt list grows past what you can check by hand, a citation-tracking tool earns its cost. What these tools fundamentally do is run your buyer prompts across ChatGPT, Perplexity, Gemini, Claude, and Google's AI Overviews on a schedule, then record whether you appeared, where, how you were described, and how your competitors fared on the same prompts. They turn the three invisible metrics, citation frequency, share of voice, and sentiment, into a dashboard you can track over time. They don't measure traffic, your GA4 stack does that. They measure visibility, which nothing else can.

The market has stratified into clear tiers, and the right pick depends entirely on scale rather than ambition. Otterly sits at the affordable end, starting around 29 dollars a month, and is built for small teams that want to monitor brand mentions across the AI engines and watch how they shift over time. Scrunch sits in the mid-market alongside it, aimed at teams that have outgrown spot-checks but aren't running enterprise programs. For a brand just starting to take AI visibility seriously, this tier covers the job without overspending.

At the enterprise end, Profound and Peec AI are the names that come up most, and the funding behind them tells you the category is real, not a fad. Profound has raised well over 100 million dollars at a billion-dollar valuation, and Peec AI raised around 29 million and crossed several million in ARR inside its first year. These platforms run hundreds of prompts across every engine, add prompt-level competitor analysis that shows exactly which queries trigger a rival's mention over yours, and flag when a model describes you negatively or inaccurately. They run several hundred dollars a month and are built for teams running AI visibility as an ongoing program, not a quarterly check.

A few others round out the field. Bluefish, Evertune, AthenaHQ, and a handful of newer entrants compete in the well-funded enterprise tier, while a long tail of cheaper tools handle one-time audits. The useful framing I keep coming back to is the one the comparison guides converge on: enterprise teams need Profound or Peec, mid-market teams need Otterly or Scrunch, and most small brands need a single one-time audit before they need any ongoing monitoring at all. Don't buy a 500-dollar-a-month platform to track 12 prompts you could check yourself in an afternoon.

One feature that separates the tiers, and that's worth paying for once you're serious, is prompt-level competitor analysis. The cheaper tools tell you your overall mention rate. The enterprise ones show you exactly which queries trigger a competitor's mention instead of yours, prompt by prompt. That's the difference between knowing you're behind and knowing where. Peec, for instance, leans hard on tracking the specific prompts that mirror how buyers actually ask, things like "best CRM for marketing agencies under 50 people," and flags when a model describes you inaccurately. For a brand running this as a program rather than a curiosity, that prompt-level granularity is where the tool stops being a report card and starts being a worklist.

A word on what these tools can't do, so you set expectations correctly. They sample. No tool queries every possible phrasing of every prompt across every engine continuously, because the cost and the model variance make that impossible. So a tool's citation-frequency number is itself an estimate, built from a sample of prompts run a sample of times. That's fine, a good estimate is decision-grade, but don't treat a tool's dashboard as ground truth any more than you'd treat a survey as a census. The number's value is in its trend and its competitor comparison, not in its second decimal place. Brands that fixate on small week-to-week movements in a sampled metric are reading noise.

The trap to avoid is buying a tool and mistaking the dashboard for the work. A citation tracker tells you where you stand. It does not move the number. Moving the number is a content, structure, and source-credibility job, the same discipline behind generative engine optimization versus traditional SEO. The tool is the speedometer, not the engine. I've seen brands feel productive because they bought monitoring while their actual visibility sat flat for months, because nobody did the work the dashboard was pointing at. Measure first, but measure in service of acting, not instead of it.

"A citation tracker is the speedometer, not the engine. It tells you where you stand. It does not move the number. Confusing the two is how brands pay to feel measured while staying flat."

The manual weekly
routine you can run
for free.

You don't need to spend a dollar to start measuring AI visibility, and honestly most brands should run the manual routine first regardless of budget, because it teaches you what the prompts and answers actually look like. The routine is simple: pick your 15 to 25 most important buyer prompts, ask them across ChatGPT, Perplexity, Gemini, and Claude on a fixed weekly cadence, and log three things each time, whether you appeared, where in the answer, and how you were described. That's it. Twenty prompts across four engines is roughly 80 checks, which is an hour or two a week, and it's the highest-signal hour you'll spend on this.

The hardest part is choosing the right prompts, and it's worth real care. Don't ask "what is the best [your brand]," because that's a softball that tells you nothing. Ask what a buyer who doesn't yet know you would ask, the category and problem questions: best moisturizer for rosacea under 50 dollars, most durable carry-on for international travel, top meal-kit services for a family of four. Those are the prompts where you either appear or don't, and where appearing actually wins a customer. Mirror how real buyers phrase things, because that's what determines whether you surface.

Logging matters as much as asking, because the value is in the trend, not any single check. A simple spreadsheet with a row per prompt and a column per week captures it: a yes/no for appearance, a note on position, and a quick read on sentiment. Over a month you start to see patterns, the prompts where you consistently win, the ones where the same competitor always beats you, the answers where you're named but described badly. That pattern is the entire point, because it tells you exactly where the content and structure work needs to go.

Run the competitor comparison in the same pass, because share of voice is where the real insight hides. For each prompt, log not just whether you appeared but who else did. When you see that a rival shows up on 16 of your 20 prompts and you show up on 6, you've found your gap and your target in one move. This is the manual version of what the enterprise tools automate, and for a focused brand with a tight prompt set, the manual version is genuinely good enough to drive a quarter of work.

A practical tip from running this myself: rephrase your prompts slightly week to week and across engines, because the models don't answer a fixed string deterministically, and a single phrasing can mislead you. If you only ever ask one exact wording, you're measuring that wording's luck, not your real visibility. Vary the phrasing the way actual buyers would, average across the variations, and you get a far more honest read. The discipline isn't complicated. It's just consistency, and consistency is exactly what most brands fail to maintain once the novelty wears off.

Connecting a bot
crawl all the way
through to revenue.

The whole stack only earns its keep if you can connect it to money, and the chain is more traceable than it first looks. It runs in four links: a crawler reads your page, the model cites you in an answer, a user clicks through, and that session converts. Each link has its own measurement, and stitched together they form a funnel from bot crawl to revenue. Most brands measure zero of these links. Measuring even the first and last transforms the conversation.

Link one is the crawl, which your server logs prove directly. Link two is the citation, which your prompt checks or your citation tool measure. Link three is the click, which your GA4 AI Assistant channel and custom channel group capture, minus the Direct leakage. Link four is the conversion and revenue, which your store and your custom events record. The reason to think of it as a chain is that a weak link tells you where the breakdown is: strong crawl but no citations is a content problem, strong citations but no clicks is a positioning problem, strong clicks but no conversion is a site problem.

The payoff at the revenue end is what makes the effort worth it. AI-referred traffic doesn't just exist, it converts better than almost anything else you have. The 1.81 percent ChatGPT conversion versus 1.39 percent for non-branded organic, a 31 percent edge that held in 10 of 12 months across 94 brands, isn't a fluke, it's the pre-qualification effect: the assistant already vouched for you, so the visitor arrives further down the funnel. Adobe's finding that AI-referred shoppers converted about 42 percent better than non-AI traffic in March 2026 points the same direction. When you can attribute revenue to the AI channel, those conversion premiums turn an abstract visibility goal into a budget case.

There's a volume story underneath the rate story, too, and it's moving fast. One study of 94 ecommerce brands found ChatGPT sessions growing more than 1,000 percent across 2025, from around 1,500 sessions in January to over 18,000 in December. ChatGPT drives roughly 87 percent of measurable AI referral traffic, with Perplexity, Gemini, and Claude splitting most of the rest. So the channel is both small today and compounding quickly, which is exactly the profile of something you want to be measuring early, while it's cheap to win and before your competitors have instrumented it.

The honest limitation, again, is that the click-and-convert links undercount because of the referrer leakage, so the revenue you can attribute is a floor, not the full influence. That's fine. A floor is enough to make decisions, and it's infinitely better than the zero most brands currently measure. When you can tell a finance team "the AI channel is small but converts 30-plus percent better and is growing triple digits," you've earned the right to invest in it, even knowing the true number is higher than the one you can prove. This is the bridge from AI visibility as a marketing curiosity to AI visibility as a revenue line, and it's the same logic behind treating agentic commerce as a real channel for Shopify brands.

Standing up the
measurement stack, in
three phases.

You don't build this all at once, and you shouldn't. The right sequence starts with the free, fast wins, proves the channel is real, then adds the heavier tooling only once the basics show it's worth it. Here's the phased way I'd stand it up for a brand starting from zero, designed so each phase delivers usable signal before you spend money on the next one.

PHASE 1
The free baseline
Week 1
What you do: Turn on the GA4 AI Assistant channel view, build the custom channel group with regex covering Perplexity, Copilot, and the rest, and pull your server or CDN logs to see which AI crawlers are already reading you. Start the manual prompt log with your 15 to 25 buyer prompts across four engines.

Why first: Every step here is free and fast, and together they tell you whether AI traffic and AI crawls already exist for your site. Most brands are surprised by how much is there once they actually look.
PHASE 2
Make it decision-grade
Weeks 2–6
What you do: Add the custom events that mark high-value actions carrying the source classification, so AI-attributed conversions become queryable. Establish the UTM habit for every link you place yourself. Run the manual prompt routine weekly and start the competitor share-of-voice scorecard.

Why it matters: This turns raw sessions into revenue-linked data and turns one-off prompt checks into a trend. By the end you can say what AI traffic is worth and where your visibility gaps are, with evidence.
PHASE 3
Scale the monitoring
Month 2+
What you do: Once your prompt set outgrows what you can check by hand, buy the right tool for your scale: Otterly or Scrunch for mid-market, Profound or Peec for an enterprise program. Use it to automate citation frequency, share of voice, and sentiment across hundreds of prompts and every engine.

The payoff: Automated, continuous visibility tracking that frees your time for the content and structure work that actually moves the numbers. The tool is the instrument panel; the work it points at is where the wins come from.

The sequence matters because it keeps spend tied to proof. A brand that buys a 500-dollar-a-month platform in week one, before it has even checked its GA4 channel or its logs, is spending ahead of evidence and usually under-uses the tool. A brand that runs Phase 1 free, confirms the channel is real and growing, then layers on tooling once the prompt volume demands it, spends money it can already justify. Measurement should earn its way up the cost curve, not start at the top.

One more framing worth holding onto: this stack is never "done," because the landscape moves monthly. New AI engines launch, GA4 adds and changes channels, crawlers appear and rename themselves, and the tools ship new features constantly. The GA4 AI Assistant channel itself didn't exist before mid-May 2026. So treat the stack as something you maintain, not something you finish, and budget a recurring hour to keep the regex, the prompt list, and the crawler filters current. The brands that win at AI visibility aren't the ones with the fanciest tool. They're the ones who actually keep looking.

+ + + + + + + +

Strip all of it back and the message is simple: AI visibility is measurable, just not by any single tool, and the brands that assemble the partial views into a real picture get to make decisions while everyone else guesses. Track the four metrics, not just the one that shows up in GA4. Use the native AI Assistant channel as a free floor and build the custom layer to fill its gaps. Read your server logs to watch the crawlers directly, the one place the data is clean. Run the manual prompt routine before you buy a tool, and buy the tool only when scale demands it. Then connect the chain from crawl to citation to click to revenue, so AI visibility stops being a vibe and becomes a line you can manage.

If you want this stack stood up properly for your brand, configured, baselined, and connected to revenue rather than left as a list of buzzwords, that's exactly the kind of work I do. The consumer commerce practice runs measurement like this for DTC brands, and you can see the broader thinking on the move from search to AI in the answer engine optimization work. The window where AI visibility is cheap to win is open now. It won't stay open once everyone is measuring it.

Questions from brands
starting to measure
AI visibility.

Q: How do I track AI traffic in GA4?

Since May 13, 2026, GA4 has a native AI Assistant channel that auto-tags sessions referred by ChatGPT, Gemini, and Claude. Open Reports, then Acquisition, then Traffic acquisition, and set the primary dimension to Session default channel group. Perplexity and Copilot still fall into Referral, so build a custom channel group with regex covering all the AI domains to catch the rest. Expect a real gap, because Statcounter found 35 to 70 percent of AI sessions arrive with no referrer and land in Direct, where even a perfect channel can't see them.

Q: What are the four metrics for measuring AI visibility?

Track four. Citation frequency is how often the assistants name or cite you across a set of buyer prompts. Share of AI voice is your rate against named competitors on the same prompts. Sentiment is whether the description is accurate and positive, and it moves conversion more than raw frequency. AI-referred traffic is the sessions and revenue that actually land on your site, measured in GA4 and your store. The first three you mostly can't see in analytics, which is exactly why you need a separate tracking layer for them.

Q: Which AI crawler user-agents should I monitor in server logs?

Filter your server or CDN logs for the main strings. GPTBot is OpenAI's training and search crawler and is consistently the highest-volume AI bot. ChatGPT-User fetches a single page live when a user clicks or asks something current. ClaudeBot gathers content for Anthropic, OAI-SearchBot powers ChatGPT search, and PerplexityBot indexes for Perplexity's answer engine. A page no bot fetches is a page no assistant can cite. By early 2026 Cloudflare measured AI crawlers at well over 50 billion requests a day across its network.

Q: Do I need a paid tool to measure AI visibility?

Not to start. You can run a manual weekly routine by asking ChatGPT, Perplexity, Gemini, and Claude your 15 to 25 most important buyer prompts and logging whether you appear, where, and how you're described. Paid tools automate that at scale across hundreds of prompts and engines. Otterly and Scrunch sit in the affordable and mid-market tier starting around 29 dollars a month, while Profound and Peec AI run several hundred a month and add competitor and prompt-level analysis. Start manual, then buy a tool once the prompt volume outgrows your time.

Q: Does AI-referred traffic actually convert?

Yes, and that's the reason this is worth measuring. Across 94 ecommerce brands in 2025, ChatGPT referral traffic converted at 1.81 percent versus 1.39 percent for non-branded organic, roughly 31 percent higher, and the advantage held in 10 of 12 months. Adobe reported AI-referred shoppers converting about 42 percent better than non-AI traffic on US retail sites in March 2026. The visitor arrives pre-qualified by the assistant's recommendation, which is why a low session count still matters far more than the raw number suggests.

  Work with Taylor  ·  Consumer Commerce

Want AI visibility measured, not guessed at?

I run the GA4, server-log, and citation-tracking stack on my own site, and I can stand the same setup up for your brand: baselined, connected to revenue, and pointed at the work that actually moves the numbers. The form takes two minutes.

Start a conversation More about Taylor →
Commerce Dispatch Free newsletter

Practitioner-level takes on commerce and consumer SaaS. No filler, just signal.