Imagine running a paid campaign with no conversion tracking. Or optimizing a website for search without visibility into sessions. (I know: the horror!) For a lot of marketers, this is the state of Answer Engine Optimization (AEO) right now.
A quick refresher: AEO is the practice of optimizing your content so AI answer engines can find and cite it in generated responses. Though the channel is new, the experience of uncovering what influences an algorithm will be familiar to any marketer who has practiced the experiment > measure > repeat loop.
The difference is that AEO has no established best practices yet. And unlike SEO or paid media, where tools like Google Search Console, Google Ads, and Business Manager provide crucial data, AI platforms like ChatGPT, Perplexity and Claude simply aren’t giving us that kind of information.
You're not fully out of luck though! There is data that we can use to measure performance and help optimize. We'll simply have to build it from scratch.
Roll up your sleeves
Here are the three components we're going to need. Together, they form a pipeline, with each one answering a different question:
- Server logs: tell you whether your content was considered
- Prompt tracking: tells you whether you were cited
- Web analytics: tell you whether that citation drove someone to your site
You need all three to get a complete picture.
Before we get into each one, a word of warning: this is a more technically demanding measurement configuration than most marketers have encountered. Server logs in particular sit outside the typical marketing stack entirely. That barrier is real, but so is the cost of flying blind — and right now, just about everyone is doing exactly that.
So buckle up, because we’re going in.
Server Logs
When an answer engine generates a response it will:
- Perform a search
- 'Read' the pages in the search results
- Use that content to generate a response
Not every page that is ‘read’ is cited though. In fact, AirOps discovered that only 15% of pages that are crawled end up being cited.
But while you may not have been cited, answer engines read your page and that crawl shows up in your server logs.
Those requests to pages in your server log are a lot like impressions. Importantly, the LLMs use different bots for crawls originating from prompts than for model-building. If certain pages are being crawled frequently by prompt-driven bots and others aren't, that tells you something meaningful about where your content is and isn't resonating with AI systems.
This is also the most empirically reliable component of the framework. A bot visit either happened, or it didn't.
The big challenge here is access. Marketers have not typically worked with this data, and getting it might require new tools and collaboration with teams you may not have a strong relationship with yet. But this data’s value makes it worth the headache.
Web Analytics
At the other end of the pipeline is web analytics — traffic to your website through citations in AI-generated responses. These are real visitors who went from a generated response to your website, and they represent the most tangible business outcome in the framework.
Like server logs, this data is empirical: either someone clicked through from an AI citation, or they didn't. But for optimization, it’s incomplete.
You’ll see the page the user landed on and the platform they came from (ChatGPT, Perplexity, Gemini etc), but not the prompt they used. You can extrapolate that the prompt has to be relevant to the content of the landing page, but you won’t know which chunks were cited or quoted. (This is where iterative prompt tracking can be valuable).
Once users have entered your site through an answer engine, you’ll have all the same tracking and measurement you usually do to analyze visitor behavior. You can even validate whether your answer engine traffic has a higher conversion rate than other channels, as has been widely reported.
Prompt Tracking
Prompt tracking, or measuring whether you are cited for specific queries, is the new rank tracking. It sits in the middle of our pipeline, and it’s the most important layer for understanding why your performance looks the way it does.
It is also the least empirical of the three, and that distinction matters.
Unlike server logs or GA sessions, prompt tracking is inherently inferential. You are choosing the prompts to measure, and you’re doing this without data. In other words, you’re guessing.
This could mean that you are cited 100% of the time for your top prompt, and you get zero traffic from it, because no one is asking that question. Keyword data and tools like Google Search Console and AlsoAsked can improve your guesses, but they’re still guesses.
Critically, this doesn't mean prompt tracking isn’t valuable. Reviewing generated responses is extremely useful, and it can help you figure out which passages from a page are being cited if you know that a page is being crawled.
The key here is to build your prompt tracking strategy on some form of data, but always keep in mind that the prompts are nothing more than conjecture.
How to use this framework
The framework is most powerful when you read across all three layers:
- High server log activity with low prompt visibility suggests your content is being considered but not selected, a content quality or relevance problem.
- High prompt visibility with low web analytics traffic suggests you're being cited but not in a way that's driving action. That’s possibly a brand awareness or query type issue.
The gaps between layers are often where the most useful optimization insights live.
If the full configuration isn't immediately achievable, start with web analytics and prioritize getting access to server logs. Prompt tracking without the other two layers is context without a pipeline — useful, but limited.
The brands that figure out how to measure this properly will have a significant advantage. The data may not tell them everything, but most others don’t have any data at all.
Ready to stop guessing about AEO? Orchestra can help. Reach out to us to learn more.
