Generative Engine Optimization: What the GEO Paper Actually Shows for Your Business

Tools like Perplexity, ChatGPT Search, Google AI Overviews, and Microsoft Copilot don't hand users a list of links. They read your content, synthesize it, and write an answer. Whether your website gets mentioned in that answer, and how prominently, is the new battleground for AEO, AI SEO, and AI visibility.

Researchers from Princeton University and IIT Delhi published "GEO: Generative Engine Optimization" at KDD 2024, one of the premier academic conferences in data science and machine learning. They ran 10,000 queries through AI search systems, tested nine content modification strategies, and measured their precise impact on visibility inside AI-generated answers. What follows is a breakdown of every major finding and what it means for your content strategy.

Key Takeaways

The research translates into six concrete practices:

Cite every substantial claim. The Cite Sources method produced consistent gains across domains and became more powerful in combination with other strategies. Treat every verifiable assertion as something that needs an attributed source.
Replace qualitative statements with specific numbers. "Many companies struggle with this" is far less useful to a generative engine than a named study with a precise figure. Build the habit of adding measurable evidence wherever you currently use vague descriptions.
Include relevant expert quotes. Quotation Addition was the top single strategy. Attributed quotes from researchers, practitioners, and industry reports give language models concrete, citable material to incorporate.
Write clearly. Fluency improvements produced a 28% gain with no new content added. Well-structured, readable prose is easier for a model to parse, summarize, and attribute accurately.
Match your strategy to your domain. Law and government content benefits most from statistics. History and culture content benefits most from quotes. Debate content benefits from a confident, evidence-backed tone.
Combine tactics rather than relying on one. The combination of Fluency Optimization and Statistics Addition outperformed any single method by more than 5.5%. Sustainable AI visibility comes from multiple good practices working together, not a single optimization pass.

How AI Search Engines Actually Work

To optimize for a system, you need to understand it. The paper provides a precise picture of how generative engines operate. When a user submits a query, the system runs a multi-step process:

A query reformulation model rewrites the input into several search-friendly sub-queries
A search engine retrieves the top 5 sources for those queries
A summarization model reads each source
A response-generation model synthesizes everything into a coherent answer with inline citations

The researchers formalize this as: a Generative Engine (GE) takes a user query qu and personalized user information PU and returns a natural language response r:

$$f_{GE} := (q_u, P_U) \rightarrow r$$

Overview of Generative Engines. Generative Engines primrarily consists of a set of generative models and a search engine to retrieve relevant documents. Generative Engines take user query as input and through a series of steps generate a final response that is grounded in the retrieved sources with inline attributions.

The critical observation: you are not competing for a ranked position on a results page. You are competing to be the source a language model chooses to quote, paraphrase, or attribute when it writes its answer. That is a fundamentally different challenge from traditional SEO.

‍

Rethinking What Visibility Means in AI Search

In traditional search, visibility is simple: where does your page rank? In AI search, that metric breaks down entirely. A generative engine writes a paragraph, weaves in citations from multiple sources, and might reference your content heavily in one answer and barely at all in the next.

The paper introduces two new metrics designed specifically for this environment:

Position-Adjusted Word Count

This measures how many words in the AI's response were attributed to your source, weighted by where in the response they appeared. Citations early in an answer receive more weight than those near the end, reflecting research showing that user attention follows a power-law distribution by position.

Subjective Impression

This qualitative metric, evaluated using the G-Eval methodology, scores a citation across seven dimensions: relevance to the query, influence on the response's logic, uniqueness, positional prominence, volume contributed, likelihood of a click, and information diversity.

How the Study Was Conducted: GEO-bench

Because no benchmark existed for this problem, the researchers built one. GEO-bench contains 10,000 queries drawn from nine sources: real queries from Bing and Google, essay questions from Oxford's All Souls College, the LIMA reasoning dataset, debate questions, trending Perplexity.ai queries, Reddit, and GPT-4-generated queries. It spans 25 domains with informational, transactional, and navigational queries.

For each query, they fetched the top 5 Google results and used GPT-3.5-turbo to generate an answer grounded in those sources. They then applied each GEO method to one randomly selected source, re-ran the generation, and measured the visibility change.

What Actually Works: The 9 GEO Methods

The paper evaluates nine content strategies. All relative improvements below are calculated against the unmodified baseline on the Position-Adjusted Word Count metric, drawn directly from Table 1 of the paper.

GEO Method	Pos.-Adj. Word Count	Subj. Impression	What It Does
No Optimization (Baseline)	19.3	19.3	Unmodified content
Quotation Addition ⭐	27.2 (+41%)	24.7 (+28%)	Add relevant quotes from credible sources
Statistics Addition ⭐	25.2 (+31%)	23.7 (+23%)	Replace qualitative claims with quantitative data
Fluency Optimization ⭐	24.7 (+28%)	21.9 (+14%)	Improve readability and prose flow
Cite Sources ⭐	24.6 (+28%)	21.9 (+14%)	Add inline citations from credible references
Technical Terms	22.7 (+18%)	21.4 (+11%)	Add domain-specific terminology
Easy-to-Understand	22.0 (+14%)	20.5 (+6%)	Simplify language and sentence structure
Authoritative Tone	21.3 (+10%)	22.9 (+19%)	Make text more persuasive and confident
Unique Words	20.5 (+6%)	20.4 (+6%)	Add varied, distinctive vocabulary
Keyword Stuffing ❌	17.7 (-8%)	20.2 (+5%)	Repeat query keywords (classic SEO tactic)

Source: Aggarwal et al. (2024), Table 1. GEO-bench test split, 5-seed average. Relative improvements calculated from baseline of 19.3.

The Top Strategies: Quotations, Statistics, and Citing Sources

Quotation Addition was the single highest-performing method, improving Position-Adjusted Word Count by 41% and Subjective Impression by 28% over baseline. Statistics Addition improved the same metrics by 31% and 23% respectively. Cite Sources came in close behind on both.

The underlying logic is consistent across all three: when a language model synthesizes an answer from multiple sources, it gravitates toward content that provides something concrete and attributable. A direct quote from a named expert, a specific percentage, or an explicit citation gives the model a discrete, verifiable unit to include. Vague, qualitative prose gives it much less to work with.

"Including citations, quotations from relevant sources, and statistics can significantly boost source visibility, with an increase of over 40% across various queries."
-- Aggarwal et al., GEO: Generative Engine Optimization, KDD 2024

Fluency Matters More Than Most Expect

Fluency Optimization and Cite Sources both produced roughly 28% gains on the Position-Adjusted Word Count metric. This is notable because fluency is a stylistic improvement with no new content added. The result suggests that clear, well-constructed prose is easier for a language model to parse, summarize, and attribute accurately. Dense or convoluted writing appears to work against the source, even when the underlying information is strong.

What Does Not Work: Keyword Stuffing

Keyword stuffing, the tactic of loading content with repeated query terms, scored 8% below the unmodified baseline on Position-Adjusted Word Count. In the Perplexity.ai validation, it performed 10% below baseline. This finding is consistent: generative engines use language models that understand semantics, not keyword frequency. Stuffed content reads as lower quality to the model summarizing it.

Domain Matters: Strategy Should Match Topic

One of the more nuanced findings is that GEO method effectiveness varies significantly by content domain. There is no universal tactic. The paper's Table 3 provides a breakdown of which methods perform best across query categories:

Domain / Query Type	Top Strategy	Second Best	Why It Works
Science / Health	Fluency Optimization	Statistics Addition	Clarity and evidence both rewarded
Law & Government	Statistics Addition	Cite Sources	Data-backed claims build credibility
History	Quotation Addition	Cite Sources	Direct quotes add authenticity to narrative
People & Society	Quotation Addition	Fluency Optimization	Attributed voices strengthen personal context
Debate / Opinion	Authoritative Tone	Statistics Addition	Confident, evidence-backed writing fits the format
Factual Statements	Cite Sources	Statistics Addition	Citations directly verify factual claims

Source: Aggarwal et al. (2024), Table 3. Top-performing categories for each GEO method.

The practical implication: identify your content's domain before choosing a strategy. A legal analysis page benefits most from data and statistics. A historical or cultural piece gains more from expert quotations. Opinion content benefits from a confident, evidence-backed tone.

Combining Strategies Outperforms Any Single Tactic

The researchers tested all pair combinations of the four top-performing methods: Cite Sources, Fluency Optimization, Statistics Addition, and Quotation Addition. The best single method (Quotation Addition) produced roughly 40% gains. Combining Fluency Optimization with Statistics Addition outperformed any single strategy by more than 5.5%. Cite Sources, moderately effective alone, became substantially stronger when combined with any of the other three.

the heatmap showing relative improvement from combining GEO strategies. Each cell shows the combined gain, with the rightmost column illustrating that Fluency Optimization pairs well with every other method.

The takeaway: strong AI visibility is unlikely to come from one tactic applied in isolation. Content that is clear, well-cited, evidence-rich, and includes relevant expert quotes performs better than content optimized on any single dimension.

GEO (AEO) Can Level the Playing Field for Smaller Businesses and Publishers

Perhaps the most surprising finding that comes from this paper, where the researchers optimized all competing sources for a query simultaneously. The results show a stark difference in how GEO affects sites based on their original search engine ranking:

Original Search Rank	Cite Sources	Quotation Addition	Statistics Addition
Rank 1 (top)	-30.3%	-22.9%	-20.6%
Rank 2	+2.5%	-7.0%	-3.9%
Rank 3	+20.4%	+3.5%	+8.1%
Rank 4	+15.5%	+25.1%	+10.0%
Rank 5 (lowest)	+115.1%	+99.7%	+97.9%

Source: Aggarwal et al. (2024), Table 2. Relative improvement in AI visibility by original SERP rank.

A fifth-ranked website, typically starved for traffic in traditional search, saw a 115.1% increase in AI visibility from Cite Sources alone. Meanwhile the top-ranked site lost 30.3% of its AI response share in the same scenario. The paper's explanation is that traditional search heavily weights off-page signals like backlinks, which favor large incumbents. Generative engines condition on content quality directly, which means well-structured, well-cited content can compete regardless of domain authority.

This does not eliminate the need for traditional SEO. Sources must still be retrieved before they can be cited. But once in the retrieval set, GEO methods can significantly change which sources the model draws on most heavily.

Real-World Validation on Perplexity.ai

Beyond their controlled setup, the researchers validated the top methods on Perplexity.ai using 200 real queries. The results held up: Quotation Addition improved position-adjusted visibility by 22% over baseline, Statistics Addition improved the subjective impression metric by 37%, and keyword stuffing came in 10% below the unoptimized baseline. The pattern is consistent across both environments, suggesting these strategies generalize and are not artifacts of the researchers' specific experimental setup.

Limitations to Keep in Mind

The researchers are transparent about what the study does and does not prove:

The methods were tested on two generative engines. Results may not transfer identically to every AI search system, and the field is evolving quickly.
The paper measures visibility inside AI-generated responses, not downstream outcomes like traffic or conversions. Higher citation share is likely valuable, but the commercial link is not directly measured here.
The study did not measure how GEO-driven content changes affect traditional search rankings, though the authors note that textual modifications are unlikely to affect off-page signals like backlinks.
GEO-bench may need ongoing updates as real-world query patterns evolve.

The Bottom Line

This is one of the most rigorous empirical studies published on what drives visibility inside AI-generated search responses. The core message: AEO, GEO, or AI search rewards citation-worthiness, not keyword density. Content that gets cited in AI answers is specific, well-sourced, clearly written, and evidence-rich. Content that does not get cited is vague, repetitive, and hard for a language model to attribute with confidence.

The most important mindset shift is this: your content is no longer just a destination for human readers. In a world where AI agents research on behalf of users, your content functions as a data source for machines as much as an experience for people. Businesses that build their content with that dual audience in mind are the ones that will show up in the answers their customers are already getting from AI.