Skip to main content

Optimising websites for generative AI search is the next frontier of SEO, and most marketers tasked with driving organic traffic are looking for ways to improve visibility on AI search. But what if one of the most common pieces of GEO advice is wrong?

Numerous articles are being written on the topic, and most of them have some common themes they focus on:

  • Use clean heading hierarchies
  • Write conversational content
  • Create FAQs and comparisons
  • Build topical authority
  • Add structured data schema

While all of these are strong suggestions and definitely help improve website usability, facilitate site crawl, and improve indexability of your content, our research has found the addition of data schema does not necessarily correlate to improved gen AI search visibility. But before jumping to conclusions, let’s dig a bit deeper.

The role of schema markup

Inch Blue shopping adsSchema markup is code that highlights structured data using standard annotations to help search engines understand content on a website. In simple terms, if your web page contains data that could be structured in a table – e.g. car make, model, year of manufacture, colour, then schema would allow you to signal the presence of these structured data values to search engines and LLMs, so they “understand” what that data relates to.

E-commerce sites, research publications, event listings, review sites, and podcast libraries, are just some examples of businesses that benefit from the variety of schema available. Search engines like Google crawl data within schema markup and can use it to appropriately index it for use in enhanced search results such as shopping listings, embedded video or reviews.

More recently, large language models, which are used by AI, use structured data contained within schema to populate their knowledge base.

Generative search’s data requirements

ChatGPT, Google Gemini, and other generative search systems rely on a deep semantic understanding of content to synthesize answers. When retrieving knowledge, these systems parse information on your web pages and extract facts and relationships. Generative models are more prone to “hallucinations” when the source data is inaccurate or unreliable.

Schema markup is a key mechanism for websites to contribute to the knowledge graph that generative search engines are built on top of. A poorly structured website makes it more difficult for LLMs to understand key facts such as product prices, recipe ingredients or podcast recording dates.

However, schema markup isn’t the only mechanism for AI to extract this data. And our research shows modern algorithms are capable of understanding well structured information, even without schema code.

Results of our research

We queried over 2,000 prompts on each of the three most popular AI search platforms: ChatGPT, Google AI Overviews, and Perplexity. The answers synthesized by these platforms included 9,000 citation sources. A citation is any website or brand mentioned in an AI search answer with a link to the source URL.

Our data specialists and GEO experts analysed the sources cited to detect the presence and types of schema in use on those web pages.

Does schema markup help increase brand visibility on generative search?

Broadly speaking, 81% of web pages that got cited included schema markup, and only 19% did not include any schema markup whatsoever.

Does use of schema increase likelihood of citations?

Looking at this data alone, one could jump to the conclusion that including schema is a requisite for getting cited in AI answers. But that wouldn’t be an accurate assumption.

To understand this, it’s important to identify the types of schema that matter.

Practical schema types for generative search optimisation

Not all schema markup highlights content that may be directly quoted in AI generated answers. Attributes such as Person and Organization may define the post author and company name, but they don’t provide any specific information about the page content itself.

When considering the main content that users are looking for in AI answers, other schema types are more important to markup that information:

  • HowTo – Instructions that explain how to do something through a sequence of steps.
  • FAQPage – A page presenting one or more “Frequently asked questions”.
  • Question – Specific question in an FAQ document.
  • Product – Any offered product or service.
  • Event – Any event, such as a concert or match, happening at a certain time and location.
  • Review – A review of an item such as a movie or book.

Digging into our data set, we found the use of these content-related schema to be much less frequently present in the cited sources.

Most popular schema used by AI search citation sources

Person schema was found to be most popular – 58.9% of total cited sources use this schema. This aligns with AI search’s requirement of quality and authoritative sources. Reliable authors with a reputation for producing high quality and accurate content are likely to be cited more often.

Article and ListItem schema don’t seem to matter as much. This is counterintuitive – if the use of schema were that important, one would have expected listicles with the ListItem schema to be much more prevalent in the sources cited. However 57.6% of sources cited do not use the ListItem schema, even though listicles are very popular sources, and results in AI search often get presented in list format for ease of use. Similarly, 59.3% of cited sources don’t use Article schema.

Content schema types vs % citationEven more specific schema for FAQs and products were found to have little or no bearing on source visibility in AI answers. Only 1.8% of cited sources were found to use FAQPage schema, 6.9% used Product, 3.1% used Question. HowTo and Review schema were present in less than 1% of sources cited.

Impact of schema markup on citation by generative AI search engines

Looking at the data for each of the most popular AI search engines, we found variability in the relative importance of schema to the individual platforms.

Which schema markup is important for Google AI Overview citations?

Person schema bears some importance, as 56.0% of cited sources were found to use it. Surprisingly, the number of sources cited that did not use other schema outweighed the number that did! Organization and Article schema don’t seem to be important factors for Google AIO citation. Google AI search also doesn’t care about whether or ListItem schema is used in content, even when the query asks for a list of items.

Importance of schema use in Google AI overviews

Which schema markup is important for ChatGPT citations?

ChatGPT is the only conversational AI platform whose search function values schema. The world’s most popular AI platform gives a lot of weight to Person schema, with 70.4% of sources cited including this. Trust, authority, accuracy and reliability of sources are very important for ChatGPT to ensure its users trust their answers. It is therefore no surprise that Open AI’s tool values Person schema, since identifying the author’s name allows connecting the article to the social graph of the author, which in turn provides an indication of authority and reliability of the source.

Organization and Article schema also are given some importance. ChatGPT gives more weight to ListItem schema than any other major AI search platform, but it’s not a deal-breaker for citation or mentions.

Importance of schema use in ChatGPT

Which schema markup is important for Perplexity citations?

Perplexity’s citation factors seem to be the opposite of ChatGPT. Person, Organization and Article schema have no bearing on the likelihood of getting cited. Websites without those schema were more visible on Perplexity. The platform doesn’t care about ListItem schema either, even though, answers on Perplexity can often draw on information in listicles.

Importance of schema use in Perplexity

Case studies: Real-world examples of schema-powered generative results

Checking the use of schema markup is very easy with the Schema Markup validator. This handy tool can be used to verify that your website schema has been marked up correctly, and can also be used to examine the use of schema by your competitors or other popular websites. We can use the tool to look at case studies of commonly cited websites to determine (if or) how they use schema.

Wikipedia – One of the most widely cited websites, Wikipedia uses schema sparingly. Wikis listing 50 sources, a table of top items, thumbnail images, and key facts will typically have no more than a handful of schema attributes, mostly about the Wiki article itself, but nearly nothing to help LLMs extract details about the contents of the Wiki – not even a summary description!

Reddit – Another extremely popular citation source, especially for Google AI Overviews, Reddit does not use schema.org markup on any of their pages, apart from a couple of few erroneously coded articleBody tags.

TripAdvisor – One of the most popular travel review sites, TripAdvisor uses FAQPage schema to highlight questions and answers on their popular discussion pages (e.g. What are the most popular restaurants in London). While the markup may have been put in place to answer voice search questions, it has also helped the site cut through the noise of UGC and get the top answers cited frequently on ChatGPT, Perplexity and Google.

Key takeaways

Overall, schema markup seems to matter to a very limited extent as a ranking, visibility, or citation factor to any of the main AI search platforms.  Using the proper HTML code matters a lot more. For example, tagging lists with <ul><li>…</li></ul> or numbered lists as <ol><li>…</li></ol> will achieve the same result.

That doesn’t mean you should not use schema markup. Schema is still important for organic search, and it definitely doesn’t hurt to make your key data more easily accessible and understandable to LLMs. However, if you’re not already using schema, you don’t need to panic, so long as you use semantic HTML markup and clear code.

Increase your AI Search Visibility

Methodology

ChatGPT, Google AI Overviews and Perplexity were selected as the targets of this study because they are the three most widely used AI search engines. The research began with the identification of a wide range of prompts derived from popular Google searches in the UK, Europe and USA, covering various topics from literature to ecommerce to popular culture. The prompts were run manually and in AI tools like Otterly.ai, which track AI search responses, to compare the results, ensure a degree of consistency, and ensure any potential biases are eliminated.

Citations included in the AI responses were manually checked by our team of GEO and SEO analysts for classification, and were then crawled use the Screaming Frog SEO spider, to detect the use of schema on the cited URLs. Legitimate citations had to include a website or brand mention in the AI search answer along with a link to the source URL. Any broken links, 404 (page not found) URLs or hallucinated web pages were discarded.

About the Author

Farhad is the Group CEO of AccuraCast. With over 20 years of experience in digital, Farhad is one of the leading technical marketing experts in the world. His specialities include digital strategy, international business, product marketing, measurement, marketing with data, technical SEO, and growth analytics.

Jérôme is the International Director at AccuraCast. A multilingual digital marketing specialist, with a very strong background in data and finance, Jérôme has previously worked at Euromoney and Geosys, where he did number-crunching for NASA.