Skip to content Skip to navigation

AI detection in search engines

Author Benjamin Denis
|
Posted on
AI detection in search engines

If you are already using, or thinking of using, AI to generate content for your website, then you may be interested to know whether Google (or other search engines) can detect AI-generated content.

It is thought that Google previously rewarded sites with lots of text because this represented a certain investment (of time or money) – it can take at least 2 hours to write 1000 words. Now that we can generate a 1000-word document in seconds, Google can either stop rewarding content or find some way of detecting whether content was created with or without investment.

The importance of being able to detect AI produced content obviously goes way beyond SEO and copywriting. Finding a solution for detecting fake content is an important priority for Internet security, journalism and education too.

Google says it is not penalizing AI

Published in 2023, Google Search’s guidance about AI-generated content states that “appropriate use of AI or automation is not against our guidelines”. Rather than penalize content generated by AI, Google has algorithms to assess the quality of content through systems such as

Regarding spam, Google says that scaled content abuse is against policies “no matter how it’s created whether content is produced through automation, human efforts, or some combination of human and automated processes”. This can be applied to any quality signals; Google will detect and penalize bad content whether it is produced by AI or not.

Commenting the March 2024 Core Update, Roger Montti contradicted Google in an article published on Search Engine Journal, saying that “Google is penalizing AI-generated content” simply because “AI cannot meet Google’s quality thresholds” as described in the Product Reviews and Helpful Content documentation. For Montti, AI-generated content will always lack expertise, hands-on experience and originality unless it is produced with the input of experience and expertise at source. He suggests that marketers should be collaborating with AI rather than expect it to do all the work.

AI Detection tools

In the RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors study published in May 2024; 12 state-of-the art AI detectors are analyzed including 4 commercially available solutions

Although tests showed that detectors worked most of the time, the study concludes that “detectors are not yet robust enough for widespread deployment or high stakes use”. As shared by Bruce Clay in How to survive the search results when you’re using AI tools for content, there are ways of getting round AI detection that are also identified in the study.

“Detectors are not yet robust enough for widespread deployment or high stakes use” – RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors

Dogwan Lee, head of the Information Knowledge and wEb (PIKE) Lab at Penn State University – and also on the team at GPTZero – says that “the best AI solution that we built analyzes text and gives a confident answer — with 85% to 95% accuracy — as to whether content was written by a human or made with AI”. Compared to humans who could only detect distinguish AI-generated text only about 53% of the time. But he also admits that detectors are playing cat and mouse with AI tool builders as AI generation software becomes increasingly sophisticated.

We tested the articles we produced using AI in the previous chapter “Test driving AI writing generators for SEO” using some of these tools and the results were conclusive. These tools are very good at detecting content produced by AI (in this case one-click production of articles without any post-editing).

GPTZero lets you check 10,000 words per month for free and plans start at $10 per month.

First starting with text written for this eBook, GPTZero correctly classified the text as being written by a human with just a 1% probability that it was AI-generated.

GPTZero shows that it is highly confident that this text is entirely human written
GPTZero shows that it is highly confident that this text is entirely human written

Text can be analyzed by copying and pasting it from a document, but you can also import files. We imported the 15 documents generated in our tests of AI-writing tools and got 13 results back before we hit the 10,000-word limit. 12 documents were considered as produced by AI with a probability of over 50%. The naan bread recipe written by Microsoft Copilot, however, only had a 35% probability that it was written by an AI.

GPTZero dashboard showing the AI probability for uploaded documents
GPTZero dashboard showing the AI probability for uploaded documents

We then moved onto the ZeroGPT tool (with a confusingly similar name) and did the same tests. The free version, with ads, lets you analyze any text up to 15.000 characters long and seems to have no limit on how many documents you can analyze per month. This tool successfully detected all the AI-generated documents, although a post written by Google Gemini was given only a 45,72% probability of being AI. The human written content had a 0% probability of being written by AI.

ZeroGPT showing a text is written by a human (and we crop lots of ads)
ZeroGPT showing a text is written by a human (and we crop lots of ads)
ZeroGPT showing a text is written by AI and the phrases that gave that away
ZeroGPT showing a text is written by AI and the phrases that gave that away

Obviously, our tests are not as thorough as those completed by the RAID study, but they appear to prove that one-click article writing using AI can be detected fairly accurately and could be used successfully as one of many signals in a search engine to measure quality.

Compiled results from AI detectors on human and AI-generated text

Compiled results from AI detectors on human and AI-generated text
Compiled results from AI detectors on human and AI-generated text

You may want to use AI detectors to evaluate your current content, and this may include content production you have outsourced and paid for. However, before acting on results that you obtain, bear in mind the ethics statement from the RAID study: “Detecting generated text is often accusatory in nature and can frequently result in disciplinary or punitive action taken against the accused party. This can cause significant harm even when detectors are correct, but especially when they are incorrect. This is especially problematic given recent work by Liang et al. (2023c) showing that detectors are biased against non-native English writers. Our results also support this and suggest that the problem of false positives remains unsolved. For this reason, we are opposed to the use of detectors in any sort of disciplinary or punitive context and it is our view that poorly calibrated detectors cause more harm than they solve”.

Watermarking for easier AI detection

As AI technology improves, pressure is mounting for AI companies to help users identify fake content. This applies to deepfake videos, images and written text. In May 2024, Google announced that it has added watermarking to text generated by Gemini using SynthID as well as to images and videos produced by Google technology.

A piece of text generated by Gemini with the watermark highlighted in blue
A piece of text generated by Gemini with the watermark highlighted in blue

The watermark (hidden in the text by a sequence of words) allows other software like browsers, email software or search engines to identify content as AI-generated using SynthID detection. Similar software has already been added to TikTok and it is believed that OpenAI already added watermarks to ChatGPT back in 2023 (link to an article by Matt Popovic who gives tips on avoiding the watermark).

Google says that watermarking is not infallible, but we see it as an extra tool on top of AI detectors that will allow search engines to take into consideration how content was produced as a ranking factor. Unlike AI detectors watermarks should not produce false positives (human written text detected as AI-generated).

This means that watermarking will allow search engines and browsers to safely add warnings on web pages where watermarks identify AI-generated content (text, images or videos). This may discourage the use of AI to mass-produce articles, but it will be at the discretion of the end user to decide whether the fact that AI was used to produce content is important for them.

By Benjamin Denis

CEO of SEOPress. 15 years of experience with WordPress. Founder of WP Admin UI & WP Cloudy plugins. Co-organizer of WordCamp Biarritz 2023 & WP BootCamp. WordPress Core Contributor.