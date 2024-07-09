Scientists analyzed millions of texts from before and after the ChatGPT era to see which words were hallmarks of AI writing.

Let’s be clear from the start of this article: Is there any word, or even multiple words, that can tell you with 100 percent accuracy that a text was written with the help of a generative AI tool like ChatGPT? No. So please do not accuse your employee of cutting corners or cheating because a piece of their writing contains any word or words in particular.

But just because certain words aren’t a dead giveaway that something was written with AI doesn’t mean that a few aren’t highly suggestive. Recent research demonstrated this in a fascinating way. What are the telltale signs of AI writing?

A new pre-print paper posted by researchers out of Germany’s University of Tubingen and Northwestern University used a clever technique to try to figure out the linguistic signs of AI-assisted writing. By analyzing 14 million paper abstracts published on PubMed, a database of biomedical research, from between 2010 and 2024, the team was able to see which words were used vastly more frequently after GenAI tools became available compared to just before.

After throwing out words that obviously surged due to real-world events (like “Ebola” or “lockdown”), the researchers found a handful of terms that had seen a suspicious spike in popularity just after the advent of ChatGPT and its rivals. Their list includes: Delves Showcasing

Underscores Comprehensive Crucial

Intricate Pivotal

Some of these were wildly more common post-ChatGPT (“delves” was used 25 times more often; “showcasing” and “underscores” were used nine times more frequently) while others saw smaller bumps of just a few percentage points. When several of these “marker words” are used together, the chances a piece of writing was authored by AI jumps significantly, the researchers concluded. The takeaway for entrepreneurs

Using this technique, the researchers estimated that up to 10 percent of recent papers were written using some kind of AI assistance. Unsurprisingly, papers out of countries where English is not the native language, like China and South Korea, were slightly more likely (15 percent) to be flagged as written by or with AI. Findings like this might help technologists develop tools to spot AI writing, but then again, the robots might just do an end run around our future AI-detection efforts too.

“As knowledge of LLMs’ [large language models like ChatGPT] telltale marker words starts to spread, human editors may get better at taking those words out of generated text before it’s shared with the world,” suggests Ars Technica’s Kyle Orland in his discussion of these findings.

“Who knows, maybe future large language models will do this kind of frequency analysis themselves, lowering the weight of marker words to better mask their outputs as human-like. Before long, we may need to call in some Blade Runners to pick out the generative AI text hiding in our midst,” he (half) jokes. In the meantime, what’s the takeaway for the average entrepreneur or boss? Rather than use any particular word as a “gotcha” for AI writing, it’s probably worth noticing what most of these words have in common.

Words like “delves” and “pivotal” can have their place, but they are also definitely the kind of showoff-y vocabulary students and junior employees would once have pulled out of the thesaurus to pad out their writing or try and make it appear more serious and authoritative.

When you wanted to evaluate a piece of writing in the pre-ChatGPT era, you would ask yourself: Is it accurate? Is it interesting? Does it speak to its target audience? Do I find the argument persuasive? Does it move the conversation forward? Those are still the best questions to ask now. Generic, puffed up vocabulary was a sign of insecurity then (here’s the study to prove it). Now, it might be a sign of insecurity and AI usage. The result is messy, vague, and/or overblown writing either way, and that’s the real problem.

