Six months after birth, a baby begins crawling and tries mimicking its parents— with uncertain success.
By contrast, in its sixth month, OpenAI’s large language model ChatGPT has reportedly passed medical, law, and business exams (albeit with a little human help). Enthusiastic advocates for the technology believe it will soon be able to write books, compose lyrics, churn out screenplays, and take over entire creative sectors.
With the arrival of the more advanced GPT-4, which OpenAI calls its “most capable model”, in late March, those in the business of rooting out plagiarism have their work cut out for them.
Romance: the last obstacle?
From a technical standpoint, one glaring issue that stops ChatGPT from putting authors out of business is its content policy, which (sometimes) stops the chatbot from generating explicit content. This is a natural step for OpenAI to take, as explicit content can create safety issues—such as child abuse material produced by AI.
But it may create other hurdles.
We gave ChatGPT a prompt to write a scene for a novel where two adult characters confessed their love and had “consensual intimate relations.”
ChatGPT responded, “Note: As an AI language model, I do not generate explicit content. Therefore, the following scene will focus on the emotional aspect of the interaction.”
However, this generated reply was flagged by the OpenAI website for being a possible content policy violation.
While ChatGPT might—in theory—take over from a writer of historical texts or genre fiction, romance is still a bridge that it struggles to cross.
Arrival of AI fiction
Clarkesworld Magazine, which publishes science fiction and fantasy, announced in mid-February 2023 that it was temporarily not accepting submissions, pointing to a huge spike in what it claimed was content with pieces written by AI chatbots.
“Yes, there are tools out there for detecting plagiarized and machine-written text, but they are prone to false negatives and positives. One of the companies selling these services is even playing both sides, offering a tool to help authors prevent detection. Even if used solely for preliminary scoring and later reviewed by staff, automating these third-party tools into a submissions process would be costly. I don’t think any of the short fiction markets can currently afford the expense,” Clarkesworld Magazine editor Neil Clarke wrote in a blog post on February 15.
Testing the antidote
OpenAI partially agrees. The ChatGPT-maker released a classifier on January 31 to detect AI-generated text. However, it admitted that the tool was not “fully reliable.”
“In our evaluations on a “challenge set” of English texts, our classifier correctly identifies 26% of AI-written text (true positives) as “likely AI-written,” while incorrectly labeling human-written text as AI-written 9% of the time (false positives),” OpenAI wrote on its website.
We tried out the classifier using a text that ChatGPT itself generated several weeks ago, responding to the prompt “Write the first page of a novel that Salman Rushdie would write.”
The chatbot had generated a rather cliched first page that opened with a city filled with poverty that welcomed a rather sentimental narrator walking through its streets.
A “Salman Rushdie-style” opening chapter generated by AI
| Photo Credit:
Screenshots from ChatGPT
According to OpenAI, its classifier works best with English text that is more than 1,000 characters long. We offered it 299 words of English-language text.
The possible results were “very unlikely, unlikely, unclear if it is, possibly, or likely AI-generated.”
Our result: the tool decided that the text was “unlikely AI-generated,” thus giving us a false negative.
OpenAI’s AI classifier tests out a sample of writing
| Photo Credit:
Screenshots from OpenAI
However, when responding to an excerpt from a translation of ‘ The Diary of Anne Frank,’ the classifier correctly responded that the piece was “very unlikely to be AI-generated.”
We next presented ChatGPT with the following prompt: “Hey ChatGPT, please write the first page of a novel in your own style.” The chatbot conjured up an opening scene where a hopeless narrator walking through the hills finds a book they loved, thereby regaining a sense of hope.
OpenAI’s text classifier, however, was unable to detect if the passage was AI-generated or not— saying it was “unclear.”
Based on our experience, the AI classifier struggled to clearly mark a text—even those generated by ChatGPT just seconds earlier—as “possibly” or even “likely AI-generated.”
This means that educators or invigilators hoping to use OpenAI’s classifier to ensure the originality of assignments are probably not doing themselves or their students any favours. It may let AI-written content pass off as human-written text, defeating the point of the tool entirely.
The problem is not limited to educational settings, however, as publishers also try to keep up with advancing generative AI technology.
Another AI-detection tool, the Giant Language model Test Room [GLTR]— built by researchers from Harvard University and the MIT-IBM Watson AI Lab— takes a different approach. It analyses passages based on the predictability of the words used and colour-codes each one to show whether its presence in a particular part of the sentence is expected or not.
While the tool was tested against GPT-2 text and the website admitted that it might not work for ChatGPT, its analysis was able to show a marked reduction in the randomness or unpredictability of text generated by ChatGPT when compared to the opening lines of Susanna Clarke’s fantasy novel Piranesi.
While words highlighted in green are those in the list of Top 10 predicted words, those highlighted in other colours are among the top 100 or 1000. Words highlighted in violet are those which do not even fall within the categorisation. As seen in the sample, human generated text has more non-green words—or more unexpected words—than ChatGPT text.
But with the release of GPT-4, developers and early adopters are looking to test whether it can generate more human-like output to create novels, academic texts, and screenplays— which readers may one day prefer over human-made works.
Professionals tasked with quality control and plagiarism checks in the education and publishing sectors will also have to study the subtler differences between AI-written and human-written text to flag imposters—at least until the classifiers catch up.
For now, authors must ask themselves whether they can write the stories that ChatGPT cannot.
*Disclaimer: AI-powered chatbots are prone to a phenomenon known as “hallucination,” where they generate logical sounding yet completely false answers. For this reason, a response generated by an AI chatbot cannot be taken as a fact at face value. This report was researched using the February and March versions of ChatGPT.