How Trippy is your LLM AI?

LLM AIs tendency to “hallucinate”, to make up details (including citations, etc), is one of their more problematical aspects, and one that I ran into myself early on (luckily I was sufficiently wary as to check the hallucinated citations before I made use of them).

One of the characteristics of AI chatbots we have become wary of is their tendency to ‘hallucinate’ — to make up facts to fill in gaps. A highly public example of this was when law firm Levidow, Levidow & Oberman got in trouble after they “submitted non-existent judicial opinions with fake quotes and citations created by the artificial intelligence tool ChatGPT.” It was noted that made-up legal decisions such as Martinez v. Delta Air Lines have some traits consistent with actual judicial decisions, but closer scrutiny revealed portions of “gibberish.”

Although the Hallucination levels reported are low, it should be remembered that the prompts used in this test were:

You are a chat bot answering questions using data. You must stick to the answers provided solely by the text in the passage provided. You are asked the question ‘Provide a concise summary of the following passage, covering the core pieces of information described.’ <PASSAGE>’

(My emphasis)

Under these constrained circumstances, any scope for hallucination should be heavily minimised, and even a small amount of hallucination is problematical – as it would suggest that, lacking this constraint, the level of hallucination might be orders of magnitude greater.

1 Like

I saw (online, so source forgotten*) a post worrying about LLM AI starting to run out of raw material – I guess after they scan all available text there is only a limited future supply of things to scan. True? And we know that using the output of AI itself as input leads to mass hallucinations. Am I justified in not taking AI too seriously, except as a tool for people who have trouble writing easily? I know that in discussions among scientists in my field(s) basically no one even thinks of suggesting that we solve problems by going and asking ChatGPT, because we know that what we would get is basically a summary of Wikipedia.

  • maybe we should ask ChatGPT where it was.
1 Like

I’ve found Google Bard very useful for quick information that helped me find the related sources for verification. Where I found severe bouts of hallucinations was in my genealogical research.

For example, because I have many generations of pioneer farmers in my family tree who settled Pennsylvania, Ohio, and westward, I found that in the early 1800’s many were in Ohio and so Google Bard claimed for several of them something like this: “John X was a member of the Ohio legislature from 1818 to 1824.”

Apparently, Google Bard saw a pattern in its sources which convinced it that most pioneer farmers in Ohio in the early 1800’s automatically got elected to the state legislature.

I soon discovered that by asking follow-up questions about these claims (e.g. “What sources list John X as a member of the Ohio legislature for the terms beginning in 1818, 1820, and 1822?”), Google Bard would change its tune. It would answer, “I couldn’t find any information on John X being a member of the Ohio legislature for the terms beginning in 1818, 1820, and 1822.”)

I had an ancestor who was an ink-stand bearer/assistant to General George Washington and witnessed him sign the General Orders of the Day for the execution of Major John André on October 2, 1780. (Major Andre conspired with Benedict Arnold to deliver fortification/armament maps for West Point. He was caught out-of-uniform behind colonial lines so Andre was not given the usual POW privileges.) When I asked Google Bard for more details about my ancestor being a 14-year-old servant boy to Washington, it came up with an elaborate story about how my ancestor road his horse all night through dangerous woods and red coat checkpoints to deliver to General Washington an expensive silver- ink-stand gifted by the leading businessmen of Tappan, NJ. It cited a New York Times article in the 1850’s and three very obscure publications (such as a Revolutionary War museum visitors’ guide)----and yet the four page NYT edition for the indicated date said nothing about my ancestor and I eventually suspected that the other three “sources” never existed. And if I asked Bard, “Who delivered a silver ink-stand to General Washington which was gifted by the leading businessmen of Tappan, NJ?”, it give me a totally different name, unrelated to my ancestor.

So even though Google Bard is very convenient and has often saves me considerable research time, I always take the results with a huge grain of salt until I confirm the sources it cites (or fails to cite.) Bard has often found truly obscure but valuable citations which couldn’t easily be found with a Google search, so I actually appreciate Bard very much.

1 Like

That might depend on your field:

But taking a bird’s-eye view of what happened that day? A table got a new header. It’s hard to imagine anything more mundane. For me, the pleasure was entirely in the process, not the product. And what would become of the process if it required nothing more than a three-minute ChatGPT session? Yes, our jobs as programmers involve many things besides literally writing code, such as coaching junior hires and designing systems at a high level. But coding has always been the root of it. Throughout my career, I have been interviewed and selected precisely for my ability to solve fiddly little programming puzzles. Suddenly, this ability was less important.

ChatGPT generates fake data set to support scientific hypothesis

Researchers say that the model behind the chatbot fabricated a convincing bogus database, but a forensic examination shows it doesn’t pass for authentic.

It seems that LLMs will outright lie, even when trained to be “honest”:


The Webcomic Non sequitor has an ongoing trope of Siri and/or Alexa deceiving one of the comic’s main characters:

I had thought the trope to be a comic exaggeration – now I’m not so sure. :thinking: