How Much Do LLMs Hallucinate in Document Q&A Scenarios? A 172-Billion-Token Study Across Temperatures, Context Lengths, and Hardware Platforms [TLDR: 25%]

RandAlThor@lemmy.ca · edit-2 5 hours ago

How Much Do LLMs Hallucinate in Document Q&A Scenarios? A 172-Billion-Token Study Across Temperatures, Context Lengths, and Hardware Platforms [TLDR: 25%]

[deleted]@piefed.world · edit-2 3 hours ago

Calculators are correct 100% of the time.

FauxLiving@lemmy.world · 2 hours ago

Calculators are not people, Mr. <1.19%.

[deleted]@piefed.world · edit-2 2 hours ago

That’s right! We should be comparing computers to computers. Well, hardware computers, not people computers.

FauxLiving@lemmy.world · 2 hours ago

Calculators are not computers, computers contain calculator-like elements but a calculator is no more a computer than a passenger jet is a coffee shop by virtue of having a coffee pot onboard.

Calculators cannot fabricate answers, but nor are they 100% correct due to things like bitflips and square root approximations. They also cannot write text, so the comparison would make even less sense.

LLMs and Humans can fabricate answers in written text so comparing the fabrication rate in written text of an LLM to a human (both entities which generate their answers with neural networks) makes more sense than to compare either to a calculator which neither uses a neural network or produces text.

So ‘we’ should compare like things and not choose items based on superficial similarities.

ji59@hilariouschaos.com · 2 hours ago

What do you even mean? Calculators and LLMs are solving different problems. And there are a lot of calculators and a lot of LLMs. Also, calculator accuracy could be approaching 0% because they all have limited precision and there are infinite numbers. Some of the calculators even can’t correctly answer 0.1+0.2, while most LLMs can do that.