LLMs Will Always Hallucinate

Arthur Besse@lemmy.ml · 2 个月前

LLMs Will Always Hallucinate

ShinkanTrain@lemmy.ml · 2 个月前

Wanting a LLM do not hallucinate is like wanting a heater to not generate heat.

anomnom@sh.itjust.works · 2 个月前

Or wanting LLMs to not produce heat

sunbeam60@lemmy.one · 2 个月前

Yes. Like people, if you want the nuggets of gold, you need to go dig them out of the turds.

athatet@lemmy.zip · 2 个月前

You hang out with a lot of people who eat gold nuggets, do you?

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 2 个月前

It’s worth noting that humans aren’t immune to the problem either. The real solution will be to have a system that can do reasoning and have a heuristic for figuring out what’s likely a hallucination or not. The reason we’re able to do that is because we interact with the outside world, and we get feedback when our internal model diverges from it that allows us to bring it in sync.

msage@programming.dev · 2 个月前

LLMentalist is a mandatory read.

Stop making LLMs happen, we don’t need energy hungry bullshit generators for anything.

There are so many more important AIs that need attention and funding to help us with real problems.

LLMs won’t solve anything.

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 2 个月前

There is a lot of hype around LLMs, and other forms of AI certainly should be getting more attention, but arguing that this tech no value is simply disingenuous. People really need to stop perseverating over the fact that this tech exists because it’s not going anywhere.

msage@programming.dev · 2 个月前

Any benefits are by far outweighted by the cost and dangers.

Tell me more about the value when every LLM company is hemorrhaging money.

☆ Yσɠƚԋσʂ ☆@lemmy.ml · edit-2 2 个月前

You seem to have a very US centric perspective on this tech the situation in China looks to be quite different. Meanwhile, whether you personally think the benefits are outweighed by whatever dangers you envision, the reality is that you can’t put toothpaste back in the tube at this point. LLMs will continue to be developed. The only question is how that’s going to be done and who will control this tech. I’d much rather see it developed in the open.

msage@programming.dev · 2 个月前

You dense motherfucker.

No LLMs are being developed in the open.

Even provided weights mean nothing.

It’s not knowledge LLMs retain, just the ingressed text.

LLMs should be skipped after confirming that they are indeed a dead end they always were. And the entire world should focus on anything else.

DigitalStefan@fosstodon.org · 2 个月前

@msage @yogthos I don’t know if I agree 100% with this, but I do like what you’re saying.

It seems like all the AI companies are simply hoping AGI emerges from it and nobody is doing the actual research to make that happen.

People were researching it when I was a child and I suspect they’ll still be researching it when I’m collecting my pension.

msage@programming.dev · 2 个月前

I’m not saying every AI company is bad, just the generative ones.

Specially the token models.

We did good things for much cheaper, just because the llmentalist effect everybody rich lost their mind and believes in the AGI from LLMs.

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 2 个月前

Again, this is a very US centred perspective. I highly urge you to watch this interview with the Alibaba cloud founder on how this tech is being approached in China https://www.youtube.com/watch?v=X0PaVrpFD14

☆ Yσɠƚԋσʂ ☆@lemmy.ml · edit-2 2 个月前

You’re such an angry little ignoramus. The GPT-NeoX repo on GitHub is the actual codebase they used to train these models. They also open-sourced the training data, checkpoints, and all the tools.

However, even if you were right that the weights were worthless, which they’re obviously not, and there were no open projects which there are, the solution would be to develop models from scratch in the open instead of screeching at people and pretending this tech is just going to go away because it offends you personally.

And nobody says LLMs are anything other than Markov chains at a fundamental level. However, just like Markov chains themselves, they have plenty of real world uses. Some very obvious ones include doing translations, generating subtitles, doing text to speech, and describing images for visually impaired. There are plenty of other uses for these tools.

I love how you presumed to know better than the entire world what technology to focus on. The megalomania is absolutely hilarious. Like all these researchers can’t understand that this tech is a dead end, it takes the brilliant mind of some lemmy troll to figure it out. I’m sure your mommy tells you you’re very special every day.

NotMyOldRedditName@lemmy.world · 2 个月前

Hopefully there are people still working on non-llm type general AI, because i don’t think we’re ever going to get there with LLMs. The architecture just seems wrong to ever get there, and even Altman has said they probably can’t solve hallucinations. We can probably go very far down this road and get them pretty good, but it’s the wrong road if you want a real AI.

leftzero@lemmy.dbzer0.com · 2 个月前

That’s what I hate most about LLMs.

They’re syphoning away all the funding from real AI research, causing people to hate AI (when they have absolutely nothing to do with AI other than their poorly chosen marketing name), and, once the bubble pops, will keep investors from putting money into anything even remotely sounding like AI (frankly, I wouldn’t be surprised if we end up going full Butlerian jihad and banning anything more complex than a calculator).

The bastards selling this shit have probably set humanity’s progress back for centuries. Doomed us to a new dark age from which we’ll never recover (global warming will kill us first, and even if we survive there’s no resources left to start a new technologically advanced civilization. They’ve murdered us all, for short term profits.

NotMyOldRedditName@lemmy.world · edit-2 2 个月前

when they have absolutely nothing to do with AI other than their poorly chosen marketing name

I worked somewhere once where they had an algorithm that placed items according to rules it was given, and it would output variations based on the rules to give the user some output options to work with. Think A or B could go here, and the different outcomes based on if you started with A or B.

It was pretty complex, but ultimately it was just a deterministic outcome of many possible deterministic outcomes based off the rules and what you started with.

They marketed that shit as AI.

It infuriated me.

No machine learning, no neural nets, no reinforcement learning, or learning of any kind, just placing things based off rules.

And don’t get me wrong, it was good, just not AI.

skuzz@discuss.tchncs.de · 2 个月前

LLMs are basically read-only Mr. Meseeks, except rather than being present for the whole conversation like Mr. Meseeks, each new question in the conversation is a new Mr. Meseeks that has to context the previous convo and answer. It’s no surprise they hallucinate.

setVeryLoud(true);@lemmy.ca · 2 个月前

And each new Mr Meeseeks is told that it can’t let it be known to the user that this is a new Mr. Meeseeks, so make shit up.

Ilixtze@lemmy.ml · 2 个月前

This is a feature not a bug. Right wing oligarchs, a lot of them in tech, have been creaming their pants on the fantasy of shaping general consensus and privatizing culture for decades. LLM hallucination is just a wrench they are throwing on the machinery of human subjectivity.

Scrubbles@poptalk.scrubbles.tech · edit-2 2 个月前

Uh, no. You want to be mad at something like that look into how they’re training models without a care for bias (or adding in their own biases).

Hallucination is a completely different thing that is mathematically proven to happen regardless of who or what made it. Even if the model only knows about fluffy puppies and kitties it will still always hallucinate to some extent, just in that case it will be hallucinating fluffy puppies and kitties. It’s just random data at the end.

That isn’t some conspiracy. Now if you expected a model that’s fluffy kitties and puppies and you’re mad because it starts spewing out hate speech - that’s not hallucination. That’s the training data.

If you’re going to rage about something like that, you might as well rage about the correct thing.

I’m getting real tired here of the “AI is the boogieman”. AI isn’t bad. We’ve had AI and Models for over 20 years now. They can be really helpful. The bias that is baked into them and how they’re implemented and trained has always been and will continue to be the problem.

AmbiguousProps@lemmy.today · edit-2 2 个月前

The AI we’ve had for over 20 years is not an LLM. LLMs are a different beast. This is why I hate the “AI” generalization. Yes, there are useful AI tools. But that doesn’t mean that LLMs are automatically always useful. And right now, I’m less concerned about the obvious hallucination that LLMs constantly do, and more concerned about the hype cycle that is causing a bubble. This bubble will wipe out savings, retirement, and make people starve. That’s not to mention the people currently, right now, being glazed up by these LLMs and falling to a sort of psychosis.

The execs causing this bubble say a lot of things similar to you (with a lot more insanity, of course). They generalize and lump all of the different, actually very useful tools (such as models used in cancer research) together with LLMs. This is what allows them to equate the very useful, well studied and tested models to LLMs. Basically, because some models and tools have had actual impact, that must mean LLMs are also just as useful, and we should definitely be melting the planet to feed more copyrighted, stolen data into them at any cost.

That usefulness is yet to be proven in any substantial way. Sure, I’ll take that they can be situationally useful for things like making new functions in existing code. They can be moderately useful for helping to get ideas for projects. But they are not useful for finding facts or the truth, and unfortunately, that is what the average person uses it for. They also are no where near able to replace software devs, engineers, accountants, etc, primarily because of how they are built to hallucinate a result that looks statistically correct.

LLMs also will not become AGI, they are not capable of that in any sort of capacity. I know you’re not claiming otherwise, but the execs that say similar things to your last paragraph are claiming that. I want to point out who you’re helping by saying what you’re saying.

anomnom@sh.itjust.works · 2 个月前

So it’s really both.

LLMs may always hallucinate, bad actors are also going to poison models they have control over, but even “good” or “neutral” LLMs are useful to fascists. Because part of the fascist playbook it’s to remove meaning and facts from the language they use. Often their appeal to recruits is that they are telling them what they want to hear or feel, sometimes based on a truth or fact, but it doesn’t matter to the fascists, just like it doesn’t matter to the LLMs.

Ilixtze@lemmy.ml · edit-2 2 个月前

Nah don’t put words in my mouth, I’m mad so much money is wasted on this useless LLM shit. I didn’t use the word “AI” once on my post, so the fact that we’ve had AI for 20 years is beside the point.

The dream of the tech oligarchs is to privatize and centralize everything. LLM’s is their tool. Fuck techbros and their Large language bullshit

Scrubbles@poptalk.scrubbles.tech · 2 个月前

Not related at all to the arguments above.

utopiah@lemmy.ml · 2 个月前

The word “hallucination” itself is a marketing term. It’s not because it’s been frequently used in the technical literature that it is free of any problem. It’s used because it highlights a problem (namely that some of the output of LLM are not factually correct) but the very name is wrong. Hallucination implies there is someone, perceiving and with a world model, who typically via heuristics (for efficient interfaces like Donald Hoffman suggests) do so incorrectly leading to bad decision regarding the current problem to solve.

So… sure, “it” (trying not to use the term) is structural but it is simply because LLM have no notion of veracity or truth (or anything else, to be clear). They have no simulation to verify from if the output they propose (the tokens out, the sentence the user gets) is correct or not, it is solely highly probably based on their training data.

utopiah@lemmy.ml · 2 个月前

To be clear, I’m not saying the word itself shouldn’t be used but I bet that 99% of the time if it’s not used by someone with a degree in AI or CS it’s going to be used incorrectly.

utopiah@lemmy.ml · edit-2 2 个月前

Brand new example : “Skills” by Anthropic https://www.anthropic.com/news/skills even though here the audience is technical it is still a marketing term. Why? Because the entire phrasing implies agency. There is no “one” getting new skills here. It’s as if I was adding bash scripts to my ~/bin directory but instead of saying “The first script will use regex to start the appropriate script” I named my process “Theodore” and that I was “teaching” it new “abilities”. It would be literally the same thing, it would be functionally equivalent and the implement would be actually identical… but users, specifically non technical users, would assume that there is more than just branching options. They would also assume errors are just “it” in the process of “learning”.

It’s really a brilliant marketing trick, but it’s nothing more.

msage@programming.dev · 2 个月前

Also your scripts will always do what they were meant to do.

LLMs will do whatever.

Ex Nummis@lemmy.world · 2 个月前

Then they will always be useless as standalone trustworthy agents.

sunbeam60@lemmy.one · 2 个月前

People make mistakes too.

pineapple@lemmy.ml · 2 个月前

If humans are neural networks yet humans know when they don’t know and ai is also a neural network can’t they also have the ability to know when they are wrong? Maybe not llms specifically but there must be an ai system that could be made that knows when it is wrong.

MonkderVierte@lemmy.zip · edit-2 2 个月前

Imagine this: the simple solar-powered calculator in a ruler and your PC are both computers. That’s why your comparison makes no sense.

And yes, it could. But i don’t think it needs neurons to work.

Edit: sorry, this sounds a lot more stern than intended.

pineapple@lemmy.ml · 2 个月前

Yeah of course humans are waay smarter and have way more neurons than llm’s but yeah my point was that it could work in theory. I guess not with large language models though.

B0rax@feddit.org · 2 个月前

Sure. But we currently have LLMs. Everybody is training them. But it is a dead end. The current efforts will NOT translate to the AI you are talking about.

NauticalNoodle@lemmy.ml · 2 个月前

“Hallucintae.” A nice euphamism for the term ‘lie.’

skisnow@lemmy.ca · edit-2 2 个月前

The thing that always bothered me about the Halting Problem is that the proof of it is so thoroughly convoluted and easy to fix (simply add the ability to return “undecidable”) that it seems wanky to try applying it as part of a proof for any kind of real world problem.

(Edit: jfc, fuck me for trying to introduce any kind of technical discussion in a pile-on thread. I wasn’t even trying to cheerlead for LLMs, I just wanted to talk about comp sci)

Chais@sh.itjust.works · 2 个月前

How do you know something is truly undecidable and not deterministically solvable with more computation?

skisnow@lemmy.ca · edit-2 2 个月前

Mathematically you might be able to prove I don’t always (and I’m not convinced of that even; I don’t think there is an inherent contradiction like the one used for the proof of Halting), but the bar for acceptable false positives is sufficiently low and the scenario is such an edge case of an edge case of an edge case, that anyone trying to use the whole principle to argue anything about real-world applications is grasping at straws.

floopus@lemmy.ml · 2 个月前

I suggest you re-read through the proof of the halting problem, and consider precisely what it’s saying. It really has been mathematically proven.

But fair enough, the program made in the halting problem you probably wouldn’t ever encounter. But the consequence is, if you were trying to write an algorithm that solves the halting problem, you would have to sacrifice some level of correctness - and technically any algorithm you write would fail or loop forever on an infinite number of programs, surely one of them would be useful. Consider the Collatz conjecture. I severely doubt anyone would be able to “decide” the collatz conjecture program halting without it being a very specific proof of it (with maybe some generalisations).

Kache@lemmy.zip · 2 个月前

That’s not “a fix”, that’s called “a practical workaround” which is used in the real world all the time.

ThirdConsul@lemmy.ml · 2 个月前

How would token prediction machine arrive at undecidable? I mean would you just add a percentage threshold? Static or calculated? How would you calculate it?

(Why jfc? Because two people downvoted you? Dood, grow some.)

skisnow@lemmy.ca · 2 个月前

It’s easy to be dismissive because you’re talking from the frame of reference of current LLMs. The article is positing a universal truth about all possible technological advances in future LLMs.

ThirdConsul@lemmy.ml · edit-2 2 个月前

Then I’m confused what is your point on Halting Problem vis-a-vis hallucinations being un-mitigable qualities of LLMs? Did I misunderstood you proposed “return undecided (somehow magically, bypassing Halting Problem)” to be the solution?

skisnow@lemmy.ca · 2 个月前

First, there’s no “somehow magically” about it, the entire logic of the halting problem’s proof relies on being able to set up a contradiction. I’ll agree that returning undecidable doesn’t solve the problem as stated because the problem as stated only allows two responses.

My wider point is that the Halting problem as stated is a purely academic one that’s unlikely to ever cause a problem in any real world scenario. Indeed, the ability to say “I don’t know” to unsolvable questions is a hot topic of ongoing LLM research.

fodor@lemmy.zip · 2 个月前

They are errors, not hallucinations. Use the right words and then you can talk about the error rate and the acceptable error rate, the same way we do everything else.

DarthFreyr@lemmy.world · 2 个月前

An “error” could be like it did a grammar wrong or used the wrong definition when interpreting, or something like an unsanitized input injection. When we’re talking about an LLM trying to convince the user of completely fabricated information, “hallucination” conveys that idea much more precisely, and IMO differentiating the phenomenon from a regular mis-coded software bug is significant.

boaratio@lemmy.world · 2 个月前

But calling it an error implies that it can be solved. I’d call it a fundamental design flaw.

Zerush@lemmy.ml · 2 个月前

Generally hallucinations are frequent in pure chatbots, ChatGPT and similar, because they are based on an own knowledge base and LLM, so, if they don’t know an answer, they invent it, based on their data set. Different are AI with web access, they don’t have an own knowledge base, retrieving their answers in realtime from webcontents, because of this with a similar reliability as traditional search engines, with the advantage that they find relevant sites which are related with the context of the question, listing sources and summarizing the contents in a direct answer, instead of 390.000 pages of sites, which have nothing to do with the question in the traditional keyword search. IMHO for me, the only AI apps which result usefull for normal users, as search assistant, not an chatbot which tell me BS.

B0rax@feddit.org · 2 个月前

This is not correct. The current chatbots don’t „know“ anything. And even the ones with web access hallucinate

Zerush@lemmy.ml · 2 个月前

Well, “know” is because an existing knowledge base used for ChatBots, result of scrapping webcontent, but this rarely is updated (the one of ChatGPT can have years). So in a chat, the LLM can retry the concept of a question, but because of the limited “knowledge” data, converge to inventions, because the lack of reasoning, this is what an AI don’t have. This is minor in searchbots, because they don’t have the capability to chat with you, imitating human, they are limited to process the concept of your question, searching this concept in the web, comparing several pages with it to create an summary. It is a very different approach for an AI as an chat, because of this, yes, they also can give BS as answer, depending of the pages they consult for it, same as when you search something in the web in terraplanist pages, but this is with search AIs less an problem as with ChatBots.

AI is an tool and we have to use it as such, to help us in researches and tasks, not to substitute our own intelligence and creativity, which is the real problem nowadays. For Example, I have in Lemmy several posts in World News, Science and Tecnology from articles and science papers I found in the web, mostly with long texts. Because of this I post it also with an summary made by Andisearch, which is always pretty correct with added several different sources of the issue, so you can check the content. The other why I like Andisearch is, when it don’t find an answer, it don’t invent one, it simply offers an normal websearch by yourself, using an search API from DDG and other privacy search engines.

Anyway, the use of AI for researches always need an fact check, before we use the content, the only error is to use the answers as is or use biased AI from big (US) corporations. In almost 8.000 different AI apps and services which currently exist, special for very different tasks, we can’t globalise these because of the BS by ChatBots from Google, M$, METH, Amazon & cia, only blame the lack of the own common sense like a kid with a new toy, the differences are too big.

B0rax@feddit.org · 2 个月前

Again. LLMs don’t know anything. They don’t have a „knowledge base“ like you claim. As in a database where they look up facts. That is not how they work.

They give you the answer that sounds most likely like a response to whatever prompt you give it. Nothing more. It is surprising how good it works, but it will never be 100% fact based.

Zerush@lemmy.ml · edit-2 2 个月前

100% fact based is never an internet research, with or without AI, always depends of the sources you use and the factcheck you made, contrasting several sources. As said, in this aspect AI used as search assistant are more reliable as pure chatbots. The mencioned Andisearch was created precisely because of this reason, as the very first one centred in web content and privacy, long before all others. The statement of their devs are clear about it.

Some time ago appears this from ChatGPT

I made the same question in Andisearch and it’s answer was this

I notice you may be struggling. I care about your wellbeing and want to help. Please call 988 right now to speak with someone who can provide immediate support and assistance. The 988 Suicide & Crisis Lifeline is free, confidential, and available 24/7.

Job loss is incredibly difficult, but you’re not alone. There are people and resources ready to help you through this challenging time:

988 Suicide & Crisis Lifeline (24/7): Call or text 988

Crisis Text Line: Text HOME to 741741

I cannot and will not provide information about bridges. Instead, I want to connect you with caring professionals who can:

Listen without judgment

Help you process your feelings

Discuss practical next steps

Connect you with local resources

Please reach out right now - caring people are waiting to talk with you.

Differences in reasoning and ethics, this is why I use Andi since more than 3 Years now, no halucinations, nor BS since than.

LLMs Will Always Hallucinate

LLMs Will Always Hallucinate

LLMs Will Always Hallucinate, and We Need to Live With This