One more LLM

QuentinCallaghan@sopuli.xyz · 2 months ago

One more LLM

Rhaedas@fedia.io · 2 months ago

Remember that LLMs were trained on the internet, a place that’s full of confidently incorrect postings. Using a coding LLM helps some, but just like with everything else, it gives better results if you only ask for specific and small parts at a time. The longer the reply, the more probable it will drift into nonsense (since it’s all just probability anyway).

I’ve gotten excellent starting code before to tweak into a working program. But as things got more complex the LLM would try to do things even that guy who gets downvoted a lot on Stack Overflow wouldn’t dare suggest.

Thorry@feddit.org · 2 months ago

One of the things I’ve really had AI fanboys going crazy over is by asking them to feed their AI generated code back into the AI and ask for potential issues or mistakes. Without fail it points out very obvious issues and sometimes some less obvious ones as well. If your AI coder is so good, why does it know it fucked up?

This is basically what these new “agent” modes do. Just keep feeding the same thing in on itself till it finds some balance. Often using an external tool, like building the project for example, to determine if it’s done. However I’ve seen this end up in loops a lot. If all of the training data contained the same mistake (or the resulting network always produces that mistake), it can’t fix it. It will just say oh I’ve made a mistake let me fix that over and over again as the same obvious error pops out.

Rhaedas@fedia.io · 2 months ago

I’ve tried a few of the newer local ones with the visible chain of thought and thinking mode. For some things it may work, but for others it’s a hilarious train wreck of second guessing, loops, and crashes into garbage output. They call it thinking mode, but what it’s doing is trying to stack the odds to get a better probability hit on hopefully a “right” answer. LLMs are the modern ELIZA, convincing on the surface but dig too deep and you see it break. At least Infocom’s game parser would break gracefully.

Valmond@lemmy.world · edit-2 2 months ago

[wrong thread]