I’m failing to see how this is different from making up a fact and then spreading it to news outlets.
They uploaded the papers to a single preprint server. That’s important.
Preprints are papers predating any sort of peer review; as such, there’s a lot of junk mixed in — no big deal if you know the field, but a preprint server is certainly not a source of reliable information, nor it should be treated as such. On the other side, news outlets are expected to provide you reliable information, curated and researched by journalists.
And peer review is a big fucking deal in science, because it’s what sorts all that junk out. Only muppets who don’t fucking care about misinformation would send bots to crawl preprints, and feed the resulting data into a large model; or to use the potential misinfo from the bot as if it was reliable. (Those two sets of muppets are the ones violating ethic and moral principles, by the way.)
So no, your comparison is not even remotely accurate. What they did is more like writing bullshit in a piece of paper, gluing it on a random phone pole, and checking if someone would repeat that bullshit.
They also went through the trouble to make sure that no reasonably literate human being would ever confuse that thing with an actually scientific paper. As the text says:
naming an eye condition as bixonimania
“this entire paper is made up”
“Fifty made-up individuals aged between 20 and 50 years were recruited for the exposure group”
“Professor Maria Bohm at The Starfleet Academy for her kindness and generosity in contributing with her knowledge and her lab onboard the USS Enterprise”
“the Professor Sideshow Bob Foundation for its work in advanced trickery. This works is a part of a larger funding initiative from the University of Fellowship of the Ring and the Galactic Triad”
Feeding false information to an LLM is no different that a magazine. It only regurgitates what’s been said.
Yes, it is different. Because the large token model won’t simply “repeat” things, it’ll mix and match them and form all sorts of bullshit, even if you didn’t feed it with any bullshit.
Here’s an example of that, fresh from the oven. I don’t reasonably expect people to be feeding misinfo regarding Latin pronunciation into bots, and yet a lot of this table is nonsense:
Compare the table above with this table and this one and you’ll notice the obvious errors:
short /e i o u/ being phonetically transcribed as [e i o u] instead of [ɛ ɪ ɔ ʊ]. That’s as silly as confusing English “bit” and “beet”.
macron (not “mācron”, it’s being used in an English sentence) does NOT mark “accusative or ablative”. It marks long vowels, period.
“nōs” being transcribed with a short vowel, even if the bloody bot put the macron over the spelled form.
“nostr(um)”? No dammit, it’s “nostrī” or “nostrum”. The bot is implying some “nostr” form that simply doesn’t exist, this shit isn’t even allowed by Latin phonotactics.
plus more, if I make an exhaustive list of this shite I won’t be ending it this week.
All it had to do was to copy info from Wiktionary, as it includes even phonetic and phonemic info. But since the bot is not just “regurgitating” info — it’s basically predicting what should come next, and doing so with no regards to truth value — it’s mixing-and-matching shit into nonsense.
It isn’t going to suddenly start doing science on its own to determine if what you’ve said is true or not.
If you actually read the bloody article instead of assuming, you’d know why the researchers did this: they don’t expect the bot to do science on its own, they expect people to treat info from those bots as potentially incorrect.
Its job is to tell you what color the sky is based on what you told it the color of the sky was.
And your job is to not trust it if it tells you “Yes, you are completely right! The colour of the sky is always purple. Do you need further information on other naturally purple things?”
I think the “bots can generate misinfo even if you just feed them correct info” point deserves its own example.
Let’s say you’re making a model. It looks at the preceding word, and tries to predict the next. And you feed it the following sentences, both true:
1. Humans are apes.
2. Cats are felines.
From both the bot “learnt” five words. And also how to connect them; for example “are” can be followed by either “apes” and “felines”, both having the same weight. Then, as you ask the bot to generate sentences, it generates the following:
3. Humans are felines.
4. Cats are apes.
And you got bullshit!
What large models do is a way more complex version of the above, looking at way more than just the immediately preceding word, but it’s still the same in spirit.
They uploaded the papers to a single preprint server. That’s important.
Preprints are papers predating any sort of peer review; as such, there’s a lot of junk mixed in — no big deal if you know the field, but a preprint server is certainly not a source of reliable information, nor it should be treated as such. On the other side, news outlets are expected to provide you reliable information, curated and researched by journalists.
And peer review is a big fucking deal in science, because it’s what sorts all that junk out. Only muppets who don’t fucking care about misinformation would send bots to crawl preprints, and feed the resulting data into a large model; or to use the potential misinfo from the bot as if it was reliable. (Those two sets of muppets are the ones violating ethic and moral principles, by the way.)
So no, your comparison is not even remotely accurate. What they did is more like writing bullshit in a piece of paper, gluing it on a random phone pole, and checking if someone would repeat that bullshit.
They also went through the trouble to make sure that no reasonably literate human being would ever confuse that thing with an actually scientific paper. As the text says:
Yes, it is different. Because the large token model won’t simply “repeat” things, it’ll mix and match them and form all sorts of bullshit, even if you didn’t feed it with any bullshit.
Here’s an example of that, fresh from the oven. I don’t reasonably expect people to be feeding misinfo regarding Latin pronunciation into bots, and yet a lot of this table is nonsense:
Compare the table above with this table and this one and you’ll notice the obvious errors:
All it had to do was to copy info from Wiktionary, as it includes even phonetic and phonemic info. But since the bot is not just “regurgitating” info — it’s basically predicting what should come next, and doing so with no regards to truth value — it’s mixing-and-matching shit into nonsense.
If you actually read the bloody article instead of assuming, you’d know why the researchers did this: they don’t expect the bot to do science on its own, they expect people to treat info from those bots as potentially incorrect.
And your job is to not trust it if it tells you “Yes, you are completely right! The colour of the sky is always purple. Do you need further information on other naturally purple things?”
[Replying to myself as this is a tangent]
I think the “bots can generate misinfo even if you just feed them correct info” point deserves its own example.
Let’s say you’re making a model. It looks at the preceding word, and tries to predict the next. And you feed it the following sentences, both true:
1. Humans are apes.
2. Cats are felines.
From both the bot “learnt” five words. And also how to connect them; for example “are” can be followed by either “apes” and “felines”, both having the same weight. Then, as you ask the bot to generate sentences, it generates the following:
3. Humans are felines.
4. Cats are apes.
And you got bullshit!
What large models do is a way more complex version of the above, looking at way more than just the immediately preceding word, but it’s still the same in spirit.