Wikimedia Foundation's plans to introduce AI-generated article summaries to Wikipedia

ɯᴉuoʇuɐ@lemmy.dbzer0.com · edit-2 10 months ago

Wikimedia Foundation's plans to introduce AI-generated article summaries to Wikipedia

warm@kbin.earth · 10 months ago

If they add AI they better not ask me for any money ever again.

6nk06@sh.itjust.works · 10 months ago

Or moderators. Why would they need those people when the AI can fix everything for free and even improve articles?

Monument@lemmy.sdf.org · 10 months ago

Right! I can’t wait to hear about all the new historical events!

I wonder if anyone witnessed the burning of the Library of Alexandria and felt a similar sense of despair for the future of knowledge.

arrow74@lemm.ee · 10 months ago

You can download a copy of Wikipedia in full today before they turn it to shit.

Unlike the people in Alexandria, you can spend less that $20 and 20 minutes to download the whole thing and preserve it yourself

Monument@lemmy.sdf.org · 10 months ago

You are a light in the darkness.

SCmSTR@lemmy.blahaj.zone · 10 months ago

Holy shit kbin is still around??

warm@kbin.earth · 10 months ago

Kbin.earth is on mbin, I think kbin is dead.

SCmSTR@lemmy.blahaj.zone · 10 months ago

I am so sad. I really liked what kbin was trying to do.

warm@kbin.earth · 10 months ago

Mbin is a fork and continuation of /kbin, but community-focused.

Kbin was destined to fail without opening up to community collaboration. I greatly preferred it over lemmy. So I will stick with Mbin now and Kbin.earth has been a small but nice Mbin instance.

Cheradenine@sh.itjust.works · 10 months ago

Wikipedia articles already have lead in summaries.

Fuck right off with this

A future experiment will study ways of editing and adjusting this content.

MDCCCLV@lemmy.ca · 10 months ago

A lot of them for the small articles and stubs are written very technically and don’t provide an explanation for complex subjects if you aren’t already familiar with it. Then you have to read 4 subjects down just to figure out the jargon for what they’re saying

Takapapatapaka@lemmy.world · 10 months ago

I agree, having experienced this especially on mathematics pages. But on the other hand, from my experience, the whole article is very technical in those cases : I’m not sure making a summary would help, and im not sure you can provide a summary both correct and easily understandable in those cases.

catloaf@lemm.ee · 10 months ago

Math articles are the worst. They always jump right into calculus and stuff. I usually have to hope there’s a simple English article for those!

AbouBenAdhem@lemmy.world · edit-2 10 months ago

This is one thing I can see an actual use case for (as an external tool, not as part of WP): Create a summary, not of the article itself, but of the prerequisite background knowledge. And tailored to the reader’s existing knowledge—like, “what do I need to know to understand this article assuming I already know X but not Y or Z”.

Cheradenine@sh.itjust.works · 10 months ago

I’d agree with that, both are problematic.

A lot of stubs should be deleted until they are expanded, they’re often more confusing than knowing nothing at all. I don’t think an LLM summary will help here though.

Reading a few articles deep is not only a pain in the ass, but is going to dissuade those who won’t do it. There’s also the issue that when you do wade in it might link to something that is poorly cited and confusing. Again, I think an LLM is going to make things worse here.

FaceDeer@fedia.io · 10 months ago

A lot of stubs should be deleted until they are expanded

How does one expand a deleted article?

Wikipedia is not intended to be presenting a finished product, it’s an eternal work in progress. A stub is the start of an article. If you delete an article whenever it gets started that seems counterproductive.

BrianTheeBiscuiteer@lemmy.world · 10 months ago

Maybe it’s a result of Wikipedia trying to be more of an “online encyclopedia” vs a digital information hub or learning resource. I don’t think it’s a problem on its own but I do think there should be a simplified version of every article.

chunes@lemmy.world · 10 months ago

There is also already https://simple.wikipedia.org/wiki/Main_Page

doctortofu@reddthat.com · edit-2 10 months ago

Et tu, Wikipedia?

My god, why does every damn piece of text suddenly need to be summarized by AI? It’s completely insane to me. I want to read articles, not their summaries in 3 bullet points. I want to read books, not cliff notes, I want to read what people write to me in their emails instead of AI slop. Not everything needs to be a fucking summary!

It seriously feels like the whole damn world is going crazy, which means it’s probably me… :(

Maeve@kbin.earth · 10 months ago

It’s not you.

“It is no measure of health to be well-adjusted to a profoundly sick society.” Krishnamurti

GraniteM@lemmy.world · 10 months ago

FourWaveforms@lemm.ee · 10 months ago

Then skip the AI summary.

liv@lemmy.nz · 10 months ago

For those of us who do skip the AI summaries it’s the equivalent of adding an extra click to everything.

I would support optional AI, but having to physically scroll past random LLM nonsense all the time feels like the internet is being infested by something equally annoying/useless as ads, and we don’t even have a blocker for it.

FourWaveforms@lemm.ee · 10 months ago

I think it would be best if that’s a user setting, like dark mode. It would obviously be a popular setting to adjust. If they don’t do that, there will doubtless be grease monkey and other scripts to hide it.

liv@lemmy.nz · 10 months ago

True!

Dr. Moose@lemmy.world · 10 months ago

This ignorance is my biggest pet peeve today. Wikipedia is not targeting you with this but expanding accessibility to people who don’t have the means to digest a complex subject on their lunch break.

TL;DR: check your privilege

JandroDelSol@lemmy.world · 10 months ago

Giving people incorrect information is not an accessibility feature

Dr. Moose@lemmy.world · 10 months ago

RAG on 2 pages of text does not hallucinate anything though. I literally use it every day.

RaoulDook@lemmy.world · 10 months ago

If people use AI to summarize passages of written words to be simpler for those with poor reading skills to be able to more easily comprehend the words, then how are those readers going to improve their poor reading skills?

Dumbing things down with AI isn’t going to make people smarter I bet. This seems like accelerating into Idiocracy

FooBarrington@lemmy.world · 10 months ago

[…] then how are those readers going to improve their poor reading skills?

By becoming interested in improving their poor reading skills. You won’t make people become interested in that by having everything available only in complex language, it’s just going to make them skip over your content. Otherwise there shouldn’t be people with poor reading skills, since complex language is already everywhere in life.

RaoulDook@lemmy.world · 10 months ago

Nope. Reading skills are improved by being challenged by complex language, and the effort required to learn new words to comprehend it. If the reader is interested in the content, they aren’t going to skip it. Dumbing things down only leads to dumbing things down.

For example, look at all the iPad kids who can’t use a computer for shit. Kids who grew up with computers HAD to learn the more complex interface of computers to be able to do the cool things they wanted to do on the computer. Now they don’t because they don’t have to. Therefore if you get everything dumbed down to 5th Grade reading level, that’s where the common denominator will settle. Overcoming that apathy requires a challenge to be a barrier to entry.

FooBarrington@lemmy.world · edit-2 10 months ago

If the reader is interested in the content, they aren’t going to skip it.

But they aren’t interested in the content because of the complexity. You may wish that humans work like you describe, but we literally see that they don’t.

What you can do is provide a simplified summary to make people interested, so they’re willing to engage with the more complex language to get deeper knowledge around the topic.

For example, look at all the iPad kids who can’t use a computer for shit. Kids who grew up with computers HAD to learn the more complex interface of computers to be able to do the cool things they wanted to do on the computer.

You’re underestimating how many people before the iPad generation also can’t use computers because they never developed an interest to engage with the complexity.

vermaterc@lemmy.ml · 10 months ago

Wikipedia is not made to teach people how to read, it is meant to share knowledge. For me, they could even make Wikipedia version with hieroglyphics if that would make understanding content easier

RaoulDook@lemmy.world · 10 months ago

Novels are also not made to teach people how to read, but reading them does help the reader practice their reading skills. Beside that point, Wikipedia is not hard to understand in the first place.

FooBarrington@lemmy.world · 10 months ago

Sorry, but that’s absolutely wrong - the complexity of articles can vary wildly. Many are easily understandable, while many others are not understandable without a lot of prerequisite knowledge in the domain (e.g. mathematics stuff).

A Wild Mimic appears!@lemmy.dbzer0.com · 10 months ago

I am not a native speaker, but my knowledge of the english language is better than most people i know, having no issues reading scientific papers and similar complex documents. Some wikipedia article intros, especially in the mathematics, are not comprehensible for anyone but mathematicians, and therefore fail the objective to give the average person an overview of the material.

It’s fine for me if i am not able to grasp the details of the article because of missing prerequisite knowledge (and i know how to work with integrals and complex numbers!), but the intro should at least not leave me wondering what the article is about.

JackbyDev@programming.dev · 10 months ago

People aren’t reading Wikipedia articles with the intention of getting better at reading.

FourWaveforms@lemm.ee · 10 months ago

Why do you think their reading skills are poor?

Dr. Moose@lemmy.world · 10 months ago

Do you give toddlers post-grad books to read too? This is such an idiotic slippery slope fallacy that it just reeks of white people privilege.

markovs_gun@lemmy.world · 10 months ago

Wikipedia articles are already quite simplified down overviews for most topics. I really don’t like the direction of the world where people are reading summaries of summaries and mistaking that for knowledge. The only time I have ever found AI summaries useful is for complex legal documents and low-importance articles where it is clear the author’s main goal was SEO rather than concise and clear information transfer.

ace_garp@lemmy.world · 10 months ago

These LLM-page-summaries need to be contained and linked, completely separately, in something like llm.wikipedia.org or ai.wikipedia.org.

In a possible future case, that a few LLM hallucinations have been uncovered in these summaries, it would cast doubts about the accuracy of all page content in the project.

Keep the generated-summaries visibly distinct from user created content.

coolmojo@lemmy.world · 10 months ago

Is this the same WiliMedia Foundation who was complaining about AI scrapers in April?

AbouBenAdhem@lemmy.world · 10 months ago

IIRC, they weren’t trying to stop them—they were trying to get the scrapers to pull the content in a more efficient format that would reduce the overhead on their web servers.

Lv_InSaNe_vL@lemmy.world · 10 months ago

You can literally just download all of Wikipedia in one go from one URL. They would rather people just do that instead of crawling their entire website because that puts a huge load on their servers.

palordrolap@fedia.io · 10 months ago

Ah, but the clueless code monkeys, script kiddies and C-levels who are responsible for writing the AI companies’ processing code only know how to scrape from someone else’s website. They can’t even ask their (respective) company’s AI for help because it hasn’t been trained yet. (Not that Wikipedia’s content will necessarily help).

They’re not even capable of taking the ZIP file and hosting the contents on localhost to allow the scraper code they got working to operate on something it understands.

So hammer Wikipedia they must, because it’s the limit of their competence.

JackbyDev@programming.dev · edit-2 10 months ago

What’s funny is crawling the site would actually be more difficult and take longer than downloading and reading the archive.

Context for others, Wikipedia is only ~24 GB (compressed and without media or history). https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia

As of 16 October 2024, the size of the current version including all articles compressed is about 24.05 GB without media.

Matriks404@lemmy.world · edit-2 10 months ago

TIL: Wikipedia uses complex language.

It might just be me, but I find articles written on Wikipedia much more easier to read than shit sometimes people write or speak to me. Sometimes it is incomprehensible garbage, or without much sense.

barsoap@lemm.ee · edit-2 10 months ago

It really depends on what you’re looking at. The history section of some random town? Absolutely bog-standard prose. I’m probably missing lots of implications as I’m no historian but at least I understand what’s going on. The article on asymmetric relations? Good luck getting your mathematical literacy from wikipedia all the maths articles require you to already have it, and that’s one of the easier ones. It’s a fucking trivial concept, it has a glaringly obvious example… which is mentioned, even as first example, but by that time most people’s eyes have glazed over. “Asymmetric relations are a generalisation of the idea that if a < b, then it is necessarily false that a > b: If it is true that Bob is taller than Tom, then it is false that Tom is taller than Bob.” Put that in the header.

Or let’s take Big O notation. Short overview, formal definition, examples… not practical, but theoretical, then infinitesimal asymptotics, which is deep into the weeds. You know what that article actually needs? After the short overview, have an intuitive/hand-wavy definition, then two well explained “find an entry in a telephone book”, examples, two different algorithms: O(n) (naive) and O(log n) (divide and conquer), to demonstrate the kind of differences the notation is supposed to highlight. Then, with the basics out of the way, one to demonstrate that the notation doesn’t care about multiplicative factors, what it (deliberately) sweeps under the rug. Short blurb about why that’s warranted in practice. Then, directly afterwards, the “orders of common functions” table but make sure to have examples that people actually might be acquainted with. Then talk about amortisation, and how you don’t always use hash tables “because they’re O(1) and trees are not”. Then get into the formal stuff, that is, the current article.

And, no, LLMs will be of absolutely no help doing that. What wikipedia needs is a didactics task force giving specialist editors a slap on the wrist because xkcd 2501.

ɯᴉuoʇuɐ@lemmy.dbzer0.com · 10 months ago

As I said in an another comment, I find that traditional encyclopedias fare better than Wikipedia in this respect. Wikipedians can muddle even comparatively simple topics, e.g. linguistic purism is described like this:

Linguistic purism or linguistic protectionism is a concept with two common meanings: one with respect to foreign languages and the other with respect to the internal variants of a language (dialects). The first meaning is the historical trend of the users of a language desiring to conserve intact the language’s lexical structure of word families, in opposition to foreign influence which are considered ‘impure’. The second meaning is the prescriptive[1] practice of determining and recognizing one linguistic variety (dialect) as being purer or of intrinsically higher quality than other related varieties.

This is so hopelessly awkward, confusing and inconsistent. (I hope I’ll get around to fixing it, btw.) Compare it with how the linguist RL Trask defines it in his Language and Linguistics: The Key Concepts:

[Purism] The belief that words (and other linguistic features) of foreign origin are a kind of contamination sullying the purity of a language.

Bam! No LLMs were needed for this definition.

So here’s my explanation for this problem: Wikipedians, specialist or non-specialist, like to collect and pile up a lot of cool info they’ve found in literature and online. When you have several such people working simultaneously, you easily end up with chaotic texts with no head or tails, which can always be expanded further and further with new stuff you’ve found because it’s just a webpage with no technical limits. When scholars write traditional encyclopedic texts, the limited space and singular viewpoint force them to write something much more coherent and readable.

blarghly@lemmy.world · 10 months ago

You’ve clearly never tried to use Wikipedia to help with your math homework

Matriks404@lemmy.world · 10 months ago

I never did any homework unless absolutely necessary.

Now I understand that I should have done it, because I am not good at learning shit in classrooms where there is bunch of people who distract me and I don’t learn anything that way. Only many years later I found out that for most things it’s best for me to study alone.

That said, you are most probably right, because I have opened some math-related Wikipedia articles at some point, and they were pretty incomprehensible to me.

baatliwala@lemmy.world · 10 months ago

I’m from a country where English isn’t the primary language, people tend to find many aspects of English complex

Matriks404@lemmy.world · 10 months ago

I am also from a country that English is not widely spoken, in fact most people are not able to make a simple conversation (they will tell you they know ““basic English”” though).

I still find it easier to read Wikipedia articles in English, than than understand some relatives, because they never precisely say what the fuck they want from me. One person even say such incomprehensible shit, that I am thinking their brain is barely functional.

vermaterc@lemmy.ml · 10 months ago

I’m ok with auto generated content, but only if it is clearly separated from human generated content, can be disabled at any time and writing main articles with AI is forbidden

KnitWit@lemmy.world · 10 months ago

Never thought I’d cancel my recurring donation for them, but just sent the email. I hope they change their mind on this, but as I told them, I will not support this.

kittenzrulz123@lemmy.blahaj.zone · edit-2 10 months ago

Hell nah, I am never donating to Wikipedia if they go AI.

deathbird@mander.xyz · 10 months ago

This is not the medicine for curing what ails Wikipedia, but when all anyone is selling is a hammer…

bitwolf@sh.itjust.works · 10 months ago

Guess they’re going to double down on the donation campaign considering the cost involved with ai

Vanilla_PuddinFudge@infosec.pub · 10 months ago

If you can’t make people smarter, make text dumber.

UnderpantsWeevil@lemmy.world · 10 months ago

Problem: Most people only process text at the 6th grade level

Proposal: Require mainstream periodicals to only generate articles accessible to people at the 6th grade reading level

Consequence: Everyone accepts the 6th grade reading level as normal

But… New Problem: We’re injecting so many pop-ups and ad-inserts into the body of text that nobody ever bothers to read the whole thing.

Proposal: Insert summaries of 6th grade material, which we will necessarily have to reduce and simplify.

Consequence: Everyone accepts the 3rd grade reading level as normal.

But… New Problem: This isn’t good for generating revenue. Time to start filling those summaries with ad-injects and occluding them with pop ups.

[R3D4CT3D]@midwest.social · 10 months ago

“Most readers in the US can comfortably read at a grade 5 level,[CN]”

so where is the citation? did they just pull a number from their butt? hmm…

srsly, this is some bs.

cygnus@lemmy.ca · edit-2 10 months ago

It’s actually true. 56% of Americans are “partially illiterate”, which explains a lot about the state of affairs in that country.

In 2023, 28% of adults scored at or below Level 1, 29% at Level 2, and 44% at Level 3 or above. Anything below Level 3 is considered “partially illiterate”

https://en.wikipedia.org/wiki/Literacy_in_the_United_States

sillyplasm@piefed.social · edit-2 10 months ago

frankly, I’m not quite surprised ._.
edit: upon reading the article, I now wonder if it’s possible for your literacy to go down. I used to be such a bookworm in grade school, but now I have to reread stuff over and over in order to comprehend what’s going on.

Carnelian@lemmy.world · 10 months ago

You might just be chronically tired or worn down from the stresses of life. It’s pretty common.

Another thing is as we get older a lot of people will choose more “challenging” adult books and then just be totally bored lol. I read young adult and kids books sometimes (how can I give a book to a child if I haven’t read it myself?) and it’s always surprising to me how they can be ripped through in no time at all.

But in general I think you’re probably right that literacy can decrease with disuse. It seems like most things about the mind and body trend that way

applemao@lemmy.world · 10 months ago

The mind is a muscle. Don’t ignore it. Especially now, if you use your mind you’ll be light-years ahead of ai addicts.

ladfrombrad 🇬🇧@lemdro.id · 10 months ago

But in general I think you’re probably right that literacy can decrease with disuse

Maths is a really good example of this.

At one point I really enjoyed doing long division in my head but as time goes on (and you don’t exercise that sponge…), it becomes lazy.

Dr. Moose@lemmy.world · 10 months ago

I’m genuonely confused how is that even possible in a developed country such as US. Do people not read at all? As in an article or gossip magazine - all of those would get you there.

Is it just country side folk drinking beer and watching fox news? It can’t be 50% of all people. How.

Ledericas@lemm.ee · 10 months ago

basically the 2nd sentence is a product defunding education in red states, and under funding everywhere else. another issue is “participation grades for basically almost failing and failing classes”.

Ledericas@lemm.ee · 10 months ago

thier math skills are even worst.