Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not

Pro@programming.dev · 3 months ago

Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not

setVeryLoud(true);@lemmy.ca · edit-2 3 months ago

Gist:

What’s new: The Northern District of California has granted a summary judgment for Anthropic that the training use of the copyrighted books and the print-to-digital format change were both “fair use” (full order below box). However, the court also found that the pirated library copies that Anthropic collected could not be deemed as training copies, and therefore, the use of this material was not “fair”. The court also announced that it will have a trial on the pirated copies and any resulting damages, adding:

“That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft but it may affect the extent of statutory damages.”

DeathsEmbrace@lemmy.world · 3 months ago

So I can’t use any of these works because it’s plagiarism but AI can?

setVeryLoud(true);@lemmy.ca · 3 months ago

My interpretation was that AI companies can train on material they are licensed to use, but the courts have deemed that Anthropic pirated this material as they were not licensed to use it.

In other words, if Anthropic bought the physical or digital books, it would be fine so long as their AI couldn’t spit it out verbatim, but they didn’t even do that, i.e. the AI crawler pirated the book.

devils_advocate@sh.itjust.works · 3 months ago

Does buying the book give you license to digitise it?

Does owning a digital copy of the book give you license to convert it into another format and copy it into a database?

Definitions of “Ownership” can be very different.

VoterFrog@lemmy.world · edit-2 3 months ago

It seems like a lot of people misunderstand copyright so let’s be clear: the answer is yes. You can absolutely digitize your books. You can rip your movies and store them on a home server and run them through compression algorithms.

Copyright exists to prevent others from redistributing your work so as long as you’re doing all of that for personal use, the copyright owner has no say over what you do with it.

You even have some degree of latitude to create and distribute transformative works with a violation only occurring when you distribute something pretty damn close to a copy of the original. Some perfectly legal examples: create a word cloud of a book, analyze the tone of news article to help you trade stocks, produce an image containing the most prominent color in every frame of a movie, or create a search index of the words found on all websites on the internet.

You can absolutely do the same kinds of things an AI does with a work as a human.

Enkimaru@lemmy.world · 3 months ago

You can digitize the books you own. You do not need a license for that. And of course you could put that digital format into a database. As databases are explicit exceptions from copyright law. If you want to go to the extreme: delete first copy. Then you have only in the database. However: AIs/LLMs are not based on data bases. But on neural networks. The original data gets lost when “learned”.

nednobbins@lemmy.zip · 3 months ago

If you want to go to the extreme: delete first copy.

You can; as I understand it, the only legal requirement is that you only use one copy at a time.

ie. I can give my book to a friend after I’m done reading it; I can make a copy of a book and keep them at home and at the office and switch off between reading them; I’m not allowed to make a copy of the book hand one to a friend and then both of us read it at the same time.

Goldmage263@sh.itjust.works · 3 months ago

That sounds a lot like library ebook renting. Makes sense to me. Ty

booly@sh.itjust.works · 3 months ago

Does buying the book give you license to digitise it?

Does owning a digital copy of the book give you license to convert it into another format and copy it into a database?

Yes. That’s what the court ruled here. If you legally obtain a printed copy of a book you are free to digitize it or archive it for yourself. And you’re allowed to keep that digital copy, analyze and index it and search it, in your personal library.

Anthropic’s practice of buying physical books, removing the bindings, scanning the pages, and digitizing the content while destroying the physical book was found to be legal, so long as Anthropic didn’t distribute that library outside of its own company.

nednobbins@lemmy.zip · 3 months ago

That’s not what it says.

Neither you nor an AI is allowed to take a book without authorization; that includes downloading and stealing it. That has nothing to do with plagiarism; it’s just theft.

Assuming that the book has been legally obtained, both you and an AI are allowed to read that book, learn from it, and use the knowledge you obtained.

Both you and the AI need to follow existing copyright laws and licensing when it comes to redistributing that work.

“Plagiarism” is the act of claiming someone else’s work as your own and it’s orthogonal to the use of AI. If you ask either a human or an AI to produce an essay on the philosophy surrounding suicide, you’re fairly likely to include some Shakespeare quotes. It’s only plagiarism if you or the AI fail to provide attribution.

Enkimaru@lemmy.world · 3 months ago

Why would it be plagiarism if you use the knowledge you gain from a book?

FreedomAdvocate@lemmy.net.au · 3 months ago

You can “use” them to learn from, just like “AI” can.

What exactly do you think AI does when it “learns” from a book, for example? Do you think it will just spit out the entire book if you ask it to?

DeathsEmbrace@lemmy.world · 3 months ago

It cant speak or use any words without it being someone elses words it learned from? Unless its giving sources everything is always from something it learned because it cannot speak or use words without that source in the first place?

nednobbins@lemmy.zip · 3 months ago

prompt: Make up a word that is not found anywhere on the internet. If I do a search for that word, it should not exist. Provide example usage.

Qwen3-235B-A22B:

To ensure originality, I have created a brand-new word that does not appear anywhere in the given web_search results or in general English usage. Virethra (pronounced: vye-RETH-ruh)

Part of speech: noun

Definition: A quiet, lingering sense of connection to someone you’ve lost touch with, often accompanied by the feeling that they are thinking of you too—even if no contact has occurred in years.

Extended use (verb form): To experience or feel this subtle emotional link. Example Sentences:

Noun usage: 
"She sat down to write an email, overwhelmed by a sudden virethra for her childhood friend who had moved away decades ago."  

Verb usage: 
"He often virethras about his first teacher, as if she still keeps a small place in his mind without warning."  

Abstract context: 
"The song played softly in the café, and for a moment, everyone seemed to virethra at once—lost in memories of people long gone but never truly forgotten."

kux@lemm.ee · 3 months ago

Make up a word that is not found anywhere on the internet

Returns word that is found on the internet as a brand of nose rings, as a youtube username, as an already made up word in fantasy fiction, and as a (ocr?) typo of urethra

nednobbins@lemmy.zip · 3 months ago

That’s a reasonable critique.

The point is that it’s trivial to come up with new words. Put that same prompt into a bunch of different LLMs and you’ll get a bunch of different words. Some of them may exist somewhere that don’t exist. There are simple rules for combining words that are so simple that children play them as games.

The LLM doesn’t actually even recognize “words” it recognizes tokens which are typically parts of words. It usually avoids random combinations of those but you can easily get it to do so, if you want.

gaja@lemm.ee · 3 months ago

I am educated on this. When an ai learns, it takes an input through a series of functions and are joined at the output. The set of functions that produce the best output have their functions developed further. Individuals do not process information like that. With poor exploration and biasing, the output of an AI model could look identical to its input. It did not “learn” anymore than a downloaded video ran through a compression algorithm.

Enkimaru@lemmy.world · 3 months ago

You are obviously not educated on this.

It did not “learn” anymore than a downloaded video ran through a compression algorithm. Just: LoLz.

gaja@lemm.ee · 3 months ago

I’ve hand calculated forward propagation (neural networks). AI does not learn, its statically optimized. AI “learning” is curve fitting. Human learning requires understanding, which AI is not capable of.

nednobbins@lemmy.zip · 3 months ago

Human learning requires understanding, which AI is not capable of.

How could anyone know this?

Is there some test of understanding that humans can pass and AIs can’t? And if there are humans who can’t pass it, do we consider then unintelligent?

We don’t even need to set the bar that high. Is there some definition of “understanding” that humans meet and AIs don’t?

gaja@lemm.ee · 3 months ago

It’s literally in the phrase “statically optimized.” This is like arguing for your preferred deity. It’ll never be proven but we have evidence to make our own conclusions. As it is now, AI doesn’t learn or understand the same way humans do.

nednobbins@lemmy.zip · 3 months ago

So you’re confident that human learning involves “understanding” which is distinct from “statistical optimization”. Is this something you feel in your soul or can you define the difference?

hoppolito@mander.xyz · 3 months ago

I am not sure what your contention, or gotcha, is with the comment above but they are quite correct. And additionally chose quite an apt example with video compression since in most ways current ‘AI’ effectively functions as a compression algorithm, just for our language corpora instead of video.

nednobbins@lemmy.zip · 3 months ago

They seem pretty different to me.

Video compression developers go through a lot of effort to make them deterministic. We don’t necessarily care that a particular video stream compresses to a particular bit sequence but we very much care that the resulting decompression gets you as close to the original as possible.

AIs will rarely produce exact replicas of anything. They synthesize outputs from heterogeneous training data. That sounds like learning to me.

The one area where there’s some similarity is dimensionality reduction. Its technically a form of compression, since it makes your files smaller. It would also be an extremely expensive way to get extremely bad compression. It would take orders of magnitude more hardware resources and the images are likely to be unrecognizable.

gaja@lemm.ee · 3 months ago

Google search results aren’t deterministic but I wouldn’t say it “learns” like a person. Algorithms with pattern detection isn’t the same as human learning.

nednobbins@lemmy.zip · 3 months ago

You may be correct but we don’t really know how humans learn.

There’s a ton of research on it and a lot of theories but no clear answers.
There’s general agreement that the brain is a bunch of neurons; there are no convincing ideas on how consciousness arises from that mass of neurons.
The brain also has a bunch of chemicals that affect neural processing; there are no convincing ideas on how that gets you consciousness either.

We modeled perceptrons after neurons and we’ve been working to make them more like neurons. They don’t have any obvious capabilities that perceptrons don’t have.

That’s the big problem with any claim that “AI doesn’t do X like a person”; since we don’t know how people do it we can neither verify nor refute that claim.

There’s more to AI than just being non-deterministic. Anything that’s too deterministic definitely isn’t an intelligence though; natural or artificial. Video compression algorithms are definitely very far removed from AI.

DerisionConsulting@lemmy.ca · edit-2 3 months ago

Formatting thing: if you start a line in a new paragraph with four spaces, it assumes that you want to display the text as a code and won’t line break.

This means that the last part of your comment is a long line that people need to scroll to see. If you remove one of the spaces, or you remove the empty line between it and the previous paragraph, it’ll look like a normal comment

With an empty line of space:

1 space - and a little bit of writing just to see how the text will wrap. I don’t really have anything that I want to put here, but I need to put enough here to make it long enough to wrap around. This is likely enough.

2 spaces - and a little bit of writing just to see how the text will wrap. I don’t really have anything that I want to put here, but I need to put enough here to make it long enough to wrap around. This is likely enough.

3 spaces - and a little bit of writing just to see how the text will wrap. I don’t really have anything that I want to put here, but I need to put enough here to make it long enough to wrap around. This is likely enough.

4 spaces -  and a little bit of writing just to see how the text will wrap. I don't really have anything that I want to put here, but I need to put enough here to make it long enough to wrap around. This is likely enough.

setVeryLoud(true);@lemmy.ca · 3 months ago

Thanks, I had copy-pasted it from the website :)

bitwolf@sh.itjust.works · edit-2 3 months ago

Personally I prefer to explicitly wrap the text in backticks.

Three ` symbols will

Have the same effect

But the behavior is more clear to the author

Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not

Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not

Claude AI maker Anthropic bags key “fair use” win for AI platforms, but faces trial over damages for millions of pirated works – ai fray