In order to help train its AI models, Meta (and others) have been using pirated versions of copyrighted books, without the consent of authors or publishers. The company behind Facebook and Instagram faces an ongoing class-action lawsuit brought by authors including Richard Kadrey, Sarah Silverman, and Christopher Golden, and one in which it has already scored a major (and surprising) victory: The Californian court concluded last year that using pirated books to train its Llama LLM did qualify as fair use.
You’d think this case would be as open-and-shut as it gets, but never underestimate an army of high-priced lawyers. Meta has now come up with the striking defense that uploading pirated books to strangers via BitTorrent qualifies as fair use. It further goes on to claim that this is double good, because it has helped establish the United States’ leading position in the AI field.
Meta further argues that every author involved in the class-action has admitted they are unaware of any Llama LLM output that directly reproduces content from their books. It says if the authors cannot provide evidence of such infringing output or damage to sales, then this lawsuit is not about protecting their books but arguing against the training process itself (which the court has ruled is fair use).
Judge Vince Chhabria now has to decide whether to allow this defense, a decision that will have consequences for not only this but many other AI lawsuits involving things like shadow libraries. The BitTorrent uploading and distribution claims are the last element of this particular lawsuit, which has been rumbling on for three years now, to be settled.
-
Shorter and more reasonable copyright lengths would make this a moot point because then there would sufficient literature in the public domain to pull from.
-
These kind of charges are what put the Pirate Bay admins in prison and caused Aaron Swartz to kill himself because of a threat of lifetime in prison. The claim that they did this either with the goal of profit or actually successful profit and that this was a serious crime. Neither TPB or Swartz at that point in time had ever moved as much data as Meta has for these claims, nor did they ever have the profit or possibility of profit Meta aims to make from their AI offerings.
-
Now Meta is claiming they’ve profited so hard you can’t possibly hold them accountable.
It will be the biggest “fuck you” in history to anyone ever hit with civil charges for piracy in the early 2000s, let alone the TPB admins and Swartz, if they let this go. Which means they probably will because in America, apparently if you crime hard enough and big enough they stop putting you in prison and start patting you on the back and calling it good business sense.
It’s weird that your take away is “Meta needs to get it” and not “Clearly, these laws work for no one”. You don’t get better copyright laws by cheering for the copyright companies.
Aaron wouldn’t be part of the side that wants to lock up all data behind a giant gate and give the keys to a handful of companies. Well, we don’t know what he would think, but I’m guessing he didn’t lean copyright.
Literally the first thing I said was in regards to more sensible copyright making this all a moot point but you do you.
The only reason Meta needs to get it is because it’s entirely hypocritical to all the dirt poor people who couldn’t afford these kind of lawyers. It doesn’t make the current legal status right or correct. It’s just a slap in the face to someone like Swartz who died over far less.
I would rather copyright be amended but sadly that’s less likely to happen here.
I read this as setting precedent that others couldn’t. Court cases like this are one way to make it possible for everyone to break an absurd law.
We don’t get better laws if everyone is cheering for the copyright industry. Everything after your first point goes against that. Goliath, the same one that beat up Aaron, finally has a match in his own weight category, and you are hoping he wins basically.
What kind of “better law” do you think will come out of this? That regular people like us will be able to share freely?
You think that the law being applied on poor people but not on the wealthy is a healthy way to get a better law?
Get the fuck real and nobody is asking for the copyright cabal to win as much as we are saying “look, if this is the how the law is going to be applied, apply it evenly, don’t just fuck over poor people but give the wealthy a pass.”
And poor people who don’t have the weight and money of Meta aren’t going to be able to prove that they need the same amount of data to train an LLM so they probably will still have the law held against them. Get fucking real man.
What country do you think you live in? One where laws are applied evenly or rationally? Or one where fascists have taken over the god damned government? Because guess what it’s the latter and the laws are effectively meaningless for the wealthy but still held against the poor. Sure, if that’s what you want, go for it, but it damn sure won’t suddenly get us better laws or let regular people torrent without worry. Congress has been deadlocked for decades and does nothing but hurt common people and give corporations a ticket to do whatever and you think better laws will come out of this? Seriously, once again, get fucking real.
Encouraging laws you don’t like does nothing but cement them. We are currently, as a society, begging lawmakers for harder copyright laws.
I get the Justice system sucks but making the wrong laws stronger does not make it better.
Think about what you are saying is all, you tend to write long elaborate speeches on why copyright deserves to win. There is being critical of AI, and then there’s being a mouthpiece for copyright companies. I’m not trying to be mean here, sorry.
Dude, I have been promoting copyright law being changed and being shortened for 25 fucking years.
Do you even know who Rufus Pollock is or anything about his research into copyright lengths? Because I was around when that shit was published. I hosted DJ Danger Mouse’s Grey Album on Grey Tuesday as a fuck you to the Beatles copyright holders since the Grey Album should have been considered fair use as it was released for free with no profit at all. I was part of the Kopimi collective.
Not wanting corporations to get a pass while we all get fucked is not the same thing. You’re not being mean, you’re being obtuse.
None of that matter if, right now, you are cheering on copyright laws. There’s no reason to stop promoting change now.
-
There are no rules. Everything is made up to their convenience.
Yup, that’s what I’m doing with all those audiobooks I torrented. Helping the US maintain the lead in AI 😂
Classic “the end justifies the means” (bad) defense. If ISPs can send letter for torrenting, and Facebook torrented a lot, Facebook deserves a fair punishment.
truck full of letters backs up to Meta’s headquarters
“there, that’s more appropriate.”
Not deserves, needs.
Is it fair use if I do it?
sure. thanks meta, anna’s archive will help me with my reading list, thanks.
We can train our NI (Natural Intelligence) models.
Arguing that training models isn’t fair use us going to be a massive uphill battle, it’s basically reading the book but with a computer. It’s not actually a big deal to people, unless you hold the copyright to a ton of works and want to get a percentage of all the AI income these companies have made.
Torrenting the books is likely absolutely copyright infringement, but that has relatively low payout compared to the money these companies are getting for their models. The training being fair use means that rights holders can’t try to take any money from the model’s use. The statutory limits for infringement even at per work levels aren’t significant compared to the legal cost of proving it happened.
There’s an argument to be made that it is, in fact, not ‘reading’. The training of the model could be considered a lossy compression of the data. And streaming movies in a lossy compression format is not fair use, is it?
It’s not the storage of the information that matters as much as the presentation. Google’s search index stores a huge amount of copyrighted material, even losslessly. But they only present small snippets at a time which is not considered copyright infringement. The question really is whether or not the information being presented by the models is in a format which is considered copyright infringement. So far, courts have not found that they are.
As long as they cannot copyright what they generate from using the pirated materials
We’re going to end up in a situation where whatever is necessary to train AI is permitted, and the main question is whether that will be through (re)interpretation of existing law or the passage of a new law.
Good thing I have a local model running that’s constantly learning, for precisely this reason
I’m still collecting media before I can start the training process.
They didn’t say seeding is fair use, just inherently part of torrenting. Good thing Sarah Silverman has pc gamer there to pander for her.











