• 1 Post
  • 2.04K Comments
Joined 1 year ago
cake
Cake day: February 10th, 2025

help-circle


  • It’s extremist to take the fact that you CAN get plagiaristic output and to conclude that all other output is somehow tainted.

    You personally CAN quote copyrighted music and screenplays. If you’re an artist then you also CAN produce copyright violating works. None of these facts taint any of the other things that you produce that are not copyright or plagiarized.

    In this situation, and in the current legal environment, the responsibility to not produce illegal and unlicensed code is on the human. The fact that the tool that they use has the capability to break the law does not mean that everything generated by it is tainted.

    Photoshop can be used to plagiarize and violate copyright too. It would be just as absurd to declare all images created with Photoshop are somehow suspect or unusable because of the capability of the tool to violate copyright laws.

    The fact that AI can, when specifically prompted, produce memorized segments of the training data has essentially no legal weight in any of the cases where it has been argued. It is a fact that is of interest to scientists who study how AI represent knowledge internally and not any kind of foundation for a legal argument against the use of AI.


  • Given the research that you’ve done here I’m going to assume that you’re looking for an answer and not simply taking us on a gish gallop.

    Your premise, and what appears to be the primary source of confusion, is built on the idea that this is ‘stolen’ work which, from a legal point of view, is untrue. If you want to dig into why that is, look into the precedent setting case of Authors Guild, Inc. v. Google, Inc. (2015). The TL;DR is that training AI on copyrighted works falls under the Fair Use exemptions in copyright law. i.e. It is legal, not stealing.

    The case you linked from Munich shows that other country’s legal systems are interpreting AI training in the same way. Training AI isn’t about memorization and plagiarism of existing work, it’s using existing work to learn the underlying patterns.

    That isn’t to say that memorization doesn’t happen, but it is more of a point of interest to AI scientists that are working on understanding how AI represents knowledge internally than a point that lands in a courtrooom.

    We all memorize copyrighted data as part of our learning. You, too, can quote Disney movies or Stephen King novels if prompted in the right way. This doesn’t make any work you create automatically become plagarism, it just means that you have viewed copyrighted work as part of your learning process. In the same way, artists have the capability to create works which violate the copyright of others and they consumed copyrighted works as part of their learning process. These facts don’t taint all of their work, either morally or legally… only the output that literally violates copyright laws.

    The pragmatism here is recognizing that these tools exist and that people use them. The current legal landscape is such that the output of these tools is as if they were the output of the users. If an image generator generates a copyrighted image then the rightsholder can sue the person, not the software. If a code generator generates licensed code then the tool user is responsible.

    This is much like how we don’t restrict the usage of Photoshop despite the fact that it can be used to violate copyright. We, instead, put the burden on the person who operates the tool

    That’s what is happening here. Linus isn’t using his position to promote/enforce/encourage LLM use, nor is he using his position to prevent/restrict/disallow any AI use at all. He is recognizing that this is a tool that exists in the world in 2026 and that his project needs to have procedures that acknowledge this while also ensuring that a human is the one responsible for their submissions.

    This is the definition of pragmatism (def: action or policy dictated by consideration of the immediate practical consequences rather than by theory or dogma).

    e: precedent, not president (I’m blaming the AI/autocorrect on this one)









  • CFD = Computational Fluid Dynamics.

    It is kind of what they said, you’re right. I was more pointing how how it could be that they could ‘sense the vibes’ of a CFD result to determine if it is accurate or if the model decided to do something weird. Since it’s a chaotic process and also an artificial one, the starting conditions can yield results that are impossible/not based on reality.

    If you look at enough of them you start to notice the kinds of things that go wrong. They would also have a pretty good idea about how their design should perform and if the simulation shows different they’d first want to troubleshoot the simulation before attempting to re-design whatever system they’re creating.




  • Anfinsen won the Nobel in 1972 for showing that the amino acid sequence is what is responsible for the 3D structure of proteins.

    Since then we’ve been able to take images of protein’s structures using xray crystallography but that is a painstaking process. The ability to accurately predict a protein’s structure from an amino acid sequence has been an unsolved problem until very recently.

    It wasn’t until 2024 that Hassabis, Jumper and Baker won the Nobel for their work in predicting protein structure (using an AI called AlphaFold) and computationally designing new proteins.

    The ability to create arbitrary proteins is new and will revolutionize some fields of medicine (like cancer treatment) and, to me, is a much more impressive use of AI.

    LLMs are interesting but they are incredibly over-hyped as far as ‘changing the world’ goes, imo.


  • Those kinds of simulations are inherently chaotic, tiny changes to the initial conditions can have wildly different outcomes sometimes to the point of being nonsensical. Also, since they’re simulating a limited volume the boundary conditions can cause weird artifacts in some cases.

    If you run a simulation of air over an aircraft wing and the end result is a mess of turbulence instead of smooth flow then you can assume that simulation was acting weird and not that your wing design is suddenly breaking the rule of physics. When the simulation breaks it usually does so in ways that are obvious due to previous testing with physical models.


  • I’m failing to see why the creative writing machine is better than a simulation set to ‘rough’.

    The problem is that you saw AI and thought LLM.

    Machine Learning is a big field, AI/Neural Networks are a subset of that field and LLMs are only a single application of a specific type of LLM (Transformer model) to a specific task (next token prediction).

    The only reason that LLMs and Image generation models are the most visible is that training neural network requires a large amount of data and the largest repository of public data, the Internet, is primarily text and images. So, text and image models were the first large models to be trained.

    The most exciting and potentially impactful uses of AI are not LLMs. Things like protein folding and robotics will have more of an impact on the world than chatbots.

    In this case, generating fast approximations for physical modeling can save a ton of compute time for engineering work.