• 3 Posts
  • 315 Comments
Joined 3 years ago
cake
Cake day: June 18th, 2023

help-circle






  • To be fair, I do think the average accepted stackoverflow answer displays far more competence than the average human.

    One of the few things I use LLM’s for is giving me overviews of best-practice in things I’m not familiar with (before reading the posts I find to get more in-depth understanding)



  • That isn’t what bothers me the most though. Earlier today I read a piece by a “tech journalist” in a paper I normally respect as doing proper work. The mentioned that one of the guys behind Claude says that Claude writes absolutely all their code now. They also said they did a test of one of the most recent models (released earlier this week), and that it wrote “A full Amazon-cloud based page that did various verification and authentication jobs, was about 67 000 lines of code, and was approved by the IT department in minutes in an afternoon”. The last part tells me they have no clue what they’re talking about. They just generated 67 000 lines of potential bugs that works, and which wasn’t reviewed by anyone competent. Nobody reviews 67 000 lines of code in a day, let alone minutes. Just the fact that they thought generating a shitload of boilerplate (most of the lines were likely that) impressive, says enough.

    It’s not your average Joe thinking this is cool that bothers me (it is cool). It’s when allegedly competent people start thinking the LLM actually has any idea what it’s doing.



  • That simply is not what people want when they look for information.

    What? Is there anyone out there that prefers to find small bits of information lying around various sources over a concise summary followed by a solid fleshing out, all in one place? I honestly cannot imagine a use case where I would prefer that a source omits a bunch of information rather than just structure the information so that I can find what I’m looking for. Wikipedia does that. That’s why you have dedicated articles for all those battles in WWII, with their own table of contents and summaries to help you digest them. There has literally never in human history existed any source of knowledge coming even close to structuring and summarising this amount of information as well as Wikipedia has, and you’re advocating that they should make it… not that?







  • I’m a researcher myself, so I feel like I can weigh in on the “reproducibility crisis”. There are several facets to it: One is of course money, but that’s not just related to corporately funded research. Good luck finding or building an independent lab capable of reproducing the results at CERN. It basically boils down to the fact that some (a lot of) research is insanely expensive to do. This primarily applies to experiments and to some degree to computationally expensive stuff.

    Another side is related to interest. Your average researcher is fired up by the thought of being the first person to discover and publish something no one has seen before. It’s just not as fun to reproduce something someone else has already done. Even if you do, you’re likely to try to improve on it somehow, which means the results may change without directly invalidating the old results. It can be hard work to write a good paper, so if you don’t feel your results are novel enough that they’re worth the effort (because they’re basically just equivalent to previously published values) you might not bother to put in the effort to publish them.

    Finally, even without direct reproduction of previously published results, science has a way asymptotically approaching some kind of truth. When I develop and publish something, I’m building on dozens of previously published works. If what they did was plain wrong, then my models would also be liable to fail. I’ve had cases where we’ve improved on previously published work, not because we tried to reproduce it, but because we tried to build on their results, and found out that their results didn’t make sense. That kind of thing is fairly common, but not reported as a “reproduction study”.

    There’s also review articles that, while they don’t do any reproduction themselves, collect and compare a bunch of comparable work. They usually have some conclusions regarding what results appear trustworthy, and what appear to be erroneous.




  • I will never forget the time I posted a question about why something wasn’t working as I expected, with a minimal example (≈ 10 lines of python, no external libraries) and a description of the expected behaviour and observed behaviour.

    The first three-ish replies I got were instant comments that this in fact does work like I would expect, and that the observed behaviour I described wasn’t what the code would produce. A day later, some highly-rated user made a friendly note that I had a typo that just happened to trigger this very unexpected error.

    Basically, I was thrashed by the first replies, when the people replying hadn’t even run the code. It felt extremely good to be able to reply to them that they were asshats for saying that the code didn’t do what I said it did when they hadn’t even run it.