Just want to clarify, this is not my Substack, I’m just sharing this because I found it insightful.

The author describes himself as a “fractional CTO”(no clue what that means, don’t ask me) and advisor. His clients asked him how they could leverage AI. He decided to experience it for himself. From the author(emphasis mine):

I forced myself to use Claude Code exclusively to build a product. Three months. Not a single line of code written by me. I wanted to experience what my clients were considering—100% AI adoption. I needed to know firsthand why that 95% failure rate exists.

I got the product launched. It worked. I was proud of what I’d created. Then came the moment that validated every concern in that MIT study: I needed to make a small change and realized I wasn’t confident I could do it. My own product, built under my direction, and I’d lost confidence in my ability to modify it.

Now when clients ask me about AI adoption, I can tell them exactly what 100% looks like: it looks like failure. Not immediate failure—that’s the trap. Initial metrics look great. You ship faster. You feel productive. Then three months later, you realize nobody actually understands what you’ve built.

  • Echo Dot@feddit.uk
    link
    fedilink
    English
    arrow-up
    10
    ·
    17 hours ago

    There’s no point telling it not to do x because as soon as you mention it x it goes into its context window.

    It has no filter, it’s like if you had no choice in your actions, and just had to do every thought that came into your head, if you were told not to do a thing you would immediately start thinking about doing it.

    • kahnclusions@lemmy.ca
      link
      fedilink
      English
      arrow-up
      4
      ·
      edit-2
      8 hours ago

      I’ve noticed this too, it’s hilarious(ly bad).

      Especially with image generation, which we were using to make some quick avatars for a D&D game. “Draw a picture of an elf.” Generates images of elves that all have one weird earring. “Draw a picture of an elf without an earing.” Great now the elves have even more earrings.

      • MangoCats@feddit.it
        link
        fedilink
        English
        arrow-up
        2
        ·
        7 hours ago

        I find this kind of performance to vary from one model to the next. I definitely have experienced the bad image getting worse phenomenon - especially with MS Copilot - but different models will perform differently.

    • MangoCats@feddit.it
      link
      fedilink
      English
      arrow-up
      2
      ·
      8 hours ago

      There’s no point telling it not to do x because as soon as you mention it x it goes into its context window.

      Reminds me of the Sonny Bono high speed downhill skiing problem: don’t fixate on that tree, if you fixate on that tree you’re going to hit the tree, fixate on the open space to the side of the tree.

      LLMs do “understand” words like not, and don’t, but they also seem to work better with positive examples than negative ones.