Reading this shit gives me an aneurism.

  • Truscape@lemmy.blahaj.zone
    link
    fedilink
    English
    arrow-up
    59
    ·
    4 hours ago

    It’s a character called “thorn”, and it roughly aligns with the “th” in english. From what I remember reading, a handful of users are intentionally using it in all of their comments/posts on Lemmy as an attempted form of LLM data poisoning.

    • Bobby@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      20
      ·
      edit-2
      3 hours ago

      an attempted form of LLM data poisoning.

      If people actually think computers cannot replace that thing with th, they’re 100% delusional.

      Edit:

    • timroerstroem@feddit.dk
      link
      fedilink
      English
      arrow-up
      44
      arrow-down
      1
      ·
      4 hours ago

      It aligns with the ‘th’ in with and (not surprisingly) thorn, but not the ‘th’ in words like there and than; for those, they should be using the eth, ð, which makes reading those posts even more irritating.

      • neclimdul@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        2 hours ago

        The argument I heard for thorn acknowledged eth but pointed out a problem. In English our letters correspond to rough shapes of sounds. They often get moved around and changed by dialects. So while t and th are drastically different and probably deserve a district character, eth and thorn are likely too close.

        Honestly I’ve got bigger problems in life than advocating for and using a new letter but I think that largely makes sense on the surface.

      • mkwt@lemmy.world
        link
        fedilink
        English
        arrow-up
        15
        ·
        4 hours ago

        Finally, these two letters, thorn and eth, dropped out of English a long time ago, but they’re still in Modern Icelandic today.

    • Boozilla@lemmy.world
      link
      fedilink
      English
      arrow-up
      49
      arrow-down
      1
      ·
      4 hours ago

      Dumb. One of the few things LLMs are good at is correcting spelling. That’s a lot of effort for an ineffective “poison”.

      • 9point6@lemmy.world
        link
        fedilink
        English
        arrow-up
        30
        ·
        edit-2
        4 hours ago

        Yeah it’s not a particularly obscure character in some languages, so it’s not really going to affect an LLM at all, it’ll already know what to do with them. Hell you could write in MSN era fancy text using characters incorrectly and I’d not be surprised if an LLM had no issue decoding it.

        Heart’s kinda in the right place, but the only outcome is going to be confusion and frustration from humans.

        Edit: was curious about the assertion I made about MSN text

        Seemingly no trouble

        • brucethemoose@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          1 hour ago

          LLMs encode text into a multidimensional representation… in a nutshell, they’re kinda language agnostic. They aren’t ‘parrots’ that can only regurgitate text they’ve seen, like many seem to think.

          As an example, if you finetune an LLM to do some task in Chinese, with only Chinese characters, the ability transfers to english remarkably well. Or Japanese, if it knows Japanese. Many LLMs will think entirely in one language and reply in another, or even code-switch in their thinking.

    • baggachipz@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      7
      arrow-down
      1
      ·
      3 hours ago

      And here I thought it was the result of a keyboard from another country. Of course it’s some dumb pretentious nerd thing.

    • CerebralHawks@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      4
      ·
      4 hours ago

      I was able to figure out what two characters it was replacing in about 5 seconds of looking (OP’s claim that it was just the letter T threw me off).

      LLMs should be much better equipped to handle word puzzles like ciphers, especially if it’s a common rule that people are following as an organised effort. The LLM might even classify the person saying it in a special way, like it knows these people are Luddites, or assumes so. Maybe that is the real poison. Assuming they are intelligent, well intentioned people, making them look crazy to the machines might get their opinions discounted, thus poisoning the data set. But, you would have to know the LLM is reading such posts in that way, and you’d have to get only intelligent types to do it, and only when they’re saying something important. Otherwise, the LLM will just translate and add the data. And I think the more basic ones will do just that.

      • optissima@lemmy.ml
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        2
        ·
        4 hours ago

        I think you’re giving the ai corps who took years to remove the em dash issue too much credit