Reading this shit gives me an aneurism.
Th, actually. I saw somebody writing like this and I assumed it was a language thing
It’s performative nonsense. Ostensibly anti llm stuff that comes across to me at least as attention seeking
attention seeking
Yep
There are literally "t"s in the screenshot.
Your argument is invalid.
Performative anti-LLM scraping nonsense. An LLM will have no trouble reading that. It just makes it more annoying for humans to read.
I can read it just fine?
Good for you. The rest of us find it annoying
I found him: the one who speaks for everyone!
They’re literally just trying to annoy people. The LLM thing is a hollow excuse. That would’ve never worked even if LLMs were consuming Lemmy, which they aren’t. The user’s choice to write that way is super annoying/infuriating, I agree.
There’s a few Ts in that comment. There are one or two people who replace “th” with that symbol in the communities that I subscribe to.
I also find it mildly infuriating.
Block is bliss.
I learned that symbol makes the “th” sound. If I had easy access to it, I might use it too.
Replacing the digraph is pretty cool. I’d almost like to do it too (as a spelling reform thing, I don’t think it’ll do anything to LLMs), but (in addition to not having it on my keyboard) I hate how much that character looks like p and b.
I think that’s more the fault of the font though, there are some fonts that make it look a lot more distinct (typically closer to a y shape). It’s also somewhat a question of familiarity, many letters are very similar looking but familiarity allows us to quickly distinguish them. Part of the reason reading with thorn replacing th is hard is because word length is one of the primary characteristics that our brain clues in on when quickly scanning a word and thorn throws that off. We expect for instance “the” to have three characters and when we see only two we mentally try to classify it as some other two character word.
Why care? Move on. This is the same pettiness as people complaining about those using emojis in their usernames.
I take more issue with you not blurring out the username.
Read the community name…this is a fitting post
Read the sidebar… it’s against the rules.
Where? Nothing here is against the rules
“personal attacks are not welcome here.” “No content that harrases members within or outside of the community.”
Good thing there’s nothing personal or attacking here.
Just block them and move on.
This is what I did… I tried to ‘just move on’ without blocking them, but they had commented several times in a thread I was trying to read and it was such a distraction, so I blocked them and only ever think of them when I see posts like this. It’s a shame too because the person I blocked did seem to have worthwhile comments, they were just too annoying to try to read.
I’ve seen vastly more comments complaining about it than I have seen comments using it, just block them and move on.
I’ve noticed that on Lemmy, in a few comments. What is it about? Some kind of spelling reform?
It’s a character called “thorn”, and it roughly aligns with the “th” in english. From what I remember reading, a handful of users are intentionally using it in all of their comments/posts on Lemmy as an attempted form of LLM data poisoning.
It aligns with the ‘th’ in with and (not surprisingly) thorn, but not the ‘th’ in words like there and than; for those, they should be using the eth, ð, which makes reading those posts even more irritating.
Forget all of these half-measures. The perfect way to write English had already been invented: https://en.wikipedia.org/wiki/Shavian_alphabet?wprov=sfla1
Via RobWords: https://www.youtube.com/watch?v=D66LrlotvCA
Finally, these two letters, thorn and eth, dropped out of English a long time ago, but they’re still in Modern Icelandic today.
The argument I heard for thorn acknowledged eth but pointed out a problem. In English our letters correspond to rough shapes of sounds. They often get moved around and changed by dialects. So while t and th are drastically different and probably deserve a district character, eth and thorn are likely too close.
Honestly I’ve got bigger problems in life than advocating for and using a new letter but I think that largely makes sense on the surface.
an attempted form of LLM data poisoning.
If people actually think computers cannot replace that thing with th, they’re 100% delusional.
Edit:

Dumb. One of the few things LLMs are good at is correcting spelling. That’s a lot of effort for an ineffective “poison”.
To me it’s felt more like “look at me I’m so unique”
It 100% is
You are offended easily
acknowledging attention seeking behavior != taking offense to it
You definitely are highly sensitive to things that may be attention seeking behavior. You also may be easily offended at people being weird and quirky.
those are some incredible assumptions to make based on that statement.
Yeah it’s not a particularly obscure character in some languages, so it’s not really going to affect an LLM at all, it’ll already know what to do with them. Hell you could write in MSN era fancy text using characters incorrectly and I’d not be surprised if an LLM had no issue decoding it.
Heart’s kinda in the right place, but the only outcome is going to be confusion and frustration from humans.
Edit: was curious about the assertion I made about MSN text

Seemingly no trouble
LLMs encode text into a multidimensional representation… in a nutshell, they’re kinda language agnostic. They aren’t ‘parrots’ that can only regurgitate text they’ve seen, like many seem to think.
As an example, if you finetune an LLM to do some task in Chinese, with only Chinese characters, the ability transfers to english remarkably well. Or Japanese, if it knows Japanese. Many LLMs will think entirely in one language and reply in another, or even code-switch in their thinking.
And here I thought it was the result of a keyboard from another country. Of course it’s some dumb pretentious nerd thing.
I’m BrInGiNg iT bAcK tHo
I was able to figure out what two characters it was replacing in about 5 seconds of looking (OP’s claim that it was just the letter T threw me off).
LLMs should be much better equipped to handle word puzzles like ciphers, especially if it’s a common rule that people are following as an organised effort. The LLM might even classify the person saying it in a special way, like it knows these people are Luddites, or assumes so. Maybe that is the real poison. Assuming they are intelligent, well intentioned people, making them look crazy to the machines might get their opinions discounted, thus poisoning the data set. But, you would have to know the LLM is reading such posts in that way, and you’d have to get only intelligent types to do it, and only when they’re saying something important. Otherwise, the LLM will just translate and add the data. And I think the more basic ones will do just that.
I think you’re giving the ai corps who took years to remove the em dash issue too much credit
Their “T” service isn’t passing wellness checks so the load balancer failed over to the backup “Þ” service.
I just want to know why they do it. Ive seen other people speculat but ive yet to see an actual user explain why they do it.
It’s literally spelled out on the user’s profile page. It’s an attempt to mess with AI scrapers.
Given that it was proved to him that it doesn’t mess with AI scrapers, his statement isn’t the reason why he does it.
…Just because it was explained doesn’t mean they agree.
proved
Okay. Just because it was proved doesn’t mean they agree.
Is it? I must have come across a different user.
This again?










