Lvxferre [he/him]

I have two chimps within, Laziness and Hyperactivity. They smoke cigs, drink yerba, fling shit at each other, and devour the face of anyone who gets close to either.

They also devour my dreams.

  • 0 Posts
  • 1.2K Comments
Joined 2 years ago
cake
Cake day: January 12th, 2024

help-circle

  • To be fair here’s how cats would be reconstructed if they went extinct and we had to rely on fossils:

    …nah, screw that, the lady in my pic is still hella charming, the one in the OP is an abomination!

    Translated from Spanish

    And they didn’t even make some joke on how dumb (burro) it looks like! hglksflksdlllksdf


  • [Replying to myself as this is a tangent]

    I think the “bots can generate misinfo even if you just feed them correct info” point deserves its own example.

    Let’s say you’re making a model. It looks at the preceding word, and tries to predict the next. And you feed it the following sentences, both true:

    1. Humans are apes.
    2. Cats are felines.

    From both the bot “learnt” five words. And also how to connect them; for example “are” can be followed by either “apes” and “felines”, both having the same weight. Then, as you ask the bot to generate sentences, it generates the following:

    3. Humans are felines.
    4. Cats are apes.

    And you got bullshit!

    What large models do is a way more complex version of the above, looking at way more than just the immediately preceding word, but it’s still the same in spirit.


  • I’m failing to see how this is different from making up a fact and then spreading it to news outlets.

    They uploaded the papers to a single preprint server. That’s important.

    Preprints are papers predating any sort of peer review; as such, there’s a lot of junk mixed in — no big deal if you know the field, but a preprint server is certainly not a source of reliable information, nor it should be treated as such. On the other side, news outlets are expected to provide you reliable information, curated and researched by journalists.

    And peer review is a big fucking deal in science, because it’s what sorts all that junk out. Only muppets who don’t fucking care about misinformation would send bots to crawl preprints, and feed the resulting data into a large model; or to use the potential misinfo from the bot as if it was reliable. (Those two sets of muppets are the ones violating ethic and moral principles, by the way.)

    So no, your comparison is not even remotely accurate. What they did is more like writing bullshit in a piece of paper, gluing it on a random phone pole, and checking if someone would repeat that bullshit.

    They also went through the trouble to make sure that no reasonably literate human being would ever confuse that thing with an actually scientific paper. As the text says:

    • naming an eye condition as bixonimania
    • “this entire paper is made up”
    • “Fifty made-up individuals aged between 20 and 50 years were recruited for the exposure group”
    • “Professor Maria Bohm at The Starfleet Academy for her kindness and generosity in contributing with her knowledge and her lab onboard the USS Enterprise”
    • “the Professor Sideshow Bob Foundation for its work in advanced trickery. This works is a part of a larger funding initiative from the University of Fellowship of the Ring and the Galactic Triad”

    Feeding false information to an LLM is no different that a magazine. It only regurgitates what’s been said.

    Yes, it is different. Because the large token model won’t simply “repeat” things, it’ll mix and match them and form all sorts of bullshit, even if you didn’t feed it with any bullshit.

    Here’s an example of that, fresh from the oven. I don’t reasonably expect people to be feeding misinfo regarding Latin pronunciation into bots, and yet a lot of this table is nonsense:

    Compare the table above with this table and this one and you’ll notice the obvious errors:

    • short /e i o u/ being phonetically transcribed as [e i o u] instead of [ɛ ɪ ɔ ʊ]. That’s as silly as confusing English “bit” and “beet”.
    • macron (not “mācron”, it’s being used in an English sentence) does NOT mark “accusative or ablative”. It marks long vowels, period.
    • “nōs” being transcribed with a short vowel, even if the bloody bot put the macron over the spelled form.
    • “nostr(um)”? No dammit, it’s “nostrī” or “nostrum”. The bot is implying some “nostr” form that simply doesn’t exist, this shit isn’t even allowed by Latin phonotactics.
    • plus more, if I make an exhaustive list of this shite I won’t be ending it this week.

    All it had to do was to copy info from Wiktionary, as it includes even phonetic and phonemic info. But since the bot is not just “regurgitating” info — it’s basically predicting what should come next, and doing so with no regards to truth value — it’s mixing-and-matching shit into nonsense.

    It isn’t going to suddenly start doing science on its own to determine if what you’ve said is true or not.

    If you actually read the bloody article instead of assuming, you’d know why the researchers did this: they don’t expect the bot to do science on its own, they expect people to treat info from those bots as potentially incorrect.

    Its job is to tell you what color the sky is based on what you told it the color of the sky was.

    And your job is to not trust it if it tells you “Yes, you are completely right! The colour of the sky is always purple. Do you need further information on other naturally purple things?”




  • Realmente o mistério é mais difícil de solucionar do que parece à primeira vista.

    É geralmente assim com palavrão, a etimologia é sempre uma bagunça. Eles são usados constantemente então o significado evolui muito rápido, só que quase não tem registro, as pessoas evitam de escrevê-los.

    Só pra te dar um exemplo. Um dos palavrões com etimologia mais bem estudada é o “merda” do latim. Sabemos ser herdado do proto-indo-europeu, e que os falantes de latim usavam-no direto, já que tudo quanto é língua neolatina herdou a merda. Mesmo assim a gente quase não sabe em que situações os falantes de latim usavam a palavra, porque quase nunca era escrita; só em uns epigramas do Marcial e umas pichações em Pompeia. (inb4 sim, é o mesmo “merda” do português.)

    Com esses insultos é a mesma coisa. As pessoas evitam de registrar. E nisso a gente perde a história deles.



  • Se incomoda se eu responder em português? Então, pra resumir a missa: tenho quase certeza que o xingamento (viado) vem do nome do bicho (veado). Motivos:

    1. Em português é comum alçar [e o] para [i u] logo antes da sílaba tônica; principalmente em hiato, que vira ditongo, e o [i u] vira [j w]. (O nome técnico disso é “alçamento pré-tônico”, caso queira procurar papers sobre o assunto.)
    2. Palavrões muitas vezes são escritos com uma ortografia mais popular, não-padrão, representando a pronúncia. Há outros exemplos disto; tipo boceta→buceta, foder→fuder, até mesmo caralho→caraio (e olha que [ʎ] “lh” →[j] “i” é bem restrito dialetalmente)
    3. Há outras expressões usadas para atacar a comunidade gay, associando-os com bichos saltitantes; tipo “gazela”, “biba saltitante”, etc. Tem também “bambi”, mas essa é claramente derivativa de “viado”.

  • I think it also applies to expletives. Check for example ⟨vagabunda⟩* /va.ga.'bũ.da/; if there was some pressure to keep the stressed syllable it would be clipped into *bunda or *gabunda, but it’s usually clipped into ⟨vagaba⟩ instead. Technically the /b/ from the stressed syllable is still there, but the core /ũ/ ⟨un⟩ is gone.

    *gotta explain this one to the folks here. “Vagabunda” means whore, promiscuous woman, etc. It’s highly offensive, way more than the nearest English equivalent (slut), it’s the sort of word to not use even in a joke. (The masculine “vagabundo” is depreciative but socially acceptable — it means lazy arse, do-nothing.)


  • 100% isso.

    Em especial, essa “flexibilidade” aparece bastante pras vogais átonas, variam muito de acordo com o dialeto e o ritmo da fala. E ao contrário da variação nas consoantes, as pessoas não prestam muita atenção nelas.

    I’m fairly sure what happened with “viado” in PT was just like “nigga” in English. In both you get a non-standard spelling of another word (“veado” and “nigger”), representing a popular pronunciation of the word (note African American English is non-rhotic, so ⟨er⟩ and ⟨a⟩ would sound both /ə/). But they still sound the same in those popular variations.

    Pior que acho que o outro ali nem fala português. Ao menos, não proficientemente. Reparou como ele confundiu “esse” com “isso”?


  • For that pair of words (ES año vs. PT ano) this works, but note the correspondence gets really messy, it depends on the etymology of the word. A quick run-down would be:

    Origin Spanish Portuguese Example
    Late Latin */nj/ /ɲ/ ⟨ñ⟩ /ɲ/ ⟨nh⟩ Latin balneum → baneum → *banjʊ̃ → ES baño, PT banho “bath”
    Latin /gn/ [ŋn] /ɲ/ ⟨ñ⟩ /ɲ/ ⟨nh⟩ can’t recall an example both kept, but Latin agnum → PT anho /ɲ/ “lamb” (archaic)
    Latin /n:/ /ɲ/ ⟨ñ⟩ /n/ ⟨n⟩ Latin annum → ES año, PT ano “year”

    Then for Latin intervocalic /n/ Spanish simply keeps it. Portuguese initially converts it into vowel nasalisation, but then changes it further on, it’s a bit messy:

    • corōnam /n/ → ES corona /n/, PT corõa /Ṽ/→coroa Ø “crown”
    • num /n/ → ES pino /n/, PT pĩo /Ṽ/→pinho /ɲ/ “pine”
    • manum /n/ → ES mano /n/, PT mão /Ṽ/ “hand”

    For ES “ano” anus and PT “ânus” anus this doesn’t work, though. Portuguese didn’t inherit the word, but reborrowed it. And perhaps to avoid making it sound like “ano” (year), kept the Latin nominative ending. (If the word was inherited it would end as *ão or something like this.)


  • It does have a tilde but it’s mostly used over vowels, to represent nasalisation; e.g.

    • ⟨mão⟩ /mãw/ [mɜ̃ʊ̯̃] “hand” vs. ⟨mau⟩ /maw/ [mäʊ̯] “bad”
    • ⟨mãe⟩ /mãj/ [mɜ̃ɪ̯̃] “mother”
    • ⟨limões⟩ /li’mõjs/ [li.'mõɪ̯̃s] “lemons”
    • ⟨vã⟩ /vã/ [vɜ̃] “vain” (F)

    For /ɲ/ (the phoneme written “ñ” in Spanish) it’s as you said, though: it’s spelled “nh” instead.


  • This suggests widespread homophobia if enough of them could combine their brainpower to form these few thoughts

    Yup, that’s accurate. Welcome to Latin America and its macho culture. People don’t even get why those jokes are bad. Then when the LGBTQ+ community correctly points out that “a piada mata mais do que a bala” (the joke kills more often than the bullet), the default popular reaction is to claim “waaah they’re overreacting” (spoilers: they aren’t).


  • Viado comes from desviado, which means someone who was driven off the proper path. It’s just a matter of homophony (and homophobia).

    I’ve seen people backtracking the etymology to desviado and transviado. I don’t buy it because clipping (truncamento) in Portuguese usually preserves the start of the word, even at the expense of the stressed syllable; e.g.

    • universidade university → uni
    • refrigerante fizz, soda, coke, pop → refri
    • depressivo depressed → deprê

    So following the same pattern for “desviado” the result would be *des or *desvi, not “viado”.






  • Paulo Coelho is one of those authors that remind me how huge the impact of a good translator is.

    I read three of his books: Veronika Decide Morrer (Veronika Decides to Die), O Alquimista (The Alchemist), and Onze Minutos (Eleven Minutes). All in the original, in Portuguese. They weren’t as bad as people say, but they all felt lacking polish and substance.

    Then I checked Margaret Jull Costa’s translation of Veronika, and it’s like she sprouted life into it. It’s all in the subtle things: replacing a metaphor with another that works better, removing indirection from a more emotional moment, this kind of thing does wonders to make a book feel more alive, like she breathed life into it, while still being faithful to the original.

    (Another situation reminding me this impact is Interview with the Vampire. Anne Rice’s original is… okay? Kind of meh, to be honest. Clarice Lispector’s translation into Portuguese is a gem, though.)