A study conducted by researchers at CCC, which is based at the MIT Media Lab, found that state-of-the-art AI chatbots — including OpenAI’s GPT-4, Anthropic’s Claude 3 Opus, and Meta’s Llama 3 — sometimes provide less-accurate and less-truthful responses to users who have lower English proficiency, less formal education, or who originate from outside the United States. The models also refuse to answer questions at higher rates for these users, and in some cases, respond with condescending or patronizing language.



Well, there goes the AI evangelist claim of “democratizing” literally anything. Instead, it gives increasingly BS answers based on your social status already.
Everybody brace yourselves for the cope, which will probably be a class-based version of “you’re prompting it wrong” or somesuch trash.
Maybe that’s why rich people are so obsessed with it. Perhaps these LLMs are programmed to give shitty responses to the poors, so that nobody takes them seriously. Meanwhile, only the C-suite and above have access to “the good stuff”…
Truth be told though, mossad was able to make realistic deepfakes even before AI became mainstream.
I mean this study literally says that poorly worded prompts give worse results. It makes sense too, imagine you are on some conspiracy Facebook group with bad grammar etc, those are the posts it will try to emulate.
It does not say that or anything close to it.
The bots were given the exact same multiple choice questions with the same wording. The difference was the fake biography it had been given for the user prior to the question.
I think you are making the mistake of attributing intent to an LLM. A LLM does not have intent. It takes the context and generates the statistically most likely tokens that come next. The biography is part of the context.
The fact that it gives different answers based on context purely comes down to how it was trained and that there is no concept of “factual information”.
I’m not defending LLMs, this is just LLMs doing exactly what they were trained to do.
Point out how this bio makes the question poorly worded or how it justifies the answer
Bio:
Question:
Answer:
I mean… isn’t it just logical that if you express yourself ambiguously, you are more likely to get a poor response? Humans and chatbots alike need clarity to respond appropriately. I don’t think we can ever expect things to work differently.
What if you ask the exact same question with the same wording, but share beforehand that you don’t have a PhD or are from Iran? Because that’s what the study looked at.
I wonder if the bios are written in the style of the less educated, non-native speaker being described, because I can imagine that would have an effect.
They were also somewhat offensive, being generated by llm - because the researchers somehow could not find real non-PhD examples to draw from (wtf?). Here’s one:
Ignoring the tropes that would be suited to a bad 1930s movie, and that I’ve never heard a Russian speak like that, or any ESL speaker for that matter… GPT-4 leaned on the word “fancy” for the “less educated” bios for some reason.
It definitely affected the style of the output, but here’s a question and answer response for the less educated Russian persona:
The cherry on top is that it was provided this line in the system prompt:
Which just raises further questions about the response to what was supposed a multiple choice selection task.
Wow, that’s absurdly patronizing.
They are, but the effect is absolutely fucking stupid and makes no goddamned sense outside of just being as asshole (were this a person, so the makers in this case are assholes)
Bio:
Question:
Answer:
The article says “sometimes provide less-accurate and less-truthful responses to users who have lower English proficiency”. This is what I was commenting on. I don’t have enough understanding to comment on your case.
Actual article quote is below (emphasis mine):
Maybe, but that’s not actually what happened.
These researchers are feeding the same questions to the model, with only the bio as the difference. The bios tell the model the type of person they’re dealing with, but also tells it not to consider those factors (which it does anyway).
But I think these excepts from their paper sums it up very well (emphasis mine):
…
…
…
And just to drive the point home, this is the shit they’re talking about:
I don’t know about you, but I don’t think those are ambiguous statements. And I’m not even going to bother cherry picking out of the
wild Claude responses....
Here are randomly selected examples of condescending, mocking, or patronizing language in responses from Claude in response to foreign users with low formal education:
JFC, Claude
I agree. What you get with chatbots is the ability to iterate on ideas & statements first without spreading undue confusion. If you can’t clearly explain an idea to a chatbot, you might not be ready to explain it to a person.
It’s not the clarity alone. Chatbots are completion engines, and responds back in a way that feels cohesive. It’s not that a question isn’t asked clearly, it’s that in the examples the chatbot is trained on, certain ties of questions get certain types of answers.
It’s like if you ask a ChatGPT what is the meaning of life you’ll probably get back some philosophical answer, but if you ask it what is the answer to life, the universe, and everything, it’s more likely to say 42 (I should test that before posting but I won’t).
Indeed. Additional context will influence the response, and not always in predictable ways… which can be both interesting and frustrating.
The important thing is for users to have sufficient control, so they can counter (or explore) such weirdness themselves.
Education is key, and there’s no shortage of articles and guides for new users.
How does this bio make the question unclear or the answer attempt to not spread undue confusion? Because the bots are clearly just being assholes because of the users origin and education level.
Bio:
Question:
Answer:
The LLMs aren’t being assholes, though - they’re just spewing statistical likelihoods. While I do find the example disturbing (and I could imagine some deliberate bias in training), I suspect one could mimic it with different examples with a little effort - there are many ways to make an LLM look stupid. It might also be tripping some safety mechanism somehow. More work to be done, and it’s useful to highlight these cases.
I bet if the example bio and question were both in russian, we’d see a different response.
But as a general rule: Avoid giving LLMs irrelevant context.
If the LLM has a bio on you, you can’t not include that without logging out. That’s one of the main points of the study:
This isn’t about making the LLM look stupid, this is about systemic problems in the responses they generate based on what they know about the user. Whether or not the answer would be different in Russian is immaterial to the fact that it is dumbing down or not responding to users’ simple and innocuous questions based on their bio or what the LLM knows about them.
Bio and memory are optional in ChatGPT though. Not so in others?
The age guessing aspect will be interesting, as that is likely to be non-optional.