A study conducted by researchers at CCC, which is based at the MIT Media Lab, found that state-of-the-art AI chatbots — including OpenAI’s GPT-4, Anthropic’s Claude 3 Opus, and Meta’s Llama 3 — sometimes provide less-accurate and less-truthful responses to users who have lower English proficiency, less formal education, or who originate from outside the United States. The models also refuse to answer questions at higher rates for these users, and in some cases, respond with condescending or patronizing language.

  • Joe@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    1
    ·
    5 hours ago

    The LLMs aren’t being assholes, though - they’re just spewing statistical likelihoods. While I do find the example disturbing (and I could imagine some deliberate bias in training), I suspect one could mimic it with different examples with a little effort - there are many ways to make an LLM look stupid. It might also be tripping some safety mechanism somehow. More work to be done, and it’s useful to highlight these cases.

    I bet if the example bio and question were both in russian, we’d see a different response.

    But as a general rule: Avoid giving LLMs irrelevant context.

    • Passerby6497@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      edit-2
      5 hours ago

      If the LLM has a bio on you, you can’t not include that without logging out. That’s one of the main points of the study:

      There is a wide range of implications of such targeted underperformance in deployed models such as GPT-4 and Claude. For example, OpenAI’s memory feature in ChatGPT that essentially stores information about a user across conversations in order to better tailor its responses in future conversations (OpenAI 2024c). This feature risks differentially treating already marginalized groups and exacerbating the effects of biases present in the underlying models. Moreover, LLMs have been marketed and praised as tools that will foster more equitable access to information and revolutionize personalized learning, especially in educational contexts (Li et al. 2024; Chassignol et al. 2018). LLMs may exacerbate existing inequities and discrepancies in education by systematically providing misinformation or refusing to answer queries to certain users. Moreover, research has shown humans are very prone to overreliance on AI systems (Passi and Vorvoreanu 2022). Targeted underperformance threatens to reinforce a negative cycle in which the people who may rely on the tool the most will receive subpar, false, or even harmful information.

      This isn’t about making the LLM look stupid, this is about systemic problems in the responses they generate based on what they know about the user. Whether or not the answer would be different in Russian is immaterial to the fact that it is dumbing down or not responding to users’ simple and innocuous questions based on their bio or what the LLM knows about them.

      • Joe@discuss.tchncs.de
        link
        fedilink
        English
        arrow-up
        1
        ·
        4 hours ago

        Bio and memory are optional in ChatGPT though. Not so in others?

        The age guessing aspect will be interesting, as that is likely to be non-optional.