Screenshot of this question was making the rounds last week. But this article covers testing against all the well-known models out there.

Also includes outtakes on the ‘reasoning’ models.

  • BanMe@lemmy.world
    link
    fedilink
    English
    arrow-up
    15
    arrow-down
    2
    ·
    12 hours ago

    In school we were taught to look for hidden meaning in word problems - checkov’s gun basically. Why is that sentence there? Because the questions would try to trick you. So humans have to be instructed, again and again, through demonstration and practice, to evaluate all sentences and learn what to filter out and what to keep. To not only form a response, but expect tricks.

    If you pre-prompt an AI to expect such trickery and consider all sentences before removing unnecessary information, does it have any influence?

    Normally I’d ask “why are we comparing AI to the human mind when they’re not the same thing at all,” but I feel like we’re presupposing they are similar already with this test so I am curious to the answer on this one.

    • bluesheep@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      2
      ·
      5 hours ago

      Normally I’d ask “why are we comparing AI to the human mind when they’re not the same thing at all,” but I feel like we’re presupposing they are similar already with this test so I am curious to the answer on this one.

      I would guess it’s because a lot of AI users see their choice of AI as an all-knowing human-like thinking tool. In which case it’s not a weird test question, even when the assumption that it “thinks” is wronh

    • punkibas@lemmy.zip
      link
      fedilink
      English
      arrow-up
      1
      ·
      6 hours ago

      At the end of the article they talk about how to overcome this problem for LLMs doing something akin to what you wrote.