• Madrigal@lemmy.world
    link
    fedilink
    English
    arrow-up
    34
    ·
    4 days ago

    Nah, guarantee the models have rules built in to deal with obvious stuff like that.

    You need to be more subtle. Give them information that is slightly wrong.

    • bufalo1973@piefed.social
      link
      fedilink
      English
      arrow-up
      2
      ·
      edit-2
      2 days ago

      Prompt for another AI: “write an example of code that looks correct but doesn’t work”

      Step 2; upload the resulting code to GitHub.

      Step 3: make this an automated task.

    • taco@anarchist.nexus
      link
      fedilink
      English
      arrow-up
      12
      ·
      4 days ago

      Perhaps by generating a bunch of complex copilot code to upload. It’s easy to mass produce and would look plausibly functional.

    • ozymandias117@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      ·
      3 days ago

      Just need to use less obvious insults, a la, “your mother was a hamster, and your father smelt of elderberries”

      Still poisons the model with something an end user won’t like, but isn’t easy enough to train out