• merc@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    110
    ·
    1 day ago

    The system prompt discovered in the leak explicitly warns the model: “You are operating UNDERCOVER… Your commit messages… MUST NOT contain ANY Anthropic-internal information. Do not blow your cover.”

    This is so incredibly stupid.

    You’ve tried security.

    You’ve tried security through obscurity.

    Now try security through giving instructions to an LLM via a system prompt to not blow its cover.