• Aceticon@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    2
    arrow-down
    1
    ·
    2 hours ago

    The problem is that LLMs don’t generate “an answer” as a whole, they just generate tokens (generally word-sized, but not always) for the next text element given the context of all the text elements (the whole conversation) so far and the confidence level is per-token.

    Further, the confidence level is not about logical correctness, it’s about “how likely is this token to appear in this context”.

    So even if you try using token confidence you still end up stuck due to the underlying problem that the LLMs architecture is that of a “realistic text generator” and hence that confidence level is all about “what text comes next” and not at all about the logical elements conveyed via text such as questions and answers.