• mcv@lemmy.zip
    link
    fedilink
    arrow-up
    2
    ·
    3 days ago

    But if that’s how you’re going to run it, why not also train it in that mode?

    • Xylight@lemdro.id
      link
      fedilink
      English
      arrow-up
      0
      ·
      2 days ago

      That is a thing, and it’s called quantization aware training. Some open weight models like Gemma do it.

      The problem is that you need to re-train the whole model for that, and if you also want a full-quality version you need to train a lot more.

      It is still less precise, so it’ll still be worse quality than full precision, but it does reduce the effect.