• empireOfLove2@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    4
    ·
    1 day ago

    There’s no way that I know of to see the per prompt usage for commercially available models. They obviously hide that. I admit I don’t research them much but I am assuming each chip is processing prompts one at a time.

    Its pretty simple arithmetic - if it’s running exclusively on a single GPU system, and a prompt takes X seconds to generate on said gpu, then you take the GPUs power over X seconds plus whatever fraction of the datacenter overhead power that gpu uses. For locally run models on your own hardware this is also trivial to calculate.

    Alternatively, GPU’s run at a certain number of “tokens” per second and each prompt is a certain number of tokens being fed into the model, generally scaling with the length of prompt.

    • lime!@feddit.nu
      link
      fedilink
      arrow-up
      1
      ·
      15 hours ago

      openai actually released some figures on power use per prompt, but the caveat is that a single prompt to their services can trigger multiple responses (the “thinking” mode) so they’re not consistent.