Is there a currently an accurate way to say how much power per prompt LLMs use?

SnausagesinaBlanket@lemmy.world · 1 day ago

Is there a currently an accurate way to say how much power per prompt LLMs use?

empireOfLove2@lemmy.dbzer0.com · 1 day ago

There’s no way that I know of to see the per prompt usage for commercially available models. They obviously hide that. I admit I don’t research them much but I am assuming each chip is processing prompts one at a time.

Its pretty simple arithmetic - if it’s running exclusively on a single GPU system, and a prompt takes X seconds to generate on said gpu, then you take the GPUs power over X seconds plus whatever fraction of the datacenter overhead power that gpu uses. For locally run models on your own hardware this is also trivial to calculate.

Alternatively, GPU’s run at a certain number of “tokens” per second and each prompt is a certain number of tokens being fed into the model, generally scaling with the length of prompt.

lime!@feddit.nu · 15 hours ago

openai actually released some figures on power use per prompt, but the caveat is that a single prompt to their services can trigger multiple responses (the “thinking” mode) so they’re not consistent.