Is there a currently an accurate way to say how much power per prompt LLMs use?

SnausagesinaBlanket@lemmy.world · 1 day ago

Is there a currently an accurate way to say how much power per prompt LLMs use?

Tangent5280@lemmy.world · 14 hours ago

Sure, but without actually knowing what kind of hardware the servers are running, what kind of software too, and what their service backend looks like we can’t say whether it is going to be higher or lower.

Michal@programming.dev · 12 hours ago

I think we can assume it’s nvidia H200 which peaks at 700W what what I saw on Google. Multiply that by the turnaround time from your prompt to full response and you have a ceiling value. There’s probably some queueing and other delays so in reality the time GPU spends on your query will be much less. If you use the API, it may include the timing information.