Google’s TurboQuant has the internet joking about Pied Piper from HBO's "Silicon Valley." The compression algorithm promises to shrink AI’s “working memory” by up to 6x, but it’s still just a lab experiment for now.
TurboQuant, meanwhile, could lead to efficiency gains and systems that require less memory during inference. But it wouldn’t necessarily solve the wider RAM shortages driven by AI, given that it only targets inference memory, not training — the latter of which continues to require massive amounts of RAM.
I didn’t realize the RAM shortage was mostly due to training—I would have thought inference was at least a big a factor.
Inference is dirt cheap in comparison. Hundreds to thousands of concurrent users can be served by hardware costing in the high-thousands to low-ten-thousands.
Training those same foundational models is weeks to months of time on tens to hundreds of millions worth of hardware.
Yeah i don’t think they ever stop training is the thing. At this point I’d assume they have multiple training pipelines to try different shit out, just queued up to hit the big farms as soon as the last models are done training.
I didn’t realize the RAM shortage was mostly due to training—I would have thought inference was at least a big a factor.
Inference is dirt cheap in comparison. Hundreds to thousands of concurrent users can be served by hardware costing in the high-thousands to low-ten-thousands.
Training those same foundational models is weeks to months of time on tens to hundreds of millions worth of hardware.
Yeah—but in theory you only need to train once, while inference costs are ongoing and scale up with usage.
I guess it’s ultimately a business decision by AI companies to weigh how often retraining is worth the cost.
Yeah i don’t think they ever stop training is the thing. At this point I’d assume they have multiple training pipelines to try different shit out, just queued up to hit the big farms as soon as the last models are done training.
Resting isn’t a thing in capitalism.