

The segmented caching request thing is… weird. I worked for a company that developed a caching proxy and it very much did not work that way. Like, random access in a caching system is usually kinda bad and you should try to avoid it. Like, our proxy manually controlled the disk (it wasn’t a mounted filesystem) so it could constantly sweep the head across the disk and cue up reads and writes optimally. This gets much harder when things are fragmented as fuck.
If the concern was about what would happen with multiple connections for the same cache miss, then the caching proxy should just combine the client-side connections into a single upstream one. You can still cache the first part of the response if your upstream connection gets terminated and then restart it from that point.




Building a jet doesn’t require over a trillion dollars of capex, and selling jets is profitable. There’s solid evidence that inference isn’t profitable, and the AI labs need inference to be extremely profitable if they’re going to meet their absolutely ludicrous contractual performance targets. Oracle is expecting hundreds of billions of dollars from OpenAI by like 2030. That shit is not happening.