Everyone says AI needs more GPUs. I profiled one and it was sitting idle most of the time, just waiting on data. how much of the “GPU shortage” is actually wasted GPUs?
we keep hearing the bottleneck for AI is compute, that there aren't enough GPUs, that everyone's fighting for H100s and B200s. so I went and actually measured what one of ours was doing during a training job. it was idle most of the time. not slow. idle. doing a quick burst of work, then sitting there waiting for the next batch of data to arrive, over and over. the expensive part (the GPU) spent most of its life waiting on the cheap part (moving data to it). green is the GPU doing work, orange is it sitting idle. that reframed the whole "GPU shortage" thing for me. a huge amount of the compute the industry is scrambling to buy is already sitting there underused, not because the chips are slow, but because the data can't reach them fast enough. you can buy ten times the GPUs and still...









