InfiniBand for AI Clusters: Why Ethernet Isn't Enough (Yet)
Every major AI training cluster runs InfiniBand. The $5 trillion question is whether Ultra Ethernet will change that. The honest answer: not yet, and here's exactly where the gap is.
Every major AI training cluster runs InfiniBand. The $5 trillion question is whether Ultra Ethernet will change that. The honest answer: not yet, and here's exactly where the gap is.
A 400G coherent port draws 16–20W. At 64 ports per chassis: 1.3kW just for optics. DSP power is a real constraint in dense deployments — and most capacity plans ignore it.
The ZR vs ZR+ debate dominates every WAN discussion. It's the wrong frame. The real question is whether you understand your link budget well enough to buy either.
Most modern high-speed DSP-based pluggable optical modules contain a DSP that re-clocks and reshapes the electrical signal. That DSP can consume a significant part of the module power budget. For
The design is complete. Procurement comes back: 400ZR+ optics are not available for four to five months. That is not a supply chain problem. It is a design assumption problem.
A $350 optic turned into an $18,000 problem. Not because the optic failed — because nobody cleaned the connector.