what does stop being babies about thz computing entail?
[this is what the refrance]
Thomas Sterling has two main bugbears, one is his own baby, the Beowulf Cluster, and the other is the Von Neumann Architecture.
On the easier side, there's the Von Neumann Architecture, which is a way of thinking about how a computer accesses programs and memory, and it basically hasn't been true for a decade now. In theory, a computer has memory which is used for both instructions and data. In real life, modern computers are magic plinko machines of data where things are computed out of order and backwards wherever it might provide a tiny speedup, but before the programmer sees any of this tiny hypervisors grab all this data and reshuffle it so that the external interface to how we program computers hasn't really changed since the days of the PDP-11.
Sterling thinks that if we were willing to totally redesign computers to allow simultaneous manipulation of data and code, moving main memory into the processor, as well as integrating the specialty features modern processors use internally for speedup, plus your own dedicated specialty instructions for high performance single-chip compute, we could dramatically improve performance and chip speed by reducing memory wait time and improving memory bandwidth. Redesigning the entire chip architecture to take advantage of modern compute techniques would allow chips to run much, much faster, but requires you to stop being a baby about it.
Far more wacky though, is his idea to redesign the way HPC systems work. I'm going to reference this older talk a bit because I can't find a free access copy of his newer shit. Way back before the 90's supercomputers existed as hyper-specialized custom built deals by Cray or IBM that cost millions of dollars. Then, Thomas Sterling comes along, and invents the Beowulf cluster. In this, you just buy a bunch of cheap off-the-shelf PC's, network them together with high end consumer network gear, and write some very clever job allocation and parallelization code to break up jobs across the cluster. Bam, you now have something that is as fast as the supercomputers aerospace engineering and oil and gas companies were using around the same time for less than a tenth of the price.
Everyone went, predictably, apeshit over this. They replaced their million-dollar cray system with a thousand e-machines crammed into a closet somewhere and this kicked off the modern field of high performance computing, which is defined by this style of cluster networking.
Over the years, though, we started to specialize. Now you have high performance networking made specifically for datacenters, high reliability and core density servers, and as we've gotten closer and closer to the limits of conventional computing, it's harder to scale out compute without running into awkward limits. If you have too many nodes it becomes difficult to efficiently scale your system, breaking up and reassembling tasks takes too long. Cramming more power into each node makes everything hot, and that means you need more cooling, as seen in the rise of liquid cooling for the datacenter in the past five years. You need faster networks for these computers to interact without waiting on each other, so we get PCIe Over Fabric. And you run into the above Von Neumann bottleneck! So what can you do.
In Sterling's opinion, you redesign the computer into a terahertz-processor memory-on-chip system linked by fibre optics and cooled by a constant liquid-nitrogen liquid-helium loop. This design is completely unlike anything today that exists outside of his company labs, as far as I'm aware. I'm sure there's a few clones in some secret government labs.
The idea here is that you can reduce power consumption by consolidating everything onto one chip, using far fewer duplicate systems that run extremely fast, and linking everything with asynchronous optical networking. Usually that would come with untenable increases in power density, but helium cooling can deal with kilowatt chips just fine, so while your power per-chip goes up, overall system power goes down, as well as reducing datacenter footprint. But it requires you to commit to building an insane helium cooled custom processor with on-chip optics, which requires you to stop being a baby about it.



















