John McCalpin
Tuesday 16 Jun, 11:00 - 12:00
MCS0001
I begin with a brief review of the history of high performance computing systems with a focus on the 'balances' between computations and different classes of memory accesses. Technology scaling and multi-core processing has led to astounding increases in the computational capabilities of microprocessors over the past decades, while the cycle time of the DRAM cells in memory systems has barely changed. The increasing imbalance between computation and memory access costs has led to stunningly complex processor implementations with astronomical design and fabrication costs, rapidly growing power requirements, and increasingly opaque and confounding performance characteristics.
If these increasing costs were forced on us by the physics of computation there would be little point in complaining. Here I will argue that much of the increasing cost is directly attributable to the continuing refusal to reconsider the basic assumptions of the processor and system architectures in light of the completely different 'balances' of semiconductor technology today.
I conclude with several examples of how changing basic architectural assumptions can allow much simpler implementations to deliver competitive application performance at greatly reduced implementation costs and power requirements. Projects such as the European Processor Initiative have enabled some exciting research in these areas, but huge opportunities for architectural innovation remain.
John McCalpin has worked in a variety of roles in high performance computing over the last 40 years. From his research in numerical modeling of the large-scale ocean circulation he invented the STREAM benchmark to show the role of sustainable memory bandwidth in application performance. He shifted to industry in 1996, where at SGI, IBM, and AMD, his role gradually shifted from analyzing performance on existing architectures to proposing architectural changes to provide more significant performance improvements. From 2009 to early 2025 he was a research scientist at the Texas Advanced Computing Center, focused on performance analysis in support of system operations, user applications, and new system acquisitions. In mid 2025 John moved to the Barcelona Supercomputing Center, where he works on performance analysis for the Mare Nostrum 5 supercomputer and on architectural performance analysis for the RISC-V accelerators under development at BSC (and across Europe).