Benchmarking the latest generation of cloud hardware for materials modeling
In 2011 cloud computing was yet to become mainstream and it was hard to think that one day the computational tasks that required expensive and highly sophisticated supercomputers could be performed in the cloud. It was simply too inefficient with many performance concerns. However, when I saw the painful and cost-intensive process or starting a new supercomputing center first-hand, I realized that once the performance concerns are solved HPC in the cloud will become the new norm.
Fast-forward to 2017 and it was time to evaluate whether the cloud caught up with traditional supercomputing centers for us at Exabyte.io. Our platform performs compute-intensive materials modeling and simulation tasks, and our customers led us to make a thorough study. We compared multiple vendors with respect to their performance for distributed memory calculations (full text available here) and discovered that Microsoft Azure indeed could perform very well already. We were convinced that high-performance computing in the cloud is ready for a widespread adoption.
Somewhat surprisingly, in 2018 we learned that Oracle was working on HPC. Historically, the domain was mainly “populated” by science and tech nerds and received little or no mainstream attention. At Oracle OpenWorld 2018, however, Larry Ellison spoke about large-scale engineering simulations in his keynote presentation and announced the availability of one of the fastest and cost-effective HPC offerings. Excited we were!
Oracle HPC team was very kind to invite Exabyte, as an independent party, to study the suitability of the latest generation of their high-performance computing hardware for materials modeling and simulations. We did several benchmarks, including the general dense matrix algebra (Linpack), Density Functional Theory (VASP), and Molecular Dynamics (GROMACS). Full explanation available elsewhere online [2].
Below we demonstrate some of the results. As it can be seen, Oracle shows the best performance due to the combination of the latest generation of computing hardware and low-latency / high-bandwidth interconnect network that facilitates efficient scaling.
The future of high-performance computing is in the cloud. AWS made the first move in 2015 with the introduction of c4-type instances. Microsoft Azure set the trend by deploying low-latency interconnect in 2016-2017, and Oracle is making a strong move in 2018. Running modeling and simulations on the cloud with similar performance as on-premises is no longer a dream. If you had doubts about the this before, now might be the right time to give it another try.
[1] Larry Ellison keynote presentation at Oracle Open World 2018
[2] Exabyte.io documentation: benchmarking cloud vendors in 2018