The article covers motivation for starting exabyte.io, what we do and why, and the future outlook
The article covers motivation for starting exabyte.io, what we do and why, and the future outlook
How we design the world of tomorrow OR what is “Materials Discovery Cloud?
You know what it feels like when you are spending a lot of time on something that will be largely unseen, yet must be accomplished to proceed with your core tasks. It feels like pain.
I hated spending weeks browsing through poorly documented software, dealing with cryptic exceptions and figuring out how to properly compile FORTRAN to run on 10,000 CPU. Yet it had to be done to obtain new original science while I was a graduate student at UC Berkeley. There I worked with one of the best materials physicists of all times, and once voiced my complaint. “It’s not meant to be easy”, was an immediate answer.
The field of materials simulations is very complex indeed: it combines together physics, materials science, chemistry and computer science. It is extremely hard for a single human-being to excel at all of these at the same time. Those who do, however, get to drink from the holy grail: simulations deliver end-results orders of magnitude faster than physical tests. The industries built around computer-aided design (CAD) and electronic design automation (EDA) showcase this clearly.
Even now, because of the complexity of the field, people have to overcome a significant barrier to get started with materials simulations. There are no clear standards that specify how materials should be described, how simulations should be executed and how the data should be collected and shared. The field is still in its “medieval times”.
This leads to a problem: simulations often fail to describe reality, and when they do, there is no clear way to identify the reason. Failure could happen due to a fundamental drawback in the models (eg. over-simplification arising from the lack of compute power), software bugs (wrong compiler settings, for example) or due to a human factor (wrong convergence parameters or input data). That’s why more often than not simulations are taken with skepticism until there is an experimental confirmation.
This, however, will soon change.
We are building an integrated development environment for materials simulations in the cloud. It has four main components:
Our users can choose models that work best for their particular case and apply these models across thousands of compute nodes with a click of a button. In this case study, for example, our customer was able to scale to 10,600 CPU within 7 minutes from the start. We store, organize and let one search the materials and their properties obtained during simulations. We also let our users quickly and efficiently prototype large sets of new compounds with our atomistic design tools.
When you are strained on time, having the ability to screen a set of 10,000 materials at once by simply making a couple of clicks can provide a significant speedup. Compared to rewriting a few unix shell scripts by hand. Or, moreover, needing to buy and maintain the hardware capable of quickly delivering results for such a large dataset.
Thanks to elasticity and economy of scale - two most thought about things for cloud computing - our users can get virtually unlimited amount of compute power. Instead of spending several million dollars on a compute cluster, they could continuously get the same peak power for 1/10 of the price. This is especially beneficial for high-throughput runs, where each material is confined to just a dozen compute nodes at most.
Now how would you store the data when you are done screening 10,000 materials? Chances are that at current your data is scattered across the filesystem and the only way to navigate it is to name the directories like:
/alloy/Li_diff/pure_Al32cube_PBE/FeVac-NEB-oct/latscale4.5
You also process the results using a perl script that was written 15 years ago by a colleague of yours, and store the results inside an Excel file that you then share with teammates by email. Sounds familiar?
We make all data accessible and searchable from a single place through an intuitive user interface. You can save quite some time on not having to navigate through thousands of subdirectories named in a certain way in order to understand which one is for which material. The data you created and accumulated can also be used to apply machine learning techniques that can suggest new materials of interest.
As opposed to CAD/EDA tools that have evolved into user-friendly software solutions like the products of Ansys or Cadence, the most widely used tool in computational materials science still is command line interface. Although it allows for the highest degree of flexibility, it is at the same time limiting in the efficiency and speed for those who are not well familiar with command line directives.
We let users access command line directives, if they choose so, and are building a high-level integrated environment. Command-line users gain value from scalability of our compute platform and our data layer. Those who choose web application gain strategic advantage in time-to-solution. We streamline the generation of new structural and combinatorial leads, and make it easy to construct simulation workflows that extract your characteristic property of interest (eg. electronic band gap).
Chances are that you already have access to the nation’s best supercomputer and spent 18 million CPU-hours on it last year while trying to discover new thermoelectric materials for a Fortune 50 company making household electronics. You would like to obtain highly accurate (within 1% from experiment) results for large datasets of realistic materials having ~1000 atoms in the crystal unit cell. Then our secret weapon is just for you.
Just like it happened with CAE/EDA, new products designed through materials simulations will become an integral part of our everyday lives. Modeling at nanoscale is already powering the latest advances in semiconductor technologies “beyond Moore”. For those in aerospace/automotive sectors simulations will be the key enabler behind new durable materials for product frames, energy storage solutions and advancements in engine efficiency. Similar things will happen for chemical and pharmaceutical industries.
We turn this vision into reality. We know that the world of tomorrow is created not through long and expensive physical tests, but through fast and accurate models. We feel the momentum behind the new approaches to materials modeling and view the rapid emerge of large-scale cloud computing among the key enablers. We work with the world’s largest enterprises and have the brightest minds in the field as our advisors.
We are exabyte.io — Materials Discovery Cloud.
Originally published at https://www.linkedin.com/pulse/how-we-design-world-tomorrow-what-materials-discovery-timur-bazhirov