Trinity is the first Advanced Technology System (ATS) provided by the DOE Advanced Simulation and Computing (ASC) program at Los Alamos. Like all ATS machines, it has a dual mission, says Bill Archer, ASC program director. First, it meets the simulation and computing needs of the DOE National Nuclear Security Administration’s Stockpile Stewardship Program. Second, the machine provides leading-edge technology to users. “This is preparing the application codes for the next generation of computer architecture,” he says.
ASC contracted with Cray in 2014 to design and build Trinity, based on input from the three national security labs – Los Alamos, Sandia and Lawrence Livermore. The New Mexico Alliance for Computing at Extreme Scale (ACES), a joint effort of Los Alamos and Sandia, is deploying the machine. With more than 760,000 cores on more than 19,000 nodes, Trinity will crunch numbers at a theoretical peak rate surpassing 40 petaflops (a million billion or quadrillion operations per second) and use more than 2 petabytes of aggregate main memory and while exceeding 6 petabytes of memory bandwidth As a result, it will be able to handle two to four large simulations or computation projects of 750 trillion bytes at once. It occupies about 5,200 square feet and relies on a massive air and water cooling system to dissipate heat generated at its peak operation of 10 megawatts.
Trinity is among the world’s biggest and fastest machines. Its main distinction, though, is that it’s specially designed as a training camp for new technologies and the computer scientists who will use them. Trinity is pioneering several new data storage technologies that are part of the exascale transition, says Scott Hemmert, a Sandia staff member and Trinity’s architect.
Based on the Cray XC40 architecture, Trinity is split into two parts. The first runs on Xeon v3 (codenamed “Haswell”) processors, low-risk workhorses of today’s most sophisticated and reliable codes. This part began doing classified work for the stockpile stewardship program in July 2016 and provides large-scale computational work for all three weapons laboratories.
The second part features Xeon Phi, or Knights Landing (KNL), processors with next-generation capabilities. In February 2017, this part of the machine began a four-month tryout with unclassified codes to stress test it. Afterward, plans call for merging the two parts and returning Trinity to a fully classified system, giving stockpile stewardship application developers access to either processor type. Users will be able to launch their codes in familiar Haswell territory, then adapt them to the new data storage and usage architectures and to the experimental realm of Knights Landing processors. “Typically, users will select one or the other processor type, depending on the application they are running,” says Jim Lujan, Los Alamos Trinity project manager. “However, it will be possible for a code that has been rewritten to do so to actually use both types of processors together.”
KNL processors are designed to balance computational might and power-saving finesse, Lujan notes. Each contains 68 compute cores on a single silicon die, compared to a maximum of 18 such cores in a Haswell. That advance alone greatly improves electrical efficiency. Additional energy-saving technologies are built into the KNL microarchitecture, such as embedded voltage regulators and “deep sleep” states for temporarily idle processors. A 2014 study by Gary Lawson and colleagues at Old Dominion University found that the processors cut power use by 30 percent compared to equivalent simulations on other devices. “This allows for improved electrical efficiency that is vital for getting to exascale,” Archer says. Trinity’s KNL and Haswell sections illustrate the point. The two have roughly the same amount of hardware, yet the KNL part delivers three times the performance for the same amount of power.
Trinity also breaks new ground with complementary technologies for data storage and usage, relying on solid-state burst buffers instead of traditional disk drives to manage working data. The burst buffers’ fast response and high bandwidth accelerate applications that require large numbers of small input-output (I/O) operations. “Essentially, the introduction of solid-state storage to the supercomputers is changing the ratio of disk and tape necessary to satisfy bandwidth and capacity requirements, and drastically improves the usability of the systems for application I/O,” says Gary Grider, Los Alamos High Performance Computing Division leader.
Campaign storage – a set of low-cost, low-performance spinning disks – can house simulation output data for months. This medium-term storage capability actually is an improvement over disks with higher performance and higher cost. Campaign storage makes results more accessible throughout a simulation. “The archive, which was formerly essential to enabling work on the supercomputers, now focuses on maintaining the critical results at the completion of longer scientific campaigns,” Grider says.
Trinity is advancing hardware technology towards exascale. However, Archer says, “The most important advance is pushing the NNSA applications to use the emerging technologies.”
A shorter version of this article appears in the current issue of Stewardship Science: The SSGF Magazine.