Efficiency surge

Predicted use of computational ranks throughout a hurricane simulation. In the standard static load-balancing approach (top), work is concentrated in the higher computational ranks (light colors along top) during runtime while lower ranks mostly sit idle. A multi-constraint static technique (second from top) distributes work more evenly across processors, but still concentrates work in the higher ranks late in the computation. Asynchronous diffusion (third from top) and semi-static load-balancing (bottom) strategies redistributed work nearly uniformly. The sustained high computational intensity for these two approaches produced a speed-up of 1.5 times compared to static load-balancing. (Image: Maximilian H. Bremer, John D. Bachan, Cy P. Chan, “Semi-Static and Dynamic Load Balancing for Asynchronous Hurricane Storm Surge Simulations,” 2018 Parallel Applications Workshop, Alternatives to MPI (PAW-ATM), November 16, 2018.)

When Max Bremer arrived at Lawrence Berkeley National Laboratory (LBNL) for his Department of Energy Computational Science Graduate Fellowship (DOE CSGF) practicum, he didn’t realize he already had the seed of a perfect research project. Bremer had earned a bachelor’s degree inaerospace engineering. He also had taken steps toward computationally modeling hurricane storm surges. At LBNL, he soon learned that he could make a unique contribution to computational science.

“I was going to do rockets or planes or something, and this hurricane simulation stuff just fell into my lap,” says the University of Texas at Austin Ph.D. student. His work doesn’t center on writing new modeling codes but on revising them to boost efficiency on future high-performance computing (HPC) architectures. “It’s a really good problem, and I find it really interesting. So I just kind of stuck with it.”

It’s also a potentially life-saving project. Since 1980, seven of the 10 costliest weather disasters were hurricanes, the National Oceanic and Atmospheric Administration (NOAA) reports. Besides intense gales and torrential rains, a hurricane generates storm surge – widespread flooding as winds push ocean waters onto land. Storm surge is “often the greatest threat to life and property,” NOAA’s National Hurricane Center (NHC) says. It blames storm surge for most of the estimated 1,500 deaths hurricane Katrina caused directly in 2005.

When a hurricane threatens a coast, local officials must make rapid evacuation decisions based on NOAA’s predictions. Those forecasts rely, in part, on simulations from an HPC code called Advanced Circulation (ADCIRC).

“They are life-or-death decisions,” says Clint Dawson, Bremer’s advisor at UT Austin, where he heads its computational hydraulics group. “For example, if they decide to close a road, and the road didn’t need to be closed, then that could slow down evacuations by several days.”

Bremer’s move into Dawson’s group happened naturally. He was an engineering undergraduate student, also at UT Austin, when a course in numerical methods lured him into the field. Bremer soon was working in Dawson’s group on DGSWEM, an experimental code used to test concepts for later incorporation into ADCIRC.

Bremer still didn’t see storm surge simulation as his career’s next chapter. After graduating, he went to University of Cambridge intending to delve deeper into pure mathematics and realized he was mainly interested in applied mathematics. Bremer was chosen for the DOE CSGF and was on track to begin doctoral studies with Dawson. He expected to work on any one of several projects – rain-induced flooding, fluid flow through porous media, hurricane storm surge.

When he began his 2016 summer practicum with Cy Chan in LBNL’s Computer Architecture Group, Bremer was especially interested in learning more about task-based parallelism – a way to improve simulations’ efficiency. “There’s only so much smaller you can make these computer chips,” Bremer says. So to speed up computers, “you just have more computers” – additional processors.

More processors and greater parallelism breaks big problems into discrete tasks and doles them out to individual processors, which then solve the separate pieces simultaneously. This approach still entails inefficiencies. MPI, Open-MP and similar parallelism methods run tasks in lockstep. Because some jobs are bigger than others, processors can sit idle, waiting as all the tasks are completed before they all move on to the next set. In a big simulation, these brief idle times add up to losses. They also prevent researchers from drawing on a machine’s full capability for more sophisticated and accurate simulations. In task-based parallelism, each task runs on its own time step, reducing the wait between jobs.

After Bremer arrived at Berkeley Lab, he and Chan saw that Bremer already had the kernel of an excellent project. He set out to learn task-based parallelism and load balancing, another optimization method. Bremer planned to take the methods to Dawson’s group and apply them to DGSWEM and, perhaps someday, to ADCIRC and other codes.

ADCIRC is a mature code, Bremer says. “Rather than come up with a new mathematical model, what if we can just use the machines better?”

Chan and his co-workers helped Bremer tackle another inefficiency in hurricane storm surge simulations. At the outset, dry areas demand no computational work. But some soon require significant processing as they become inundated. Because no one can predict precisely which dry areas will suddenly demand more computing resources, the computer’s workload becomes imbalanced. “To achieve efficient utilization of the machine, you need to move these patches around on the fly,” Bremer says. “That’s load balancing.”

They “wrote a simulator for our simulator” called DGSim, a skeletonized version of DGSWEM. “It allowed us to use a lot less of the machine. So I can run the simulation on my laptop whereas normally I would need thousands of cores to do it.”

The group then validated DGSim on Edison, a Cray XC30 supercomputer at LBNL’s NERSC (the National Energy Research Scientific Computing Center), Chan says. “We were able to reduce the number of time steps calculated but still capture the same overall dynamic load profile of the hurricane.” DGSim saved more than 5,000 core-hours compared to running a DGSWEM simulation, and the new algorithms improved hurricane-simulation performance by more than 50 percent. Bremer, Chan and their colleague John Bachan presented the results at SC18, the international supercomputing conference in Dallas.

Today, Bremer finds himself “sitting between these two camps. One is this high-performance computing community, where you have all these people who are trying to figure out how we can get these algorithms to run efficiently on the new computers, and then Clint and his collaborators, who have a very concrete idea of a problem they’d like to solve.”

This article is an excerpt from a profile in the 2019 print version of DEIXIS magazine.