Krishnamoorthy and his team worked on porting or optimizing the codes to run on Jaguar, Oak Ridge National Laboratory’s Cray XT5 system. Now the PNNL scientists are doing something similar on Jaguar’s successor, Titan.

Recently, the research team demonstrated that the most computationally intensive portion of the calculation can be run on 210,000 processor cores of Titan, a Cray XK7 at Oak Ridge’s Leadership Computing Facility, achieving more than 80 percent parallel efficiency.

“When I joined PNNL we were still looking at codes that run on a few thousand cores,” Krishnamoorthy says. When Chinook, the lab’s computational chemistry machine, arrived, “we were immediately jumping up to 18,000 cores.” Doing one calculation at a time and constantly shuttling data to disk drives “was not going to suffice.”

Tackling and solving the jump to highly parallel computing and innovating how the system deals with workload and faults earned Krishnamoorthy a DOE Early Career Research Program award. It grants him $2.5 million over five years to explore ways he can extend his ideas to exascale computing. Shortly thereafter, Krishnamoorthy learned he had also been awarded PNNL’s 2013 Ronald L. Brodzinski Award for Early Career Exceptional Achievement.

He’s now broadening the methods developed for computational chemistry to apply to any algorithm that has load-imbalance issues.

“You want a dynamic environment where the execution keeps on going and the user doesn’t have to worry about statically scheduling everything – the run-time engine just automatically adapts to changes in the machine and in the problem itself,” he says.

The current framework is called Task Scheduling Library (TASCEL) for Load Balancing and Fault Tolerance.

“We are now trying to adapt this method to the new codes as they are developed,” Krishnamoorthy says. He wants moving one version of a program to the next generation to be seamless, by automating the process.

“You have to write the program in terms of collections of independent work or tasks and the relationships among them in terms of who depends on who and what data they access,” he says. “As long as it is written this way, the run-time can take over and do this load balancing, communication management and fault management automatically for you.”

His methods address two of the most daunting challenges facing exascale computing: load imbalance and fault tolerance. He’s contemplating not what computers will look like in the next two to three years but instead what challenges there’ll be with applications running on exascale computers eight to 10 years from now.

The cost of failure can be steep, but Krishnamoorthy is making it less so every day.

 

Page: 1 2

Share
Published by

Recent Posts

Bolt basics

A Rice University fellow simulates the ins and outs of the familiar fasteners in pursuit… Read More

July, 2024

Computation across chemistry

A fellow uses his deep experience in math and computation to study electric fields in… Read More

June, 2024

‘Putting it all together’

A Vanderbilt University fellowship recipient applies math, physics and computation to sort out semiconductor defects. Read More

May, 2024

Misty microphysics

A fourth-year fellow probes cloud droplets to boost climate predictions. Read More

April, 2024

Genomic field work

A UC Davis fellow combines computing, the corn genome and growing data to help farmers… Read More

March, 2024

Tracking space debris

 A Computational Science Graduate Fellowship recipient monitors threats to satellites in Earth’s ionosphere by modeling… Read More

February, 2024