Though most of the world remains shut down from the pandemic, supercomputers are hard at work. Some are simulating the coronavirus in search of therapies. Others are working to safeguard the nation’s nuclear stockpile, predict the weather or study climate change. Although supercomputers are more efficient than the device you’re reading this on, they share a vulnerability to attackers.
At Pacific Northwest National Laboratory, staff scientist Ang Li contributes to a range of projects in the lab’s high-performance computing (HPC) group. (See sidebar, “Simulating computing’s future.”) Li developed an interest in HPC over a decade ago, when he did an undergraduate internship at the Paris-Sud University in France. While there, he helped build a software framework that allowed programs to run efficiently on both graphics processing units (GPUs) and standard-HPC central processing units (CPUs).
Since then Li has worked to develop efficient software for GPUs – processors originally used to move images and videos quickly across screens. In HPC systems, GPUs can improve and optimize performance. He continued this work as he pursued two doctoral degrees from the National University of Singapore and Eindhoven University of Technology in the Netherlands before joining PNNL in 2016.
Over the past year, Li and his colleagues in Richland, as part of PNNL’s Center for Advanced Technology (CENATE), have helped develop a way to protect U.S. supercomputers from cybersecurity attacks. The Department of Energy’s Office of Advanced Scientific Computing Research started the center a few years ago to investigate emerging computing technologies. Although Li and his team initially studied HPC performance challenges, they eventually shifted their focus because HPC systems are increasingly under cyberthreats. Under the guidance of their program sponsor at DOE, Li and his colleagues began to examine how attackers might exploit these computing technologies.
State-of-the-art HPC systems are hybrid machines comprised of GPUs and CPUs. Those who seek to exploit supercomputers for nefarious purposes would first go for the GPUs – the Achilles heel because of their higher computing capability and because standard security shields don’t cover them.
Li and his CENATE colleagues reasoned that machine learning might be one way to monitor ongoing activities and ultimately block attacks on HPC systems. Such algorithms could be trained to classify events as either normal or malicious.
The team designed recurrent neural networks, machine-learning algorithms that analyzed the signatures of normal versus potentially malicious GPU workloads, with algorithms running on the same computers they’re intended to protect. Most HPC systems have performance counters, which measure the number of events that happen over a given span, such as how much data are fetched from the CPU’s or GPU’s memory and how much power the system is using. Normal and malicious software would produce different values on these counters. “All these things form features that get fed into the neural network in order for it to do its classification,” explains CENATE’s lead and Li’s collaborator Kevin Barker. They used all these different features to draw a line between safe and nefarious applications. The neural network was initially trained with known malicious codes such as password crackers and hashing algorithms.
The CENATE algorithm is a good proof of principle, Li says, and the next step would be to use reinforcement learning based on artificial intelligence so the algorithm might be able to respond in real time. “If you come every time a security operator checks the instance, that attack has already happened,” Li says. “At some point, you might [want to have] some automatic techniques to handle that.”
In the future, this algorithm would be used to protect computational facilities at national laboratories, such as Oak Ridge and Argonne, or other smaller HPC clusters.
Future HPC systems will be more advanced, Li says, with many different kinds of processors. “How to keep them safe will continuously remain a big concern.”