For three decades, computational biologist Dan Jacobson has seen high-performance computing’s potential for puzzling out the complexity of living systems. “I probably wandered into the data center one too many times as a graduate student, chanting the mantra that there must be a better way” to extract deeper insights from biology, recalls Jacobson, chief scientist for computational systems biology at the Department of Energy’s (DOE’s) Oak Ridge National Laboratory.
He initially trained as a biochemist and contributed to the department’s human genome work in the 1990s. Back then, researchers could only dream about how genomes might revolutionize science. Layered within those data sets lie clues to how the brain works, what underpins cancer and other diseases and, Jacobson’s continuing interest, how to make plants more tolerant of environmental stressors. But even though genome sequencing and assembly has become easier and cheaper, technology that sorts through and makes sense of experimental data is still catching up.
Now, however, large and rich biological data sets, powerful supercomputers and creative algorithmic development are allowing Jacobson and colleagues to address these sweeping biological questions – problems they’d only dreamed of working on just a couple of years ago. “We’re starting to surpass expectations,” Jacobson says. “That being said, it’s early days in systems biology. There’s so much more to do.”
ORNL’s new Summit supercomputer, the world’s fastest, is an important part of this momentum. Its NVIDIA graphics processing units are optimized for artificial-intelligence work. These machine-learning algorithms can sift troves of data, classifying information from images, clinical records, genetic information and more, revealing patterns that couldn’t have been uncovered otherwise.
Earlier this year Jacobson’s team used Summit to achieve the fastest scientific application ever recorded: 2.36 exascale operations (more than 2 billion billion calculations) per second, or exaops. The team’s systems biology algorithms integrate a variety of data types to link genes with their biological functions and has been named a finalist for this year’s Gordon Bell Prize in high-performance computing. The team described its work in May in Frontiers in Energy Research.
Genomes encode fundamental instructions for cells such as the recipes for the sequence, structure and shape of proteins – the molecules that control a cell’s operations. Genes also direct cells on when to produce protein and how much, which can determine how these molecules work with other cell components. The number of molecular-interaction possibilities is overwhelming, Jacobson notes, with more potential combinations than all the atoms in the universe. But those interactions, with cues from the environment, determine an organism’s properties and behavior.
A growing group of biologists, including Jacobson, wants to encourage others to consider these molecular connections more systematically. “A lot of the transition that we’re trying to push in this field is to start thinking about higher-order combinations of how molecules interact in a cell and how those interactions affect the overall traits or phenotypes (physical characteristics) of an organism.”
Many of these large-scale biology projects in neuroscience and medicine are just beginning. But for several years, Jacobson has been collaborating with ORNL colleagues Paul Abraham, a chemist, Xiaohan Yang, a plant biologist, and others to untangle clever water-sparing strategies plants use to synthesize food. With a better understanding of these natural processes, biologists could engineer crops to feed growing populations, cope with a changing climate and seed renewable fuels. In addition, farmers could cultivate land previously considered too dry for agriculture.
Plants use photosynthesis to convert carbon dioxide, water and sunlight into the sugars that they – and the organisms that eat them – use for fuel. Most plants carry out the entire process during daylight hours.
But to take in carbon dioxide, plants must open stomata – pores on their leaves. Because water evaporates through open stomata, some plants have evolved a photosynthesis strategy to prevent this loss: crassulacean acid metabolism (CAM). CAM splits this work between night and day; plants open their stomata at night and store carbon dioxide short-term as a compound called malate. After sunrise, plant cells transport the malate to chloroplasts, where it is converted back into carbon dioxide and used during photosynthesis to finish the job of producing glucose and other sugars. These molecules are stored as starch.
As a result, CAM plants can survive on a fraction of the water others require. Researchers want to understand the genes involved in CAM and how they’re regulated. To figure that out, Jacobson and his colleagues have measured chemical signatures in CAM plants – such as gene and protein expression and the production of important small molecules called metabolites – and have mapped their variations every two hours over a 24-hour cycle. In an initial study in Nature Plants, they compared these chemical patterns in a CAM plant, Agave, with the model plant Arabidopsis, which photosynthesizes only during the day.
Since then the ORNL group has expanded that work to examine other CAM plants, such as pineapple and the model Kalanchoë, examining the independent ways different plant species have shifted their photosynthetic schedules. The team published those results in Nature Communications last December.
They’ve found expression changes among 54 genes in these CAM species – associated with stomata-opening, heat responses, sugar metabolism and other processes. These activity changes could be linked with CAM-associated tweaks to a plant’s day-night cycle and point to a core group of genes that could be useful for engineering CAM into other species. “There have been some really interesting patterns in what’s happening in gene expression and what’s happening in protein expression,” Jacobson says.
Engineering new CAM plants may not be straightforward, Jacobson cautions. Higher gene-expression levels in cells don’t always signal the exact timing of critical events. The correlation is approximately 50 percent, a coin flip, he says. Gene expression can increase during the day but not have an effect on the protein amounts until nighttime. “Just looking at one of these layers of information doesn’t tell you the whole story.”
Researchers will need to dig deeper to understand multiple layers of regulation in the cells. But if they can pin down the details of both how CAM works and how plants control it, Jacobson says, “we can have a better shot of engineering that into other crops that we’re interested in.”