Supercomputer Contest Hits New Level

There is an international race to build an exascale supercomputer, and one of the people leading it is Peter Beckman, a top computer scientist at the U.S. Department of Energy's Argonne National Laboratory.

The DOE has been working on exascale computing planning for two years, said Beckman, but the funding to actually build such powerful systems has not been approved. And unless the U.S. makes a push for exascale computing, he said, it's not going happen. The estimated cost of an exascale project will be in the billions of dollars; an exact cost has not been announced by the department.

The most powerful systems today are measured in petaflops, meaning they're capable of quadrillions of operations per second. The fastest system, according to the latest Top500 supercomputing list, released this month, is China's 2.5 petaflop Tianhe-1A. An exascale system is measured in exaflops; an exaflop is 1 quintillion (or 1 million trillion) floating point operations per second. China, Europe and Japan are all working on exascale computing platforms.

Beckman, recently named director of the newly created Exascale Technology and Computing Institute and the Leadership Computing Facility at Argonne, spoke to Computerworld about some of the challenges ahead.

What is the exascale effort at this point? It is the realization or the understanding that we need to move the hardware, software and the applications to a new model. The DOE and others are looking to fund this but have only started with initial planning funding at this point.

The software effort that I'm leading with Jack Dongarra [a professor of computer science at the University of Tennessee and a distinguished research staff member at Oak Ridge National Laboratory] and some of the co-design pieces have planning money to get started, but the next step is for the government to put forward with a real ambitious plan and a real funded plan to do this.

What's happening, and I'm sure your readers and others know, is power constraints, budgets, architecture, clock speeds, have transformed what happens at every level of computing. In the past, where you had one CPU, maybe two, you are now looking at laptops with four cores, eight cores, and we just see this ramp happening where parallelism is going to explode. We have to adjust the algorithms and applications to use that parallelism.

At the same time, from a hardware and systems software perspective, there's a tremendous shift with power management and data center issues -- everything that's happening in the standard Web server space is happening in high-performance computing. But in high-performance computing, we are looking forward three to five years.

Think of it as a time machine. What happens in high-performance computing then happens in high-performance technical servers, and finally your laptop.

We're looking at that big change and saying what we need is a real organized effort on the hardware, software and applications to tackle this. It can't just be one of those. In the past, the vendors have designed a new system and then in some sense it comes out, and users look at it and ask: "How do I port my code to this?" or "What we're looking at is improving that model to 'co-design'" -- a notion that comes from the embedded computing space, where the users of the system, the hardware architects and the software people, all get together and make trade-offs with what the best optimized supercomputer will look like to answer science questions.

In the end, it's about answering fundamental science questions, designing more fuel-efficient cars, designing better lithium batteries¸ understanding our climate, new drugs, all of that.

How far along is this? What stage are you at? We've been doing it for a better part of a decade in less formal venues. IBM is a partner with Argonne and Lawrence Livermore Lab, and together we designed Blue Gene/P and Blue Gene/Q. In that partnership, we paid money to IBM to design the prototype for Blue Gene/P and Q and then all of our scientists did constant evaluation and discussion about trade-offs. For example, would we rather have a memory management unit than another core? But it was sort of, what I would say, in the small. We didn't take it out to the broader community.

In the exascale thrust, the DOE has said we're going to launch a series of co-design centers that will cover several applications areas, fusion, materials, chemistry, climate, etc., and those communities will then have a voice in speaking with the companies designing the platforms.

Is this a national or international effort? The DOE piece is a national effort, but Jack Dongarra and I also lead the International Exascale Software Project (IESP). In it, we bring together representatives from Asia, Europe and the U.S. to focus on software. That's something that transcends national boundaries at this point. People work on codes from open source.

Because software is ubiquitous in that way and is really shared and improved inter-globally, the IESP has organized a road map for what the software for exascale needs. We have spent the last year and half developing that road map and have now turned our attention to co-design. That's mostly a collaborative effort.

The DOE has funded a very specific program to start the planning for exascale. They have been given planning funding. But until there is a congressional budget that funds it, it is still just in planning mode.

Is there concern about getting funding for exascale development? There is. Budgets are tight, and the change in politics, in representation in Washington, means that things that were sort of in the plan now have to be looked at a second time. There is a concern that this initiative has to be pushed forward and has to get funded or we're going to lose our leadership position. The DOE has been planning this for the last couple of years, so this is not a new thing.

Is exascale development as predictable as people believe?Will exascale systems arrive in the 2018 time frame? In some sense, we've become so predictable, but that's only because we invested in a particular goal. If we don't have an exascale push in the country, it's not going to happen.

Is there any comparison between what's involved in reaching petascale with what's involved in reaching exascale? There was a period of time of about 15 years where the maximum level of parallelism in the biggest systems in the world really didn't change much. The biggest systems had tens of thousands of processors. We are now on an exponential ... like this [he points up], where Blue Gene has 200,000 [or] 300,000 cores now; the next version is going to have a million cores as we go up. The application codes need to be radically improved in order to take advantage of all this parallelism.

Are the programming languages developed for it? That is a big issue. If you were to go to 10 different big application folks and ask, "What's your programming model for the future?" you will see a lot of concern in their expression and maybe not a lot of certainty in their answer. The path to harvest all this parallelism and put it to use is not clear yet.

What can exascale systems accomplish? The key thing that people are looking at is moving from simulating and sort of understanding basic behavior to predictive simulation. What we want to be able to do is not just characterize a jet engine and understand how its combustion works but move aggressively to be able to predict the design of an engine that would get 20% better fuel efficiency and reduce our carbon emissions.

As we look to electric vehicles, all the technology hinges on the battery. If we can move from a basic manipulating of chemistry to predicting the optimal new battery designs, we could change to an electric vehicle economy. The single biggest impact on how we can change our everyday life is if we could move to eliminate the need for burning fossil fuels.

What can this do for national economic development? We are a country that loves to invent its way out of problems. When we see a problem, we like to find a solution that's inventive, that's creative, that's new. When I look at healthcare, transportation, generating power, basic materials, chemistry -- we want to be the country that invents solutions for these problems.

All those things require government funding because they involve basic sciences, more so than say 100 years ago, correct? This is something that many people don't understand. [At an earlier time] one guy could actually invent and do a bunch of stuff. Nowadays, it still can be one guy but he's on a pyramid that has millions of community-developed components and other pieces of technology that he is relying on.

To get really far advanced, to really push that state of the art, you are on top of a collaborative community of scientists. Science has done more and more in a partnership with other people at universities, laboratories and industries and other countries, and it really requires the government to keep invested.

The education piece is also key. Finding the postdocs and finding the students that come to work in our laboratories is becoming increasingly difficult.

Why is that? We are not producing enough high-quality science and technology Ph.D. students. When we open up a postdoc position for an expert in this particular computer science field, we have to look hard to find people. There is not an overflow of these people, it's a thin group.

I was at a workshop in Tennessee recently and a Ph.D. student gave a talk and afterward several of us and went up and asked, "Have you decided where you are going to go?" At lunch, he had three people courting him trying to get him to come work for them.

How many are there like you in the U.S. devoting quality time to exascale development? In terms of people working full time on the exascale problem, I would say there are handfuls at best.

What do you want to see happen in the next year to give you confidence that this is moving ahead? The budgets in Washington have to get straightened out to actually fund this exascale initiative. And then we have to work very quickly to find the hardware partners that are capable of responding and doing this in partnership with the co-design centers and the software.

Is there is competition with Europe, Asia? If you look at what's happening in China, there are countries that are realizing that building and educating in science and technology and engineering is what will make the difference with competitiveness 10 years from now.

The countries who win that effort to build and educate in science and technology will dominate the competitive landscape in the future. If you look at what's happening in China, they are making their investments appropriately with that strategic goal in mind.

If you look at Europe and what they are doing with their supercomputer centers, in some sense, they've already put money where their mouth is with their plans for exascale. If you look at the top 10 supercomputers right now, half are in foreign countries. This is a new thing for us -- half of the top 10 machines are in other places.

With this competition, is the timing good to get money for exascale? It may be good. However, we seem to always be a reactive country, and it would have been better to not be in this situation than to have to react to it. But I'll take reacting.

What we really need is to build and continue to maintain the expertise of designing and educating and bringing that whole package together. The reason, in some sense, I'm not too focused on the purchase part is that any country can buy [a supercomputer].

The question is, who designed the technology, software and the applications? Because that's the place where it matters to science and technology engineering in the U.S. Right now, we're still in the lead, but our competitors are working hard.

Patrick Thibodeau covers SaaS and enterprise applications, outsourcing, government IT policies, data centers and IT workforce issues for Computerworld. Follow Patrick on Twitter at @DCgov, or subscribe to Patrick's RSS feed. His e-mail address is pthibodeau@computerworld.com.

Subscribe to the Best of TechHive Newsletter

Comments