Chaos That Brings Order
Large computing systems, includ-
ing supercomputers, cloud clus-
ters and server farms, are built
using thousands of hybrid nodes,
each with multicore central pro-
cessing units (CPUs) and possibly
graphics processing unit (GPU)
accelerators. Currently, they’re
experiencing frequent hardware
component faults, and somewhat
unexpectedly, it has become a
major challenge to ensure accu-
rate, error-free computations on
them, particularly as progress is made toward exascale systems.
Oak Ridge National Laboratory’s DUCCS is ultra-efficient software that
utilizes highly parallel chaotic map computations to quickly (in a few minutes)
and efficiently detect component faults in computing units, memory elements
and interconnects of hybrid CPU-GPU computing systems. This transportable
software is based on an original, creative design that combines the chaotic map
theory from physics and mathematics with the advanced programming software of CPU-GPU systems. Detected faults information can be used to work
around or replace faulty parts, and render the applications resilient by supporting checkpoint recovery and migration to fault-free zones.
◗ Oak Ridge National Laboratory, www.ornl.gov
Supercomputing provides the foundation for numerical modeling and simulations, which permit scientists to gain new insights into a range of topics.
However, as high-performance computing (HPC) systems scale up by orders
of magnitude, constraints on energy consumption and heat dissipation impose
limitations on HPC systems and the facilities in which they’re housed.
Hewlett-Packard and National Renewable Energy Laboratory’s HP Apollo
supercomputing platform approaches HPC from an entirely new perspective as the
system is cooled directly with warm water. This is done through a “dry-disconnect”
cooling concept that has been implemented with the simple but efficient use of
heat pipes. Unlike cooling fans, which are designed for maximum load, the heat
pipes can be optimized by administrators. The approach allows significantly greater
performance density, cutting energy consumption in half and creating synergies
with other building energy systems, relative to a strictly air-cooled system. The
warm-water cooling eliminates the need for expensive datacenter chillers and heats
the water to 113 F, allowing it to help meet building heating loads.
HP Apollo servers are used to heat office space in its first installation at
NREL’s Energy Systems Integration Facility—R&D Magazine’s 2014 Laboratory
of the Year Award winner—which has achieved a power usage effectiveness
(PUE) rating of 1.06.
◗ National Renewable Energy Laboratory, www.nrel.gov
LESS NOISE, LESS DRAFT
LabSox delivers 0RUH;$LUÁ;RZ with Less Noise and
even the most challenging projects.