of handling the growing volume of scientific
data, research institutions must find ways to
blend different storage technologies together.
High-performance storage, like flash or
high-speed disk, is needed to meet high-performance computing requirements. But at
any given time, only a subset of data is active
and needs to reside on high-performance
media. Storing inactive files on the same
media is unnecessary and expensive.
A better approach is to implement
multiple tiers of storage. In a multi-tier
environment, total storage capacity is broken
into different forms of media. There is high-performance disk or flash storage for active
files—those files that are part of an active
project or are undergoing computational
analysis. The remainder of the capacity
consists of tape or cloud storage.
Some research institutions have
successfully realized this approach. GWDG,
for example, uses a multi-tier storage
infrastructure. Of the 7 petabytes of data
managed by the organization, only 2. 5
petabytes reside on disk. The remaining
4. 5 petabytes are stored on tape. Since tape
storage is more economical than disk, this
approach allows GWDG to deliver the
performance and capacity needed at a lower
cost to the organization.
Data management in a multi-tier
The process of data management is the
key to getting the most benefit from a multi-tier storage environment. As previously
mentioned, data has a lifecycle. On average,
about 70-80 percent of data files stored are
not actively used. As files age or become
inactive, they should be moved off of higher
priced storage and archived on a lower cost
With a complex storage environment,
data management can be cumbersome.
Fortunately, data management processes can
be automated. Policies can be established at
the file level, and the movement of files into
archive can be done without the researcher
even being aware of it. Managed this way,
data files look the same from the researcher’s
perspective regardless of where they are
stored. As a result, files remain visible and
accessible when they are needed.
Data management in a multi-tier storage
environment also helps ensure data is
protected. Leveraging multiple tiers, policies
can be established so that critical data
sets are copied to another disk array or to
another form of media like tape or cloud.
That way data is protected and preserved,
so it can be restored quickly in case of a
hardware failure and the research process is
Keeping data usable in a high-
More scientific data is helping researchers
to uncover new discoveries. But as more data
is generated and data storage environments
become larger, research institutions must pay
attention to how they manage the growth
of their storage infrastructure in order to
deliver the best performance possible in the
most economical way.
Mark Pastor is director of data intelligence
solutions at Quantum. He is responsible
for driving Quantum’s data intelligence
and storage solutions for high performance
computing, AI, research and other large
unstructured data environments.