By W. Pitt Turner IV, P.E.,
John H. (Hank) Seader, P.E. and Kenneth G. Brill
One of the most common sources of confusion
in the field of uninterruptible uptime is what
constitutes a reliable data center. All too
often, reliability is in the eye of the
beholder: what is acceptable to one person or
company is inadequate to the next. Competing
companies with data centers of radically
different infrastructure capabilities are all
claiming to deliver high availability.
With the continuously increasing pressure on
high availability and the explosive growth of
the Internet comes an increased demand for
computer hardware reliability. Information
technology customers expect availability of
“Five Nines” or 99.999%. Unfortunately, the
substantial investment a business frequently
makes to achieve Five Nines, in its computer
hardware and software platforms, is likely to be
insufficient unless matched with a complementary
site infrastructure (power, cooling, and other
environmental support systems) that can support
their availability goals.
The Uptime Institute, Inc.® (The Institute)
developed a tiered classification approach to
site infrastructure functionality that addresses
the need for a common benchmarking standard. The
Institute’s system has been in practice for 10
years. It includes actual measured availability
figures for site availability ranging from
99.67% to more than 99.99%. It is important to
note that this range of availability is
substantially less than the current Information
Technology (IT) expectations for Five Nines,
which leads to the conclusion that site
availability gates overall IT availability.Over
the last 40 years, data center infrastructure
designs have evolved through at least four
distinct stages, which are captured in The
Institute’s classification system. Tier I first
appeared in the early 1960s, Tier II in the
1970s, Tier III in the late 1980s and early
’90s, and Tier IV in 1994 with the United Parcel
Service Windward project, which was the first
site to assume the availability of dual-powered
computer equipment. The Institute participated
in the development of Tier III concepts and
pioneered the creation of Tier IV.
Back to top
Invention of Tier IV was made possible by Ken
Brill, Executive Director of The Institute, who,
in 1991, envisioned a future when all computer
hardware would come with dual power inputs (US
Patent 6,150,736). During construction of the
$50 million Windward project, United Parcel
Service worked with IBM and other computer
hardware manufacturers to provide dual-powered
computer hardware. The significance of Mr.
Brill’s insight has subsequently been confirmed
by billions of dollars in site infrastructure
investment.
Dual-power technology requires at least two
completely independent electrical systems. These
dual systems supply power via diverse power
paths to the computer load, by effectively
moving the last point of electrical redundancy
from the Uninterruptible Power Supply (UPS)
system downstream to a point inside the computer
hardware itself. Mr. Brill’s intuitive
conclusion has since been confirmed by The
Institute’s research that has determined that
98% of all site infrastructure failures occur
between the UPS and the computer load. Since
completion of the Windward project in 1994,
System plus Systemsm (S+S) Tier IV electrical
designs have become common and the number of
computer hardware projects with dual inputs has
grown.
The advent of dual-powered computer hardware
in tandem with Tier IV electrical infrastructure
is an example of site infrastructure design and
computer hardware design simultaneously
achieving higher availability. Even with the
significant improvements in computer hardware
design made over the past 10 years, many data
centers constructed in the last five years–and
even today claim Tier IV functionality, but
actually deliver only Tier I, II, or III–are
falling behind in their capability to match the
availability required by the information
technology they support. The purpose of this
paper is to outline what functionality and
attributes are required for the different tier
levels.
Defining the Tiers
The tier classification system involves several
definitions. A site that can sustain at least
one unplanned, worst-case infrastructure failure
with no critical load impact is considered fault
tolerant. A site that is able to perform planned
site infrastructure activity without shutting
down critical load is considered concurrently
maintainable (fault tolerance level may be
reduced during concurrent maintenance). It is
important to remember that a typical data center
site is composed of at least 20 major
mechanical, electrical, fire protection,
security and other systems, each of which has
additional subsystems and components. All of
these must be concurrently maintainable and/or
fault tolerant for the site to meet the
requirement of fault tolerant and/or
concurrently maintainable.
Back to top
Some sites built with fault-tolerant S+S
electrical concepts failed to incorporate the
mechanical analogy, which involves dual
mechanical systems. Such sites are classified
Tier IV electrically, but only achieve a Tier
III mechanically. Another common mistake is only
looking at first level failures and not the
subsequent failures that will sometimes be
triggered by the first failure.
The following list summarizes the high level
characteristics of each tier. The availability
numbers shown are actual numbers for many sites
which combine both tier requirements as well as
the associated tier attributes.
- Tier I
Tier I is composed of a single path for
power and cooling distribution, without
redundant components, providing 99.671%
availability.
- Tier II
Tier II is composed of a single path for
power and cooling distribution, with
redundant components, providing 99.741%
availability.
- Tier III
Tier III is composed of multiple active
power and cooling distribution paths, but
only one path active, has redundant
components, and is concurrently
maintainable, providing 99.982%
availability.
- Tier IV
Tier IV is composed of multiple active power
and cooling distribution paths, has
redundant components, and is fault tolerant,
providing 99.995% availability
Back to top
This chart illustrates tier requirements:

This chart illlustrates the tier attributes
of the sites from which the actual availability
numbers were drawn:

Tier I Data Center Infrastructure
Basic Data Center
A Tier I data center is susceptible to
disruption from both planned and unplanned
activity. It has computer power distribution and
cooling, but it may or may not have a raised
floor, a UPS, or an engine generator. The
critical load on these systems is up to 100% of
N. If it does have UPS or generators, they are
single-module systems and have many single
points-of-failure. The infrastructure should be
completely shut down on an annual basis to
perform preventive maintenance and repair work.
Urgent situations may require more frequent
shutdowns. Operation errors or spontaneous
failures of site infrastructure components will
cause a data center disruption.
Tier II Data Center Infrastructure
Redundant Components
Tier II facilities with redundant components are
slightly less susceptible to disruptions from
both planned and unplanned activity than a basic
data center. They have a raised floor, UPS, and
engine generators, but their capacity design is
N+1, which has a single-wired distribution path
throughout. Critical load is up to 100% of N.
Maintenance of the critical power path and other
parts of the site infrastructure will require a
processing shutdown.
Back to top
Tier III Data Center Infrastructure
Concurrently Maintainable
Tier III level capability allows for any planned
site infrastructure activity without disrupting
the computer hardware operation. Planned
activities include preventive and programmable
maintenance, repair and replacement of
components, addition or removal of capacity
components, testing of components and systems,
and more. For large sites using chilled water,
this means two independent sets of pipes.
Sufficient capacity and distribution must be
available to simultaneously carry the load on
one path while performing maintenance or testing
on the other path. Unplanned activities such as
errors in operation or spontaneous failures of
facility infrastructure components will still
cause a data center disruption. The critical
load on a system does not exceed 90% of N. Many
Tier III sites are designed with planned
upgrades to Tier IV when the client’s business
case justifies the cost of additional
protection. The acid test for a concurrently
maintainable data center is the ability to
accommodate any planned work activity without
disruption to computer room processing.
Tier IV Data Center Infrastructure
Fault Tolerant
Tier IV provides site infrastructure capacity
and capability to permit any planned activity
without disruption to the critical load.
Fault-tolerant functionality also provides the
ability of the site infrastructure to sustain at
least one worst-case, unplanned failure or event
with no critical load impact. This requires
simultaneously active distribution paths,
typically in S+S configuration. Electrically,
this means two separate UPS systems in which
each system has N+1 redundancy. The combined
critical load on a system does not exceed 90% of
N. Because of fire and electrical safety codes,
there will still be downtime exposure due to
fire alarms or persons initiating an Emergency
Power Off (EPO). Tier IV requires all computer
hardware have dual power inputs as defined by
The Institute’s Fault Tolerant Power Compliance
Specifications Version 2.0, which can be found
at www.uptimeinstitute.org. The acid test for a
fault tolerant data center is the ability to
sustain an unplanned failure or operations error
without disrupting computer room processing. In
consideration of this acid test,
compartmentalization requirements must be
addressed.
This chart illustrates how these ideas are
mapped over the architecture of site
infrastructure:

Back to top
Solving Incompatible “Five Nines”
Expectations
Even a fault tolerant and concurrently
maintainable Tier IV site will not satisfy an IT
requirement of Five Nines (99.999%) uptime. The
best a Tier IV site can deliver over time is
99.995%. This assumes a site outage occurs only
as a result of a fire alarm or EPO and that such
an event occurs not more than once every five
years. Only the top 10 percent of Tier IV sites
will achieve this level of performance. Unless
human activity issues are continually and
rigorously addressed, at least one additional
failure is likely over five years. While the
site outage is assumed to be instantaneously
restored (which requires “24 by forever”
staffing), it can still require up to four hours
for IT to recover information availability.
Tier IV’s 99.995% uptime is an average
calculated over five years. An alternative
calculation using the same underlying data is
100% uptime for four years and 99.954% for the
year in which the downtime event occurs.
Higher levels of site uptime can be achieved
by protecting against accidental activation of
the real need for fire protection and EPOs.
Preventatives include high sensitivity smoke
detection, limiting fire load, signage,
extensive training, staff certification, limited
number of non-staff in critical spaces, and
treating employees and contracted staff well to
increase pride in their work. All of these
measures, if taken, can reduce the risk of
failures.
Other solutions include placing the redundant
parts of the IT computing infrastructure in
different site infrastructure compartments so
that a site infrastructure event cannot
simultaneously affect all IT systems. Another
alternative is focusing special effort on
business-critical and mission-critical
applications so they do not require four hours
to restore. These operational issues can improve
the availability offered by any data center, and
are particularly important in a Four Nines Tier
IV data center housing IT equipment that
requires Five Nines availability.
Back to top
Authorship
Pitt Turner is a professional engineer, a
distinguished fellow of The Institute, and a
Principal in ComputerSite Engineering, Inc.® He
has guided more than $1.6 billion in site
infrastructure investment for primarily Fortune
50 clients.
Hank Seader developed the original
idea for the Tier concept. At the time, he was a
facility manager for a major data center and
wanted a simple way to convey complex
reliability concepts to his senior management.
Currently, Hank is a member for the ComputerSite
Engineering team.
Ken Brill is Executive Director of The
Institute, and a Principal in ComputerSite
Engineering. He is the founder of the Site
Uptime Network® and the inventor of dual power
distribution technology for high availability
data centers.
This article was originally posted on The
Institute (www.uptime.com).
Back to top
|