TMC's Advisor

The Advisor is published by TMC

Soft-Fail Designs

Are your systems vulnerable to failure modes you have not considered? Will your datacentre cooling protect your servers during a heatwave? Can your phone system or data network cope with a traffic overload? In short – do you design for “soft failure” or do you cross your fingers and hope that you’ll never face an “unexpected” disaster? Here are some examples of how to design in soft failure modes.

By Kristin Kiewitz

Kristin Kiewitz, is a business analyst and researcher.

Failures Can Escalate

In a related article, Peter looked at how everyday failures can escalate into disasters and how this can be avoided by designing in “soft failure” modes.

Data Centre Cooling

The cooling system in a data centre is built to handle a specific heat load – say 100kW. As long as the heat generated by the equipment added to heat entering the centre from the surrounding building meet design assumptions, there is no problem. However, when a heatwave strikes – something becoming uncomfortably more common, the HVAC system won’t be able to keep up and the data centre temperature will rise.

If you ignore the risk and the temperature rises too high, your equipment will fail. If you make a soft-fail plan, a Plan B, you might choose from various options:

Indoor Flooding

Water may be your friend when you want to put out a fire, but water and electrics don’t mix – nor do paper and water. Flooded basements in Calgary during the last Bow River floods showed that it may not be wise to store paper archives below ground where there is a flood risk. Paper files should be stored on racks above ground level so small amounts of water on the floor do not cause problems. You might also consider putting more valuable items into waterproof containers rather than just cardboard boxes.

Remember that floods are only one way that water can be a problem. Burst pipes or leaking roofs can have the same effect as can backed-up drains. Cooling systems condense water from the air as it cools – and that condensate has to go somewhere. Often, water ingress is slow and moisture detectors are a good line of defence. Sprinkler systems should be dry pressurized in archive rooms and data centres so that a slow leak is detected before it becomes an issue. Good drainage is important to consider so water has somewhere to go rather than pooling.

Backup Power

Is your backup power plan up to the job? If the generator fails to auto-start, is there sufficient UPS cover and a manual start procedure in place? Are you able to couple in a temporary generator quickly - do you know where to get one? – or to shed load to extend UPS cover, or ultimately, to shut down your systems in a controlled manner?

Towers

You may have radio transmission or cell site structures on your property. Some towers use guy systems for support and ice build-up can develop, even at ground level, with the potential to cause injury or property damage. A good example of unanticipated icing is the first snowfall after the opening of the Port Mann bridge in Vancouver.

The simplest solution is to make sure that people cannot get access to areas where falling ice might be an issue, and to post warning signs. If you can’t do that then you may have to work out how to prevent ice build-up, or to implement de-icing procedures as is done for aircraft prior to winter condition takeoff.

Look Around

When you look around and imagine worst case failure possibilities, you can anticipate potential disasters in various areas of your infrastructure and facilities. You may choose to do an element of redesign to circumvent the problem, like the vortex dampers on tall radio masts or you may decide that it’s better to manage potential failures, like building capacity throttling into your upstream networks.

If you’d like to discuss potential failure modes of your systems, or to comment on this article, please email me at .

This article was published in the July 2024 edition of The TMC Advisor
- ISSN 2369-663X Volume:11 Issue:5

©2024 TMC Consulting