Heathrow Blackout
Just before midnight on Thursday March 20th 2025, a major fire at the North Hyde substation precipitated a power shutdown that closed London’s Heathrow Airport. Power was restored on Friday and flights resumed that evening. Some 200,000 passengers had their travel disrupted on over 1,300 flights, with 120 aircraft actually in the air at the time of shutdown being diverted or returned to departure point. How does this sort of chaos happen and can it be prevented? Airports around the world need to review their risk analysis plans to find out.
Why no backup?
It is easy to lay blame and assume that major infrastructure, like airports, is fully protected by backup plans. So, what went wrong?
Actually, the truth is that not a lot did go wrong as far as the airport backup plans are concerned. An airport the size of Heathrow (or Vancouver, or Toronto for that matter) is like a medium sized city – with an electrical load running into many millions of watts (MW). Generating power at that capacity does not come cheaply – a typical 10MW diesel generator will cost well over US$5m. To fully power an airport will require at least one such unit – and power generated that way is much more expensive than utility power from the local grid, making it impossible to cost justify even having such a system let alone using it regularly.
Critical infrastructure, like Heathrow, is provided with several UPS (Uninterruptable Power Supply) systems, where batteries keep priority loads powered for typically 20 minutes or so to allow for standby generators to start and take over the load. These generators have a capacity less than 1MW – most being around 500kW. Priority loads include emergency lighting, for safety, and computer systems – plus some essential communications, and, possibly, radar and runway lighting. Beyond that, other systems, like baggage handling, main lighting, HVAC, shops etc all go dark and stop.
Rather than relying on backup, most focus goes to ensuring resilience to make supplies ‘fault tolerant’. This means having more than one (often three) power grid feeds, with physical separation. Failure of one feed simply trips out that component and the rest of the supply network takes up the load. Apart from a loud bang and brief voltage dip, users are none the wiser. Indeed, Heathrow does have multiple feeds – however, it seems that, for reasons yet to be explained, the diversity (or independence) was not what was expected. It has been suggested that the backup substation was temporarily being fed from North Hyde – causing the failure there to take out main and backup feeds.
Why down so long?
Whilst it is easy to turn power off, the process for turning it back on is far from simple. Not all computer systems are UPS backed and there is a restart sequence that must be followed (hopefully that is documented). Computer servers must be started and running before network connected devices will even work. Servers and switches must be all running before network devices (like Internet of Things – IoT) will work. These days, nearly everything is IoT, or computer controlled. Even starting cooling systems is not as simple as plugging in the fridge at home. After power is restored, the control systems need to confirm that condensed liquid has drained from the gas side of the chillers before the compressors can be safely started. Baggage in the vast conveyor system that is the lifeblood of the airport has to be all manually collected and returned to the start so the rebooted system is able to ‘know’ where everything is.
Power design at major sites tends to be built around classifying loads as either essential or not. Only essential loads are generator backed, and only critical ones have UPS support as well. Often, the truth is that parts of most systems are essential – and a case could be made that those parts should be separately powered in order to speed up rebooting. If the baggage system computers and scanners were powered, then the high-powered conveyor motors could stop and then restart with the system knowing where all the bags were. The downside, and there is always one, is that this makes the systems and their power infrastructure more expensive.
What Should Be Done?
In the end it becomes a cost-benefit exercise.
It is not possible to design infrastructure to never fail. We can design to reduce the likelihood of failure by any factor we choose– but the cost of doing that increases as the chosen factor increases. Yes, the loss of Heathrow for a day cost millions – but would users pay increased fees to reduce the already low risk of failure still further? In the end, how low should we push that risk? Heathrow is approaching 80 years old, and this is the first substation fire that has closed operations for a day.
Certainly, there are lessons to be learned. The UK government is keen to learn and implement any needed changes. Never failing is not a commercial reality – but getting back running faster may well be a useful target. Telling approaching aircraft to hold for 30 minutes while the airport reboots may well be acceptable – but they cannot hold for a day.
Not trusting that assumed diversity is there is probably the most important lesson to be learned – and that applies to power and also telecommunications, including internet. Having a backup is only of use if it works when the primary fails.
It is not thought that the fire at Horth Hyde was a terrorist act – but it is certain that terrorists and other ‘bad actors’ will have been impressed by the consequences of a ‘simple’ fire. Next time we may not be so lucky.
If you’d like to discuss how TMC can help you independently assess the risk of failure that you face, or to comment on this article, please email me at ellen.
This article was published in the
May 2025
edition of The TMC Advisor
- ISSN 2369-663X Volume:12 Issue:2
©2025 TMC Consulting