Midnight in Chernobyl or how small mistakes can lead to great disasters

Once in a while I pick up a book about something outside of my expertise. As a kid I lived ca. 200 km from Chernobyl, where the biggest nuclear disaster happened in April 1986 (I actually remember that day and the few days after). The book got my interest because of its subtitle – the untold story of the nuclear disaster. I, admittedly, wanted to know how the disaster looked like from the side of the operators.

No one really knows what the long-term effects of the disaster really are (after all, 30+ years in not such a long term), but it’s interesting to see how the disaster happened and what we can learn from it in software engineering.

So, in short, the disaster happened because of the combination of factors.

First, the design of the reactor was flawed. The mix of substances used in the reactor have certain properties that raise the effect when they should lower it, or raise the effect when not monitored constantly.

Second, the implementation of the design, the construction of the power plant, was not great either. Materials of lower specs were used due to shortages in the USSR. The workers did not care much about the state property and the 5-year plans trumped the safety, security measures and even the common sense.

Third, and not less important, the operations were not according to the instructions. The operators did not follow the instructions for the test that they were about to commence. They reduced the power below the limit and then executed the test. Instead, they should have stopped the reactor and run the test during the next window available.

So, what does it have to do with software engineering? There was no software malfunction, but a set of human errors.

IMHO, this accident teaches us about the importance of safety mechanisms in software. I believe that many of us, who design software, do not think so much about the potential implications of what we do. We get a set of requirements, which we implement. However, what we should do, is to look broader at how users can use our system. How we can prevent any potential disaster.

For example, when we implement an app for a game. Should we allow people to play the game as much as they want? Should we provide them with all kinds of commercials? or should we help them by saying that they played long enough and that they could consider a break? Or maybe we should filter the commercials if we know that the game is played by a child?

I think that this is something we need to consider a bit more. We should even discuss this when we design our curricula and how we implement the curricula.