The man who saved the world

On September 26 1983, a lowly leftenant colonel in the Soviet Air Defence Forces single-handedly prevented the accidental launch of World War III. Stanislav Petrov’s story first  emerged in the 1990s after the fall of the Soviet Union, but it’s only recently received any real attention. (Wired wrote an article on him last fall.) Petrov quite literally saved the world.

I won’t get into the whole story. (Wikipedia does it justice.) In a nutshell, Petrov was on duty when an alert came in from the Soviet satellite-based launch detection system indicating a US first strike had been launched. Petrov ignored protocol and decided that the alert had to be in error despite the fact that alert confidence was high. Petrov chose not to report the alert up the chain of command to the Soviet leadership, who at that time were in such a state of paranoia that they expected a first strike and were likely to launch on warning. It’s felt by many that this is about the closest we ever came to nuclear holocaust.

This is interesting from a human performance perspective because Petrov’s decision was the result of knowledge-based performance when he should have been in a rule-based mode. Instead of simply picking up the phone as he was trained and obligated to do, he stopped, thought, acted … or in this case didn’t … and reviewed. In effect following STAR. Petrov put aside protocol and applied a ‘questioning attitude’. He analyzed the situation and determined he was ‘out of procedure’ (OOP) based on the premise that a first strike was unlikely to take the form of a single missile. This initiative — and for a Soviet missile command officer ignoring nuclear combat protocol, this courage — allowed the time needed to review the situation and determine that the alert was in error. (In fact, it had been caused by sunlight reflecting off clouds.)

The issue of personnel operating in a knowledge-based mode when they should be in a rule-based mode is a complex one. This example clearly demonstrates the value of allowing a task performer to use knowledge-based decision-making to recognize when a rule-based mode is no longer appropriate. However, operating in a knowledge-based mode can also introduce its own risks.

The knowledge requirements for a worker operating in knowledge-based mode are much higher than those needed to operate in rule-based mode. If a worker without that knowledge is allowed to step out of rule-based mode (that is, to improvise), the results can be unexpected. This is a problem if the organizational system and processes have been designed to allow less knowledgeable workers to perform tasks (the monkey grinder approach).

Different industries have tackled this issue in different ways.

In the traditional telecoms industry in most of the western world (the southern US aside because of its literacy challenges), craftspeople working on systems in rule- or skill-based mode were almost universally overqualified. A craftsperson was expected to recognize when they were OOP and further empowered to switch to a knowledge-based troubleshooting or problem-solving mode as soon as it became necessary.

This was possible in part because of the incredible redundancy built into modern switching systems and in part because of the availability of a highly trained, skilled and disciplined workforce.

This approach was necessary because of the time-sensitivity of telecommunications problems — you can’t just turn off a switch and walk away, because people still keep trying to make calls — and because of the level of complexity in modern telecommunications systems. (By the early 1990s, a DMS switching station had more than ten million lines of code and an enormous range of failure modes.) There’s no such thing as rendering a phone system safe. The people pulling line cards and responding to switch alarms had to be able to quickly characterize and respond to complex problems, both within procedures in rule-based mode, and outside procedures in knowledge-based mode.

In contrast, in the nuclear power industry workers are very strongly discouraged from moving to a knowledge-based performance mode without substantial oversight and process overheads. In keeping with this culture, many  performers doing rule-based work on nuclear systems lack the knowledge needed to work in a knowledge-based mode on those same systems.

In other words, the worker may know what switches to throw and what warning lights to look for, but may have little or no understanding of the underlying system they are manipulating.

The worker relies on the rigor and conservative design of the procedure to identify when the work is OOP. Once OOP, work stops altogether and another worker (typically a system specialist with extensive systems knowledge) is brought in to continue in a knowledge-based mode.

This approach works because the systems are robust and workers typically stay within procedure. It works because the system is designed to fail safe and be easily rendered safe (removing the urgency in moving to a knowledge-based mode) and because the technology itself is relatively primitive. Workers are unlikely to encounter complex or ambiguous failure modes that cannot be easily recognized as OOP. (However, when they do, the results can be profound — as happened at Three Mile Island.)

This approach is necessary because of the potential for extreme outcomes if workers move into a knowledge-based mode without the necessary knowledge. As importantly, it is necessary because of the required regulatory oversight within the nuclear industry. It’s essential that workers stay within procedure because procedures are binding not only within the job function or organization, but also with the regulator. Procedures are crafted with enormous care, requiring the approval of system specialists, management and in many cases regulator representatives. Going out of procedure is thus exceptional. Within the nuclear power industry, procedural adherence is paramount.

Coming back to the example of Petrov, a duty officer at a nuclear command centre — at least, within the Soviet command system — didn’t need to have an understanding of nuclear combat strategies because they weren’t expected to make those types of evaluations. Petrov’s job was to respond to launch alerts from the automated monitoring systems and report those alerts to the Soviet leadership. Hear a bell, read data off a screen, and pick up a phone.

More than that, the requirements regarding procedural adherence in a nuclear combat environment would have been far more extreme than even those of the nuclear power industry. The expectation to stay within procedure — to follow orders — is absolute. The potential outcomes are almost unimaginable. Both the demand for compliance and the need to avoid an error, once Petrov recognized the likelihood of that error, are truly exceptional. Petrov had reason to fear the repercussions of ignoring that protocol — especially that protocol — in Soviet Russia. He could also have been wrong and facilitated a US first strike on his country.

In this case, whether Petrov’s understanding of the situation was simply fortuitous, or whether there was some unrecognized defence-in-depth within the system (because, perhaps, duty officers performing that ‘monkey grinder’ role were intentionally overqualified) isn’t obvious.

The lesson in terms of human performance is that there can be enormous value in making sure that even workers expected to perform only in rule-based mode have the knowledge and understanding needed to recognize and react to OOP conditions. Systems and procedures aren’t always rigorous enough, and there are times despite our best efforts when the only thing preventing catastrophy is knowledge, understanding and confidence.

I’ll leave you with this last thought … where were you on September 26, 1983?

Leave a comment

You must be logged in to post a comment.