System accident

A system accident is an "unanticipated interaction of multiple failures" in a complex system. This complexity can be either technological or organizational, and is frequently both.[1] A system accident can be very easy to see in hindsight, but difficult to see in foresight because there are too many different action pathways to seriously consider all of them.

These accidents often resemble Rube Goldberg devices in the way that small errors of judgment, flaws in technology, and insignificant damages combine to form an emergent disaster. System accidents were described in 1984 by Charles Perrow, who termed them "normal accidents", as having such characteristics as interactive complexity, tight coupling, cascading failures, and opaqueness. James T. Reason extended this approach with human reliability[2] and the Swiss cheese model, now widely accepted in aviation safety and healthcare.

Once an enterprise passes a certain point in size, with many employees, specialization, backup systems, double-checking, detailed manuals, and formal communication, employees can all too easily recourse to protocol, habit, and "being right." Rather like attempting to watch a complicated movie in a language one is unfamiliar with, the narrative thread of what is going on can be lost. And other phenomena such as groupthink can be occurring at the same time, for real world accidents almost always have multiple causes. In particular, it is a mark of a dysfunctional organization to simply blame the last person who touched something.

Charles Perrow also termed system accidents as normal accident, because given the current level of technology, the occasional such accident is highly likely in the long term. In 2012 he wrote, "A normal accident is where everyone tries very hard to play safe, but unexpected interaction of two or more failures (because of interactive complexity), causes a cascade of failures (because of tight coupling)."[3]

There is an aspect of an animal devouring its own tail, in that more formality and effort to get it exactly right can actually make the situation worse.[4] For example, the more organizational rigmarole that is involved in adjusting to changing conditions, the more employees will delay reporting the changing conditions. The more emphasis on formality, the less likely employees and managers will engage in real communication. And new rules can make the situation worse, both by adding another layer of complexity and by telling employees, yet again, that they are not to think but merely to follow the rules.

In a 1999 article primarily focusing on health care, J. Daniel Beckham wrote, "It is ironic how often tightly coupled devices designed to provide safety are themselves the causes of disasters. Studies of the early warning systems set up to signal missile attacks on North America found that the failure of the safety devices themselves caused the most serious danger: false indicators of an attack that could have easily triggered a retaliation. Accidents at both Chernobyl and Three Mile Island were set off by failed safety systems."[5]

Scott Sagan

Scott Sagan has multiple publications discussion the reliability of complex systems, especially regarding nuclear weapons. The Limits of Safety (1993) provided an extensive review of close calls during the Cold War that could have resulted in a nuclear war by accident.[6]

Possible system accidents

Apollo 13 space flight, 1970

Apollo 13 Review Board:

Introduction
. . . It was found that the accident was not the result of a chance malfunction in a statistical sense, but rather resulted from an unusual combination of mistakes, coupled with a somewhat deficient and unforgiving design [Emphasis added]. . .

e. Although Beech did not encounter any problem in detanking during acceptance tests, it was not possible to detank oxygen tank no. 2 using normal procedures at KSC. Tests and analyses indicate that this was due to gas leakage through the displaced fill tube assembly [emphasis added].

f. The special detanking procedures at KSC subjected the tank to an extended period of heater operation and pressure cycling. These procedures had not been used before [emphasis added], and the tank had not been qualified by test for the conditions experienced. However, the procedures did not violate the specifications which governed the operation of the heaters at KSC.

g. In reviewing these procedures before the flight, officials of NASA, ER, and Beech did not recognize the possibility of damage due to overheating. Many of these officials were not aware of the extended heater operation. In any event, adequate thermostatic switches might have been expected to protect the tank [emphasis added].

h. "A number of factors contributed to the presence of inadequate thermostatic switches in the heater assembly. The original 1962 specifications from NR to Beech Aircraft Corporation for the tank and heater assembly specified the use of 28 V dc power, which is used in the spacecraft. In 1965, NR issued a revised specification which stated that the heaters should use a 65 V dc power supply for tank pressurization; this was the power supply used at KSC to reduce pressurization time. Beech ordered switches for the Block II tanks but did not change the switch specifications to be compatible with 65 V dc."[7]

Three Mile Island, 1979

Charles Perrow:

"It resembled other accidents in nuclear plants and in other high risk, complex and highly interdependent operator-machine systems; none of the accidents were caused by management or operator ineptness or by poor government regulation, though these characteristics existed and should have been expected. I maintained that the accident was normal, because in complex systems there are bound to be multiple faults that cannot be avoided by planning and that operators cannot immediately comprehend."[8]

ValuJet(AirTran) 592, Everglades, 1996

William Langewiesche:

"By diligently pursuing his options, the mechanic could have found his way to a different part of the manual and learned that 'all serviceable and unserviceable (unexpended) oxygen generators (canisters) are to be stored in an area that ensures that each unit is not exposed to high temperatures or possible damage.'"[4]

That is, most written manuals are formalistic, but are neither helpful nor informative.


Brian Stimpson:

Step 2. The unmarked cardboard boxes, stored for weeks on a parts rack, were taken over to SabreTech's shipping and receiving department and left on the floor in an area assigned to ValuJet property.

Step 3. Continental Airlines, a potential SabreTech customer, was planning an inspection of the facility, so a SabreTech shipping clerk was instructed to clean up the work place. He decided to send the oxygen generators to ValuJet's headquarters in Atlanta and labelled the boxes "aircraft parts". He had shipped ValuJet material to Atlanta before without formal approval. Furthermore, he misunderstood the green tags to indicate "unserviceable" or "out of service" and jumped to the conclusion that the generators were empty.

Step 4. The shipping clerk made up a load for the forward cargo hold of the five boxes plus two large main tires and a smaller nose tire. He instructed a co-worker to prepare a shipping ticket stating "oxygen canisters - empty". The co-worker wrote, "Oxy Canisters" followed by "Empty" in quotation marks. The tires were also listed.

Step 5. A day or two later the boxes were delivered to the ValuJet ramp agent for acceptance on Flight 592. The shipping ticket listing tires and oxygen canisters should have caught his attention but didn't. The canisters were then loaded against federal regulations, as ValuJet was not registered to transport hazardous materials. It is possible that, in the ramp agent's mind, the possibility of SabreTech workers sending him hazardous cargo was inconceivable[9]

Possible future applications of concept

Five-fold increase in airplane safety since 1980s, but flight systems sometimes switch to unexpected "modes" on their own

In an article entitle "The Human Factor", William Langewiesche talks the 2009 crash of Air France Flight 447 over the mid-Atlantic. He points out that, since the 1980s when the transition to automated cockpit systems began, safety has improved fivefold. Langwiesche writes, "In the privacy of the cockpit and beyond public view, pilots have been relegated to mundane roles as system managers." He quotes engineer Earl Wiener who takes the humorous statement attributed to the Duchess of Windsor that one can never be too rich or too thin, and adds "or too careful about what you put into a digital flight-guidance system." Wiener says that the effect of automation is typically to reduce the workload when it is light, but to increase it when it's heavy.

Boeing Engineer Delmar Fadden said that once capacities are added to flight management systems, they become impossibly expensive to remove because of certification requirements. But if unused, may in a sense lurk in the depths unseen.[10]

Langewiesche cites industrial engineer Nadine Sarter who writes about "automation surprises," often related to system modes the pilot does not fully understand or that the system switches to on its own. In fact, one of the more common questions asked in cockpits today is, "What’s it doing now?" In response to this, Langewiesche again points to the fivefold increase in safety and writes, "No one can rationally advocate a return to the glamour of the past."[10]

Healthier interplay between theory and practice in which safety rules are sometimes changed?

From the article "A New Accident Model for Engineering Safer Systems," by Nancy Leveson, in Safety Science, April 2004:
"However, instructions and written procedures are almost never followed exactly as operators strive to become more efficient and productive and to deal with time pressures. . . . . even in such highly constrained and high-risk environments as nuclear power plants, modification of instructions is repeatedly found and the violation of rules appears to be quite rational, given the actual workload and timing constraints under which the operators must do their job. In these situations, a basic conflict exists between error as seen as a deviation from the normative procedure and error as seen as a deviation from the rational and normally used effective procedure (Rasmussen and Pejtersen, 1994)."[11]

Perhaps in the future, intelligently and slowly modifying safety rules based on actual experience becomes more common.

References

Notes

  1. Perrow, Charles (1984). Normal Accidents: Living with High-Risk Technologies, With a New Afterword and a Postscript on the Y2K Problem, Princeton, New Jersey: Princeton University Press, ISBN 0-691-00412-9, 1984, 1999 (first published by Basic Books 1984)
  2. Reason, James (1990-10-26). Human Error. Cambridge University Press. ISBN 0-521-31419-4.
  3. GETTING TO CATASTROPHE: CONCENTRATIONS, COMPLEXITY AND COUPLING, Charles Perrow, The Montréal Review, December 2012.
  4. 1 2 Langewiesche, William (March 1998). The Lessons of Valujet 592, The Atlantic. See especially the last three paragraphs of this long article: “ . . . Understanding why might keep us from making the system even more complex, and therefore perhaps more dangerous, too.”
  5. The Crash of ValuJet 592: Implications for Health Care, J. Daniel Beckham, January '99. DOC file --> http://www.beckhamco.com/41articlescategory/054_crashofvalujet592.doc Mr. Beckham runs a health care consulting company, and this article is included on the company website.
  6. Sagan, Scott D. (1993). The Limits of Safety: Organizations, Accidents, and Nuclear Weapons. Princeton U. Pr. ISBN 0-691-02101-5.
  7. REPORT OF APOLLO 13 REVIEW BOARD ("Cortright Report"), Chair Edgar M. Cortright, CHAPTER 5, FINDINGS, DETERMINATIONS, AND RECOMMENDATIONS.
  8. Perrow, C. (1982), [https://inis.iaea.org/search/search.aspx?orig_q=RN:13677929 Perrow's abstract for his chapter entitled "The President’s Commission and the Normal Accident," in Sils, D., Wolf, C. and Shelanski, V. (Eds), Accident at Three Mile Island: The Human Dimensions, Boulder, Colorado, U.S: Westview Press, 1982 pp.173–184.
  9. Stimpson, Brian (Oct. 1998). Operating Highly Complex and Hazardous Technological Systems Without Mistakes: The Wrong Lessons from ValuJet 592, Manitoba Professional Engineer. This article reviews William Langewiesche's approach to ValuJet 592 and provides counter-examples of complex organizations which have good safety records.
  10. 1 2 The Human Factor, Vanity Fair, William Langewiesche, September 17, 2014. " . . . pilots have been relegated to mundane roles as system managers, . . . Since the 1980s, when the shift began, the safety record has improved fivefold, to the current one fatal accident for every five million departures. No one can rationally advocate a return to the glamour of the past."
  11. A New Accident Model for Engineering Safer Systems, Nancy Leveson, Safety Science, Vol. 42, No. 4, April 2004. Paper based on research partially supported by National Science Foundation and NASA. " . . In fact, a common way for workers to apply pressure to management without actually going out on strike is to 'work to rule,' which can lead to a breakdown in productivity and even chaos. . "

Further reading

  • Cooper, Alan (2004-03-05). The Inmates Are Running The Asylum: Why High Tech Products Drive Us Crazy and How To Restore The Sanity. Indianapolis: Sams - Pearson Education. ISBN 0-672-31649-8.
  • Gross, Michael Joseph (May 29, 2015). Life and Death at Cirque du Soleil, This Vanity Fair article states: " . . . A system accident is one that requires many things to go wrong in a cascade. Change any element of the cascade and the accident may well not occur, but every element shares the blame. . . "
  • Helmreich, Robert L. (1994). "Anatomy of a system accident: The crash of Avianca Flight 052". International Journal of Aviation Psychology. 4 (3): 265–284. doi:10.1207/s15327108ijap0403_4. PMID 11539174.
  • Hopkins, Andrew (June 2001). "Was Three Mile Island A Normal Accident?" (PDF). Journal of Contingencies and Crisis Management. 9 (2): 65–72. doi:10.1111/1468-5973.00155. Archived from the original (PDF) on August 29, 2007. Retrieved 2008-03-06.
  • Pidgeon, Nick (Sept. 22, 2011). "In retrospect: Normal accidents," Nature.
  • Perrow, Charles (May 29, 2000). "Organizationally Induced Catastrophes" (PDF). Institute for the Study of Society and Environment. University Corporation for Atmospheric Research. Retrieved February 6, 2009.
  • Roush, Wade Edmund. CATASTROPHE AND CONTROL: HOW TECHNOLOGICAL DISASTERS ENHANCE DEMOCRACY, Ph.D Dissertation, Massachusetts Institute of Technology, 1994, page 15. ' . . Normal Accidents is essential reading today for industrial managers, organizational sociologists, historians of technology, and interested lay people alike, because it shows that a major strategy engineers have used in this century to keep hazardous technologies under control -- multiple layers of "fail-safe" backup devices -- often adds a dangerous level of unpredictability to the system as a whole. . '
  • "Test shows oxygen canisters sparking intense fire". CNN.com. 1996-11-19. Retrieved 2008-03-06.
  • Wallace, Brendan (2009-03-05). Beyond Human Error. Florida: CRC Press. ISBN 978-0-8493-2718-6.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.