TMI operators did what they were trained to do

Note by Rod Adams:  This post has a deep background story. The author, Mike Derivan, was the shift supervisor at the Davis Besse nuclear power plant (DBNPP) on September 24, 1977, when it experienced an event that started out almost exactly like the event at Three Mile Island on March 28, 1979.

The event began with a loss of feed water to the steam generator. The rapid halt of heat removal resulted in a primary loop temperature increase, primary coolant expansion, and primary system pressure exceeding the set point for a pilot operated relieve valve in the steam space of the pressurizer. As at TMI, that relief valve stayed open after system pressure was lowered, resulting in a continuing loss of coolant. For the first 20 minutes, the plant and operator response at Davis Besse were virtually identical to those at TMI.

After that initial similarity, Derivan had an “Ah-ha” moment and took actions that made the event at Davis Besse turn into a historical footnote instead of a multi-billion dollar accident.

When Three Mile Island happened and details of the event emerged from the fog of initial coverage, Mike was more personally struck than almost anyone else. He has spent a good deal of time during the past 35 years trying to answer questions about the event, some that nagged and others that burned more intensely.

In order to more fully understand the narrative below, please review Derivan’s presentation describing the events at Davis Besse, complete with annotated system drawings to show how the event progressed.

This story is a little longer and more technical than most of the posts on ANS Nuclear Cafe or Atomic Insights (where this post originally appeared). It is intended to be a significant contribution to historical understanding of an important event from a man with a unique perspective on that event. If you are intensely curious about nuclear energy and its history, this story is worth the effort it requires.

The rest of this post is Mike’s story and his analysis, told in his own words.

______________________

By Mike Derivan

My first real introduction to the Three Mile Island-2 (TMI) accident happened on Saturday, March 31, 1979, a few days after the accident. TMI-2 was a Babcock and Wilcox (B&W) pressurized water reactor plant.

At the Davis Besse nuclear power plant (DBNPP) in Ohio where I worked, we initially heard something serious had happened at TMI-2 as early as the day of the event, March 28, and interest was high because TMI was our sister plant. DBNPP also is a B&W PWR plant.

Actual details were sketchy for the next couple of days, and mainly by watching the nightly TV news it became clear to me that something serious was going on. It was clear from watching the TV news reports that conflicting information was being reported. Some reports indicated there had been radiation releases and also reports by the plant owner of no radiation releases.

I even remember hearing the words “core damage” first mentioned. It was Saturday on a TV news report that I saw the first explanation using pictures of the system to the suspected sequence of events and it became clear to me the pilot operated relief valve had stuck open.

My reaction was gut-wrenching and I was also in disbelief that TMI did not know what had happened at Davis Besse. That evening I watched the Walter Cronkite news report. I sat there with total disbelief as he discussed potential core meltdown. Disbelief because if you were a trained reactor operator in those days it was pretty much embedded in your head that a core meltdown was not even possible; and here that possibility was staring me right in the face.

Cronkite’s report was also my first exposure to the infamous hydrogen bubble story. I had enough loss of coolant accident (LOCA) training to understand that some hydrogen could be generated during LOCAs; after all we had containment vessel hydrogen concentration monitoring and control systems installed at our plant. But the actual described scenario at TMI seemed incredible, except that it had apparently happened.

I would expect that my reaction was the same as many nuclear plant operators at that time. The exception was that the apparent initiating scenario had actually happened to me 18 months earlier at Davis Besse and I just couldn’t get the question out of my mind: “Why didn’t they know?”

The real root cause of the TMI accident

Since the time of the TMI accident virtually hundreds of people have stuck their nose into the root cause of the TMI accident. Both the Kemeny and Rogovin investigations identified a lot of programmatic “stuff” that needed to be fixed, and I agree with most of it.

I feel, however, that both of them skirted one important issue by using different flavors of “weasel words” in the discussion of operator error. The two reports handled that specific topic a bit differently, but the discussions got couched with side topics of contributing factors. The general consensus of all the current discussion summaries I read is that TMI was caused by operator error.

The TMI operators did make some operator errors and I am not denying that. But my contention is all the errors they made were after the fact that they got outside of the design-basis understanding of PWRs at that time. It is no surprise to anyone that when a machine this complicated gets outside of its design basis, anything might happen. You basically hope for the best, but you are going to have to take what you get.

Fukushima proves that, and everyone knows why/how Fukushima got outside of its design basis. The how/why that TMI operators got outside of their design basis is going to be the focus of my discussion. I will also discuss the fact that I think this was understood at the time of the investigations, but it was consciously decided not to pursue it.

My whole point of contention is the turning off the high pressure injection flow early in the event in response to the increasing pressurizer level is the crux of the whole operator error argument. All discussions say that if the operators hadn’t done that, the TMI event would have been a no-never-mind. And I agree.

But nobody really wants to believe that they were told to do that for the symptoms they saw.

In other words, they were told to do that, by their training, compounded by tunnel vision bad procedure guidance. I have believed this since the day I understood what happened at TMI. Furthermore, the TMI operators were trying to defend their actions from a position of weakness; their core was melted, nobody wanted to believe them.

I am not in a position of weakness on this issue, my event came out okay at DBNPP, and so I have no reason to not be totally honest or objective on this issue. During the precursor event at DBNPP, we also turned off high pressure injection early in the event in response to the symptoms that we saw, and for the same reason the TMI operators did it 18 months later; we were told to do it that way.

This fact is apparently a hard pill to swallow. But if it is hard for you to accept, just imagine how I felt watching TMI unfold in real-time.

And right there is the crux of the issue. Once those high pressure injection pumps were off, both plants were then outside the design-basis understanding for that particular small break LOCA.

So you hope for the best, but take what you get. But still, obviously an error has been made if not taking that action would have made the event a no-never-mind.

So who exactly made the error? Both the Kemeny and Rogovin reports discuss the problems with the B&W simulator training for the operators. The important point that they both apparently missed (or didn’t want to deal with, which I prefer as the explanation) is that this is really an independent two-part problem.

I will refer to controlling high pressure injection during a small break LOCA as part A of the problem, and to the actual physical PWR plant response to a small break LOCA during a leak in the pressurizer steam space as part B of the problem.

It really is that simple. B&W was training correctly for high pressure injection control (part A) for small break LOCAs in the water space of their PWR. But neither they nor Westinghouse correctly understood the correct plant response for a small break LOCA in the pressurizer steam space.

By omission they were not training correctly for a small break LOCA in the pressurizer steam space (part B). To make matters worse, B&W was overstressing in training the importance of the part A “rules”, to the extent that an operator would fail a B&W administered operator certification exam for failure to correctly implement the part A rules.

Thus, when fate would have it and the two occurrences (part A and part B) combined in the real world, where the plant responds per the rules of Mother Nature, the B&W training and procedures ended up leading the operators to actions that put them outside the actual design basis, not the falsely perceived (and trained upon) design basis.

Up until very recently my argument has been one using just simple logic and sheer numbers of operators involved. In Davis Besse’s September 1977 event, there were five licensed operators involved in that decision, either by direct action or complacent compliance. In other words, all five agreed that it was the right thing to do. Of course, it wasn’t the right thing to do, but nobody objected because it was the correct part A thing to do and nobody understood the part B of the problem.

Eighteen months later at TMI, in March 1979, an additional number of operators (just how many depends on the time line) repeated the same initial wrong actions. So we have about a dozen operators, at two separate plants 18 months apart, all doing the same thing and all convinced that they were doing the right thing.

Is it even conceivable to think that they did not all believe they did the right thing according to part A? I just don’t believe so; of course, we are all arguing from a position of weakness. It is the wrong thing to do for part A and part B combined, so nobody really wants to believe that we were trained to do it.

But as I explained, it is really the two-part problem that created the issue. My point can be further emphasized by the fact that the Nuclear Regulatory Commission’s Region III had heartburn over the report that DBNPP submitted for its event. The NRC did not like the fact that the report did not say that the operators made an error turning off high pressure injection.

I know why that happened. The person most responsible for writing the report narrative was actually in the control room during the event. He did not believe the action was wrong based on his same training relative to part A of the problem. So why would he put that statement in the report? He was so convinced that his own (complacent) agreement was correct that saying otherwise would be a false statement.

Just recently new information came to my attention that absolutely confirms my belief that B&W was in fact totally emphasizing high pressure injection control in their training based solely on their understanding of the part A problem, with no understanding on B&W’s part of the part B problem or its affect when combined with the part A problem.

My understanding comes directly from seeing the whole infamous Walters’ response memo of November 10, 1977, to the original Kelly memo of November 1, 1977. It is absolutely remarkable to me that 35+ years after the DBNPP event and almost the same amount of time after TMI that a totally unrelated Google search turns up a complete version of the Walters memo.

After half a lifetime of studying all the TMI reports, I had only seen one “cherry picked” excerpt from the Walters memo, basically saying that he agreed with the operators’ response at DBNPP. The whole memo in context basically confirms that the operator claims of “we were trained to do it” are correct.

The original Kelly memo also confirms that Kelly still didn’t grasp the significance of the part B problem, as related to the DBNPP event; or if he did he didn’t relate it thoroughly and clearly in his memo. Both memos are presented and discussed below; make up your own conclusions. (The source document is here.

The Kelly memo

Kelly Memo

Kelly Memo

The referenced source document is basically a critique of these memos by textual communications experts. Here’s a summary: First, Kelly is talking “uphill” in the organization, so he couches his memo with that in mind. He asks no one for a decision, but basically asks for “thoughts.” And he makes a non-emphatic recommendation for “guidelines.”

My personal additional notations are that he dilutes the importance of and possibly adds confusion to the recommendation by adding “LPI” to the discussion, but most importantly he totally misses any part B problem discussion. He does say “the operator stopped High Pressure Injection when Pressurizer level began to recover, without regard to primary pressure.”

But there is no mention about the fact that the system response was not as expected, e.g. the pressurizer level went up drastically in response to the reactor cooling system boiling. He never articulates that the operator’s reluctance to re-initiate high pressure injection, even after we understood the cause of the off-scale pressurizer level indication, was based solely on that indicated pressurizer level and our training. Thus, the memo totally misses addressing the part B problem point that the system response was not as expected by anybody, which was crucial to getting the guidance fixed.

The other thing I notice is that the memo is not addressed to Walters. I’ve also “been there, done that” in a large organization. I can easily understand how the recipient (Walters’ boss) upon receiving this memo, with no specific articulation of a new problem (part B), would pass it to Walters with a “handle it, handle it… make it go away.” I also note that N.S. Elliott ison the distribution. He was the B&W Training Department manager, thus B&W training was directly in the loop on this issue also.

The Walters response memo

Note that the original Walters’ response memo to Kelly was hand written, so it has been apparently typed someplace along the line. This is how it appears in the reference source, typos and all.

Walters Memo

Walters Memo

I’m omitting the communications expert’s comments, because they are in the reference. Here are my comments: In simple operator lingo, this response is a “smart ass slap down” to Kelly, including all the accompanying sarcasm. But there are some very important admissions revealed here. First, an admission, including Walters’ discussion with the B&W Training Department, that we responded in the correct manner considering how we were trained, and also including the bases behind our training.

This is what we operators had been claiming all along, but nobody wanted to believe it. Second, Walters clearly states both as his personal assumption and the B&W Training Department assumption that reactor coolant pressure and pressurizer level will trend in the same direction during a LOCA. Bingo. He has just admitted that they don’t get, still, the specific part B contribution to the problem.

So they are in fact training wrong for this event because they don’t understand part B. Further, this discussion is happening after the DBNPP event, as a result of the Kelly concerns, and well before TMI. Third, the tone of Walters’ sarcastic comments about a “hydro” (hydrostatic pressure testing) of the reactor coolant system every time high pressure injection is initiated shows the disproportional emphasis that the B&W training was placing on “never let High Pressure Injection pump you solid.” Again, something that the operators were claiming that nobody wanted to believe.

My conclusion, and it hasn’t changed in 35 years, is that the root cause of the TMI accident was that the B&W simulator training and inadequate procedures put the TMI operators in a box, outside of their design-basis understanding for that specific small break loss of coolant. And a contributing cause is B&W itself didn’t understand the actual plant response to that steam space loss of coolant event because it was never analyzed correctly. Then, they also missed the warning that the Davis Besse event provided.

For a long time I wondered why both the Kemeny and Rogovin investigations didn’t reach the same specific conclusion as I have. After all, both investigations had some very smart people involved in both processes, and they both looked at the same evidence. My thinking today is that they did reach that same conclusion. But I don’t actually know what they may have seen as the bottom line purpose for their investigations either.

If you consider that no investigation report was going to change the condition of TMI, it may have been as simple as there is enough wrong that needs fundamental changing, so let’s just get those changes done and move forward. So neither group saw a need to identify the actual bottom line root cause, rather they just gave recommendations for prevention of another TMI–type accident.

Further, by the time those two reports were published, it was well understood that there was going to be a lawsuit between GPU and B&W. If one of those reports had specifically identified B&W with partial liability for the root cause, that conclusion along with the report that made it, would be inherently dragged into the lawsuit.

I have no doubt that this was actually discussed at the time. And I will further speculate that it was actually decided that there was no reason to identify the actual true single root cause in the reports because the lawsuit itself would decide that liability issue independently of the reports. My problem with that is the lawsuit, which started in 1982, never really settled the liability issue as it was mutually “settled” in 1983 before a conclusion was reached.

Another thing that I think was actually discussed at that time was the fact that if the reports stated that the root cause was because the B&W training put the operators outside of the design-basis understanding for that event (because the event wasn’t understood by B&W), it would open Pandora’s Box. They didn’t want to deal with “What else do you have wrong?” and there was well over a $100 billion worth of these nuclear power plants still operating.

This conclusion is strongly reinforced for me by the Kemeny Report section “Causes of the Accident”. This section of the report lists a “fundamental cause” as operator error, and specifically lists turning off high pressure injection early in the event. And then the report lists several “Contributing Factors” including B&W missing the warning provided by the Davis Besse event.

If you read the contributing factors listed, there is a screaming omission; it is never stated that B&W (actually the whole PWR industry if you consider the precursors) did not understand the actual plant response to a leak in the pressurizer steam space (what I refer here as part B of the problem). And that is why B&W and the NRC both missed the DBNPP warning. Virtually nothing will ever convince me that all those smart people did not put that truth together.

Thus, it was both their fear of opening Pandora’s Box and a conscious decision that there was no need to implicate B&W with any partial liability that ruled the process. By doing that, they collectively decided to throw the TMI operators under the bus as the default position.

My conclusion for the missing Contributing Factor problem is an Occam’s razor solution; it is not “missing” at all with respect to they didn’t “Get It”; it was a decision to not include it. After all, if that Contributing Factor had been included, who on earth would believe it is an operator error when they simply did what they were told to do in that situation? So, they just simply did not want to deal with the real issue; who made the error?

A simple analogy

For years I struggled with finding a simple analogy to explain the position that the TMI operators were placed in by their training, one that could be understood by common everyday knowledge that everyone was familiar with (and not the technical detail that required understanding the complications of nuke plant operations). One of the reasons that it was difficult was that it required a “phenomena” that is commonly understood today, but was not understood at all at the time of the training. This is the best that I can come up with.

Suppose in learning to drive a car you are being trained to respond to the car veering to the left. It’s simple enough, simply turn the steering wheel to the right to recover. It is also what your basic instinct would lead you to do, so there is no mental conflict in believing it.

It is also actually reinforced and practiced during actual driver training on a curvy road. That response is soon imbedded as the right thing to do. Now suppose your driver training also includes training on a car simulator training machine. It is where you learn and practice emergency situation driving. After all, nobody is going to do those emergency things in an actual car on the road.

Here’s where it gets complicated. Assume virtually no one yet understands that when the car skids to the left on ice (because of loss of front wheel steering traction), the correct response is to turn the steering wheel into the skid direction, or to the left. This is just the opposite of the non-ice response. And to make matters worse, because no one understands it yet, including the guy who built the car simulator, the car simulator has been programmed to make this wrong response work correctly on the simulator.

So in your emergency driver training you practice it this way, the simulator responds wrong to the actual phenomena, but it shows the successful result and you recover control. Since this probably also agrees with your instinct, and you see success on the simulator, this action is also embedded as the right thing to do. One additional point, if you don’t do this wrong action, you will flunk your simulator driver training test.

You know where this is going, now you are out driving on an icy road for the first time and the car skids to the left. You respond exactly as you were instructed to do and exactly as the simulator showed was successful, and you have an accident because the car responds to the real world rules of Mother Nature.

An investigation is obviously necessary because, I forgot to tell you, the car cost $4 billion and you don’t own it. During the subsequent investigation everything is uncovered; the unknown phenomenon is finally correctly understood, the simulator incorrect programming is discovered, it is uncovered that the previously unknown phenomenon had been discovered before your accident, and your accident was even predicted as possible.

But the investigation results are published and the finding is that the accident was caused by your error of turning the steering wheel the wrong way on the ice. Nobody else is found to have made an error in the stated conclusions but you; it is simply a case of driver error. Do you feel you have been wronged? This is what happened to the TMI operators.

For everybody out there who doesn’t like my conclusions, I’ll just say that many of the principals of the investigations are still alive, but choose not to talk. So, simply ask them, especially the principals in the GPU vs. B&W lawsuit that should have determined any liability issues. Ask them why it didn’t happen. My idea of justice involves getting the truth, the whole truth, and nothing but the truth exposed. That process is still unfinished.

tmi b&w 314x200

8 Responses to TMI operators did what they were trained to do

  1. Sidney Bernsen

    It seems to me that the operator error conclusion is only partially right. There is a basic flaw in the design of the primary system that produces a loop seal between the reactor vessel and the pressurizer and provides no direct measurement of the water level in the vessel. Therefore, the operator assumes that the system is full if the pressurizer level indicates a water level. The other system design error is the requirement to shut off automatic safety related systems when the turbine throttle closes.
    The only operator error that one should consider a contributor is the failure to understand why the temperature downstream of the pressurizer relief valve was so high and recognize that there was steam flow through it. But this would require a basic understanding that throttled steam is at a lower temperature than at its source.

  2. Sidney, I wrote this article. Your loop seal discussion effect, though likely correct, is so far in to the event sequence as to be N/A to the “set up” sequence that got them in trouble, which is my intended focus. I miss your point on the turbine throttles closing. At DBNPP we started with the Turbine tripped, for maintenance while we held at 9% power, yet we encountered the confusing situation post reactor trip. So I don’t get your point; does not seem applicable to the early event sequence. Finally, rest assured I understand throttling losses across a throttled steam valve as well today as I did in 1977. So you must mean I missed looking at the Mollier Meter; were you on Rogovin? I specifically address those conclusions clearly here: http://www.nukeknews.com/Cognitive%20Dissonance%20-%20Operator%20Error.html
    I suggest you read them; I really don’t care to repeat them here, have a nice day, mjd.

  3. Dave Heckman

    Any time you look into history, it’s important to look through the correct lens. Design flaws aside, it’s important that we learn from our human errors and never repeat them. I entered the Operations Dept at TMI in early ’82 and here are my observations. Pre-1982 nuclear power was a different animal than it is today. Procedures were practically non-existent. In fact, criminal charges were forgone against the CRS because the company had provided no written guidance or documented training for the accident scenario – so there was no way he could be held accountable to violating anything. In ’82, a procedure was typically five to ten pages long with 80 pages of addendums stapled to the back of it. People would add more information as they recognized the need. Format and structure were not terms that even applied to procedures at that time. Keep in mind; we were still using typewriters and quills (kidding – we had ballpoints). Procedure changes were via “cut-n-paste” technology. For those of you who weren’t born before computers, this was a process by which you literally would use scissors and tape. Think montage. When it was all done, you’d get someone to retype it, or you could run it through the Zerox machine to make a good final copy that didn’t have paper swatches taped on it. The process had not yet caught up with the recognized need for more robust and comprehensive instruction. This was typical across the industry and the old timers at that time would tell you that they didn’t even use procedures at “useta plant.” Plants relied on the knowledge and actions of the operators. And, there was a problem with that. Prior to that time the NRC didn’t proctor licensing exams, so there was little motivation to ensure that cheating didn’t occur. In fact, as I recall, all, or nearly all (I think it was 24 out of 24, or 24 out of 26) of the license class failed the first proctored exam at TMI prior to the restart of Unit 1 in ‘85. Also, the idea of simulator training had only recently been accepted across the industry and was very rudimentary and the fidelity of simulation was poor. As far as classroom training went; well, new operators were in a “self-paced, self-taught” learning program. While this obviously had something to do with the fact that the units were shut down at the time, it seemed a poor way to do business. One of the few classroom experiences I had there was rather interesting. After completing a week of training, it came time to take the Friday afternoon exam. As soon as the instructor started passing out tests, one of the plant electricians blurted out that the shop steward had not approved the exam and everyone stood up and filed out of the room. Following TMI, there were a lot of design changes that were back-fitted into plants, mostly in the form of remote shutdown panels and MOVs. I worked for a company for a while that did them. However, the most significant changes happened in training and written guidance. I’ve read pretty much everything that’s been written about TMI – in fact, it has been required training at different points throughout my career; however, it has always been clear to me that the industry was negligent in providing written guidance and proper training. Sadly, TMI was an accident that was waiting to happen. But, so as not to strike fear into the hearts of anyone reading what I’m writing here, I am still in the industry and can attest to the fact that both procedural guidance and training are light-years beyond what anyone in 1982 could have imagined. Operators are now incredibly well-trained and proficient and it is common knowledge that you can’t blow your nose at a nuclear power plant without having to follow a procedure.

  4. Probably should see Mahaffey’s new book: “Atomic Accidents” for comparison.

  5. Alex:

    Could you be a little more specific about who should see Mahaffey’s new book and what they should be looking for?

  6. WIlliam B. Cheney, III

    In 1976, as a Savannah Plant Reactor Physicist and former Reactor Supervisor, I interviewed at TMI for the position of Lead Reactor Engineer for TMI II. Coming from The SRP reactors, with their Graphic Panels for Nuclear and Hydraulics at separate ends of a 20 X 40 control room, I was astonished by the TMI II Control Room layout. I exclaimed at the time and I am sure it cost me the job: “I don’t see how you will ever train anybody to operate this thing!”
    Books have been written about the dysfunctional layout of that control room and the part it played in the early part of the Incident. But I maintain as does Rod, the operators were lead down the “primrose path” to making the wrong decisions even though the procedures said they were right.
    We must always ensure that the basis of our decisions are based on actual information, not the assumptions that management thinks are right.
    On a personal note: I was in NYC when TMI II started. The aircraft that I took the next day was diverted from overflying the site out of fear of the “explosion” that was sure to come. 18 years in the industry and I knew rom the first that the “Official” reports were a white wash.

  7. I wish to remain anonymous

    Mr. Derivan, if you experienced the same event today as you did in 1978 and the procedures and training were the same, and your actions were the same and you prevented an accident, you’d be fired for violating the procedure, or at least be removed from the control room and demoted. The NRC would likely hand out violation for not following procedures and INPO would rate your plant as an INPO 3 because the operators don’t follow procedures.
    Your “aha moment” would be a career limiting event if you acted on it. INPO might ding your maintenance and engineering departments for the failed PORV, too. They would also probably put your training program on probation because you didn’t follow your training.
    You would have the warm fuzzy feeling that comes from preventing a core altering event.

  8. Anonymous, I can sympathize with your predicament and add that your perception seems uncomfortably common especially among operators. What concerns me more than your example is the opposite case, of compliance with a procedure known to be deficient for the event at hand, solely out of fear of retribution. It represents an unhealthy working environment at the very least if not a slippery slope to undesirable consequences. It is certainly an issue that merits an avenue for further open dialog by all concerned.