• United States



The great IT risk measurement debate, part 2

Mar 02, 201114 mins
Data and Information SecurityROI and Metrics

IT risk—can it be measured, modeled, mitigated? Part two of Alex Hutton and Douglas Hubbard's discussion covers likelihood statements, the placebo effect on risk perception, and much more.

For the beginning of this discussion, see part one. We left off with Alex Hutton pointing out the problem with creating rigid risk management requirements when the measurement of risk is flawed to begin with.

The players:

Alex Hutton is research and intelligence principal at Verizon Business and was previously CEO of Risk Management Insight.

Alex Hutton: We need something that’s much more flexible, where we have dozens of models for dozens of uses in the management of risk that may be informative independent of [identifying] likelihood and impact. One of the greatest things that happened to me moving from Risk Management Insight over to Verizon was that Verizon had a completely different, epidemiologically based view of risk—”Let’s go to the source of things that were incidents, and let’s do a study.”

Dr. [Peter] Tippett [vice president of industry solutions and security practices at Verizon Business] is a medical doctor and a Ph.D., with years of epidemiological experience, so his approach was, “Let’s start gathering data. Let’s build the framework by what data we need to extract out of that, what’s meaningful and so forth.” And this just blew my mind—a completely different view of risk and what creates it. And so I would rather, as I said, that people focus on gathering data and then looking for interesting correlations and using those to be actionable, rather than worrying about whether or not their likelihood statements are really reflective of reality.

“I would rather, as I said, that people focus on gathering data and then looking for interesting correlations and using those to be actionable, rather than worrying about whether or not their likelihood statements are really reflective of reality.”

Alex Hutton

At some point I think we can get to that, and probably much sooner rather than later, but right now, no standard is helping anyone do anything that would stand up to any scrutiny in any of your books. [There’s no standard] where, in my eyes or in the eyes of any professional, really, you would say, “Aha! You have a decent model.”

Douglas Hubbard: There’s a couple of ways to compare these sorts of things [such as security incidents]. One is, if you look at a whole system, can we take a whole bunch of examples of different organizations using some methodology, and can we have a large enough number of trials over a longitudinal study that would show that you’re better off with one method than with another, by looking at the shareholder value of the firms and things like that?

Read much more CSOonline coverage of critical issues in security metrics

Since there hasn’t been anything like that for infosec yet that I’m aware of, the other approach is component testing: We can at least be sure whether or not the gears of the clock work.

Hutton: That’s where I was going to go. That’s one of the reasons why I think everybody should read your book rather than worry about getting an industry certification if they’re really interested [in risk management].

Hubbard: Look at likelihood statements. There is such a large number of likelihood statements for an individual. I mean, if you start capturing all of them over the course of a year, in a year or two an organization will have a very large number of likelihood statements for a large number of analysts. And if you looked at all the likelihood statements and someone was saying that “these things are 10 percent likely per year,” well, about 10 percent of those events should have happened in the course of the last 12 months so.

Even though that’s not a measure of the performance of the overall system quite yet, it is a measure of something we do very frequently. So likelihood statements, even subjective ones, have an objectively measurable performance, because if I say that a bunch of things are 80 percent likely, if I’ve tracked them over time, about 80 percent of them should have occurred. And if I say that a bunch of other things are 2 percent likely, I would have to look at a large number of them and see that only about 2 percent of them occurred, so we can empirically verify subjective statements.

Part of the problem is, though, that there are other sources of error in the NIST 800-30 and the ISO standards, and [with] Cobit and Risk IT, there’s other errors introduced by those methods. The methodologies are developed in isolation from what we know about how people respond to ordinal scales and the quirky things that we do that affect our responses to these sorts of questions. They’re not even based on everything we knew before that about risk assessment, like from actuarial science and so forth. It’s almost as if someone said, “This is such a unique field; I have to invent this from whole cloth. Nothing else can teach me anything about risk management because risk management is so unique here that I have to just make it up.”

Hutton: They were just influenced by management consultants, as you’ve noted.

“There is a real placebo effect that has been measured in control experiments: We can observe that an analysis behavior improves confidence even though it’s actually making decisions and forecasts worse&.Law enforcement officers going to lie-detector training came back more confident in their ability to detect lies, but they actually did worse than people who had no training at all.”

Doug Hubbard

Hubbard: I think you’re absolutely right, and I’m speaking as a reformed software management consultant, because I did that at [Coopers & Lybrand] for a little while. Most of my projects were actually quantitative, but some of the stuff was the fluffier stuff that we got involved in. But it allowed me to see how commonplace that was in analysis of management problems. So one of the webinars I do now is called “management placebos,” and I don’t pick on infosec necessarily, but it’s one of several different areas I talk about where there appears to be a real analysis placebo effect. It’s a measurable effect. There is a real placebo effect that has been measured in control experiments for various phenomena: We can observe that an analysis behavior improves confidence even though it’s actually making decisions and forecasts worse. So if you ask people, “Do you feel better about your decisions after doing this analysis?” almost everybody would say yes. Well, it appears that the placebo effect is very strong there.

Now, in the pharmaceutical industry, of course, they have to assume the placebo effect in every drug they test, because if they know that it does exist in some cases, they have to assume it exists in every case, and they have to prove it isn’t a placebo effect. That’s the standard that we should want to raise all risk analysis to—including risk analysis in infosec. We should prove that any perceived benefit from risk analysis is not purely a placebo effect.

So in some of my new material, and, actually, in the second edition of my first book, I listed some more recent research about how, say, additional data-gathering and collaboration at some point actually started to make decisions worse because they were just getting more data; even though their confidence was going up, their decisions were getting worse. There were instances of training where people felt better about their performance after training, but the training actually made their performance slightly worse.

One example is lie detection. Law enforcement officers going to lie-detector training came back more confident in their ability to detect lies, but they actually did worse than people who had no training at all. So since we know that this placebo effect exists, that means we have to start with the assumption that any of this could have just a placebo effect and that we have to prove that it isn’t just a placebo effect. But on top of that, there are known problems that existing methods don’t adjust for, like overconfidence. We know how to adjust for that, right?

There are other problems they introduce. In other words, it was a problem that didn’t exist in the unaided intuition of the expert, but this methodology creates the problem. For example, I like the way you said it: You’re breaking the laws of the universe by multiplying the ordinal scales. So you’re taking ordinal scales and you’re multiplying them together, and we’re thinking that the result is some meaningful number. Actually, when people have analyzed those things, they’re adding error to it, in some cases making decisions worse than they were before.

Hutton: In my talks I say, “What you’ve just done is you’ve multiplied jet engine by translucent and you’re saying your result is slightly faster than slow. Congratulations.”

“So the idea that I was trying to turn risk management back around to is that risk management and risk analysis themselves are processes that have measurable performance.”

Doug Hubbard

Hubbard: Right. It doesn’t mean anything. So the idea that I was trying to turn risk management back around to is that risk management and risk analysis themselves are processes that have measurable performance. So when I ask big rooms full of people, “What’s your single biggest risk?” and they all come up with different things, I say, “No, you all have the same biggest risk. The biggest risk is that none of you know that your risk management actually works.”

Some of them will argue that they do, and I’ll say, “OK, so you’ve measured the performance of your risk management methods or, if you haven’t done that, you’ve at least measured the performance of pieces of it, like the components with component testing?” None of them ever have.

Hutton: Well, in terms of infosec, there’s an even bigger problem, which is you can make likelihood statements all you want, but you cannot really test them until you know that the models you’re using give you good results. One every year, one every two years—those are applicable to those events. We don’t even know if that model is applicable to larger events. The problem, of course, is that in 36 months, you may have said it’s a one- and two-year event, right? Let’s wait and let’s sit there, and 10 years from now we’re going to check and we’ll see if it was really 23 times every 10 years. Well, two years later, Microsoft completely revamps their operating system, so you change from BlackBerrys to Apple iPhones, so now it’s beside the point. You can’t measure that.

The landscape has changed, and there are ways that we can try to address that, but it is difficult for people who are still thinking that they can multiply jet engine by peanut better and get something meaningful out of it.

Hubbard: A few months back I was speaking to a roomful of 500 nuclear engineers, and they run simulations to work out the likelihood of things like one-in-5,000-years events. Now, it probably doesn’t surprise you to know that they don’t actually wait around for the empirical data on that, so they deal with it in two ways.

One is they have more data than, say, a single reactor. Nuclear power hasn’t been around for thousands of years, but there are lots of reactors around the world that have been running for decades, and so there really are thousands of reactor-years of data.

Secondly, there’s tens of thousands or hundreds of thousands of component-years of specific components, because a particular plant will not just have one particular kind of pump. It will have multiple instances of that pump, and lots of other plants will have that same pump, so there’s actually hundreds of thousands of pump-years of data—or maybe millions, depending on how common diverse components are. So in a lot of those cases, they really do have quite a lot of data.

In my second book, one of the things I talk about is something called the Mount St. Helen’s fallacy: Just prior to the eruption of Mount St. Helens in 1980, a group of volcano experts and geologists got together and said, “I know that the north face is bulging, and sooner or later there has to be a landslide and it will uncork some pressure underneath it, but there is no historical evidence of a lateral explosion on Mount St. Helens.” So even in the face of the obvious visual evidence that there’s a big bulge growing inside, they said, “It looks like it’s going to blow up in the north but it’s never blown up in the north before and”—volcanologists have actually said this, and I quoted this one volcanologist saying it twice—”you can literally learn nothing about a volcano by studying other volcanoes, because each volcano is so unique.” In which case, my challenge would be, What do volcanologists know, then? What did they get their Ph.D. on?

Hutton: Individual mountains.

Hubbard: Yeah, did you get your Ph.D. just on Mount St. Helens, or…? So I actually say that’s a fallacy [that every volcano is incalculably unique]. My insurance company doesn’t have to look at a 48-year-old non-smoking male with my body mass index living in the lesser suburbs of Chicago in order to work out my risk. They look at 45-year-old males, 55-year-old males, they look at 48-year-old males who live in St. Louis, and so on, right? They built these big regression models.

I call it the fallacy of close analogy: Thinking that you have to have really identical situations to compare this to, and since each situation is unique, we feel like we can’t learn anything by looking at historical data. What I tell people is, no, let’s do the math and see whether or not, just using the data that you have, the math outperforms the intuition of individuals.

If there’s a 10 percent chance per year of some event, we don’t have to wait around for 10 years. We look at all the times that this person said something was 10 percent likely—they might have made 200 different predictions where they said something was 10 percent likely. Out of those 200, about 20 of them, plus or minus some statistically allowable error, should have come to fruition. Each single data point is not the size of data that we’re limited to. We look at all of that individual’s predictions. We’re asking the question, “Is that person calibrated?” We’re measuring that person’s skill at applying subjective probability assessments. That’s what we’re really measuring.

Hutton: Not to be self-serving, but I did want to circle back to two concepts that you mentioned in solving that. The first was the size of the data—are you unique, and so forth—and one of the problems in our industry is data sharing­. And then you mentioned breaking the larger system down into components and using evidence based on the components to suggest a more accurate outcome [about the] totality. This is one of the great things that I found when I joined Verizon a couple of years ago—this was their direction, the culmination of which has been our best community effort and the data-breach report. Dr. Tippett and Wade Baker [director of risk intelligence at Verizon Business] and my group were trying to foster data sharing and give people comparative analytics while respecting privacy—in some cases, even anonymity, although that adds a lot of uncertainty to the data. The outcomes are very component-based. It’s not very different from FAIR (Factor Analysis of Information Risk model), not very different from ISO 27005. Anyway, just a little plug for what we’re trying to do, and that’s in the New School blog.

Also see IT risk assessment frameworks: Real-world experience

Other folks, like Trustwave, have databases that are a great source of information. [Editor’s note: See CSO’s security data and survey directory.] A few things do exist, but again, it is up to our modelers to understand and create context. Unfortunately, these people are being told, “Go multiply these ordinal scales together, do that 40,000 times, and you’ve done your enterprise risk assessment for the year. Thanks very much—give us our $750,000 [fee].”

Hubbard: Plus the other thing is, I think modeling in the IT sense means something different than modeling in the empirical sciences sense, and I think that what you want is to use the term “modeling” as in the empirical sciences.