What a rocket scientist taught me about how far you get on data alone

Back when I was studying for a PhD, proving why control software was guaranteed to do the right thing, or (very hard) not do the wrong thing, I had a secret. I was a fraud. I wasn't really a mathematician. I had dabbled in law, and studied literature, and was somehow hoping that a good grasp of argument and narrative would carry me through the formalisms and the proofs. After all, it's all just language and semantics, isn't it? But I wasn't convinced, and nor was my supervisor. Despite my reading as much as I could, paying attention at the departmental seminars, and each week correcting the red ink that he spread across my contributions, my supervisor took the drastic step of sending me to the most prestigious of conferences on safety critical systems, with speakers from places like MITRE and NASA - real rocket scientists! The message was clear.This was it. I couldn't possibly say that I wasn't being given every opportunity to learn, to become a proper Computer Scientist.Screw up after this, and you're out.

The talks were great. Researchers, practitioners, case studies, real examples. Two days of debate and discussions.Meeting people who had written things that had previously seemed impenetrable to me, and realising what they meant. Suddenly I saw that I wasn't as far behind as I thought. It all made sense. I could see how what I was doing was relevant, interesting even, to others in the community. It was all going to be alright.

One of the last talks I went to was all about a spaceship - a popular session given on-going controversy around the Space Shuttle.The researcher was talking about how they had carried out the most extensive modelling exercise on code - transforming into pure logic, proving out properties. This had helped the rest of the team find critical problems that, had they not been discovered, would have had disastrous consequences.The kind of talk that makes you feel good about being part of this community of people doing such great and important work.

And then something happened that, I've since discovered, is really not a normal occurrence.

"Are there any questions?"

A man stood up - he was very possibly a rocket scientist - and took the microphone, and slowly, clearly said, "That is not how it happened." And I, and everyone else in that audience, sat up just a little.

He continued. "Let me be clear. The only reason you found those problems is that I told you they were there, and I told you where to look".Silence in the audience. You have never heard an auditorium so quiet. It took a while for the speaker to respond, and I don't really remember what he said. I was watching the rocket scientist, slowly shaking his head in quiet refutation of whatever it was the speaker was saying.

After the talk, I didn't get to find the rocket scientist, or to ask the questions that were only just taking shape. Over the next couple of talks, the event somehow, nervously, got back into its rhythm in time for the close.

By the time everyone was leaving - discussing their plans, and the next events they were planning to attend - you could have imagined that it never happened.

But it did.

I've often thought about what went on in that project - what questions I would have asked, if I had had the chance. What was missing from the work that meant that it required human intervention to guide it to the right answer? What context around the code did the rocket scientist understand that somehow wasn't in the model that got created? If that was what was going on, why wasn't this acknowledged, or made part of the process? Why did the 'technical answer' seem to be fighting with the 'human insight', rather than leveraging it? And why the cover-up, if that doesn't seem too dramatic a way of putting it?

Over the years, I've come to recognise the same dynamic playing out in the work that I now do, around modelling, data and analytics. Although I went along with the formalisms and the modelling (and didn't get thrown out), I've always been very attuned to the risk that data, or a model, might actually be missing something that is key to being able to pinpoint the issue, or get to the resolution that people need.

One obvious example is that in much of so-called management information, the systems that are gathering data are doing it because they were designed to process transactions, not to monitor the state of the business. So long as we're interested in analysing transactions, that's fine. But don't expect definitive guidance on how the business is really operating. In this case, the question we're asking has a wider coverage than the data. That doesn't mean the data isn't helpful - it's just not necessarily giving me everything I need - there's work still left to do.

Another example - digitisation is introducing great opportunities for businesses that can exploit the processing and analysis of data at greater scale and speed, to achieve and evolve new business models - such as a faster analysis of customer preferences leading to a new way of selling a product, or delivering a service. But the questions we now need to ask are also getting more sophisticated - are the changes that we are making to the operating model likely to get us more quickly to the outcomes we need? Is that actually happening in practice? Have we got the balance right across costs, innovation, continuity? These are all questions that require 'more work' on top of the data, because the data doesn't have a wide enough coverage, and in some cases never could - we need estimates and predictions regarding cause and effect, and need to account for processes not under our control as well as our new digital innovations.

I've also come to realise that these issues can be hard to spot.

In the best case, people are very good at compensating for what is missing, and are usually quite generous to the experts that are serving them.They put in the effort to communicate and elaborate around the data, the results and the reports, piecing together their collective intelligence, carrying out the mental transformation of data into a model of the business, until confidence levels get to an acceptable level.Â

But in the worst case, no one notices that there's a problem at all.

By way of closure - twenty years later, and I'm at an event on data visualisation, and one of the talks is about how all sorts of analysis - whether visual, statistical or learning - can be much more effective if, rather than being based directly on a dataset, they're based on a model, a digital representation of, a business, in which many, many people have had a hand in shaping and evolving, that they can interpret, challenge, and contribute to. The rocket scientist would have been proud - explicitly putting people, and their understanding, at the centre of the process, rather than covering up their involvement.

"Are there any questions?".

Someone takes the microphone, and says, "Look, all we need to worry about is you get your data, put it into a single database, and that's it. What else is there? What is it that isn't in the data"

Silence in the audience. He's got a point, hasn't he?

This time I do remember what the speaker said - because it was me.

I said, "We were all listening to a speaker earlier, who had some great visualisations of data."

It really was great work - visualising the occurrence of take-off and landing accidents, and identifying correlations.

"But we didn't see any answers, because the question we were asking - what causes the accidents - had a bigger coverage than the data. What we did was spend 20 minutes of that talk having it explained to us what we were looking at - whether it was on the perimeter of the airport or not, what sort of pilots they were, and is that profile likely to change, what kinds of things are likely to contribute to accidents, what was different about those airports, until gradually, all the context that was in the head of the person who created the chart was out, shared and explicit, and we could start seeing where the data fitted in all that, and properly discussing the implications. That 20 minutes' worth of context - that's the 'what else'. That's what you don't get in the data. And without that, what could we really have achieved?"

And he said, "Well, I'm running a data integration project, and so long as the data's good quality, that's what matters."

I looked around the audience, to see if there might be a rocket scientist somewhere who could back me up on this, but no such luck. I had to make do with the sense that he was there with me, up on stage, slowly shaking his head.

Written by Simon Smith.

Adrian Gill

March 26, 2019 10:22

Data for the sake of data, EA by EA's for EA's and even Financial Management by and for Accountants... All of these are missing the context, maybe even redacting the context in support of their continued relevance? I cannot say for sure... as I do not have the context that would cover all examples... but I do lean in that direction..

Great read, Thanks!!

Chris John

May 28, 2019 12:03

Excellent - I can't believe I've missed this for over a year. I wonder if you'd give permission to repost this on our Intranet?

What a rocket scientist taught me about how far you get on data alone

Comments

Didn't find what you were looking for?