Why Michael Stonebraker may not be right (at least not all of the time) when it comes to data
Michael Stonebraker, ACM Turing Award winner, has been consistently right about many things to do with data for over half a century now*. He takes issue, for example, with data warehousing, and the expense and limits of scaling this approach for anything outside of its original 1990s remit of ‘unifying customer-facing sales data’. He calls out the futility (‘trainwreck’) of trying to create a single, perfect data schema, or ‘map of what this data means’, when the insight we’re striving for is less about the contents of a transactional system, and more about the context of the volatile, changing business that the system is running.
And he’s also right, as always, to put the caveat around his complaints, pointing out that many of these technologies and methods of data are not bad per se – it just depends on what you are trying to achieve, and how deep your pockets are.
I’ll apply the same caveating trick to my view of Stonebraker’s own suggestion of ‘schema last’ technologies. The idea is that, instead of creating a data schema up-front, you combine and enrich likely looking data, using machine learning, and let the schema emerge. This also is not a bad thing, per se. In particular, it hits the big themes of decentralisation of control, iteration and experimentation. However, depending on what you are trying to achieve, it could be very wrong.
The reason? Across many industries, there is an emerging imperative for people to take control of how their part of the business is operating, and collaborate quickly and effectively with other people on how best to achieve the outcomes they can’t achieve on their own. IT functions are trying to act as business partners, and see how services improve line of business outcomes. The line of business is trying to understand how they serve their customers with digitised offerings. Finance departments are being encouraged to go beyond financial planning, and show how they can contribute to how the business serves its customers. At times it looks like every business (even behemoths like the UK’s Ministry of Defence) is striving to be a platform business, with partnering going on at all levels through the organisation to rapidly achieve a market advantage – or even just better margins.
To work this out, and see how best to change operations to better achieve outcomes together, you need a reference point that is separate from the data coming out of the systems that are involved. The insight that people are trying to create and share is into the parts of business that they see around them, and most people just hold this in their head as ‘this is how I think this business works’. We could rely on people talking and explaining to each other to share their understanding. But all the evidence is that that doesn’t scale - too slow, too error-prone, and too much cost.
For this, neither schema first nor schema last is the answer. Rather, it’s the focus on data schema itself that misses the point. The answer is 'business model first', where that business model is materialised out of data, and shaped from the shared understanding of people who are being explicit about their insights, visualising, analysing and sharing the understanding of what drives what across operations. That business model then drives the development and machine learning of data connections and transformation.
The battleground has shifted from data – this is commodity – to what the business is actually trying to achieve. And this is the area in which we should be redoubling our efforts in collaboration, machine learning and automation. We need data systems that can respond to users who say, “This is what I’m trying to achieve – now tell me something I don’t know about how my business really operates, to let me be a better partner with all those people and organisations I need to work for or with’. That’s the right approach, if what you’re trying to do is actually be a better business. And who isn’t trying to achieve that?
* Too many references to cite, but see for example this from his latest venture: "Practical strategies for data unification, with Dr. Michael Stonebraker"
Please sign in to leave a comment.
Comments
0 comments