We have an unimaginably large supply of data, but the demand is lagging behind. Stefaan Verhulst, Co-founder and Principal Scientific Advisor of The Data Tank, explains why collaboration will be vital to allow data to benefit society as a whole.
© IE Insights.
Digitalization combined with computation has increased the availability and the supply of data quite often what we call datafication. Every transaction has generated data. We not only have more data, the data we have is also more varied. We have telephone data, we have grocery data, we have data that we collect through traditional means. Digital data can be reused without loss of quality or without too much cost involved.
Data that was collected for grocery purposes, we can actually use that data to understand food supply, food habits. Despite the supply of data and despite the potential of reusing that supply for public benefit, the supply, however, is not distributed in a way that can benefit society in a meaningful way. So we have a massive gap between the demand for data and the supply that is massive, that somehow is not matched with the demand.
And so the real objective is that we need new partnerships, new collaborations to match the demand and the supply in ways that can benefit society.
While we have seen the creation of data collaboratives, we still need to make a lot more work on making them systematic, sustainable, and responsible. More systematic, by which I mean we have to go beyond the pilot and we have enough pilots that we have seen that never scale. We have to make them sustainable because quite often what we see in the space of data collaboration is that there is an assumption that there is no cost involved in actually establishing those data collaboratives.
And more importantly, we also need to make data collaboratives responsible. There are five important ways to make data collaboratives more systematic, sustainable, and responsible. The first one is that we actually need to strengthen the demand for data and the demand for data collaboratives. Too often we see that people are knocking on the door of, for instance, private sector actors requesting data without really formulating a question that matters or without really formulating the problem.
And so one initiative is called the 100 Questions Initiative. By which I mean, what are the 100 questions that matter to society for which we actually can reuse data that is already existing so that we are becoming smarter about those communities? The second element is that we also need to strengthen the supply. By which I mean is that we need new professions that can really understand how to set up those data collaboratives.
And so what I’ve been calling for is the creation of Chief Data Stewards that really understand how to steward data in the public interest that was collected for other purposes, but also how to formulate questions. The third element is really that we have to then start thinking about what are the incentives for corporations to provide access to data, to really then collaborate.
We call it the 9 Rs, and it ranges from reciprocity, by which I mean is that you share something, you get something back, and that something back could, for instance, mean other data sets, but it also could mean access to expertise that you don’t have in-house. Other reasons could be, for instance, reputation, providing access to data can become an important ESG or an important corporate social responsibility objective.
The fourth one is that because we are talking about reusing data that was collected for one purpose, especially in the case when it involves personal data. And so we need what I call a social license that really indicates what are the preferences from the data subjects, what do they want the data to be used for, and under what conditions.
And then the last point is that we actually need more data about data. Quite often we don’t measure what works. And so we actually have the opportunity to become more data driven. We can make a difference in society by matching the demand for data with the supply of data that has massively increased. We would also prevent the missed uses of data because we cannot figure out how to provide access to data for reuse.