1  What Does it Mean to Be Data-Driven?

What Does it Mean to Be Data-Driven?

In today’s tech-driven world, data is king. Every click, swipe, and search generates a breadcrumb of information. Alas, although most decision-makers want to be data-driven, data does not speak for itself.

So, what does it mean to be data-driven? At its core, it’s about using data to inform decisions, not just describe them. It’s about moving beyond correlation – the “what goes with what” – and understanding causation, the “why” behind the patterns we see. To be truly data-driven, there must be some level of evidence that the data can provide that would make you choose a different path than the one you would have otherwise taken.

This is where causal inference steps in. Causal inference is the science of drawing cause-and-effect conclusions from data. It allows us to answer questions like:

Causal inference is the missing piece of the data-driven puzzle. It lets us move beyond correlation and identify the true drivers of business outcomes. Impact evaluation builds on this, putting numbers to the effects of a program, policy, or intervention. Think of it as measuring the impact of a specific business decision.

Data can also be used to improve the ongoing operations and effectiveness of a program, a process known as program improvement. This involves continuously collecting data on how the program is running, identifying any bottlenecks or areas for enhancement, and making adjustments as needed. Think of program improvement as an ongoing feedback loop, constantly refining and optimizing the program based on real-world data.

Now, let’s delve a bit deeper. Imagine you’re a decision-maker at a social media company pondering a new feature. You have data showing that users who engage with the feature spend more time on the platform. This is a correlation, but it doesn’t tell the whole story. What if those users were already naturally the most engaged?

This is where the concept of the counterfactual becomes crucial. The counterfactual is what would have happened if we hadn’t implemented the new feature – it’s the potential outcome had we not made the change. While Jerzy Neyman hinted at this idea in 1923 (see Neyman 1923), Donald Rubin fully developed the concept in the 1970s (see Rubin 1974; also Rubin 1978) . Given that we can only observe one potential outcome for each unit, the counterfactual is inherently missing data. Hence, causal inference can be viewed as a missing data problem. For a review of variety of causal inference methods from this perspective see Ding and Li (2018).

Choosing the right counterfactual is critical for drawing valid causal conclusions. The wrong counterfactual can lead to misleading results and potentially disastrous business decisions. We’ll explore these challenges and different approaches to constructing counterfactuals in the coming chapters.

By understanding causal inference and the importance of counterfactuals, you’ll be well on your way to leveraging the true power of data to make informed decisions for your business. However, choosing the wrong counterfactual can have serious consequences. Here are some classic examples:

Remember, in order to design a good study to inform decisions, we need to know which decisions we are trying to inform. This clarity about the decision at hand allows us to choose the right counterfactual scenario for comparison. By carefully considering potential outcomes and constructing strong counterfactuals, we can leverage the power of data to make informed choices and drive better business results.

Learn more

Li, Ding, and Mealli (2023) Bayesian causal inference: a critical review.