As data scientists, we face a variety of problem types. One of our critical challenges is identifying the proper methodological approach to solve each problem. By doing this, we avoid force-fitting the wrong tool to solve a problem, and we avoid having to reformulate a question to fit a specific method.
One way to characterize problems is by the type of results that answer the question at hand. For many data science questions, we can use predictive frameworks. Tasks such as image classification, churn prediction, or product matching are can be tackled by models that make predictions. These are the types of problems where traditional machine learning shines, as it can capture the complex relationships between a set of features and an outcome. Prediction models are best suited for problems where the end user is not as interested in why something is happening or likely to happen.
An alternative to prediction frameworks is causal frameworks. Problems fit into causal frameworks when the end user wants to know what will happen if they make a particular decision. In other words, can we provide insight on what will happen to outcome Y if our business partner decides to change input X? These scenarios are problems that require statistics and econometrics. Outside of some nascent research, causality is where traditional machine learning methods tend to come up short.
When we are building solutions, one of the first topics we discuss is the appropriate modeling framework. We do not want to use a causal model to make predictions if we are not concerned about the “why”, because these models cannot capture the complex relationships that may exist in the data. Conversely, we do not want to use standard machine learning models to make predictions when we are most interested in how our decision to change X will influence Y.
Digital Advertising: An Example of Causal Modeling
Like most e-commerce companies, Wayfair chooses to advertise to our customers through digital display ads. When we purchase advertising, we like to think in terms of the causal impact. Ultimately, we want to know how a specific ad positively impacts a customer's impression of Wayfair, as well as their browsing behavior.
When buying ads, we sometimes have the opportunity to purchase many ads for the same customer in a given day. As we strive to deliver a superior customer experience, we realize that part of the experience is off-site interactions with the brand. Exposing a single user to too many advertisements could be a bad experience and lead to unfavorable impressions of the company. Negatively affecting a customer's perception of Wayfair is the exact opposite of our goal with advertising!
The video below walks through how we combined a novel experimental design with an econometric technique called “instrumental variable regression” to quantify the marginal causal impact of digital advertising. By using this technique, we capture the exact point at which the number of ads we are serving to a person becomes too many. This has allowed us to increase the efficiency of our marketing spend and deliver a more delightful customer experience.
Check out the below video from last year's Open Data Science Conference for more!