Markov and Shapley: Marketing attribution modelling at interaction level or path level17 January 2021
Markov chain and Shapley value: In the field of marketing, first-click, last-click or multi-click models (linear or bathtub) are frequently used as attribution models to measure the success - or profitability - of individual marketing channels (SEA, SEO, e-mail, etc.). What all these models have in common is that they divide the value of a conversion (a purchase) in a predefined way between the different steps in the list of a customer's interactions. For example, if one uses a last-click model and the customer has purchased a product after clicking on a Facebook ad, the value of this conversion is attributed to the "Facebook" marketing channel, regardless of any previous interactions by the customer. In the case of multi-click models, the value of a conversion is allocated to all interactions associated with this purchase according to predefined weights. However, the weightings in all these models appear ad-hoc, as they are not based on precise data analysis, but are determined manually in advance. This is where the dynamic attribution models come in. Instead of assigning arbitrary weights to the interactions in advance, dynamic attribution models determine these weights independently based on the data provided. The Markov model and the Shapley value model are considered state-of-the-art models. Both will be discussed in more detail below. Both are designed to produce results on the level of channels. To obtain results at the interaction level or path level requires some extra work. The latter is not within the scope of this article, but we will be happy to help you with it. Feel free to contact us for rates, we can implement the Markov and Shapley models for your business on a short notice.
The fundamental object in the Markov model on which the entire analysis is built is the Markov graph. For a given data set containing the interaction paths of different customers (e.g., Start -> Facebook -> Remarketing -> Conversion/Purchase), this can be constructed as follows:
- All marketing channels in the data set as well as the terms "start", "conversion" and "null" (no conversion) are written individually on an index card and then spread out on a flat surface.
Now you go through the interaction paths of the individual customers and illustrate each of them by connecting the corresponding index cards with arrows.
If we choose an example with a total of three marketing channels and four different interaction paths, the resulting graphic could look like this, for example:
Then, for each marketing channel, count the number of arrows that lead to a fixed other marketing channel and
divide this number by the total number of arrows starting from this node. In the example above, you get:
Since out of three arrows starting from the Facebook channel, two point in the direction of Google and one in the direction of remarketing, you get the numbers 2/3 and 1/3. These numbers represent the so-called transition probabilities, i.e. for a customer who has just clicked on a Facebook ad, the probability is 2/3 that the next stop in his interaction path is Google, while the probability is that the next stop is Remarketing is 1/3.
The last step is to calculate the overall probability of a conversion.
In the above example, there are three Markov paths that lead from the starting point to a conversion.
For example, if we look at the path Start -> Facebook -> Remarketing -> Conversion, we get 3/4 * 1/3 * 2/3 = 1/6 for its probability.
Correspondingly, the other two paths are 1/9 and 2/9 respectively, so that the total probability of a conversion is 1/6 + 1/9 + 2/9 = 1/2.
To quantify the success of the individual marketing channels, each one of the three marketing channels is now removed in turn
from the Markov graph and the calculation repeated.
The success of the channel is then calculated using the formula
Even without explicitly following the mathematical explanations, it is intuitively clear that the above Markov graph can be used to sort the marketing channels according to importance. If, for example, the remarketing channel is removed, none of the three converting Markov paths remain. This channel is therefore essential in the example discussed here. If, on the other hand, Facebook or Google are removed, one converting Markov path remains in each case. These two channels are therefore somewhat less important in comparison. In this way, the success of individual marketing channels can be precisely determined on the basis of customer data, without having to make further assumptions in advance.
Shapley value model
An alternative to the Markov model is the Shapley value model.
The Shapley value is a concept that belongs to the mathematical subfield of game theory.
The basic problem solved by the Shapley value is to distribute profits fairly within a game in which players can form coalitions.
However, the concept of Shapley value can also be applied to the attribution problem.
Here, the different marketing channels are seen as players who typically interact in a customer's interaction path, i.e., form a coalition.
The conversion rate, for example, can be used as the profit to be distributed.
This is determined from the customer data by counting the number of conversions (and non-conversions) for each theoretically possible interaction path.
The number of conversions is then set into relation with the overal number of appearances of this interaction paths,
i.e. the quotient (number of conversions) / (number of conversions + number of non-conversions) is computed.
Put simply, the Shapley value model quantifies the impact of a particular marketing channel
by comparing interaction paths that differ only in the presence or absence of the channel under consideration.
To illustrate the point, consider the following examples:
Both interaction paths obviously differed only by the presence of the "Video Branding" channel. However, while the upper interaction path has a conversion rate of 10%, the lower path only has a conversion rate of 6%. The "Video Branding" channel thus increases the conversion rate by about 67% in relative terms (4% in absolute terms) and it receives a high Shapley value as a result, since it generates substantial added value. In the second example, the situation is exactly the opposite: In relative terms, the "Display" channel obviously only ensures an increase in the conversion rate of around 2% (0.2% absolute). The Shapley value of the "Display" channel is therefore significantly lower than that of the "Video Branding" channel, as the added value generated by this channel must be regarded as marginal.
In the following, the exact definition of the Shapley value will now be discussed.
In mathematical terms, the Shapley value for the individual marketing channels is determined using the following formula:
Here, the marginal contribution of channel i to a coalition is defined as the difference between the profit that a coalition S generates together with channel i and the profit it generates without channel i, i.e.
Here, G is the profit function and S is a combination of marketing channels which does not include channel i. So in the above examples, the marginal profit is given exactly by the differences of the conversion rates, i.e. 4% in the first example and 0.2% in the second example. However, to calculate the actual Shapley value of a channel, not only one interaction path is considered. Rather, the Shapley value is determined by calculating the marginal contribution of the channel under consideration for all possible coalitions and then taking the average. Thus, the other factors in the previously stated formula only ensure that the average is correct. In other words, the Shapley value reflects the average added value generated by a marketing channel.
One subtlety in the Shapley value model concerns the profit function G. This is not predetermined by the definition of the Shapley value but can be determined independently. In the examples given, the conversion rate was defined as the profit in the sense of the Shapley value. Alternatively, however, the net revenue generated by the conversions can also be used as the Shapley profit. Often, the latter version is a suitable choice. If the primary focus is on determining the cost-to-sales ratio, for example, the net revenue is in fact the necessary quantity. However, an adjustment of the profit function is possible at any time and a comparative analysis regarding different profit functions can generate further valuable insights.
The primary advantage of dynamic attribution models is that they produce more accurate results than the heuristic models, which include the Bathtub model. The reason for this is that both the Markov and Shapley value models do not perform their calculations using arbitrarily determined weights but determine the weights of the steps in the interaction paths independently on the basis of the data. Consequently, they eliminate human influence on the analysis and are therefore clearly superior to heuristic models. In particular, when it comes to calculating the cost-to-sales ratio for very expensive marketing channels such as Search Engine Advertising (SEA), the use of models with the highest precision seems almost imperative, as any misjudgement can lead to unnecessarily high follow-up costs.