INTRODUCTION

The purpose of marketing attribution is to quantify the influence each advertising touch point has on a consumer’s decision to engage with a brand, make a purchase or perform any conversion event relevant to the marketer.

Attribution allows marketers to optimize media spend to conversions and compare the value of all advertising channels across the entire path to conversion and eliminate the inaccuracy of analyzing data from siloed channels or utilizing subjective measurement.

The most common attribution modeling technique in digital adverting is the Last Touch Attribution Model, where 100% of the credit for a conversion is provided to the most recent ad impression or click. Simple to both understand and compute, Last Touch ignores all preceding events that may deserve partial credit.

Advertisers seeking deeper analysis have relied upon statistical analysis of raw data which has been problematic for both advertisers and vendors as due to Viewability, they inadequately value certain digital inventory; due to different devices, unintentionally treat a large majority of data as ‘non-converting’; due to not including offline media, completely disregard a large dataset; and due to an inadequate model refresh rate, they end up with analysis that will not work in today’s dynamic media environment.

Accordingly, there is a need for more sophisticated models which not only have a high degree of accuracy in correctly classifying a user as positive (with a conversion action) or negative (without a conversion action) but also accurately determine current influence and predict the impact that advertising touch points have on a consumer.

METHODOLOGY
C3 Metrics utilizes a proprietary tagging infrastructure that collects customer journey data on all converting and non-converting users. The journey takes the form of the traditional A-I-D-A “purchase funnel” of Awareness, Interest, Desire and Action; represented within the C3 Metrics algorithm as O-R-A-C, Originator, Roster, Assist and Converter.

The significance of this journey is that different marketing touchpoints combined with specific dimensions (tactic, messaging, campaign, etc.) influence customers to move down the funnel where some users will convert, and others will not. For example, television with a specific creative might be the most influential driver in on Roster, yet with a different creative may be better at building awareness in the Originator, while another creative may lead to users not converting.

Organizing this data in preparation for modeling begins with collection of all paid, owned, and earned media at a granular level leading to a user’s date and time-stamped path.

Digital data is added via the C3 Metrics proprietary tags; Television and Radio via Inscape feed or post-log data with the algorithm correlating spots times to jumps in navigational search traffic (direct navigation, brand search or SEO) when compared to a rolling same day/time 4-week average; Direct Mail via CRM connection and Print via proxy URL’s.

An example path may appear as follows:

Standard statistical modeling and other Attribution vendors perform a similar data organization process and then incorporate data validation routines limited by their ability to have a single source of data and investment in sophisticated data validation tools.

Modeling with this ‘organized, but not fully-validated data’ is typically done via regression or other methods (game theory, vector autoregression, multi-dimensional regression, etc.), including non-converting paths , but yields questionable results for attribution purposes as the data set contains multiple errors without deterministic data validation. Newer methods comparing multiple models (AKA Ensemble) also yield errors without deterministic data validation.

Data validation is the process of ensuring that a program operates on clean, correct and useful data and is intended to provide well-defined guarantees for fitness, accuracy, and consistency of the data.

In addition to pulling data from a deterministic single source via the C3 Metrics tagging infrastructure, the C3 Metrics algorithm incorporates the following additional deterministic data validation routines in real-time: a) Different Devices; b) Viewability; c) Navigational Search at the bottom of the conversion (‘Converter’); and d) Touchpoints appearing in the funnel which based on time cannot have any causal impact.


Different Devices
According to research, 90% of consumers move between devices to accomplish a conversion goal, with 67% of people using multiple devices sequentially to shop online.

Without connecting users to their different devices, the other devices would be considered ‘non-converting’ and adversely impact the model for those touchpoints leading to non-detectable errors as high as 90%. For example, if the converting device contained the following path:

And Device #2 was the following:

 

 

And Device #3 was the following:

 

 

Both the 2nd and 3rd device would be part of the non-converting or ‘negative’ data set for purposes of modeling.

The C3 Metrics algorithm incorporates cross device connection as part of data validation. The user’s converting device path would be automatically incorporated with the 2nd and 3rd path for a final path showing the following:



Viewability

According to research, greater than 50% of digital impressions are never seen with 69% not in view for ad networks and programmatic vendors. Upwards of 90% of the data utilized for digital modeling is impression data. Without determining viewability, the model results will have a large margin of error. For example, if the final conversion path was the following:

 

 

 

 

The ‘Originator’ on Day 1 is shown to be responsible for originating or building awareness.

In Digital advertising it is common practice for a vendor to purposely place an advertisement at the bottom of pages, which results in a higher profit for purchasing ads, yet results in ads which will never be seen. In this instance, there is a large degree of likelihood those unseen impressions could ‘cookie’ a large number of converting paths and receive undue credit.

The C3 Metrics algorithm incorporates deterministic real-time MRC accredited viewability as part of data validation. The user’s final path would automatically remove ads which have not been seen. The final path would show the following:

 

 

 


Non-Causal Touchpoints
With greater than 90% of advertisers currently utilizing a Last Touch Attribution method, it is common to see many touchpoints appearing in the funnel within the last couple of minutes or seconds prior to conversion.

In the above example, if the conversion event takes an average of 5 minutes to complete and advertiser data shows that the minimum time to completion is 3 minutes, the Converter (‘Facebook’) which appeared in the path only 1 minute prior to the conversion could not have had any influence and was likely the result of the user clicking over to Facebook upon receiving a Facebook notification.

The C3 Metrics algorithm incorporates a time-decay in the algorithm to remove touchpoints as part of data validation.

The user’s final path would automatically remove touchpoints where it would be impossible to have any causal impact based on the time difference between the touchpoint and conversion. In this example, incorporating time decay, the final path would show the following:

 

 

 

 


Navigational Search
According to research, nearly half of all digital conversions end in either Brand Search or Direct Navigation. Without correcting for this, Brand Search may receive an undue percentage of credit.

 

 

 

 

 

The C3 Metrics algorithm incorporates a navigational search component in the algorithm to determine if brand search or direct navigation is purely navigation and not causal.

In the case above, the Brand Search is the result of the user viewing the TV spot and Brand Search acts as navigation only. In this example, incorporating navigational search , the final path would show the following:

 

 

 


Probabilistic Data Validation
Without incorporating deterministic data validation, the following data would be presented to a model:

In this instance, the Display ad on Day 1 and the Facebook on Day 22 would be seen as valid Touchpoints within models provided by vendors not utilizing deterministic data validation. In addition, the TV ad on Day 22 and the Brand Search on Day 12 would be treated as negative or non-converting.


Deterministic Data Validation
With deterministic data validation, the final converting path is the combination of all the user’s devices, with inclusion of MRC accredited viewability, non-causal Touchpoints and navigational search as follows:

 

 

 

 

The final converting path appears as follows:

 

 

 

Both the Display ad on Day 1 and Facebook on Day 22 are not included, as they are not causal to the conversion and the TV on Day 22 and Brand Search on Day 12 are included.


Conclusion
Data validation is a crucial tool for every business as it ensures your team can completely trust the data, they use to be accurate, clean and helpful at all times. Making sure the data you use is correct is a proactive way to safeguard one of your most valuable, demand-generating assets.

Regardless of the modeling methodology, utilization of probabilistic data validation routines leads to channels and tactics which are in actuality providing value (such as TV in the example provided) receiving little to no credit and channels and tactics providing no value (display ads not seen) receiving significant credit.

With industry data of 50% of ads not in view and 90% of data analyzed for multi-touch attribution being impression-based, utilizing probabilistic data validation routines can lead to a 45% error rate in attribution outcomes.