Over the past two years, we have been working on building a business intelligence, scenario planning, and optimization SaaS platform for marketers. MEP provides decision-makers with a single place to understand multi-channel performance, and perform “what-if” analysis of spend by channel.
A tremendous amount of care has been put into building MEP to be more than just a shiny app (no pun intended.) Each company (or business unit) has its own unique marketing mix and architecture, and each element of that architecture—channels, time granularity, cross-section (segmentation), upper- versus lower-funnel—has been parameterized in model objects. Model objects must validate using a JSON metadata file before they are displayed in the platform—and this provides real scalability.
For the first two years of its development, we were focused on critical infrastructure. We made steady progress below the water line, but there wasn’t a ton to show for it. Over that time, the team focused on integrating the front end with our big data back end (Databricks); user roles and permissions, to ensure that each user and client’s data were secure and private; building out our JSON metadata parameterization; adding support for all of the models and curve functions we use (Bayesian, GAM, etc.); and building out the critical tables and charts to understand marketing effectiveness.
Over the past three months, the ship has started to take shape over the water line, and it’s really impressive (it’s even more impressive knowing how robust the hull is—OK, I’ll stop torturing that analogy.)
Scenario Planning
We thought a lot about how to let managers visually plan different levels of marketing spend, and show what the results of these decisions would be. At first, we deployed a simple spreadsheet download/upload function. We thought this would be the most flexible option, but our users thought it was clunky (it was). So, we went back to the drawing board and came up with three different on-platform scenario planning options: Manual, Strategic, and Optimized.
Manual provides the user with ultimate power. In this approach, users interact directly with model dataframes in Databricks and then recalculate the scenario. This is particularly useful for our analysts, who are routinely running scenario after scenario with tiny changes in spend and mix in preparation for client deliverables.
Strategic is for business users who want to quickly get to “what if” answers. In the strategic pane, users can choose any input variable—spend, impressions, or controls—and change it, up or down, either by a percentage of a fixed amount, for any time period. The number of these changes have no upper limit, and if you make a mistake, you can delete it. Once you’re happy with a scenario, you save it, give it a name, and then send it back to the Databricks cluster to run.
Optimized is just what it sounds like: A user can optimize for, say, total sales in a given period, and then add a series of constraints. Once they are satisfied, the scenario is sent back to Databricks for computation. This can take a while; these models aren’t simple linear regressions, so we can’t use matrix algebra to solve for an optimum. Instead, our awesome team (led by Sam Arrington) built a two-stage program that searches for a macro solution, and then hones in on a local minimum/maximum. When the optimization is done, the user gets an email and can see what the answer is.
When doing this work, we realized that the days of simple “linear program” (think Excel Solver) optimization for marketing are over. We’ve entered a new phase, where advanced machine learning techniques are required, not optional. I don’t like using “AI” flippantly, but we have some of that in here, and it’s the only way this works as fast as it does. More to come on that in coming quarters.
Model Comparison
When we started down the path of scenario creation, we knew we needed an easy way to compare two models or outcomes. We went a little further than just allowing a user to compare two scenarios, however. We built a more robust method that allows a user to compare two of anything. The comparison looks both at overlapping channels and those that are only present in one of the objects—a full outer join, if you will. This allows a lot of flexibility—if you want to know how two different models look, you can do that, too. It’s basically a Swiss Army Knife for marketing data comparison, and will support many future use cases for MTA, testing, and basic reporting.
Multi-Stage Modeling
We spend a lot of time at Marketbridge making sure that upper-funnel tactics—like display, OOH, digital video, and social—get proper credit for their contributions to sales. To do this, we build multi-stage models, where upper-funnel tactics regress both on end sales and on so-called capture channels—typically branded paid search and affiliate.
To make this happen, models must be “aware” of other models—concretely, a dependent variable of one model is also an input (independent variable) of another model. Behind the scenes, this means that model objects have been built with metadata that attaches them to one another via variables.
At the same time, users should be able to visualize and link models together. In MEP, a user should be able to point a model’s output to another model’s input—potentially in an endless chain. We’ve added a neat visualization to make this happen.
Up Next: AI, APIs, MTA and Testing Integration, and Benchmarking
Our roadmap is really exciting for the rest of 2024 and into 2025. We’re working on a more integrated marketing measurement approach called ITMA (integrated testing, mix, and attribution) that takes the best elements of test-and-learn processes, econometric inference, and multi-touch attribution and integrates them into a single approach.
We are spending a lot of time building data connectors to the big publishers and platforms to get data into longitudinal human records (LHRs) and econometric data frames. Traditionally, the time to get data into a model has been the limiting factor for multi-channel attribution; our goal is to get this time down from months to hours. It’s a big job, with a lot of edge cases, but expect announcements on this in Q1.
AI is a big topic in effectiveness and attribution. Today, we use generative AI mainly in the code-building and model-construction phase. We have cut the time to write a function or method by around 80% using various AI copilots. The next big step will be integrating AI as “agentic search agents” looking for optimal fits and unexpected relationships.
Finally, benchmarking is a big ask from clients. What’s the typical CPA of branded paid search in healthcare? Is a ROAS of 120 good for a direct-to-consumer electronics company? What percentage of business should marketing be driving? Today, these answers are qualitative; we’ve done a lot of projects and “know” the answers, but we don’t have a quantitative database. The key to getting to this database is metadata and taxonomy. As I mentioned above, we’ve put a huge amount of effort into parameterization, so we will be starting a benchmarking service later in 2025 leveraging all of these data, at a channel and industry level.
That’s all for now on MEP. We’d love to talk to you about your marketing measurement and effectiveness challenges. Complete the form below to schedule a meeting!
Today’s marketing leaders are looking for instant, accurate, and complete insights from their analytics stack. Unfortunately, no single tool–whether Testing, MMM, or MTA–can be that golden bullet on its own. The solution is to combine all three approaches into one unified system. We call this ITMA, or Integrated Testing-Mix-Attribution.
In ITMA, we use each inferential method for what it is good for, in a partially automated, integrated data science environment:
(T)esting is good for precisely understanding incrementality
M(M)M is good for understanding long-run, non-marketing, and inter-channel effects
MT(A) is good for fast reads with unlimited granularity
This approach provides significant benefits to the marketing leader:
Immediate Results: Because results are built at a record level for each new sale, marketing leaders can understand channel, campaign, and audience attribution in real time via business intelligence dashboards.
Consistent Answers: Because stimulus, response, control, and audience data all sit in one data lake, consistency is baked in.
Confidence Estimates: Mean estimates are always shipped with upper and lower bounds, at any percentile. There is no limit to channel granularity; more channels mean confidence will decline but will re-narrow with time or testing.
Total View of Causality: Integration of upper-funnel brand-focused marketing—and its impact on attitudes—is built in. Every channel comes with its immediate (within 90 days) and long-term impact, forming a complete picture of return.
Marketing Data Lake: ITMA is built on a Spark delta lake (e.g. DataBricks) data lake that can serve multiple use cases, including reporting, ad hoc analytics, and activation. Because all of the data are pristine, marketers can most likely replace multiple existing systems with one unified ledger—a marketing income statement for the CMO.
The nitty gritty: how does it work
Marketbridge’s ITMA is built in Databricks, hosted at the cloud provider of your choice. This is not SaaS. Rather, it is a purpose-built, evolving service infrastructure that can be insourced as required.
Components include:
Databricks tables with common taxonomy and metadata
Data connectors to publishers, platforms, and marketing technologies
Reproducible data engineering workbooks
Version control and documentation in Github
The R-Shiny front-end MEP, which provides reporting, scenario analysis, and optimization
The R modeling library mbmmm, which provide econometric, longitudinal, and testing inference, optimization, and taxonomy standardization
Marketing Data Lake
The ITMA rests on a marketing data lake: A complete view of marketing stimulus and response, along with associated audience and customer information. This data lake provides significant ancillary benefits beyond attribution and optimization; because it must undergo ongoing quality assurance (QA) testing and remediation, it can function as a marketing general ledger—a sorely missing component of many organizations.
Download our whitepaper, “The superpowered CDP: Building a go-to-market data lake”
For a comprehensive exploration of the technical and use case review of a marketing data lake, download our paper.
The basic table structure starts with a longitudinal human record (LHR): a record of each individual’s interactions both “in the wild” (third party) and on domain (first party). Where identity resolution is not available, a probability is attached to a record to provide a complete view of potential stimulus. This LHR is then enriched with aggregated data (for example, upper funnel advertising, brand tracking, or economic data). When customers convert, first party demographic data can be cross-walked, and third-party demographics can be appended via an identity resolution service of the client’s choosing (for example, Experian, Equifax, or LiveRamp).
Because Databricks uses distributed storage and compute, query times are shortened from hours to seconds. When compute is not being used, clusters can be shut off, keeping costs reasonable.
Rapid Data Connectors
Because speed to insights is a primary objective of ITMA, shortening the time between marketing execution and ingestion into the data lake is critical. To accomplish this, APIs and direct linkages to data providers via cloud providers are the preferred methods of data transfer. This is most feasible for programmatic and digital marketing.
Reproducible Data Engineering
Marketing effectiveness measurement most often fails in the data transformation phase, before any analysis takes place. The “garbage in, garbage out” mantra is operative—small errors in grouping, summing, counting, and joining multiply and drive large errors downstream.
No black-box code or spreadsheet math is used to drive results. All code—whether custom for a given installation, Marketbridge libraries, or open-source libraries and packages—is available to inspect. Changes to code are preserved in perpetuity, ensuring auditability.
Download our whitepaper, “A roadmap for modern marketing analytics”
Download our whitepaper to learn more about reproducible data engineering in the context of marketing analytics.
The Marketbridge Marketing Effectiveness Platform (MEP) is a web-based decision support interface that allows marketers to understand channel-by-channel return, run hypothetical scenarios, and optimize their marketing mix for different objectives. It runs on the same open-code framework, using the same data, all the way back to the longitudinal human record.
mbmmm
mbmmm comprises a set of libraries and packages that power statistical inference, model validation, metadata, and data structures. It is totally flexible and extensible, with no tight couplings that will limit future flexibility.
Case Study
A health insurance carrier was juggling multiple marketing measurement and effectiveness methods, tools, and data structures. Each provided different answers—sometimes dramatically different from system to system. This resulted in low trust in analytics and slow, unconfident marketing decision-making.
Marketbridge worked with the marketing analytics team to replace a black box MMM; a software-based MTA, and a fragmented testing approach with a single measurement and optimization process: ITMA. Over the course of nine months, technology, analytics, and change management workstreams were launched and ultimately integrated to provide marketing executives with a unified multi-channel performance system.
The core of the system was the Marketing Data Lake, built around each newly acquired customer. A complete graph of known and inferred touches prior to conversion allowed attribution, while crosswalks to first- and third-party data allowed almost unlimited audience profiling—critical in understanding how different kinds of customers made the journey from awareness to learning to shopping to submitting applications.
The data lake is fed into three core systems. First, an econometric model forecasting total applications and sales by day and by region was built. This model used the data lake as its main input, grouping and summing both stimulus and response to create a cross-sectional time-series data asset, updated daily. This econometric model—essentially an MMM (media mix model)—also estimated revenue, leads, and other KPIs, and included non-marketing variables like plan strength, the macroeconomy, seasonality, and pricing. Second, a testing “factory” was built and kicked off. Tests were planned in a systematic way, using a kanban board. Each test was appropriately scoped (with one learning objective); statistically powered for low-risk readouts; and scheduled and integrated with marketing execution teams.
Testing was championed at the highest level of leadership (CMO and Chief Commercial Officer) as an ongoing innovation driver; because of this, most short-run concerns about lost performance were overcome. Once tests concluded, standard readout templates allowed learning to be effectively catalogued and put into action. Finally, test results were fed back into the econometric model and the MTA as Bayesian priors.
Download our whitepaper, “Accelerate growth through test-and-learn marketing culture”
To learn more about the Marketbridge approach to test-and-learn marketing, download our whitepaper.
Finally, a multi-touch attribution (MTA) system used Markov Chain modeling to estimate how each upstream touch or interaction—whether known or inferred—contributed to the ultimate outcome. Priors from the econometric model (MMM) and testing were also fed back into the multi-touch model to provide better estimates for long-run and latent effects. This system powered a daily dashboard showing attribution for each channel, down to a “micro-channel” level (e.g., branded paid search, specific affiliate partners, Meta-social-reel, etc.) This dashboard was used by executives to tune campaigns quickly. As priors from MMM and testing were updated, inferences were likewise updated.
The system replaced a six-figure black-box MMM solution and several complex identity graph-based attribution technologies, saving around $2.5 million dollars per year, while adding near-real-time attribution, and reducing confusion from conflicting ROAS and CPA estimates. The marketing data lake quickly drove additional use cases, including audience profiling, customer experience, and media auditing. Within one year, overall marketing-touched applications increased at a higher than fair share rate, and early indications are that reinvestments in upper-funnel brand marketing is paying off in higher yield rates in previously weak markets.
What to expect
Embedded: We embed our world class marketing data science team inside your domain to build your system. No clean rooms, software licenses, or restrictive contracts. Because we act as our clients’ direct agents, there are no pass-through markups or side arrangements with other vendors or software providers.
Nine Months to Answers: The Marketbridge team sets up the ITMA system inside your domain in six months, and then spends three months in pilot mode, making sure everything works. Because we are consultants at heart, you get weekly updates from the start, where we work with your team to hook up data sources, instantiate tables, run models, and set up dashboards.
Don’t Lose Your Marketing Brain: Because the infrastructure we build is open source, you don’t run the risk of losing what’s been built. While the mbmmm and MEP packages are Marketbridge IP, your team can keep using them and extending upon them, whether we remain your provider or not, subject to a license that they stay inside your walls. This de-risks marketing measurement, future-proofing your team from unforeseen future technologies, marketing approaches,
We Stick Around to Keep the Innovation Going: Once ITMA is moved into production mode, the team shifts into “run” mode, providing weekly updates on marketing performance, making enhancements, and helping you move from good marketing performance to world-class.
New innovations are tackled using an agile approach. A backlog of tests, analytics features, and new data sources is maintained in a kanban board. We work with the client collaboratively to prioritize what to work on next. All new work is done using the same reproducible, white-box methods.
Learn more and get started
We would love to meet with you to understand the current state of marketing measurement and optimization at your company, and to plan an ITMA journey that will get you to better effectiveness in less than a year.
Complete the form below to schedule a meeting with our Chief Analytics Officer.
1 These more comprehensive econometric models are sometimes called “Commercial Mix Models” due to their larger scope. As the scope of explanatory statistical models increases, they become useful to other parts of the organization, like finance and sales.
Pricing your product or service just right can feel like solving a puzzle without the picture on the box to reference. Although it can be challenging, we find pricing is one of the biggest short-term levers to drive sales performance. In fact, McKinsey reports “pricing right is the fastest and most effective way for managers to increase profits…a price rise of 1 percent, if volumes remained stable, would generate an 8 percent increase in operating profits.”
So how do you find the sweet spot for your new or existing product or service – the price point at which you can generate revenue without scaring off potential customers? That’s where survey pricing methodologies come into play. Two of the most popular direct survey-based pricing methodologies are the Gabor-Granger Model and Van Westendorp Price Sensitivity Meter (PSM), although others exist. These two methods are most helpful when looking for simple and straightforward answers about pricing. So how do these two methods work, and when should you use each one? Let’s break it down.
Think of the Gabor-Granger Model as a straightforward way to find the maximum price customers are willing to pay.
Here’s how the Gabor-Granger Model works:
After giving respondents the product or service description, show them a series of prices and ask how likely they are to purchase the product at each price point. If they are willing to pay that price, they are offered a higher (randomly chosen) price. If they are not willing to pay that price, they are offered a lower (randomly chosen) price. The algorithm repeats until we find the highest price each respondent is willing to pay.
Based on their answers, you can pinpoint the price that would give you the best sales while still maximizing revenue.
One of the helpful features of the Gabor-Granger model is that it helps you measure price elasticity—essentially, how sensitive customers are to price changes. For example, if you lower the price, will you see a surge in demand? Or, if you raise the price a little, will you only lose a small percentage of buyers? This method helps you predict those scenarios with confidence.
When should you use it?
The Gabor-Granger model is great when you’re trying to:
Find revenue-optimizing price points.
Get a clear sense of what your customers are willing to pay, especially for established products.
Focus on just one product or service without considering competition.
What’s the catch?
While Gabor-Granger gives you clear pricing estimates, it has a couple of disadvantages:
Since you’re suggesting the price points, it doesn’t give you insight into what consumers naturally think is a fair price.
Because people know they’re being asked about pricing, they might understate their willingness to pay to try to get a better deal (“gaming the system”).
The model only considers your brand or product (without factoring in the competition).
Van Westendorp: Letting Consumers Set the Range
Now, let’s talk about the Van Westendorp model. Unlike Gabor-Granger, which asks respondents to react to predefined prices, Van Westendorp flips the script and lets respondents tell you what prices they think are too low, too high, and just right.
Here’s how the Van Westendorp model works:
Ask a series of questions that gauge perceptions of price. There are typically four questions, set in the context of “At what price would you consider the product to be…”:
“Priced so low that you would feel the quality couldn’t be very good?” – to determine the “too cheap” price.
“A bargain—a great buy for the money?” – to determine the “cheap” price.
“Starting to get expensive, so that it is not out of the question, but you would have to give some thought to buying it?” – to determine the “expensive” price.
“So expensive that you would not consider buying it?” – to determine the “too expensive” price.
From there, you can build a price sensitivity meter that shows the range of acceptable prices from “too cheap” to “too expensive.”
When should you use it?
The Van Westendorp method is ideal for:
Getting exploratory results on an acceptable range of prices.
Products or services that don’t fit neatly into an existing category.
Gaining a deeper understanding of your target demographic and what attitudes they have about price points (and where those attitudes may impact your strategy).
The biggest advantage of Van Westendorp is that you’re getting consumer-driven pricing insights. You’re not dictating the prices (like Gabor-Granger) — consumers are, which can help you understand not only what they might pay but what they want to pay. It’s especially useful when you’re not confident in how the market will react to a specific price point.
Any drawbacks?
While Van Westendorp gives you great insights into price perception, it has a few limitations:
It doesn’t offer a clear picture of potential revenue. Knowing that consumers think a price is fair is helpful, but it doesn’t tell you how much money you’re likely to make at that price.
Since the pricing preferences are broad, the data can be tricky to interpret if you need concrete numbers for financial forecasting.
So, Which Method Is Best for You?
Both models have their strengths, and in some instances, they can even complement each other.
Use the Gabor-Granger model when you’re trying to optimize revenue and you need a clear, calculated price point. This method helps when you’re dealing with products or services that already have a place in the market.
Use Van Westendorp when you’re not sure what price the market will accept. It’s a good method for new products or niche items that don’t easily fit into existing categories.
In some cases, we recommend clients use Van Westendorp to gather a baseline of acceptable price ranges and then fine-tune those price points with the Gabor-Granger model to optimize revenue.
Final Thoughts on Survey Pricing
Pricing can make or break your business, but with the right tools, you can turn guesswork into strategy. The Gabor-Granger and Van Westendorp models give you different but equally valuable insights into how much your customers are willing to pay. Whether you’re launching something new or refining an existing product, these survey pricing methodologies can help you strike the perfect balance between affordability and profitability.
Marketing Mix Modeling (MMM) is a popular measurement technique to understand how different marketing channels and campaigns–as well as non-marketing factors like pricing, product assortment, distribution channel mix, competitive actions, and the macroeconomic environment—affect business outcomes. While there are many technical resources available online describing the statistical models used in MMMs and the pros and cons of each, a straightforward linear Marketing Mix Modeling example—focusing on the data required and the visual and data outputs emerging—is lacking. In this article, we will go through a complete Marketing Mix Modeling example from start to finish.
Use Cases for Marketing Mix Modeling
Marketing mix modeling (MMM) is a specific type of econometric modeling. Econometric modeling is the analysis of data over time and across categories to understand causality and forecast future results. MMMs, at their simplest, explain how marketing activity drives sales. They have many use cases, including estimating the long-run impact of advertising, optimizing marketing spend between channels, understanding audience responsiveness, evaluating campaign effectiveness, forecasting sales, conducting scenario analysis, and measuring overall performance—usually reported as returning on advertising spend or ROAS.
Marketing mix modeling is used across many industries. The most prevalent marketing mix modeling example is in the consumer package goods (CPG) industry. This industry sells mainly through retail distribution, making end customer-level data hard to come by—either for measurement or activation. This means that most marketing is “upper funnel”—video, print, or interactive without a specific call to action. This kind of marketing is ideal for modeling with MMM, as direct attribution is usually impossible.
Soup to Nuts Marketing Mix Modeling Example
Sourcing Data
Marketing mix modeling data can be divided into three basic categories. The vast majority of data are “x” or independent variables. These variables are hypothesized to drive dependent variables. Independent variables can be further sub-divided into control variables and stimulus variables.
Control variables cannot be influenced by the advertising, but still have the potential to explain outcomes. For example, the 10-year Treasury Rate is commonly used as a proxy for credit availability, which can impact consumer demand for more non-essential items. When the rate goes down, credit tends to be cheaper and looser, causing consumers to open their wallets. Conversely, the S&P 500 index is commonly used as a proxy for how wealthy consumers feel; if they have more money in their 401(K)s—even if it will not be available to them for decades—they tend to open their wallets.
Stimulus variables are at least partially controllable by the advertiser. Paid media—think television, digital display and social, paid search, and affiliate marketing—is completely under the control of marketing decisionmakers. Earned media is partially controlled; it takes time for PR and influencer marketing efforts to drive impressions, but companies can still make decisions to increase or decrease focus. Price is also partially controllable; for companies that use third-party distribution channels, setting price is more suggestive, and takes longer to take hold, but it is still a lever. Likewise, overall distribution channel mix is also a longer-term decision, but still has important impacts on marketing performance.
Response variables represent behavior that is affected by marketing. The most common response variable is sales, which can be measured both in dollars and units; either can be used in modeling. More advanced metrics like customer lifetime value (CLV) can also be used in lieu of gross sales.
Figure 1: Building the panel using both record-level and aggregated data.
Intermediate response variables that point to constructs like brand equity can also be collected. Both survey-based metrics like brand awareness, comprehension, affinity, or net promoter score and behavioral data like share-of-search, Google trends, and pre-paywall pageviews can be used as intermediate proxies.
Cross-sectional (sometimes called panel) variables organize independent and dependent variables. Cross-sections can include two types of components: a time component (Week or Day) and optional category components (Geographic, audiences, cohorts, etc.) For the econometric time series, each model needs between one to three years’ worth of data. The robustness of a model will increase by uncovering any seasonal trend or impact of outside factors. The goal is to ensure the data spans a period with consistent marketing affecting consumer purchase decisions. Category component data can include geographic areas (e.g., County or DMA), audiences, cohorts (e.g., customer birth dates), or any other relevant grouping criteria. While not necessary, including category components increases the degrees of freedom available and thus the precision and granularity of our estimates.
Once identified, all data sources are merged into a clean “panel” format. The source of each data set is either owned by various parts of an organization (Finance, Sales, Consumer Insights, Digital, IT, Marketing), vendors that support business functions (CRM, Media activation), or open-source information (Credit rate, Consumer Price Index). Communication and alignment between these distinct groups are necessary to set data requirements and ensure consistency. This process—sometimes called extraction, transformation, and loading (ETL)—is typically the most time consuming and error prone step in the process.
Typically, this “data munging” process is first done in batch format. Input files are arranged in a staging area—sometimes something as a simple as a hard disk, but increasingly using a cloud-based storage and compute environment like Databricks. It is best practice to write the data transformation steps in a text-based language like SQL or Python, and store the steps in a version control system like Github. This ETL process can then evolve over time as files change, more APIs are used in place of flat files, and additional data sources are added.
Exploratory Data Analysis and Quality Assurance
One of the most common mistakes beginner MMM modelers make is to jump immediately to regression modeling. However, most “bad” models turn out that way not because of the inferential modeling, but because of quality issues with the underlying data or an incomplete understanding of what the data represents.
Exploratory data analysis (EDA) is also too often a throwaway step that is rushed through to get to the real work. In fact, EDA should be approached with a similar level of rigor that one would use for regression modeling. To achieve this, the analysts involved need to have a clear plan of attack. This plan of attack is best documented with a series of archetype tables with “golden number” results expected.
The goal of any data validation is to make sure the “golden number” tracks through the data transformation step. The challenge here is that executive leadership views the data at a total budget or revenue level. Concurrently, the raw data used to create the panel is at a more granular level (Individual sales or marketing campaign activation). Consistency between how the data is labeled and any difference in timescale (When a marketing campaign is in flight vs. When its paid for) will affect the executive view. The goal in a proper EDA is to make sure that the golden numbers match between the analytics data set and the financial “common knowledge” that executives will use to judge whether the model is to be trusted.
For example, say one desired output table was total end customer revenue by geography by month, for the year 2024. The golden number would be sourced from an agreed-upon executive dashboard or report. Using the econometric time series data set, the analyst would then group by the required dimensions and sum (in this case, something like sum revenue by county by month where year equal 2024.)
Beyond validation, exploratory data analysis can also be helpful when forming hypotheses. Data visualizations like time series plots, bar charts showing counts or sums of independent and dependent variables by year, month or quarter, and scatterplots showing relationships between two variables are some of the most common visualizations used. Having a library of common visualizations in a single workbook (for example, in an Rmarkdown or Jupyter notebook file) is a best practice to rapidly create visualizations from a time series data set. Beyond validation and hypothesis generation, initial learnings from the EDA often help lend credibility to MMM results later during delivery.
Modeling
Before starting the modeling process, it is important to select an appropriate model type, which depends on the nature of the underlying data, and the sophistication of the analysis. In determining model type, there are 3 primary considerations:
Response shape construction
Multilevel modeling
Frequentist vs. Bayesian
Multilevel models can include any of the response shapes discussed and any model can be estimated as either a Frequentist or Bayesian model.
Response Shapes
Response shapes can be as simple or complex as needed for the task. As complexity increases, we trade off ease of training and interpretability for accuracy and flexibility. From simplest to complex, the three common response shapes are linear, non-linear, and splines.
Figure 2: Diminishing returns curves in action.
Linear regression models are linear with respect to the regression parameters (i.e. the betas). This means we can still apply any transformations that do not require additional parameters beyond the classic beta parameters and account for concepts such as diminishing returns. The most common of such are log, square root, and inverse transformations. While useful, these transformations often produce nonsensical results at very low or very high spend due to their lack of flexibility. For example, log transforming both the stimulus and response variables implies constant elasticity—meaning that at any point on the curve (i.e., an amount of marketing spend), the percent change in the response variable divided by the percent change in spend will be the same. In other words, increasing the spend on a marketing channel from $1 to $1.01 results in the same percent increase in sales as going from $1M to $1.01M dollars, which clearly does not represent reality. Nonetheless, linear models are straightforward and easy to interpret, making them suitable for scenarios where we do not expect a complex relationship between stimulus and response variables. They can also be helpful as “back of the envelope” starter models to understand whether a given set of independent variables impact an outcome.
Non-linear models extend linear models by allowing parameters that are not linear with respect to response. This brings the possibility of flexible functional forms that can estimate more realistic diminishing return curves. More advanced approaches are typically used to model the typical “S”-curves seen with upper-funnel advertising. An S-curve shape acknowledges that there is a minimum level of spend below which marketing is ineffective, then the curve rapidly steepens, and then eventually plateaus. While clearly valuable, non-linear models are harder to estimate and thus require more data and training time. This estimation difficulty also typically results in larger parameter confidence intervals.
Generalized additive models further extend modeling capabilities with splines. Splines offer even more flexibility by allowing different polynomial functions (or any other basis function) to be fitted to different segments of the data, ensuring that the model can adapt to varying rates of change in the response variable across different ranges of marketing spend. With this construction, they can theoretically model any smooth response shape including both S-curves and diminishing returns. However, as always there are downsides; without taking care in construction (e.g. applying sufficient regularization) splines often result in nonsensical response shapes (e.g. a response shape that looks like a sine wave) and their nonparametric nature reduces interpretability.
Multilevel Modeling
Mixed models, multilevel models, and panel models all make use of cross-sectional variables and are used interchangeably depending on the domain. Here, we will use the term multilevel models. These cross-sections are often geographies which are used below as a concrete marketing mix modeling example. However, the same statements can be made about any level (e.g. audience or cohort).
Geographical cross-sections provide two dimensions (time and geo) which implies for every point in time, we have many samples (one for each geo) and for every geo, we have many samples (one for each point in time)
This substantially increases the samples in our dataset, increasing the precision of our estimates. Additionally, it opens the door to new modeling techniques. The most common of which are:
Geo-level response curves: geo-level variation is leveraged to estimate individual response curves by geography either through no pooling (i.e., response curves for a geo are estimated using only data from that geo) or partial pooling (i.e., response curves for a geo are estimated by essentially taking the weighted average of the no pooling estimate and the average estimate across all geos)
Controlling for unobserved time varying effects: dummies for each point in time can be added to the model as there are multiple samples per point in time (often called time fixed effects)
As always, there is no free lunch:
The size of the data inherently makes constructing the dataset more difficult and increases training time substantially
Data is not always available at the geographic level for all channels, requiring assumptions around how that spend should be distributed in creation of the panel
Geographic labels do not always align between different marketing platforms, making joining arduous and error prone
Frequentist vs. Bayesian
Frequentist regression (i.e., the statistical paradigm most taught and used historically) assumes nothing is known of the relationships of interest prior to collecting and analyzing data. Frequentist regression is solely driven by the data collected.
Bayesian regression, on the other hand, combines prior beliefs (i.e., the prior) with the data (i.e., the likelihood) to compute estimates (i.e., posterior estimates). Priors can conceptually be divided into two types:
Utility priors: Priors that regularize, helping with multicollinearity and estimation of complex non-linear response curves. As the complexity or granularity of the model increases, utility priors become more necessary.
Previous knowledge priors: Priors resulting from domain knowledge, benchmarks, or previous modeling exercises
Experimentation priors: Priors resulting from experimental results, most commonly geo-tests. These are particularly useful for channels with questions around causality (e.g., branded paid search).
Historically, frequentist regression became standard primarily for computational reasons. Parameter estimates and p-values can theoretically be computed by hand or very efficiently with a computer whereas Bayesian estimates are usually impossible to compute via hand and time-intensive on a computer. Increasing computational power has closed the gap; however, Bayesian regression models still take much longer to train and iterate.
Variable Selection
Almost any combination of independent variables could be added to a model, but getting to the right structure is an iterative process that demands careful consideration. The structure of a marketing mix model requires a thoughtful approach to selecting and including the right combination of independent variables. This iterative process demands careful consideration because the wrong combination of variables, omitted variables, or an imbalance in the number of variables can lead to problematic model structures. Including too many variables can make the model overly complex and difficult to interpret, while too few variables might result in an oversimplified model that lacks actionability.
Feature selection requires a mix of necessary business variables and leveraging scientific methods to decide variable importance. Any factor that has an indirect impact on sales could be of value from the business perspective. Examples might include the effect of a new campaign for a specific product or measuring the impact of a change in sales channel.
How potential customers interact with marketing stimuli before they make a purchase decision should also be considered. Customers engage with several types of marketing stimuli while in the marketing funnel. The marketing funnel consists of four stages (awareness, interest, engagement, and conversion). For example, TV would be considered a channel driving awareness while paid search would be a conversion channel. How variables interact must be considered before a correlation matrix is used for potential variable selection. A correlation matrix is typically the first step in identifying candidate significant variables. This matrix displays the correlations between all pairs of variables and can be enhanced with color shading to indicate high positive or high negative correlations, making it easier to spot potentially powerful independent variables. High correlations between variables suggest multicollinearity, a situation where two or more variables are highly correlated and provide redundant information. Multicollinearity can inflate the variance of coefficient estimates and make the model unstable. Therefore, identifying and addressing multicollinearity is crucial in the early stages of model building.
Variable reduction is often necessary, especially when using non-Bayesian modeling approaches with many different stimulus variables that are co-linear. Co-linearity occurs when changes in one advertising lever are accompanied by changes in others. This can lead to counterintuitive results, such as variables that should have positive coefficients turning negative. To mitigate this issue, analysts employ techniques like correlation analysis, variance inflation factors (VIF), and principal component analysis (PCA) to reduce the number of variables while retaining the essential information.
Stepwise regression is a systematic technique for variable selection that can help in building a parsimonious model. This method involves adding and removing variables iteratively based on their statistical significance in explaining the response variable. Forward selection starts with no variables in the model, adding one variable at a time, while backward elimination starts with all candidate variables and removes the least significant ones step by step. Stepwise regression balances between these two approaches, adding and removing variables as needed to optimize the model’s performance.
Regularization techniques like Lasso (Least Absolute Shrinkage and Selection Operator) and Ridge Regression are essential for reducing overfitting. Overfitting occurs when a model is too complex and captures the noise in the data rather than the underlying relationship. Lasso adds a penalty equal to the absolute value of the magnitude of coefficients, effectively reducing some coefficients to zero and thus performing variable selection. Ridge Regression adds a penalty equal to the square of the magnitude of coefficients, shrinking all coefficients towards zero but never setting them exactly to zero. Elastic Net combines both Lasso and Ridge penalties, providing a balance that can be particularly useful in situations with highly correlated predictors. Each of these methodologies helps refine the model structure, ensuring it is both robust and interpretable. By iterating through these steps, analysts can develop a model that accurately captures the relationships between marketing activities and business outcomes, providing actionable insights for optimizing marketing strategies.
Evaluating the Model
Evaluating a Marketing Mix Model’s (MMM) performance involves two main components: model fit and prediction. Four key traits should always be considered:
Accurately represents reality
Accuracy of prediction for out-of-sample periods
Measured relationship of marketing variables + external factors
Provides meaningful decision-making insights
These traits ensure that the model is both statistically sound and practically useful.
The first validation step for any model is to review any diagnostic checks such as residual analysis. Residual analysis involves examining the residuals (the differences between observed and predicted values) to check for homoscedasticity (constant variance), autocorrelation (residuals are not correlated over time), and normality (residuals follow a normal distribution). Evaluating residuals over time helps identify unobserved effects which may significantly bias results. These checks help ensure that the model assumptions hold true and that the model provides a reliable representation of reality.
To validate if a model has accurately inferred an underlying pattern from actual data (Model fit), we can use the R-squared metric. The R-squared value measures the proportion of variance in the dependent variable predictable from the independent variables. R-squared can be misleading in complex models so it makes more sense to use an Adjusted R-squared since it adjusts for the number of predictors in the model.
When trying to compare different models, one can use the Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC) with a lower value indicating a better model. These criteria penalize models with more parameters, thus discouraging overfitting.
For example, a model that produces better fit metrics such as R-squared and BIC but fails to meet model assumptions is worse for inference than one with a lesser fit that meets model assumptions.
The gold standard for measuring a model’s ability to accurately predict unseen data is the out-of-sample mean absolute percentage error (MAPE). This metric assesses the prediction accuracy by comparing the predicted values to the actual values in out-of-sample data. Out-of-sample testing involves splitting the data into a training set and a hold-out set. The model is trained on the training set and then tested on the hold-out set to evaluate its generalizability. Cross-validation techniques, such as k-fold cross-validation, extend out-of-sample testing. In k-fold cross-validation, the data is divided into k subsets, and the model is trained on k-1 subsets while the remaining subset is used for testing. This process is repeated k times, with each subset used exactly once for testing. With time series data, these folds are often split such that training folds precede testing folds temporally. These methods help ensure that the model is not overfitted to the training data and can generalize well to new data. Care still needs to be taken in optimizing purely against out-of-sample performance. A model that predicts well is not useful for inference if it does not properly uncover the relationships between stimulus and response.
Perhaps the most important validation is common sense, known as “face validity.” This involves ensuring that the model makes sense from a business perspective and aligns with known market behaviors and historical insights. For example, if the model suggests that a particular marketing channel has a huge impact on sales, but this contradicts known business practices or historical performance, then the model may need to be re-evaluated. Business validation involves discussing the model results with stakeholders who have domain expertise to confirm that the results are reasonable and actionable. This step is crucial because a model that is statistically sound but lacks practical relevance is of little use. Face validity checks ensure that the model’s insights are grounded in reality and can be used to inform strategic decision-making.
Presenting and Using the Results
Once a model has been built and validated, the last step is using it to inform, predict, and make decisions. This involves communicating the insights effectively, integrating the model into business intelligence tools, and potentially leveraging the model for optimization.
PowerPoint
Even though data scientists hate to admit it, PowerPoint is still the lingua franca of business. Almost every MMM project ends with at least one PowerPoint. Ideally, two decks will be created: a “walking” deck meant to be read, and a “presentation” deck meant to be discussed. Too often, this isn’t the case, and the dense walking deck is read out to executives; this is probably the number one pitfall when communicating MMM results.
Either deck will still have the same basic sections:
Background and Objectives: Data scientists and statisticians often overlook explaining what the goal of the entire MMM process is, and generally how econometric modeling works. While this section can be skipped over in later “runs” of the model, it is important to set the stage for executives, outlining how regression works, what it is good for (understanding holistic relationships between marketing channels and long-run strategy) and not good for (immediate decision-making and quick campaign reads), and how it will be used as it evolves.
Model Overview: This section explains the model type, the variables included, and the rationale behind their selection. While these slides can be very technical, it is typically best to move most of the technical background to an appendix that can be referenced if needed, and to instead focus on the 30,000-foot view in the main executive summary. Structural equation-type diagrams can be used to illustrate the model structure at a high level and the relationships between variables.
Data Insights: Exploratory data analysis, while not the main topic of an MMM, is typically used to validate and tie to “golden numbers,” ensuring executive buy-in that the data used to build the model itself are correct. In addition to validating golden numbers, interesting trends and insights that fall outside of the scope of the model itself can be explored in this section.
Model Outputs: This section is the “meat of the sandwich,” in which model outputs are concisely communicated. This section should communicate total marketing contribution; overall ROAS (return on advertising spend) and cost-per-acquisition (CPA); channel-by-channel contributions, again outputting both ROAS and CPA; marginal CPA and ROAS; channel-specific adstocks (how long a channel’s influence is felt in-market); and response curves by channel.
Predictions and Scenarios: This section helps stakeholders understand the “so what” of the analysis—the normative “what we should do.” Typically, forecasts and scenario analyses based on the model are created. There are literally infinite scenarios possible, so choosing which to highlight requires coordination between the team doing the work and the marketers making the big future decisions. Regardless of the specific scenarios picked, the presentation should highlight how different levels of marketing spend or other variables impact outcomes.
Business Intelligence (BI) Tools
Integrating MMM results into business intelligence (BI) tools allows for continuous monitoring and analysis. BI tools such as Tableau, Power BI, or QlikView can be used to create interactive dashboards that update in real time as new data becomes available. This integration is definitely a later step for most companies, as it requires data engineering and technical steps beyond just displaying outputs in PowerPoint.
To make marketing mix models play nicely with BI outputs, there need to be consistent output data structures. Typically, an output data structure that includes standard time and cross-sectional dimensions—the same as those used in the time series panel, along with key “facts”—contributions, cost-pers, and ROAS.
Decision Support
Because of their underlying architecture, MMMs are natural candidates for use in decision support and optimization. Marketing mix models include non-linear, diminishing returns curves by their nature; optimization is a matter of finding the ideal mix of curves that maximizes a certain objective function—for example, maximizing revenue—subject to a set of constraints.
Figure 3: Optimization is about re-mixing marketing channels to achieve the best efficiency at a given level of spend
The simplest way to support this optimization exercise is to use a built-in linear programming algorithm such as Microsoft Excel’s Solver. In this approach, each curve can be extracted as an equation, and then aggregate outputs can be assigned to certain cells. One cell can be made the objective, and others can be assigned as constraints—for example, TV spend must be less than 30% of mix, the total investment must be less than $100M, and so forth. More advanced approaches can use machine learning algorithms in R and Python to optimize mixes.
Conclusion: Marketing Mix Modeling Example
Marketing mix modeling is sometimes seen as an intimidating and mythical black box by marketers and non-technical business professionals, which can lead to low trust in results. This need not be the case. Even non-technical managers should understand the steps taken to build and output an MMM. By presenting a clear Marketing Mix Modeling example, I illustrated how various marketing channels and external factors intertwine to affect business outcomes. This marketing mix modeling example serves as a valuable reference for understanding how data-driven decisions can enhance marketing effectiveness and drive success.
Multi-Touch Attribution, or MTA, seeks to better understand what truly drove or “caused” an action to happen. How does multi touch attribution work? Usually, these actions are online and commercial in nature. The most typical use case is understanding what drove an order or a transaction. These “part-worths” can then be summed up across many orders or transactions to form a complete picture of how different marketing channels contributed to the sales accrued in a given period.
For example, say a user purchased a subscription to a monthly ride-sharing service. It is easy to know that the user entered from a paid search ad; each ad has a unique referring URL that can be tracked. However, giving all credit to the paid search ad is most likely unfair. The user might have seen a social media campaign days before, or an ad on a baseball game weeks before. Those touches should also receive credit. The difference between a last-touch and multi-touch attribution is usually substantial. Lower-funnel channels usually look less attractive in a multi-touch approach (see Figure 1 below).
Figure 1: A last-touch versus a multi-touch approach can yield very different results. The “MTA effect” shows less attractive results for paid search and affiliate, and better results for DM, online video, and social in this example.
To accurately apportion credit to all contributing channels, it is necessary to estimate the effect that each touch had on the customer who ultimately ordered. This requires, at the most basic level, knowing that a previous interaction occurred. This is the number one problem with multi-touch attribution: gathering the “touches”.
Historical Context of Tracking
In the first decade of the 2000s, tracking users across digital properties was easier. Customers’ browsers held records of where they had been and what they had done. These cookies functioned as digital breadcrumbs, alerting an advertiser that a consumer had seen an ad weeks before. It was possible for advertisers to piece together customer journeys deterministically—that is, knowing concretely when and what someone saw before they made an order. With these data censuses, early MTAs could estimate each touches’ effects simply—typically applying even credit to each touch, with linear decays of impact over time—or using more complex methods.
Privacy Measures Impacting MTA
However, over the past several years, it has become more difficult to build truly deterministic MTAs. Privacy measures—whether cookie deprecation, walling off app data interchanges, or obfuscation of PII in third-party data sources—have made it harder and harder to say that a pageview or click was the result of an individual customer with any degree of confidence. The tradeoff between privacy and customization will likely be a forever war (see Figure 2)—suggesting an owned, decoupled approach to attribution that doesn’t rely on fads or black boxes.
Figure 2: The push and pull between privacy and personalization will likely be a forever war.
First-Party Data: The Advertiser’s Advantage
Some data sources are still totally knowable. Anything directly controlled by the advertiser—sometimes called first-party data—can be tracked in principle. Examples of these first-party interactions include email, direct mail, landing pages (or any web pages on the company website), owned or cooperative advertising networks, and call centers—whether customer support or acquisition-focused. In these cases, previous touches and interactions with an individual can be reconstructed as an identity graph, tied together with some unique ID.
Dealing with Unknown Identities
In most cases, however, interactions with customers are “underwater”—unknown to everyone but the individual and the publisher that he interacted with. In these cases, all hope is not lost. There are two basic approaches to dealing with resolving unknown identities.
Identity Resolution Technology
The first is using an identity resolution technology provider. These data brokers provide crosswalks between different walled gardens, by either inking proprietary sharing agreements or by using algorithms to match up individuals by using various pieces of their identity to build “fingerprints.” The fidelity of matching can vary from very low to approaching 50%—which is quite good. The trick is, most of these providers do not want to give up their raw data. Instead, they prefer to keep identities in their domain, for obvious reasons. In fact, most identity resolution providers’ first business is activation, not measurement. They provide value by essentially extending retargeting—the continued advertisement to one user over time—to broader parts of the digital domain.
Even so, it’s worth following up with a few vendors to see what they are willing to provide. In some cases, APIs may be provided that return a proprietary ID (or not) to a given third-party interaction. At scale—over millions of interactions—this approach can help build a deterministic identity graph.
Probabilistic Assignment of Interactions
Another approach is probabilistic assignment of interactions. This approach—while less precise than deterministic assignment of interactions—is also not vulnerable to future changes to platform privacy policies. In this framework, each identity (user) is assigned a probability of having seen an advertisement based on time and location.
For example, say we know that in the Memphis DMA on May 1st, there were 1,045 paid social ads served (these types of data are usually available from platforms, or via the digital agency executing the marketing.) The only other piece of data we would need would be the targeting criteria for the campaign (say, 18-34 year olds), and the number of those people in the DMA. From the American Community Survey (ACS), we know that the 2022 1-year estimates for 18-34-year-olds is 367,709 in the Memphis MSA (Metropolitan Statistical Area)—not an exact match for DMA but close enough. In that case, we can say that on May 1st, an 18-34-year-old individual in that area would have a 1,045 / 367,709 (0.28%) chance of having seen that ad.
This approach can work for virtually any media channel, from targeted to broad reach. It obviously is less accurate and contains significant potential error, but in aggregate, it is a powerful tool to create all encompassing media attribution.
Building a Longitudinal Human Record (LHR)
In reality, a mixture of discrete (identity resolved) and aggregated (estimated) impressions will be used in a comprehensive MTA (see Figure 3). Whatever the combination, the aim is to end up with a longitudinal human record (LHR)—the data structure also used for CDPs (customer data platforms.) This data structure in “tall and skinny”—in other words, it will have many rows and few columns. A typical LHR used for multi-touch attribution will have hundreds of millions or billions of rows, and something like 10-20 columns.
Figure 3: A mixture of record-level (deterministic) and aggregated (probabilistic) data sources to build the longitudinal human record (LHR)
The most important columns in the LHR are the date-time of the interaction in question and the ID of the “individual”. The records should be sorted first by ID, and then by date-time. In other words, all interactions associated with an ID x will be grouped together, and then sorted by date-time descending, with the most recent interaction at the top. In the case of an order—the thing you are trying to attribute in MTA—this will generally be at the top of a lot of previous interactions.
Apportioning Value to Interactions
Once the data are constructed, a method is required to apportion value to each of the previous interactions. The simplest method is equal apportionment—and it’s not necessarily a bad place to start. Say an ID associated with an order has 40 logged interactions in the prior 90 days. In this case, each of the 40 interactions would receive 1/40th of the credit for the order.
This simple method can be made more accurate by discounting “probabilistic” interactions by their probabilities. For example, in the case of the 0.28% chance someone in Memphis saw the social ad on May 1st above, that record would be a 0.0028 in both the numerator and the denominator of the attribution equation. It follows that deterministically known records—for example, an email that was clicked—will get a 1.00, which will swamp the probabilistically estimated interaction (or impression.)
Advanced Methods of Parsing Credit
More advanced methods of parsing credit are certainly possible. There are two primary methods: logistic regression at scale and Markov Chain estimation.
Logistic Regression
Logistic regression at scale attempts to estimate each prior interaction’s relative impact on an event (usually a sale) by transforming each side of the regression equation with a logarithm. The advantage of this approach over more traditional linear regression is that because the input (independent) and output (dependent) terms are logged, any combination of independent variables is bounded between 0 and 1 (or 0% and 100%). This is also called modeling the log of the odds of an event (the transaction). It is out of the scope of this paper to get into the specifics of how logistic regression (or logit) models works, but suffice to say that the output will be a term for each potential input expressed as a log of the odds, which can be translated to probability with the equation:
Where t is the linear combination:
The trouble with using logistic regression for MTA is that it requires ones (successful sales) and zeros (unsuccessful sales—or potential customers who never became customers.) Unsuccessful sales can be hard to come by. When an entire customer journey is known, for example in the case of a health insurance carrier attempting to get members to take a discrete, known action, this is a very feasible approach. However, when prospective customers don’t have a real defined endpoint, more creative approaches to defining zeroes are needed.
Markov Chain Estimation
The second more advanced approach to parsing value in Markov Chain estimation. This flips logistic regression on its head, and focuses on the upstream tactic—say, clicking an email—and then estimates each next downstream outcome possible from that point. A good visualization of this approach is the old “The Price is Right” game “Plinko.” In this game, the player places a disc at the top of a board with a set of pegs. Each peg represents a random coin flip—so the path that the disc takes is a series of .50 probability decisions resulting in an outcome. However, in Markov Chains, each probability is not 50%–and the disc can end up in more than two next states in the next step.
Markov Chains have the fundamental property of “memorylessness”—their behavior at a given state is only dependent on where they are, not where they have been. This simplifies MTA analysis, but by implication, does not take into account the “saturation” effect of marketing. In other words, we only care that you received a piece of mail at time T, but not how many pieces you received prior to time T. The task of the analyst is to build a matrix of the probabilities of the interaction a customer will have next at a given interaction point, sometimes called a transition probability matrix. For example, this matrix might look like this for an advertiser deploying mainly digital tactics:
Next State
Initial
Email
Social
Affiliate
OTT
Sale
No Sale
Current State
Initial
0
0.40
0.20
0.30
0.10
0
0
Email
0
0.10
0.05
0.05
0
0.01
0.78
Social
0
0.02
0.01
0.02
0.02
0.02
0.91
Affiliate
0
0.04
0.01
0.03
0.01
0.03
0.88
OTT
0
0.04
0.01
0.01
0.02
0.02
0.90
Sale
0
0
0
0
0
1
0
No Sale
0
0
0
0
0
0
1
The above matrix must meet two conditions:
Pij is the probability of moving from state i (row, current state) to state j (column, future state); it must also be bounded between 0 and 1 (inclusive)
The rows (i) have to sum to 1; sale moving to sale is called “absorption” (yes, this is a bit confusing, you can think of it as the journey ending)
From this matrix, it’s possible to calculate the absorption probabilities of each cell—in other words, the probability that at that point, an individual will eventually turn into a sale. To do this, split thetransition matrix into two sub-matrices, Q, the matrix of transitions between non-absorbing states (touchpoints), and R, thematrix of transitions from non-absorbing states to absorbing states (conversion or non-conversion).
Matrix Q:
Initial
Email
Social
Affiliate
OTT
Initial
0
0.40
0.20
0.30
0.10
Email
0
0.10
0.05
0.05
0
Social
0
0.02
0.01
0.02
0.02
Affiliate
0
0.04
0.01
0.03
0.01
OTT
0
0.04
0.01
0.01
0.02
Matrix R:
Sale
No Sale
Initial
0
0
Email
0.01
0.78
Social
0.02
0.91
Affiliate
0.03
0.88
OTT
0.02
0.90
With these two matrices, absorption probabilities are found by calculating the fundamental matrix, F = 1 – Q)-1where I is the identity matrix and (1 – Q)-1 is the inverse of the matrix I-Q. Finally, multiplying F by R will produce the ultimate probability (absorption) of a sale happening (or not) at any point in the matrix. In the case of the given data, F for “sale” is:
Next State
Email
Social
Affiliate
OTT
Initial State
Email
0.011151
0.000569
0.000587
0.000176
Social
0.000488
0.020235
0.000447
0.000418
Affiliate
0.001401
0.000387
0.031011
0.000324
OTT
0.000925
0.000256
0.000263
0.020416
These estimates can then be used to allocate value between touches in a chain. For example, say a user is observed going from email to email to social to OTT, and then finally closing. In this case, the values pulled from the above table would be 0.011151; 0.000569; and 0.000418, with the final step—OTT to absorption—pulled from our first matrix as 0.02. To apportion value, these numbers are summed and divided by the total number of steps in the pathway:
(0.011151 + 0.000569 + 0.000418 + 0.02) = 0.03214
To get to a final contribution vector for each unique journey:
Touch 1 (Email)
Touch 2 (Email)
Touch 3 (Social)
Touch 4 (OTT)
35%
2%
1%
62%
Of course, this has to be scaled over thousands and millions of orders—and it’s a lot more complicated than the simple example shown above.
The final question is time. In the example above, say the first interaction (email) happened 90 days ago, but the last interaction before close (Social) was two days ago. Is it fair to give the Email touch 35% credit and the Social touch 1%? Almost certainly not.
There are many techniques for decaying a touch’s impact on ultimate conversion, but they all have the same basic concept: an interaction’s impact fades over time. In MMM (media mix models) we use the term adstock—the rate at which a touch’s “power” decays after it is shown. The concept is similar in MTAs, but is reversed, because we are focused on the sales, not the stimulus. In other words, we look back from the order, not ahead from the stimulus.
The simplest approach for MTA is straight-line decay with a look-back window. This is just as it sounds: A touch before the start of the look-back window gets 0 credit; one mid-way through the window gets ½ of the full credit; and a touch on the day of the order gets full (1) credit. More advanced logistic decay approaches are certainly possible, but yield limited additional benefits and add significant complexity.
Bringing It All Together: How Does Multi Touch Attribution Work?
Putting all of these elements together—the longitudinal human record (LHR), probabilistic inference of non-deterministic touches, estimation of contribution via either logistic regression or Markov Chain analysis, and decay—will give a marketing analytics team the basic elements of multi-touch attribution.
One final, important note: black box MTAs provided by vendors claiming a secret sauce for identity resolution are compelling because they seem simple, but buyer beware. Proprietary identity graphs are only as good as the federations the company belongs to and the underlying cross-walk code, and there is really no way to validate them other than by seeing if they “make sense.” For this reason, they tend to struggle over the long run as accuracy—and hence utility—are questioned by financial decisionmakers. A purpose-built, open-code, white box approach to MTA, which can also be used to power downstream applications like dashboards and econometric MMM panels—should be the preferred approach for marketing analytics teams.
In the ever-evolving landscape of marketing attribution, businesses are engaged in an arm race to better measure and maximize ROI and growth. Beyond simple last-touch attribution, the two prominent methodologies often compared are Media Mix Modeling (MMM) and Multi-Touch Attribution (MTA). This article delves into the nuances of MMM vs MTA, exploring their differences, advantages, and how they can be effectively utilized to drive marketing success. The good news is that this isn’t necessarily an either / or decision; MTA can be used upstream and in concert with MMM to combine quick-read and long-run views of marketing’s effectiveness.
Before diving into MMM vs MTA, it’s important to note that the bread and butter of marketing reporting—last touch attribution—is still a powerful and simple tool, and should not be cast aside in favor of more advanced techniques. Last touch attribution credits the final interaction before conversion, generally using a direct database linkage via some kind of unique ID—a telephone number, URL string, or webpage tag. Last touch attribution will always be simpler and faster than multi-channel techniques. MMM and MTA outputs should always be looked at side-by-side with last touch reporting. It is often these comparisons that yield the most interesting insights.
Understanding Media Mix Modeling (MMM)
Media Mix Modeling (MMM) is a statistical technique used to estimate the impact of various marketing channels on sales performance. It aggregates historical data in a time series, usually across a geographic or other cross-sectional key, to measure the effectiveness of marketing efforts and to allocate budgets more efficiently. MMM can interpret any kind of stimulus, whether paid or earned, upper- or lower-funnel, or offline vs. online. It can also be used to understand the impacts of non-promotional factors—including price changes, competitive actions, product launches, and distribution channel strategy.
Figure 1: The basic idea behind MMM; there is an efficient frontier for marketing achieved by optimally mixing channels. This mix is different at different spend levels, but generally the macro curve exhibits diminishing returns to scale (its slope decreases as a company spends more).
Key Benefits of MMM
Comprehensive View:MMMs provide a broad and complete overview of how different marketing channels interact and contribute to overall sales. This comprehensiveness is beneficial to understanding the combined effects of multiple marketing efforts and avoids over-crediting.
Long-Term Abilities: One of the strengths of MMM is its ability to account for the longer-term impacts of marketing—whether over weeks and months, or years, in the case of marketing’s impact on brand equity. This is particularly helpful when trying to gain an honest accounting of the effectiveness of upper-funnel channels like TV and print, where effects are usually not immediate. This long-term focus also makes MMM less capable of reading “what just happened”—although techniques like rolling analysis windows can help with trending.
Requires “Smaller” Data: MMM data frames are only a few megabytes, usually a few thousand rows and a few hundred columns. This generally makes it possible to store the data on a traditional device—a hard drive or simple cloud-based storage. MMMs can utilize either spend or impressions as its “x” or independent variables, and can use either sales or revenue as an ultimate dependent variable. Even so, MMMs need extensive historical data, usually a minimum of two years, making them more suitable for established brands with significant data accumulation. Brands without much historical data can still be boot-strapped with Bayesian priors.
Understanding Multi-Touch Attribution (MTA)
Multi-Touch Attribution (MTA) is a more granular approach that focuses on assignable channels—usually digital, but also including direct mail and interactions with known customers. It attributes conversions to the multiple touchpoints that a consumer interacts with throughout their journey. This method provides insights into the effectiveness of each touchpoint in the conversion path.
MTA was very popular in the early days of digital marketing, before privacy concerns and platform data hoarding made it harder to resolve identities across channels. In recent years, it has come under fire as first generation deterministic ID resolution approaches failed, but advances in data clean rooms and probabilistic exposure inference are making “second generation” MTA models a very attractive option for inference.
Key Benefits of MTA
Close to Real-Time Insights: MTA models are capable of using real-time data, allowing marketers to make quick adjustments to their strategies. This is particularly advantageous in fast-paced digital environments. The devil is in the details, however—real-time results are only possible with very robust data pipelines and fast compute environments.
Potentially Unlimited Granularity: Because MTA models are built at the log level, there is potential unlimited detail available about each individual touchpoint, helping marketers understand the specific role each interaction plays in driving conversions. Keep in mind that this detail is dependent upon robust lookup tables and cross-walks, as well as thought through marketing taxonomies.
Consumer Journey Mapping: A side benefit of building the human longitudinal record required for MTA analysis is a 360-degree view of the journey. Exploratory data analysis (EDA) of this data artifact using big data tools can identify influential touchpoints, find breakdowns in e-commerce pipelines, and discover high-value audience segments. Even so, MTA data frames store much more data, at the record level. Log files, sometimes billions or tens of billions of rows, must be processed, demanding a big data compute environment.
Combining MMM vs MTA for a Holistic Approach
MMM vs MTA is how these methodologies are often perceived, but they can and should be used together. Integrating the macro-level insights from MMM with the micro-level details from MTA can provide a comprehensive understanding of marketing effectiveness. This integrated approach allows businesses to leverage the strengths of both models.
The record-level data required for MTA analysis can be used upstream of the MMM econometric panel structure, directly feeding it. In this way, the same “single source of truth” can be used for both analyses. Data that is not used in MTA—for example, survey data—can be joined after the raw data has been grouped and aggregated. Our recent white paper on the Go-to-Market Data Lake architecture details this approach. There are three main steps:
Creation of Longitudinal Human Record (LHR): Tying customers’ journeys together in longitudinal chains can help locate points of friction, profiles audiences, and conduct multi-touch attribution (MTA).
Creation of Econometric Panel: The LHR then serves as the base query to create an econometric panel for MMM. This panel is a summation of stimulus (x-variables) and response (y-variables) by day or week, across one or more cross-sectional dimensions.
Data Aggregation and Supplementation: The panel is then supplemented with aggregated data, such as linear television or unresolved digital marketing data, to fill in gaps and ensure a complete dataset.
Figure 2: Start with record-level data to build the LHR, and feed the ultimate econometric panel to enable MMM.
Use Cases for MMM and MTA
It is best to think about the usage of MMM and MTA in the context of planning cycles. It is helpful to think of three marketing planning cycle types: strategic; tactical; and reactive.
Strategic Planning
Strategic planning typically happens annually, with more ambitious strategy resets looking out three or even five years. This type of planning typically looks at the total marketing investment envelope (e.g., $50M or $75M per year); the rough mix by funnel position; and any new channels or types of marketing to be tested at scale. MMM—particularly more advanced modeling taking advertising’s impact on brand equity—is the right tool for this exercise.
MMMs can be extended from measurement to optimization by extrapolating the curves outputted from statistical inference into “what if” scenarios, and then using machine learning to re-mix marketing until an optimal solution is reached. This optimization step can be a helpful input into the strategic planning process. It is important to note, however, that optimizations based on past results are not accurate when predicting huge budget swings. A good rule of thumb is that beyond a 20% increase or decrease in budget, curves become unreliable.
Tactical Planning
Tactical planning typically happens quarterly or annually, and looks at the specific channel and audience mix that will drive maximum ROI—sometimes called ROAS in marketing circles (return on advertising spend). Both MMM and MTA can be useful in the tactical planning phase. MMM is good to understand marginal customer acquisition cost (CAC), allowing campaign planners to re-mix channels to maximize effectiveness given a certain budget. MTA can then be mixed in to identify recent changes in channel effectiveness, and to get to granular detail on specific creative types, landing pages, offers, and cadence.
Reactive Adjustment
Reactive planning (or adjustment) happens constantly. Marketing dashboards typically start with last-touch results. They become far more powerful when positioned side-by-side with MTA results. Channel managers who are used to seeing last touch CPAs will now see a mutually exclusive, collectively exhaustive view of CPAs, as well, that credits channels with “halo” if they drive more influence up the funnel. MTA is ideally suited for reactive adjustment because it can be built to update in near real time.
Figure 3: A last touch vs. MTA version of channel contribution. The “MTA effect” is the impact of multi-touch attribution on last touch CPAs. Some channels do better, and others look more expensive.
Final Thoughts on the MMM vs MTA Debate
In the debate of MMM vs MTA, the answer really is “both”. Both methodologies offer unique benefits and address different aspects of marketing effectiveness. The good news is that they can be built together, using one data pipeline. By understanding their strengths and limitations, marketing leaders can leverage MMM for strategic planning and MTA for tactical optimization and reactive adjustments. Combining both approaches—without throwing away last touch attribution—provides a holistic view of marketing performance, ensuring that every marketing dollar is spent effectively.
Have you ever encountered survey results showing all the respondents think everything is important or, worse, nothing is important? This survey output often arises in survey questions where respondents are asked to rate items on a scale, resulting in a chart like this:
Here, every option seems equally important, making it challenging to pinpoint what truly matters to respondents.
Now, imagine you are responsible for selecting the next best feature for your product, a decision that will trigger a multi-million-dollar investment to bring this feature to life. Or you are a CMO trying to decide what features will resonate with consumers as you break into a new market. Would you feel comfortable making a decision based on the above chart?
Probably not.
Unfortunately, this “everything is important, so nothing is” syndrome is a common quantitative research issue, especially with tools like Net Promoter Score (NPS) or Likert Scales, which struggle to measure incremental changes and extract meaningful differentiation between options.
Enter MaxDiff, a methodology designed to solve these exact problems.
What is Maximum Difference Scaling?
Maximum Difference Scaling, also known as MaxDiff, is an advanced choice-based tradeoff methodology used to help strategic leaders uncover better decisions about their GTM approach, products, and more. Instead of asking respondents to rate the importance of items independently, MaxDiff presents them with a set of options and forces them to make tradeoffs, indicating their most and least preferred items. The output offers crisp insights into what truly matters to respondents.
Example of what a respondent would see in a MaxDiff battery
Why Use MaxDiff?
MaxDiff can be advantageous over other research techniques for a few reasons:
Forces Choices and Tradeoffs: By requiring respondents to choose between options, MaxDiff helps highlight preferences more sharply.
Generates Clearer Insights: This methodology reveals gaps in importance and preferences, making it easier to identify key priorities.
Avoids Yeah-Sayers: Respondents can’t simply agree that everything is important or not important; they must make definitive choices.
Creates Intuitive Experience for Respondents: The format is straightforward, making it easier for participants to engage meaningfully with your survey.
Enables More Efficient and Effective Survey Design: MaxDiff efficiently captures nuanced data without overwhelming respondents with lengthy rating scales.
When to Use MaxDiff
MaxDiff is a versatile tool that can be applied to various use cases to help marketers make more informed decisions:
Prioritization of Features: Determine which features are most important to your customers to drive product evolutions.
Needs-Based Customer Segmentation: Understand and identify different customer segments based on their preferences (see below example).
Message and Branding Testing: Find out which messages and branding efforts resonate most with your audience.
Product Feature Testing: Identify the most and least desirable product features.
TURF Analysis: Extend MaxDiff’s output of your product’s most important features or criteria to inform the top combination of features for your offering via a TURF (Total Unduplicated Reach and Frequency) analysis (see below example).
How to Use MaxDiff
To get the most out of MaxDiff, here’s the structured approach we follow:
Qualitative Discovery: Begin with qualitative research to understand the broad range of needs, criteria, and options. This step ensures that your list of options is comprehensive. Note: MaxDiff won’t be able to tell you if your list of options is “good,” “bad,” or all-encompassing, so it’s crucial to identify the options in qualitative discovery to inform your next step.
Quantitative Validation: Once you have a robust list, use the MaxDiff tool in your survey platform to validate (or disprove) each option’s importance and satisfaction levels.
Outcome Analysis: The result is a prioritized list of needs, highlighting the most important and least satisfied criteria to address.
Example output using MaxDiff offers more differentiation between options.
MaxDiff is a powerful tool that brings clarity to the often opaque world of market research. By forcing respondents to make tradeoffs, it provides actionable insights that can help to inform product evolutions, better understand customer needs, or optimize messaging and branding. If you’re struggling with the “everything is important, so nothing is” syndrome in your surveys, it’s time to consider MaxDiff for more precise, more meaningful results.
For most B2B go-to-market (GTM) leaders, 2023 accelerated a trend that had, until recently, only been lurking in the background—the diminishing effectiveness of direct sales and marketing teams. The traditional reliance on tele- and email-based account management, free trials, and automated motions have the unintended consequence of decreasing the openness or receptivity of the buyers they target. Today, buyers are less receptive than ever to answering a call, much less meeting with a person or organization with whom they do not have a pre-existing relationship.
To counter this dynamic, GTM leaders have predictably turned to their channel partners and the teams that support them to provide essential pathways to expand market reach and drive the top-line growth required to achieve targets.
A Comprehensive Guide to Channel Partner Enablement
Effective channel partner enablement is crucial. It enhances relationships and drives strategic outcomes, such as expanding account relationships and increasing the leverage of commercial organizations.
In this guide, we explore the seven steps of channel partner enablement that ensure these relationships are as productive and profitable as possible. From customizing partner programs to leveraging advanced data analytics, these imperatives can transform your partner strategy into a powerful component of success.
A well-structured partner program is essential for building new and nurturing existing relationships. One sure sign of an effective program is its ability to be tailored to meet each partner’s diverse needs and expectations. While this was once a fairly straightforward exercise, the proliferation of partner roles and capabilities has blurred the lines between historical partner classifications such as ISV, VAR, GSI, MSSP, etc. In the new age of partnerships, agility, supported by clearly defined and simple program structures, best drives alignment and motivates partners to execute their roles effectively.
Best Practices:
Understand Partner Profiles: Segment your partners based on their market position, size, and typical customer base. This segmentation will allow you to tailor programs that align with the most prominent business models and partner capabilities.
Design Tiered Incentives: Implement a tiered program structure that provides partners with the requisite financial rewards and non-financial benefits for meeting different performance thresholds. This promotes healthy competition and ensures partners of all sizes can benefit and grow.
Perform a Program Health Check: Conduct a comprehensive evaluation of your partner program every two to three years to ensure it’s working as designed and effectively driving the desired partner behaviors and outcomes. This health check helps identify areas of improvement and ensures the program remains aligned with market conditions and strategic objectives.
2. Craft and Quantify Compelling ‘To-Partner’ Value Propositions
Creating and clearly defining compelling value propositions for partners that resonate at the organizational and individual levels is crucial for the success of any channel partnership. This means articulating both the higher-level strategic benefits that partners gain from the program (i.e., increased revenue-enhanced product offerings, etc.) and breaking down what the partnership might mean for the individual contributors (i.e., sales incentives, exclusive training, etc.), all while providing the quantifiable data to back up the claims and benefits. This approach helps partners see the tangible gains from the partnership, driving them to engage more fully and align their efforts with the shared goals of the collaboration.
Best Practices:
Build ‘Partner-Centric’ Value Propositions: Put yourself in the shoes of the partners you serve to refine plans for helping them achieve their business goals. Stakeholders often nuance these goals, and there is seldom a ‘one size fits all’ solution. Partner research surveys are often helpful in developing ‘Partner-Centric’ messaging frameworks.
Think Organizationally and Individually: In addition to outlining organizational benefits, emphasize how the individuals within the partner company can achieve success from the program. This might include access to exclusive training, tools that make their jobs easier, or performance-based financial rewards.
Celebrate and Highlight Success: Use case studies and testimonials from existing partners to demonstrate the real-world benefits and ROI of the partnership. Seeing peers’ success is often a powerful motivator for existing partners and has the added benefit of attracting new partners.
3. Personalize Content Around Partner-Oriented Use Cases
Personalization seems simple but can be challenging to execute. When referring to partner enablement, personalization should provide the right type of information and access to the right resources tailored to the partner’s specific needs. When implementing a personalization strategy into your program, focus on partner-oriented use cases that deliver highly relevant assets that directly support the partners’ sales and marketing motions and strategies.
Best Practices:
Start with Identification: Kick off the personalization process by conducting thorough analyses to identify the most relevant and widespread use cases for your partners. This often overlooked but critical step is fundamental for prioritizing where to invest precious time and resources for driving personalization at scale. Regularly revisit and refine these use cases based on evolving product, buyer, and partner dynamics to ensure they align with actual partner needs.
Support Content Utilization:Creating content that aligns with the most prevalent use cases and supports partner needs based on scenarios (i.e., sales scripts, marketing assets, training materials, battle cards, etc.) is only half the battle. Once built, systems must be in place that cater to each use case. This might involve specialized training sessions, dedicated support personnel for complex issues, or even white-glove marketing support for certain high-priority partners and use cases.
4. Leverage Data as a Collaborative Asset and Differentiator
Over the last decade, the prevalence of data, technology, and supporting analytics has exploded; it seems as if almost everything related to the marketing and sales process is counted, stored, and analyzed. Channel partnerships, therefore, should be more valuable than ever, with mutual clarity into account bases, pipelines, and near-instantaneous sharing of insights—yet most partnerships do not work this way. A lack of mutual trust underscored by concerns over data security and the perceived risk of losing proprietary insights and information often creates strategic deadlock between partner and provider, where each party shares only just enough information as necessary to receive the requisite compensation—in other words, transactional partnerships.
Integrating mutual data into the partner enablement program is crucial for transforming partnerships from “reactionary” or “transactional” to “proactive.” In doing so, organizations can improve their operational efficiency while enhancing the partnership’s overall value and encouraging long-term collaboration and shared success.
Best Practices:
Establish Clear Agreements: Draft formal agreements that clearly outline what data will be shared, how it will be used, and what safeguards will be in place to protect it. This should include defining the purpose and acceptable use of data, the methods and frequency of data exchange, and the duration for which data will be shared.
Use a Phased or Tiered Approach: Begin with sharing non-sensitive, less critical information (e.g., aggregate/summary data, market trends, or product performance metrics) to build confidence. Only once a foundation of trust has been established should you move towards sharing more sensitive customer data. This should be done slowly and deliberately—often starting with a single, trusted partner (and may be further bisected by product/region)—before a programmatic roll-out.
Leverage an Objective Third Party: In most cases, involving a neutral third party can help ease concerns by managing the upkeep, matching, anonymization, and exchange. This could be a trusted consultant or a data-sharing platform that ensures compliance with all agreed terms and maintains the confidentiality and integrity of the data.
Enhance Collaborative Technology: Use technology that enhances collaboration between you and your partners. Shared workspaces, real-time communication platforms, and integrated supply chain systems can help streamline operations and improve transparency.
5. Optimize Channel Partner Coverage
Optimizing channel partner coverage ensures that partnerships across the ecosystem are effectively engaged and supported throughout the partner lifecycle and customer sales processes. Though concepts of coverage, sizing and deployment, and territory assignments or compensation alignment are critical levers of success for leaders of direct selling units within an organization, they tend to be less prevalent for channel organizations (this is especially true when related to enablement). This dynamic is mainly due to organizations being more accustomed to the immediacy and attribution of direct results when managing their own sales teams, where outcomes are easily measured, transparent, and more responsive to change than their indirect counterparts. On the other hand, channel partners inherently add incremental layers of complexity, making it harder to prioritize and optimize partner coverage effectively. As a result, many channel organizations fail to incorporate this powerful lever in their overall enablement strategy.
Ensuring that partner account managers and commercial teams are executing consistent processes is crucial for the scalability and effectiveness of partner enablement programs. Regular training and consistent processes are essential, especially in maintaining alignment across various teams.
Coverage optimization starts with confirming the right types of job roles and commercial functions are in place (e.g., partner account managers, partner marketing managers, channel managers, etc.) and subsequently equipping those personnel to execute consistent processes and maintain a regular cadence of enablement activities. By refining coverage strategies and securing buy-in from sales leadership, organizations can often enhance the overall effectiveness of their channel partnerships, leading to increased sales and a stronger market presence.
Best Practices:
Sales Leadership Buy-In: Secure buy-in from sales leadership to ensure enablement efforts are supported with the necessary resources and attention.
Utilize Partner Segmentation to Inform Coverage: Segment partners based on key criteria such as tier, relationship strength, capability, market potential, strategic value, etc. Use this segmentation to guide the role requirements and ensure resources are effectively deployed.
Consistent Globally Extensible Execution: Develop operating procedures, milestones, and engagement standards for partner managers that drive valuable partner interactions. This consistency helps build trust and strategic alignment and has the added benefit of keeping management up-to-date and well-informed.
Flexible Engagement Models: Recognize that different partners may prefer different levels of engagement. To accommodate these preferences, offer various interaction models, from self-service portals to personal account management.
6. Analyze Partner Performance, Engagement, and Activity
Regardless of the type of partnership (e.g., tech, channel, strategic, etc.), a clear understanding of partner performance, engagement, and activity is essential to understanding productivity partnerships and the effectiveness of enablement efforts. As the partner technology stack has continued to expand, partner organizations increasingly leverage advanced technology and tools to gain better insight into how their partners behave.
These tools include sophisticated CRM/PRM systems, comprehensive Business Intelligence (BI) applications, specialized engagement tracking software, and cutting-edge predictive analytics platforms that curate detailed insights into how partners interact with online portals and individual pieces of content, and predict future performance trends that allow for improved enablement strategies and tactics. By integrating these powerful analytics tools into their operational framework, partner programs move beyond simple observation, enhancing partner relations and improving future interactions.
Best Practices:
Tools Must Support Strategy: Partners expect modern, easy-to-use, and easy-to-navigate tools —these are now largely table stakes in partner enablement (typically inclusive of Portals, PRM/CRM systems, data analysis tools, and marketing automation platforms). However, these tools should be thought of as additive to the overall enablement strategy rather than the answer to all challenges.
Assign Ownership to Measuring Engagement: Using analytics tools to monitor how partners interact with provided content and resources often sounds too good to be true, and many organizations struggle to find tangible value despite a plethora of data. Establishing clear ownership can help alleviate this concern and ensure that the data collected does not go unattended.
Customize Support Based on Data: Utilize the insights gathered from analytics to provide customized support to partners, helping them overcome specific challenges and capitalize on opportunities. Consider engagement intelligence when developing coverage or aligning incremental resources with partners.
7. Develop Ongoing Communication and Feedback Loops
Open lines of communication and robust feedback mechanisms are vital for building trust and maintaining long-term relationships with partners. These systems help organizations adjust their strategies and operations in response to direct input from their partners.
Best Practices:
Voice of Partner Surveys: Implement well-structured “Voice of Partner” surveys that regularly collect comprehensive feedback from partners on various aspects of the program, products, services, and overall experience.
Only Track What You Intend to Act On: It is crucial to collect feedback and act on it. Show partners that their input is valued by making visible changes based on their suggestions and concerns.
Embrace a Holistic Approach to Channel Partner Enablement
In our experience, partner organizations often become hyper-focused on one or two dimensions of enablement while overlooking how these elements work together holistically. In such instances, it’s crucial to remember that effective enablement transcends individual initiatives; it’s about developing a well-defined, cohesive ecosystem of collaboration and mutual understanding. For organizations aiming to fully leverage their partner networks, embracing a comprehensive approach to enablement is a strategic necessity and a transformative opportunity. This journey towards refined enablement empowers partners, enhances collaborative efforts, and drives substantial business growth and success.
Download our framework, “Designing a Best of Breed Partner Program”
A well-designed partner program can set the stage for success. Download our framework to learn about the eight components of a best practice partner program and three quick-start areas to accelerate channel revenue growth.
Marketing campaigns typically target one of two main audience pools: 1) Prospects, i.e., those a business aims to acquire; or 2) Members, i.e., those a business seeks to retain or engage. To measure marketing effectiveness, each group presents unique challenges.
In the case of acquisition marketing, information about prospects is often limited. Marketers must allocate spending across various channels to cast a broader net to acquire new customers. They can target addressable markets that meet their specific criteria (i.e., individuals aged 65+ for insurance companies that provide Medicare plans or those that live within specified locations for retail organizations). However, complete information about the prospect pool is often unknowable.
By contrast, member marketing benefits from having extensive data on current members. Much of the person-level data utilized is generally stored within an organization’s first-party data. This allows marketers to utilize personalized channels (such as digital outreach and content-specific mail) for more precise targeting and measurement.
In this blog, we will examine and compare these two types of marketing. For each type, we’ll touch on the broader objectives and measurement techniques, the data required for measurement, the typical challenges marketers face when attempting to gain insights from data, and the ways that analysts can offset these challenges to measure marketing effectiveness.
Why It’s Crucial to Measure Marketing Effectiveness
The measure of marketing effectiveness establishes the incrementality of our marketing, providing clarity into what we gained from a campaign, i.e., would not have happened otherwise. This is essential for discussing budget, ROI, and the net benefits marketing provides to the business. Effective measurement leads to educated, data-driven, strategic marketing decisions that account for unknowns and clarify the ideal focus areas.
Whether the marketing is directed at prospects (acquisition) or members, measurement allows marketers to answer big questions such as:
“What is the value that marketing is providing?”
“How can we increase our outcomes given budget constraints?”
“What marketing tactics are the most effective at driving positive outcomes?”
These are complex questions that drive strategic decisions in organizations. The identification of insights that can provide better answers to these questions and make more informed decisions is invaluable.
Acquisition Marketing Campaign Measurement
Acquisition marketing is complicated. Marketers have a wide range of channels to use to engage new prospects. These include traditional channels, such as Direct Mail and TV (both branded and direct response); the wide array of digital channels, such as Display, Paid Search, Organic Search, Online Video, and Social Media, amongst many others; and more industry-specific channels, such as Events, Affiliates, or Out-of-Home (OOH, e.g. billboards or transit ads).
Measurement Techniques
With so many channel choices, finding the right mix of budget resources can be challenging. Marketing Mix Modeling (MMM) is an excellent method for answering mix-related questions. An MMM is an econometric model that helps measure the effectiveness of marketing activities and their impact on sales or other KPIs. Within MMM model development, audience granularity is not required. A set of aggregated data at various segment levels (e.g., time periods, geographies, product categories, and other types of well-defined metadata) can be used to build a valuable model.
While MMM helps provide a broad understanding of how to allocate spend across channels, it is limited in its ability to provide insights at the more granular customer level. Identifying how to attribute “credit” for each sale (or other output KPI) across marketing channels requires some attribution models.
The default, simpler methodology is last-touch attribution (LTA). This methodology attributes 100% of the sale to the last channel a customer engaged with before converting. This often leads to over-attribution of bottom-of-funnel channels in reporting, causing an overinvestment in these down-funnel channels that is more straightforward to measure; and atrophy within less measurable, upper-funnel channels that play an essential role in the acquisition process through their impact on brand health and audience attitudes.
The far more challenging approach is multi-touch attribution (MTA). MTA aims to solve the LTA problem by providing a more comprehensive view of the customer journey. The model apportions the credit of an application across the various channels that prospects engaged with throughout the buying process. A good MTA model can help marketers better understand the value of each touchpoint. With the complicated ecosystem of marketing and sales efforts, better information can significantly improve strategies.
Data Requirements
One key challenge in measuring acquisition marketing is finding signal within the sheer volume of data available. With a huge market and many potential customers, marketers have access to a wealth of information that can inform their strategies. However, the quality and granularity of the data across channels can vary significantly.
For example, in the case of Direct Mail as an acquisition channel, marketers have complete visibility into the “1s” and “0s” – that is, they know exactly who received a piece of mail and who did not. This level of granularity allows for more precise measurement and analysis.
Alternatively, poor ID resolution for upper- to mid-funnel channels, such as Digital channels and TV, is a crucial hurdle to overcome when building an acquisition MTA. Unlike Direct Mail, which allows for precise tracking, these channels often rely on cookies, device IDs, or other imperfect methods of identification. As a result, these channels usually only yield aggregated impression and response counts. As such, it can be difficult to connect individual users across different touchpoints on these platforms if they don’t use them to convert.
Finally, multiple application capture channels generate different levels of data, which can be challenging to reconcile with each other and the funnel activity. For example, a lead captured through a website form may provide different information than one captured through a call center. Integrating these disparate data sources into a cohesive MTA model can be complex and time-consuming.
Strategies for Overcoming Measurement Obstacles
Marketers can implement several strategies to overcome the challenges associated with poor data granularity within marketing channels. One strategy is an aggregated MTA approach, as opposed to the more traditional discrete MTA detailed above.
An aggregated MTA measures the effect of marketing investments on the last-touch attributed marketing channels or known segments over a given time period. This methodology estimates channel- or segment-specific dependent variables using multiple stimulus variables. In many ways, an aggregated MTA is more similar to an MMM in technical modeling methodology than a discrete MTA. In this instance, it measures the multi-channel impacts on siloed/coded response measures.
Model propensities offer another approach to overcoming the measurement challenges of channels without person-level impression data. This is applicable for channels, such as DRTV or Digital channels, where the individuals who convert (the “1s”) are known, but those who do not convert (the “0s”) are not. The aggregate number of impressions for these channels is known for a specific geography and time period. A model can be built to impute impressions within the broader audience pool. Individuals with higher propensities can then be treated as a “high likelihood” impression within a traditional MTA model.
Member Marketing Campaign Measurement
Member marketing can have many objectives, such as ensuring existing customers remain satisfied with their experience or engaging the individual to take certain actions to add value to the organization (e.g., extending their subscription, buying an add-on, etc.). While having a “known” population makes member marketing campaigns easier from a targeting perspective, these campaigns can face challenges in determining the prioritization of the testing agenda.
Measurement Techniques
A Randomized Control Trial (RCT) is the classic approach for testing incrementality. In this methodology, the population is randomly split into “test” and “control” groups, where the former receives the novel treatment to be evaluated, and the latter gets a comparative baseline. This baseline typically does not include marketing and is considered a universal holdout for establishing marketing’s incrementality in general. This approach allows the team to determine what conversions would have been lost if the campaign had not been run.
While RCT is the most straightforward approach conceptually, there are some opportunity costs to consider. In member marketing campaigns, the impact of traditional RCT universal holdouts tends to be “felt” more by the business because the members being withheld from marketing are known. Additionally, in its purest form, an RCT should ask a simple, one-objective question per test, which, depending on the outcome’s response timeline, can frustrate marketers who want to test and learn on big questions at a faster cadence.
Given the desire to preserve the whole addressable population in most member campaigns, MTA is a measurement route that can provide reads on incrementality while allowing the total population to be messaged. As discussed in our acquisition example, member-level data enables us to apply this modeling concept at a per member, per journey level, or a discrete MTA. The outcome is an understanding of incrementality per conversion while enabling aggregations to multiple different audience cuts given the discrete data origins.
In the member marketing world, a discrete MTA is an attractive foil to RCT, given that it mitigates some of the obstacles noted above. When using this approach, a universal holdout is not necessary to achieve incrementality. However, embedding RCT outcomes into the model itself and comparing the model output against RCT tests is a way to make your discrete MTA more mature.
Since discrete MTA models are built at an individual level and member populations can quickly get into the tens of millions, an advanced data science platform with scalable compute is necessary to enable this type of work (e.g., Databricks or Snowflake). Additionally, temporal constraints on the underlying data feeding these models can sometimes create a barrier to scoring and results. For example, if our campaign’s objective is first to have a member schedule an appointment with their doctor and, after that, schedule another appointment to get a specific test, such as a colonoscopy, the lifecycle of that final response is several months. Additional data complications and refresh cadences can delay this process even further. Given that MTAs look backward on conversions to see what drove the action, the campaigns receiving credit in this scenario would not get proper attribution for several months.
Data Requirements
In a member marketing scenario, we have distinct, discrete data on each person we are marketing to. There is also a clear line of sight into each individual’s outcome, i.e., whether they did or did not act. This means we can cleanly get to the “1s” and “0s,” reflecting the true outcome for a member. This foundation serves RCT and discrete MTA methods, allowing for tailored sampling and individual-level modeling.
Additionally, the marketing typically measured in member campaigns is not usually awareness-based outreach (e.g., TV ads). Instead, it consists mainly of direct response marketing. Channels such as email, outbound calling, SMS, and direct mail can all directly route members to the intended outcome. This facilitates member-level tracking of each marketing piece an individual received and intermediary metrics (opens, clicks) for the applicable channels.
Strategies for Overcoming Measurement Obstacles
Given the tradeoffs between the two outlined approaches, the marketing team should align with stakeholders on a clear learning agenda. If faster results are the priority, RCT would be the best solution despite its tradeoffs. To overcome hesitancy in the control population in the case of a universal holdout, the analytics team can provide various options on how sensitive (what level of confidence) and specific (how small of a difference) the test read will be—the more sensitive and specific, the larger the holdout. Often, a middle ground can be found to produce definitive reads at a reasonable population size.
If there is no desire to withhold individuals from marketing and the conversion metric being tracked has a reasonable lifecycle, a discrete MTA would be the favored method. Once it has been built and given consistent streaming inputs and a relatively swift conversion metric, discrete MTAs can migrate into real-time reporting of conversions reported to date.
Techniques & Strategies for Effective Campaign Measurement
It’s crucial to measure the marketing effectiveness of acquisition and member marketing campaigns to make informed, data-driven decisions that optimize budget allocation and drive positive outcomes. While each marketing method presents challenges, various measurement techniques and strategies can be employed to overcome these obstacles.
For acquisition marketing, a combination of MMM and MTA can provide a comprehensive understanding of how to allocate spend across channels and attribute credit for conversions. In the case of member marketing, RCTs and discrete MTAs can help establish the incrementality of campaigns and identify the most effective tactics for engaging existing customers. By leveraging the appropriate measurement techniques, utilizing the available data, and aligning on clear goals, marketers can make strategic decisions that drive business growth and demonstrate the value of their efforts.
In the fast-paced B2B marketing and sales industry, analytics have become indispensable. Successful go-to-market strategies require a thorough understanding of the five essential jobs that analytics teams perform to unlock success. This blog delves into each job, providing insights and best practices for optimal results.
What are the Five Essential Jobs?
1) The What: Reporting
Definition: The What job is about reporting the facts. How many leads are we generating, how quickly are revenue analytics leads moving through the funnel, how much are we spending on marketing, and how is each rep doing versus their goals? This is the most basic, and also the most important, “table stakes” job. Fortunately, it can be mostly automated via business intelligence tools.
Reporting is the lifeblood of sales and marketing executives, ideally offering near-real-time performance insights. Different types of reports provide insights into various parts of the pipeline.
Types of Reports:
Activity-Based Reports provide insights into day-to-day actions and engagements that drive a B2B marketing and sales process.
Performance-Based Reports look beyond activities and assess the results and outcomes of sales and marketing efforts.
Forecasting Reports anticipate future sales movements and market trends.
Customer Reports offer insights into customers’ acquisition cost (CAC), their value over time, and the results of targeted marketing efforts.
Reporting Best Practices:
Ensure data quality for trusted reports.
Start with a simple report and gradually expand.
Establish a common data language (taxonomy).
Address duplicate data issues for accurate reporting.
Employ centralized data storage for effective reporting.
2) The Why: True Analysis
Definition: The Why job is most akin to the Greek root words of analytics—literally “untying a knot.” Analysts answer never-ending ad hoc questions from executives on almost any topic imaginable. This process cannot be automated; analysts need fast access to clean data and sound data science tools to get results.
Marketing and sales analysts are crucial to the success of a commercial team; they perform the vital task of true analysis, combining business and data science skills to unravel insights and drive informed decision-making.
Key Requirements for Analysts:
Fast compute and affordable, limitless storage.
Reproducible analysis through text-based data science languages.
Timely access to relevant data for analysis.
Agile Project Management for Analysis:
Use a Kanban approach for task listing and prioritization.
Maintain a clear list of analysis tasks, stakeholders, and due dates.
3) The Who: Targeting and Segmentation
Definition: The Who job deals with accounts and customers—segmenting them, targeting them, and serving them the right content. This job provides the strategic input for Account-Based Marketing (ABM).
Effective B2B targeting involves understanding market dynamics, including segment targeting, within-segment targeting, and within-account segmentation. The analytics team must integrate various data sources to create target lists and buyer archetypes.
Data Integration for Targeting:
Segment and Within-Segment Targeting: Industry trends, firmographic data, behavioral data.
Account Targeting: Master hierarchical table structure for accounts.
Within Account Targeting: Categorize key players, map decision hierarchies, and use qualitative and quantitative research.
Define actionability requirements before segmentation.
Assign segments back to leads and contacts with reasonable accuracy.
4) The How: Measurement
Definition: The How job is about measuring marketing and sales: How did we get this lead? And what can we do to get more like it? In B2C companies, media mix modeling (MMM) is commonly used to get these answers. This is trickier for B2B companies but just as critical.
B2B marketing measurement poses unique challenges, requiring a nuanced approach. Small-n deals, multiple objectives, chunkier tactics, long-time scales, audience complexity, and sales integration demand a hybrid measurement approach.
Measurement Challenges and Approaches:
Separate measurement for different objectives (awareness, demand generation, sales enablement).
Hybrid approaches combining econometric, deterministic, test-based, and heuristic-based methods.
Leverage management insights to complement data-driven models.
5) What’s Upcoming: Prediction
Definition: The What’s Upcoming job predicts what customers will do or respond to. It is the “action” side of account-based marketing and depends on machine learning techniques (predicting and classifying based on signals).
Machine learning plays a pivotal role in predicting outcomes in B2B go-to-market analytics. However, B2B predictive modeling faces challenges such as sparser signals, fewer observations, and a need for a deeper understanding of individual features.
Challenges and Considerations:
Signals are sparser in B2B predictive modeling.
Fewer observations require a more challenging modeling approach.
Consider outsourcing to ABM platforms but retain a team of predictive data scientists for deeper insights.
Unlocking Success in B2B Go-to-Market Analytics
Mastering the five essential jobs in B2B go-to-market analytics—reporting, true analysis, targeting and segmentation, measurement, and prediction—is key to unlocking success. Staying ahead with innovative approaches and a robust analytics strategy will pave the way for sustained growth as the landscape evolves.
Download our whitepaper, “A Roadmap for Modern B2B Go-to-Market: Part 2 – Operations and Analytics”
Download this whitepaper to learn more about the processes, technology, and analytics needed to meet revenue goals.