How to Prepare Your Data for Marketing Mix Modeling

You've decided to run a marketing mix model. You've read about what MMM is and how much data you need. Now you're staring at a pile of exports from Shopify, Meta Ads Manager, Google Ads, and maybe a few other platforms, and you're trying to figure out how to turn all of that into a single CSV that a model can actually work with.

This is the step where most people get stuck — not because it's technically hard, but because nobody tells you the specific formatting decisions that matter. This guide covers exactly how to structure your data, the mistakes that silently break your model, and a template you can copy. (If you're still deciding whether MMM is right for you, start with our complete guide to marketing mix modeling.)

The target format

Every MMM tool — whether it's CheapMMM, Robyn, Meridian, or anything else — expects basically the same input: a table where each row is a time period and each column is a variable. The first column is the date. The second is your outcome metric (usually sales or revenue). Every column after that is a marketing channel's spend for that period.

That's it. The goal of data preparation is getting to this shape cleanly.

| Date | Sales | Meta_Spend | Google_Spend | TikTok_Spend | Email_Sends | |------|-------|------------|--------------|--------------|-------------| | 2025-01-06 | 87,400 | 12,300 | 8,200 | 3,100 | 45,000 | | 2025-01-13 | 92,100 | 14,500 | 7,800 | 4,200 | 52,000 | | 2025-01-20 | 78,600 | 11,200 | 9,100 | 2,800 | 38,000 |

Weekly vs. monthly: pick weekly

You'll see guides that say weekly or monthly data both work. Technically true. Practically, weekly is almost always the better choice.

Monthly data gives you 12 rows per year. That's not enough for most models to produce stable attribution, and it smooths out the spend variation that the model needs to learn from. A big campaign push in the first week of March followed by a quiet last three weeks gets averaged into a single number that hides the pattern.

Weekly data gives you 52 rows per year. That's enough for a model to find meaningful patterns even with 6-9 months of history. It preserves the spend spikes and dips that are the most informative signal in your data.

Use Monday-to-Sunday weeks if possible, since most ad platforms default to this. Be consistent — if your first row starts on a Monday, every row should start on a Monday.

How to handle the date column

Use a consistent date format. YYYY-MM-DD (like 2025-01-06) is the safest because it's unambiguous. MM/DD/YY works too, but can cause confusion between American and European date formats if anyone else ever touches the file.

Each date should represent the start of the time period. If your row covers the week of January 6-12, the date value should be 2025-01-06.

Don't mix granularities. If most of your rows are weekly but you threw in a few monthly rows at the end because you didn't have weekly data yet, the model will either error out or produce garbage. Every row must represent the same time interval.

Revenue: pick one definition and stick with it

Your sales column should be the same metric across every row. The most common choices are gross revenue (total sales before refunds and discounts), net revenue (after refunds, before shipping and taxes), or total orders (if you're modeling conversion volume instead of dollars).

Any of these can work. What breaks the model is switching definitions partway through. If you pulled January through March from one Shopify report and April through June from a different report that uses a different revenue calculation, you've introduced a step change that the model will try to explain with your marketing variables. That produces incorrect attribution.

One specific trap: if your platform shows revenue net of ad platform fees or payment processing fees for some periods but not others, those aren't consistent. Use the raw revenue number from your commerce platform, not a derived metric from a dashboard that might change its calculation.

Channel columns: naming and structure

Name your columns clearly — Meta_Spend, Google_Spend, TikTok_Spend, TV_Spend. Avoid spaces in column names (use underscores) and avoid special characters. Keep the names short but recognizable.

One column per channel you want to measure. If you spend on Meta and Google, that's two columns. If you also run TikTok, add a third. Simple.

The granularity decision. You can model Meta as a single column (total Meta spend) or split it into Meta_Prospecting and Meta_Retargeting. More granularity gives you more specific insights but requires more data for stable estimates. The rule of thumb: if a sub-channel accounts for less than 10% of your total spend, roll it into its parent channel. If you have fewer than 6 months of data, keep channels aggregated.

Non-spend channels. For channels where spend isn't the right metric — like email, SMS, or organic social — you can use volume metrics instead: email sends, SMS messages sent, or organic social posts published. The model will estimate the relationship between that activity metric and sales, which is still useful even though it's not denominated in dollars. Just know that the ROAS output for those channels will be harder to interpret since the "investment" isn't in dollars.

Handling zero-spend periods

This is one of the most common data preparation questions, and the answer is simple: use zero, not blank.

If you didn't spend anything on TikTok for three weeks in February, those cells should contain 0, not be empty. A blank cell might be read as missing data (NaN), which many models handle differently from zero spend. Zero spend is informative — it tells the model what your sales look like when a channel is turned off. Missing data is noise.

The same applies if you paused a channel entirely for a month. Keep the column, fill it with zeros for the paused period. Those zero-spend weeks are actually some of the most valuable data points in your set because they show the model what happens when a channel goes dark.

Handling channel launches and pauses

If you launched TikTok ads in March and your data goes back to January, you have two months of zeros followed by spend data. That's fine — the model can handle it. The zeros before launch are treated as "this channel was off," which helps establish the baseline.

If you only have 4-6 weeks of data on a new channel, the model won't have enough history to produce reliable estimates for that channel specifically. You have two options: include it anyway and treat that channel's ROAS estimate with extra skepticism, or leave it out of this model run and add it in next quarter when you have more data.

If you permanently dropped a channel — say you stopped running TV ads 8 months ago — you can include the historical TV data up to the point it stopped. The model will use the period where TV was active (and the transition to inactive) to estimate TV's contribution. Or you can exclude the column entirely if the TV period was short relative to your total dataset.

Adding promotional flags

If you run promotions, sales events, or significant discount periods, adding a flag column dramatically improves model quality. Without it, the model will attribute promotional sales spikes to whatever ad channels were active that week, inflating their ROAS.

The simplest approach is a binary column: 1 for weeks with a promotion running, 0 for normal weeks. If you want more precision, use average discount percentage that week or a categorical indicator (0 = no promo, 1 = minor sale, 2 = major event like BFCM).

| Date | Sales | Meta_Spend | Google_Spend | Promo | |------|-------|------------|--------------|-------| | 2025-11-24 | 145,000 | 22,000 | 15,000 | 2 | | 2025-12-01 | 98,000 | 18,000 | 12,000 | 1 | | 2025-12-08 | 72,000 | 14,000 | 9,000 | 0 |

This lets the model say "sales were high in week 1 because of a major promo, not because Meta spend was high." Without the flag, Meta gets credit for the Black Friday bump. For more detail on how promos affect ecommerce models specifically, see our ecommerce and DTC MMM guide.

Common mistakes that silently break your model

Misaligned time periods. Your revenue data is Monday-Sunday weeks but your ad platform exports are Sunday-Saturday. A one-day offset means your spend data is slightly misaligned with your sales data, which weakens the correlations the model is trying to find. Check that all sources use the same week definition.

Currency mixing. If one channel reports in USD and another in EUR (common for brands running ads in multiple markets), the model treats those as comparable dollar values. Convert everything to the same currency before combining.

Including returns in some weeks but not others. If your commerce platform nets out returns as they're processed (not when the original sale happened), your revenue data gets distorted. A high-return week looks like low sales even if actual new sales were strong. Where possible, use gross sales and handle returns separately, or at least make sure the treatment is consistent.

Duplicate spend from rolled-up reports. If you pull a "total campaigns" export from Meta that includes spending you've already broken out separately (like a branded campaign column plus a total Meta column), the model sees double the spend. Each dollar should appear in exactly one column.

Trailing whitespace or invisible characters in CSV headers. If your column is named "Meta_Spend " (with a trailing space), some parsers won't match it to "Meta_Spend". Open your CSV in a text editor and check the header row. This sounds trivial but it's one of the most common reasons uploads fail.

A pre-upload checklist

Before you upload your CSV, run through this list.

Every row represents the same time interval (all weekly or all monthly, not mixed). The date column uses a consistent format with no gaps in the sequence. The sales column uses the same revenue definition across all rows. Every channel column contains numbers only, with zero for periods of no spend (not blank). Column names have no spaces, special characters, or trailing whitespace. All spend values are in the same currency. No column contains a rollup that duplicates spend from another column. If you run promotions, there's a flag column indicating promotional periods.

If all of those check out, your data is ready.

What to do after your data is ready

Upload it. If you're using CheapMMM, you can drag and drop the CSV and have results in under a minute. No login, no setup.

Once you have output, the quality of your results is directly tied to the quality of the data you fed in. If something looks off — a channel with impossibly high ROAS, or a model that doesn't fit your sales curve well — the first thing to check is always the data. Look for the common mistakes above, fix them, and re-run.

For help reading the output once you have it, see our guide on how to interpret MMM results. For guidance on whether your dataset is large enough to produce reliable results, see how much data you need for MMM.

Try CheapMMM free — no login required