1. The Problem
Recent evolutions in digital advertising platform algorithms have given ads an extraordinary level of relevance. Optimization processes increasingly revolve around the ads themselves, making the decision about their performance a critical one.
However, most paid media operations make these decisions using arbitrary criteria: “give it three days,” “wait for a thousand impressions,” “if it doesn’t have a good CTR in 48 hours, kill it.” Leaving these decisions entirely in the hands of platform algorithms isn’t a solution either — it frequently leads to suboptimal results.
These rules have no statistical foundation. They are heuristics that seem reasonable but generate two costly errors: killing ads that could have worked, because they weren’t given enough data volume to evaluate their real performance; and keeping ads that don’t work, because the arbitrary threshold wasn’t reached and the team keeps waiting without a clear criterion to decide.
The result is wasted budget in both directions
The underlying problem is simple: paid media teams make decisions without knowing how much data they need for those decisions to be reliable. It’s not a tools or platform problem — it’s a methodology problem.
This article proposes an approach based on Bayesian inference to answer a question every paid media team should be able to answer: how many impressions do I need to know if my ad is working?
2. The Conceptual Framework
Every time an ad is shown to a user, a binary event occurs: the user clicks or doesn’t click. This turns each impression into an experiment with two possible outcomes, which in statistics is modeled as a Bernoulli trial.
The CTR (click-through rate) we see in platforms is simply the ratio of clicks to impressions. But that number is not the ad’s real CTR — it’s an estimate based on the data we have so far. With 50 impressions, that estimate is highly unstable. With 5,000, it’s much more reliable — but if the ad is bad, that reliability has proven very costly. The question is: at what point do we have enough data to trust what we’re seeing without having overpaid for the answer?
Traditional frequentist statistics would answer this question with hypothesis tests and p-values. But that approach has an important practical limitation: it requires defining fixed sample sizes before starting, and it doesn’t adapt well to the reality of paid media, where data arrives continuously and decisions can’t wait until the end of a formal experiment.
Bayesian inference offers a more natural alternative for this context. Instead of asking “can I reject the null hypothesis?”, it asks something far more useful: given what I’ve observed so far, what is the probability that this ad’s real CTR is below my benchmark?
This approach has three practical advantages for paid media operations: it updates with each new impression, it doesn’t require predefined sample sizes, and it produces a direct probability that is easy to interpret and convert into a decision rule.
The specific model we use is based on the Beta distribution, which is the natural tool for modeling proportions — like CTR — when the data is binary. The next section explains how it works.
3. The Method: Bayesian Inference with the Beta Distribution
The Beta distribution is a probability distribution defined between 0 and 1, making it ideal for modeling proportions like CTR. It is defined by two parameters: alpha (α) and beta (β), which in our context have a direct interpretation:
α = number of clicks + 1
β = number of non-clicks + 1
Before showing the ad a single time, we start with what is called a non-informative prior: Beta(1,1). This is equivalent to saying “I have no prior information about this ad’s CTR — any value between 0% and 100% is equally possible.” It is a position of total ignorance, and it is deliberate.
With each impression, the distribution updates automatically. If the ad receives 10 clicks in 200 impressions, the posterior distribution is Beta(11, 191). This distribution is no longer flat — it concentrates around 5.5% but with a range of uncertainty that reflects the limited volume of data.
The power of this approach is what we can calculate from that posterior distribution: the probability that the ad’s real CTR is below any threshold we define as a benchmark.
For example, if our benchmark is a 2% CTR, we can ask: what is the probability that this ad’s real CTR is less than 2%? If the posterior distribution tells us that probability is 95%, we have a solid statistical basis to discard the ad. If it’s 60%, we don’t have enough evidence yet — we need more impressions.
Here is the complete mechanism:
1. Define a CTR benchmark based on the campaign context (industry, platform, format, objective).
2. Establish a minimum confidence level for making decisions (e.g., 90% or 95%).
3. With each new impression, update the Beta distribution and calculate the probability that the real CTR is below the benchmark.
4. When that probability exceeds the established confidence level, discard the ad. As long as it doesn’t, the ad remains active.
The result is a decision criterion that doesn’t depend on arbitrary rules but on accumulated evidence, and that becomes more precise with each new data point.
4. The Decision Table
To make this framework operational, we built a reference table that answers the central question: given a number of impressions and a confidence level, what is the maximum observed CTR that justifies discarding an ad?
The table reads as follows: if your benchmark is a 2% CTR, your ad has 500 impressions, and you want to make decisions with 90% confidence, look up the intersection of those values. If your ad’s observed CTR is below the number in that cell, you have statistical grounds to discard it.
Benchmark: 2% CTR
|
Impressions
|
80% Confidence
|
90% Confidence
|
95% Confidence
|
|
100
|
0.00%
|
N/A
|
N/A
|
|
200
|
0.50%
|
0.50%
|
0.00%
|
|
500
|
1.20%
|
1.00%
|
0.80%
|
|
1,000
|
1.50%
|
1.30%
|
1.20%
|
|
2,000
|
1.70%
|
1.55%
|
1.45%
|
|
5,000
|
1.82%
|
1.72%
|
1.66%
|
“N/A” values indicate that no observed CTR allows discarding the ad at that confidence level with that volume of impressions. Making decisions in those ranges isn’t just imprecise — it’s statistically impossible.
Practical example: An ad has 500 impressions and a 0.8% CTR. Your benchmark is 2% and you operate at 90% confidence. The table value for that intersection is 1.00%. Since 0.8% is less than 1.00%, you can discard the ad with statistical backing.
Another scenario: the same ad has a 1.2% CTR at 500 impressions. The threshold at 90% is 1.00%. Since 1.2% is above it, you don’t have enough evidence to discard it — the ad remains active and is re-evaluated as more data comes in.
The table makes something important evident: with few impressions, you can only discard clearly bad ads. For finer decisions — distinguishing between a 1.5% and a 2% CTR — you need more volume. There are no statistical shortcuts for that.
This table can be adapted to any CTR benchmark and built for other binary metrics such as conversion rate or email open rate. The principle is the same.
5. Operational Implications
This framework has direct consequences on how digital advertising campaigns are planned and executed.
The minimum budget is not arbitrary — it’s calculable
If you need at least 500 impressions to make a decision with 90% confidence, and your average CPM is $20, then you need at least $10 per ad just to be able to evaluate it. If you’re testing 10 ad variants simultaneously, your minimum testing budget is $100 — before optimizing anything. Many teams launch tests with budgets that aren’t enough to generate the data they need to decide. The result is that they make decisions with insufficient data, which is worse than making no decision at all.
Killing ads early is more expensive than it seems
When you discard an ad at 100 impressions, you’re not saving budget — you’re throwing away the investment you made in those 100 impressions without having obtained usable information in return. Every impression that doesn’t contribute to a reliable decision is a wasted impression. The paradox is that the urgency to “not overspend” on a bad ad frequently leads to overspending on bad decisions.
The number of variants being tested must be proportional to the available budget
This is a common mistake: launching too many ad variants with a budget that isn’t enough to evaluate any of them reliably. It’s better to test 3 ads with enough impressions to decide than to test 10 without being able to evaluate any. The Bayesian framework makes this trade-off explicit and allows you to plan for it before launching the campaign.
Workflow integration is simple
The team defines the CTR benchmark and confidence level at the start of the campaign. From there, evaluating each ad comes down to a lookup against the table: is the observed CTR at this impression volume below the threshold? If yes, discard it. If not, it stays active. This eliminates subjective discussions about whether an ad “looks good” or “needs more time” and replaces them with a single, consistent criterion.
6. Limitations and Considerations
This framework is a powerful tool, but it’s not a complete solution. There are factors that must be considered when applying it.
CTR is not the only metric that matters
An ad can have an excellent CTR and generate clicks that don’t convert. This framework applies to any binary metric — conversion rate, open rate, engagement rate — but the chosen metric must be aligned with the campaign’s actual objective. Optimizing for CTR when the goal is conversion can lead to decisions that are technically correct but strategically wrong.
Audience fatigue distorts the data
An ad’s CTR is not static — it tends to degrade over time as the audience becomes saturated. An ad with a 2.5% CTR in its first 500 impressions may drop to 1.2% in the next 500. The Bayesian framework assumes the underlying CTR is constant, which is a simplification. In practice, it’s better to evaluate in time windows rather than just cumulative totals.
Performance varies by placement and platform
The same ad can have a 3% CTR in the Instagram feed and 0.5% in Audience Network. If data is aggregated without segmentation, the average CTR doesn’t represent any real context. The benchmark and evaluation should be applied at the segmentation level where operational decisions actually occur.
The non-informative prior is conservative
Starting with Beta(1,1) means we don’t incorporate prior knowledge about expected performance. For teams with robust historical data, it’s possible to use informed priors that accelerate decisions. This adds complexity but can significantly reduce the impression volume needed to decide.
This framework does not replace strategic judgment
Statistics tells you when you have enough evidence to discard an ad. It doesn’t tell you what ad to create, what message to test, or what audience to target. It’s an operational decision tool, not a creative strategy.