CHEQ Raises $150 Million, led by Tiger Global

Learn More

Finding Fraud With Analytics; lessons from my Weekend Run

What if I told you I ran 12 miles this weekend? Most people wouldn’t even give it a second thought. But what if I were running for charity and donors were giving money based on how many miles I ran? They’d want to know, right? Of course. And of course, this is all hypothetical. But there is a lesson to be learned about analytics, which also applies to ad fraud, and the detection of it. Let me explain.

One Rolled-Up Number – 12.3 miles

Did I actually run 12 miles, or did I just TELL you that I did? Or perhaps I took a cab or a horse-drawn carriage ride around Central Park twice with my fitness tracker turned on so the distance covered was 12.3 miles. Without any further details, you’d have no way to tell.

Similarly, most reports from programmatic media buying just tell you one number — how many impressions you bought or how much you spent. Some may even tell you your pace of spending is lagging behind so you should hurry up and spend faster. Some reports may show an overall click through rate, averaged across the entire campaign. To me, these numbers are completely useless as analytics because there is no context with which to judge whether it was real or not, let alone accurate or not. The number could simply have been made up. In the case of fraudsters, would you expect them NOT to lie to you? Often they are outright lying to you (see “bad guys sell you traffic, but don’t even deliver it; they just trick your analytics to look like they did.”) What you need are more supporting details so you can check the plausibility of the analytics.

Supporting Details – Map, Duration, Average Pace

If the observer demanded more supporting details, they could see the map (header image) of the path around Central Park, and other details like the total duration – 2 hours 48 minutes — that would give them confidence that the 12.3 mile run was real. They can even calculate the average page of 13:38 per mile, which proves that I didn’t take a ride in a cab or horse-drawn carriage around Central Park. But it would also be clear that I didn’t “run” as claimed – it was more like a “brisk walk.”

Relating this to programmatic campaigns, we see that overall numbers are reported; but you really have to work hard to get more details yourself. For example, to go beyond the total impressions, total spend, and total clicks, you need to run your own custom reports. And those reports are limited too. The granularity is only down to the day – so you can only get details rolled up by day, not by hour. This level of details is better than nothing, but it still won’t tell you enough to know if the campaign was working well or not. For example, where the ad impressions run or when did your ads run? Did they all run during the overnight hours when humans are asleep, or worse get blown out in the first hour after midnight, so there are no impressions left to serve during actual waking hours?

In the chart above, you can see that most of the impressions were blown out right after midnight, or between midnight and 4a by bots. So there are no impressions to be served during the rest of the day to human audiences. If you only have reports rolled up to the day, you won’t have enough detail to see the fraud happening to you with your eyes open. So it is critical to insist on line item details.

Line item details – elevation, laps, and splits

From my “brisk walk,” RunKeeper tracks elevation changes and time splits, in addition to the GPS trace. And since I purportedly went around the Central Park big loop twice, the elevation changes of lap 1 should exactly match the elevation changes of lap 2 – they were the same course. And looking at the chart above, lap 1 elevation changes matches those of lap 2 exactly — so the data is internally consistent and thus provides corroboration that it is real – that I actually went around Central Park twice. Furthermore, by looking at the split time for each mile, you can see the course was completed at a relatively even pace, mile after mile. And it wasn’t something strange like taking a cab ride to complete the course in the first 5 minutes and leaving the timer on for another 2 hours and 43 minutes to manipulate the overall average.

Similarly, if you have line item details, the bad guys who commit ad fraud would have to work a LOT harder to fake different metrics in order to make the analytics “look right.” And that is the point. With line-item details in your analytics, you can finally tell if there is fraud, or if anything is strange or doesn’t even make common sense. For example, you may notice that you’re getting the exact same number of clicks every hour of every day or that the click through rate is the exact same every hour, regardless of whether it was waking hours or overnight hours. Clearly something is not right and the clicks are probably automated, and not from humans who are actually interested in what you were advertising. Or in your line-item placement reports, you see websites and mobile apps that you’ve never heard of eating up 10s of millions of your impressions, per day, with win-rates that hover close to 100%. Not all fraud is going to be this obvious, but if you have detailed reports, you can see what is happening and take appropriate action, as needed. The laundry list of possibilities goes on.

As a scientist, I look at data and I look for context and line item details so I can judge for myself whether the data is real or accurate. I look for internal consistency in the data and I look for other details that can corroborate the insights. Following these simple rules and insisting on line item details in all your programmatic campaign reporting will allow you to pick out ad fraud using just the analytics and your common sense.

And yes, I DO think about ad fraud during my long walks, and pretty much every waking moment for that matter – and some not-awake moments too, like in my dreams.