Blog
Marketing
Everything you need to know about data centers and ad fraud

Everything you need to know about data centers and ad fraud

Jonathan Marciano

Marketing | August 02, 2020

Data centers play a fundamental role in our society and digital economy. Everything that happens online is housed in a data center. In these buildings full of servers and other digital equipment, videos and other files are stored, important software runs and data is exchanged between different networks that form a data distribution hub. The largest data center footprints come from the largest cloud and internet companies, including Amazon, Microsoft, Google, and Facebook. For instance, a new site announced by Facebook creating 100 jobs is set to open in 2022 in the US, which is the leading location for data centers, followed by China. There are now more than 500 hyperscale data centers in the world.

However, with the rise of data centers, online fraud and ad fraud have thrived. The weaponization of data center traffic plays a large part in the rise of ad fraud, which costs $23 billion a year for online marketers.

Using data from CHEQ, for instance, we see that nearly 50% of online ad fraud attacks involve data center bot traffic. In this post, we look closer at how bot operators use data center traffic to carry out all sorts of fraud, using bots to drain ad budgets.

Table of Contents

What is a Data Center?

A data center is a facility housing many networked computers that work together to process, store, and share data. Most major tech companies rely heavily upon data centers as a central component in delivering online services. Interestingly, the amount of electricity used at a large data center rivals that of a small town.

Fraud and Data Centers: The Link

The online advertising watchdog, the MRC, requires in its 2020 guidance that best practice for fraud detection involves removing data center traffic from online advertising campaigns. In the words of the MRC, your provider should provide “filtration of IPs those of known hosting entities determined to be a consistent source of non-human traffic; not including routing artifacts of legitimate users or virtual machine legitimate browsing.”

Data centers remain popular among bot-runners because they represent an easy and relatively inexpensive way to hide illegal activity. It works like this: data centers have a network of servers that have their own IP addresses. In general, there are organizations in charge of allocating these IP addresses (actually ranges of IP addresses). There are cases in which companies get a range of IP addresses and provide a proxy service to their users. There are some services that offer the usage of a few tens of thousands of IP addresses for a few dozen dollars. Fraudsters like to use such services since click fraud prevention tools, such as CHEQ (and many others), block users from making too many requests in a short period of time. In this way, fraudsters route their bots through data centers (proxy services) in order to “change” their IP in an attempt to avoid detection.

Identifying Data Center Traffic

When an entity gets an IP range, this data is stored in a regional internet registry (RIR) database which is available to the public. There are common usage types (although it’s not official nor globally agreed on). These uses are Commercial; Government; Military: University/College/school; Library; Content Delivery Network; Fixed Line ISP; Mobile ISP; Data Center/Web Hosting/Transit; Search Engine Spider and Reserved.

Technically, each IP address can be mapped to an organization. However, it is not always clear what the company actually does. Moreover, there are cases in which IP ranges were registered under fake names.

In fraud detection we tend to focus on Data Center/Web Hosting/Transit (DCH); Fixed Line ISP; and Mobile ISP. Fixed line ISP and mobile means regular domestic users who surf from their homes or mobile devices. In contrast, DCH means that the IP range is assigned to a company that is not an ISP, normally for its own usage (servers, for example). Now, servers are not supposed to surf websites like regular users; that’s why we consider it fraud.

Examples of data center driven traffic

There are many instances of bad data center traffic. Google, for example, identifies publisher fraud, where publishers run software tools in data centers to intentionally mislead advertisers with fake impressions and clicks. In one case, involving a fake click program called Urlspirit, there were more than 6,5000 data-center installations of the software, with each data-center installation running in a separate virtual machine. In aggregate, the data center installations of this software, generated an average of 2,500 fraudulent ad requests per installation per day.

Overall, with most bad bot traffic emanating from data centers, the U.S. remains the “bad bot superpower” with 45.9 percent of bad bot traffic coming from the United States.

Do not block all data center traffic

However simply using data center “blocklists”, which is a widespread means of blocking unwelcome visitors, can involve blocking many real and legitimate users. This can be the case with an organizational IP addresses, or an organizational VPN\proxy, used by an office. These are also used by the very human employees who surf websites. In this way, Secure Web Gateways (SWG), corporate data centers, web proxies, and VPNs have become a means that substantial numbers of web users use data centers as launchpads to the internet. Traffic from such users is routed through IP addresses allocated to data centers. Blocking all data center traffic could result also block large numbers of genuine visitors using data centers as internet gateways.

A far better approach to preventing bot traffic from killing your ad campaigns would be to deploy a click fraud solution that’s backed by cybersecurity-levels of detection and Machine Learning technologies. This can analyze every single impression against more than 1000 cybersecurity tests to analyze if a click or impression is from a human or not, whether it arrives from a data center or not.

Bots moving from data centers to residential IP addresses

Indeed, though many traditional attacks come via data centers that are using bot/automation tools, more sophisticated click fraud attacks require far more protection. In more and more cases attacks use a legitimate environment including residential networks. To appear even more human, today’s bad bots increasingly use residential IP addresses instead of data center IPs, even though this is more expensive than center IPs. Secondly, bad bot traffic also deploys mobile ISPs when the cheaper residential or data center options are not effective. Thirdly, a significant fraction of bot attacks perpetrating ad fraud also use smartphones and PCs that have been infected with malware or compromised browsers, browser add-ons/ plugins, and apps.

In most cases, we find that bot runners shelling out on expensive residential IPs or mobile ISP proxies are highly motivated. They will often also lie about the user agent and modify their fingerprints, as well as the HTTP headers sent out to investigators. It is a given that today’s bots are almost indistinguishable from legitimate human traffic able to execute JavaScript and leveraging real browsers or browsers executed “headlessly”* such as Chrome, Firefox, and Internet Explorer [ *headless browsers allow the execution of a full version of the latest browser while controlling it programmatically. It can be used on servers without dedicated graphics or display, meaning that it runs without its “head”, the Graphical User Interface (GUI). In headless mode, it’s possible to navigate from page to page without human intervention, confirm JavaScript functionality, and generate reports].

Use of data centers accounts for most ad fraud attacks, but reliance on industry blocklists are certainly not enough even in tackling this issue. With bot sophistication enhancing daily, we must keep ahead of attacks from the data center and beyond.

To learn more on how to prevent ad fraud, book a demo here.