General invalid traffic (GIVT) is the kind of nonhuman traffic you would expect to routinely access your websites: bots, spiders, search engine crawlers, and other traffic coming fro data center IP addresses known to site owners and advertisers. GIVT will generally identify itself as such and is easy to filter out of campaigns and metrics.

Form Bots 101: Protecting Your Business from Spammy Leads and Traffic Burden

From collecting leads to taking orders and providing customer service, form fields play a crucial role in user experience. However, the use of automation and bots has made forms a prime target for spammers and hackers.

A form bot is a type of automated software that submits forms on websites at a high rate, bypassing any security measures.

Here, we will explore the world of form bots to explain what they are, how they work, and most importantly, how to protect your business against their negative effects, including spammy leads, data breaches, false analytics, and more.

Form Bots: How they work and the problems they cause

Often called form-filling bots, form bots automate the process of filling out and submitting online forms. While harmless in theory, these bots are designed to fraudulently pose as a human user. They are usually used in malicious ways, and are posed to cause a business many issues.

Form bots can be used to access gated content, flooding online forms with fake or stolen consumer data, providing no real or legitimate information. For a business, the data presented seems valid and coming from a ‘real person’, with some form bots even mimicking human behavior during the form-filling process. This allows for malicious activity including phishing, spamming and other fraudulent practices that can adversely affect your business.

Unreliable leads become costly

Many companies rely heavily on paid partnerships to promote their product or services. In turn, these businesses receive leads that fill out forms on their website. Fraudsters commonly pose as a reliable partner and exploit these companies by using form bots to generate fake or bad leads, later claiming credit and collecting a large payment. In many cases, businesses do not begin to realize these leads are unreliable until after a payment has been made.

Not only that, but, as a business, the more form bots fill out your website’s forms, the more your budgeted expenses deplete. Resources become wasted chasing down leads with no tangible outcome. Budgeted email campaigns go unseen, costly methods like ad retargeting fail to produce results, and revenue of brands relying on premium content or subscriptions are lost. Over time real leads can be overlooked and become missed opportunities. Even worse, as these efforts continue, businesses can suffer a substantial financial loss.

Click Hijacking attacks increased 125% in 2022. Learn more in our State of Fake Traffic 2023 report.

Form bots create a burden on website traffic

While form bots are not real people visiting your business’ website, they are still seen on the backend as site visitors. If your site is receiving a lot of traffic, real or fraudulent, your website can become unresponsive. This can turn away actual visitors frustrated with slow loading times.

In 2018 Google reported that mobile users’ bounce rate increased by as much as 123% when page load time increased from one second to ten seconds. For a visitor filling out multiple forms across multiple competing websites, any type of additional hassle can become the deciding factor in whether or not a form is filled, creating the potential of losing multiple legitimate leads.

Your competition gets ahead

In highly competitive markets where businesses are required to be fast-paced and proactive, form bots are a serious threat. Time theft is a destructive side effect of form bots. Consider it this way – while your business is chasing down fake leads from a bot, your competition could be closing with actual leads that may have been receptive to your product or service. In an even more concerning scenario, sometimes scammy competitors use form bots with malicious intent, throwing off potential leads and burdening your website performance.

Protect your business from form bots: Strategies and best practices

With the dangers of form bots at hand, it is crucial that your business takes proactive steps to protect against fraudulent leads. Here are a few tools and best practices that your business should follow:

Set up ReCAPTCHA

With Google ReCAPTCHA, you can prevent automated software from engaging in abusive activities on your site by using a risk analysis engine and adaptive CAPTCHAs. Each visitor’s behavior is analyzed to determine whether it is a bot or a human. Visitors either receive a simple box to check or a more involved puzzle that requires identifying a picture.

ReCAPTCHA is free, easy to set up, and requires little maintenance once installed. However, advanced bots are sometimes able to bypass reCAPTCHA, so it’s important to use other spam prevention methods in addition to it.

The biggest drawback to ReCAPTCHA and similar solutions is that it creates substantial friction in the user experience. A frustrating or slow authorization process can often cause users to abandon a form, task, or checkout altogether in search of a smoother process elsewhere.

Require visitors to double opt-in

A double opt-in form requires users to confirm their email address before their submission is accepted, protecting against form bots. In this process, users fill out a form to provide their email address and then receive a confirmation email containing a link that they must click to confirm their email address is correct and that they wish to subscribe.

Although double opt-in forms provide a high level of protection against form bots, they also add an extra step to the form submission process, which will frustrate some users and often drastically increase form abandonment. For that reason, it’s best to save this technique for pages and forms that are a necessity for the user, such as forms for changing travel plans or updating account information, as opposed to using it on one-time use forms such as downloads or checkout procedures.

Add form bot traps

Form bot traps are a type of security measure designed to prevent automated bots from submitting forms on websites. Hidden fields or challenges can be created inside a form that prevents a bot from completing it, while any legitimate human users will be able to easily pass.

Another type of form bot trap is called a honeypot trap. Unlike a form bot trap, a honeypot trap is specifically designed to draw in a form bot by creating fields that only a bot could see. Using a honeypot trap, bots will automatically fill out all form fields, including hidden ones, while legitimate users would not. When a bot attempts to submit a form, website owners can detect it using a honeypot trap and take appropriate action, such as blocking the submission or displaying a CAPTCHA challenge.

Rate limiting

A rate-limiting technique restricts the number of requests or actions a service or application can handle within a given timeframe. To prevent bots from submitting forms, rate limiting can be used to limit the number of submissions made by a single IP address or user account.

One IP address might be allowed to submit 10 forms per hour, for example. Bots that submit large numbers of forms quickly will reach the rate limit and be prevented from submitting any more forms.

Various techniques can be used on the server side to implement rate limiting, for example, using a database to track how many form submissions are coming from each IP address and blocking or throttling those that exceed the limit. Rate limiting can also be handled using libraries such as express-rate-limit (for Node.js) and rate limiter (for Python).

When setting up rate limiting, it’s important to consider whitelisted IP addresses, such as IPs from your own company, so that your employees are not affected by the rate limit rules.

It’s important to note that rate limiting alone is not sufficient to completely prevent form bots, so it’s often used with other approaches, such as CAPTCHA or hidden fields, to provide a more thorough defense against them.

Use IP and Geolocation Measures

Form bots can also be prevented by using IP and geolocation measures together by identifying and blocking requests from certain locations or IP ranges that are known to be associated with bot activity.

Using this method, bots can be blocked from the traffic coming from data centers or hosting providers.

Geolocation data can be used to block bot traffic from specific countries or regions. Form submissions can be verified by comparing their IP addresses against a database of known IP ranges and their associated countries and regions.

The blocking of IP addresses and geolocations can prevent bot activity, but it can also block legitimate users if they are blocked by mistake. You can prevent this by using a form to request unblocking IP addresses or locations.

Leverage a Fraud and Bot Detection tool

While the techniques outlined above can give you some piece of mind and can be quite effective in certain circumstances, most of these tools also have significant drawbacks, such as increased friction on the user experience or heavy labor costs for your IT department. The best solution to stop form bots is to leverage a fraud and bot detection tool. Not only do these tools detect and block invalid traffic, but they also save valuable time and keep friction for legitimate users to a minimum.

CHEQ leverages thousands of security challenges to evaluate site traffic in real-time, determine whether a visitor is legitimate, suspicious, or invalid, and take appropriate action in blocking or redirecting that user. For paid traffic, CHEQ automatically updates IP exclusion lists to reflect the constantly changing threat landscape, saving you valuable time and ad spend.

Frequently Asked Questions

What is a form bot, and how does it work?

Form bots are automated software that can submit forms on websites. Using predetermined data, form bots simulate human actions while filling out forms. Forms can be automatically submitted at a high rate, bypassing any security measures designed to prevent automated submissions.

How do form bots harm businesses?

A form bot can cause businesses several problems, including data breaches, distortion of analytics, abuse of business resources, revenue loss, and email marketing issues.

How can I protect my business from form bots?

Several methods can be used to prevent form bots, including fraud detection software, CAPTCHAs, form bot traps, rate limiting, IP blocking, and double opt-in forms.

What are some signs that my business may be a target for form bots?

The number of form submissions from a single IP address, the number of submissions with invalid or fake data, and the number of submissions in a short period of time may indicate that a business is a target for form bots.

What should I do if I suspect that my business is being targeted by form bots?

You should take immediate action if you suspect your website or data is being attacked by form bots. Implementing security measures such as CAPTCHAs, hidden fields, rate limiting, IP blocking, and double opt-in forms can help. Additionally, you may want to consider using a service or tool that can help you detect and prevent bot activity, like CHEQ. Consult a security expert or contact law enforcement if the situation is severe.

Over the past 40 years, the internet has expanded into a massive highway of information–with billions of daily active users and trillions of daily engagements–driving innovation, growth, and connectivity on a global scale.

But as the internet has grown in scale and sophistication, the quality and authenticity of its traffic have decreased as the web is increasingly flooded with automation tools, bots (good or bad), and users who, for one reason or another, aren’t genuine. In the marketing industry, this traffic is known as Invalid Traffic (IVT).

To help better understand this phenomenon and the impact it has on businesses, CHEQ has conducted the first annual State of Fake Traffic report.

By analyzing billions of data points from tens of thousands of anonymized campaigns, funnels, and websites protected by CHEQ, we were able to gain accurate insights into the scope of the fake traffic problem and how it affects different platforms, industries, and regions. In this blog post, we’ll examine the leading referral sources for fake traffic in an effort to show which platforms are leading sources of bot traffic. For more information on how specific industries, regions, and platforms are affected by fake traffic, download the full 35-page report here.

What is Invalid Traffic (IVT)?

Invalid traffic is web traffic that consists of bots, fake users, and otherwise invalid users who cannot turn into legitimate customers. This could mean harmless bots like search engine web scrapers or malicious traffic like ad fraud botnets.

Google defines invalid traffic as “any activity that doesn’t come from a real user with genuine interest. It can include accidental clicks caused by intrusive ad implementations, fraudulent clicking by competing advertisers, advertising botnets, and more.”

To Google, IVT is primarily a concern because it can be used to artificially inflate a publisher’s ad earnings, a practice that is against Google Ad’s terms of service. But invalid traffic isn’t limited to paid traffic; it also makes up a large portion of direct traffic and unique site visitors and has plenty of adverse effects downstream, from polluted marketing analytics to wasted remarketing efforts.

Invalid Traffic’s Impact on Marketing Organizations

Historically, IVT has been a major concern for information technology and security teams looking to guard organizations from bad actors disguising their online footprint to commit attacks. However, as today’s CMOs have realized, IVT is also a prevalent problem for marketers and go-to-market teams.

For marketers and businesses dependent on web traffic to drive sales, this creates a unique challenge: Because of the prevalence of IVT, nearly every marketing funnel, campaign, and operation is impacted to some degree, oftentimes in very harmful ways.

Where IVT is present, audiences, CDP segments, and CRMs become polluted, campaigns become optimized toward fake users, and revenue opportunities are missed. Analytics and BI systems are skewed by bad data, leading to poor insights and worse decisions made on bad information.

Additionally, website and conversion funnels are disrupted by invalid leads and visitors. This is a challenge that must be dealt with, sooner, rather than later.

Examining Invalid Traffic by Source

Fake traffic is a persistent threat that affects all digital marketing channels. Left unaddressed, this fake traffic will waste advertising budgets and create negative downstream effects such as poorly optimized and ineffective campaigns, confused analytics, and inaccurate attribution.

Despite the best efforts of search engines, ad networks, and social media platforms to mitigate fraud and falsification through dedicated teams and built-in tools, there remains a significant ingress of fake traffic across all platforms.

Our analysis of billions of fake traffic referrals found a general level of parity in fake traffic across most platforms, with some notable exceptions. The general findings of our research is outlined in the chart below:

Social Fake Traffic Rates Climb as Professional Networks Attract Bad Actors

While the general fake traffic rates for social media platforms were lower than comparable search and display ads, one category of social media had, by far, the highest fake traffic rates of any platform studied.

Professional networking platforms had an average invalid referral rate of 12.4%, with 9.7% of paid traffic and 15.3% of organic traffic determined to be invalid.

For hackers, these platforms make a convenient group of high-value targets.

For those committing ad fraud, the incentive is even stronger. At an average of $5.58 per click in 2022, the cost-per-click for professional networks is typically up to five times that of typical social and PPC costs. From an attacker’s point of view, that makes it five times more efficient to target a campaign on these sites.

Click Hijacking Attacks Drive Fake Traffic to Display Ads

Display ads are the oldest form of online advertising, and they’re still an extremely effective tool that allows businesses to reach a broad audience and raise brand awareness. However, because these ads are delivered to third-party websites, they are often easily manipulated by malicious actors. Display ads are particularly vulnerable to clickjacking attacks, which grew by 125% across all platforms in 2022.

This deluge of these attacks lead to a fraud rate of 7.2% for displays in 2022, 40% higher than the rate for search ads.

Click hijacking occurs when a valid user clicks on an asset, such as a link or advertisement, that appears to be legitimate, but it is actually a disguised malicious element, which may install malware, or redirect users. Last year, researchers discovered a set of Google Chrome extensions that had been installed over one million times was hijacking searches and inserting affiliate links into web pages, disrupting user experience, and costing retailers thousands in affiliate fraud. In the case of a display ad, an attacker may use various techniques, such as adding hidden layers or modifying the code of a webpage, to cause a display ad to be clicked without the user’s knowledge. The attacker can then collect payment for the fake click from the advertiser.

This type of attack can be difficult to detect and prevent because it occurs on the client side, and the user’s browser is often not able to distinguish between a legitimate click and a hijacking click.

Viewbots Inflate Streaming Numbers and Burn Advertising Dollars

Streaming platforms had an unprecedented reach in 2022. The top streaming site reaches more people aged 18-49 than all TV networks combined, and it reaches them with more ads–which are statistically more likely to hold viewer attention, and ultimately to convert.

But many of those ad viewers are not human. In 2022, streaming platforms generated the highest invalid rate for paid traffic of any category, at 11.1%. Based on the ad revenue figures of just one streaming platform, that could amount to over $3 billion in wasted ad spend.

So where is all of this traffic coming from? The answer is view bots, a relatively new form of fake traffic in which pieces of automated software (bots) are used to view streaming videos or live streams in order to artificially boost the view count and generate fake engagement–and fake ad views–for unscrupulous creators.

Most view bots are simple scripts that open a video in a headless browser, but more complicated viewbots may also create fake accounts to mimic logged-in viewers, and can even incorporate a chatbot capability that will spam the stream’s chat or comments section with artificial banter to make audience numbers appear more legitimate. Some viewbots will even click through on ads to increase the perceived click-through rate. And these bot networks are available for rent for prices as low as $10/month.

The impact of these fake viewers goes far beyond fake clicks–most established creators offer partner programs, where they earn a commission for mentions or ad impressions.

If those impressions are generated by bots, not real people, then the ad budget used to create and place those ads has essentially been wasted.

If it costs $2000 for 100,000 impressions, and 15-20% of those impressions are fake, that’s $150-200 wasted. Considering most advertising campaigns on these platforms measure impressions in the millions, the costs of those fake impressions can add up fast. Furthermore, with key performance metrics becoming skewed by fake traffic, decision making becomes increasingly difficult.

Get All the Details in the State of Fake Traffic Report

Want to know more about the state of fake traffic in 2023? Download the full report here to get a full overview of fake traffic threat groups and types, how fake traffic breaks down across traffic sources, and more

In this 35-page report, we offer new insights into invalid traffic trends and statistics as we:

Share invalid traffic rates and trends
Examine prominent and growing threat types
Compare invalid rates across 11 major industries
Compare invalid traffic by region of origination
Compare invalid rates for paid and organic traffic generated by leading ad platforms, search engines, and social media platforms.

Download our free report today and learn about the latest trends and insights in the world of invalid traffic.

In this post, we break down the various forms of web scraping and go over ways you can mitigate and prevent damage to your business.

Contents:

What is Web Scraping?

‘Good’ and Bad Scrapers

Why do Bad Actors Scrape Web Content?

What Kind of Content do Scraping Bots Target?

How Does Web Content Scraping Hurt My Website?

How Can I Detect Web Content Scrapers?

How Can I Block Web Content Scrapers?

What is Web Scraping?

Web scraping is the process of extracting content and data from a website. Many digital businesses, like search engines, price comparison tools, and market research companies, use web scrapers for legitimate purposes, but they’re also used by bad actors for malicious purposes.

[Want to see how much fake traffic affects your ads? Get a free invalid traffic scan.]

Web scraping can be done manually, but it’s typically performed by automated bots that are programmed to recognize and extract specific data, either from the website’s underlying HTML or from connected databases and APIs. Some bots may also create fake accounts to gain deeper access to a site. Malicious web scraping can be used to build fraudulent websites, to steal competitive information, or simply to create duplicate sites for ad fraud purposes.

‘Good’ and Bad Scrapers

As noted above, there are thousands of legitimate scrapers online. These bots are easy to tell from invalid traffic because they identify themselves in the HTTP header and will follow the directions of your site’s robot.txt file, which tells a bot what it can and cannot do on your website. Malicious scrapers, however, will usually employ a false HTTP user agent, and disregard your robot.txt file–they’re after anything they can get.

11.3% of inbound traffic is fake or fraudulent. Download our Free State of Fake Traffic 2023 report to learn more.

Why do Bad Actors Scrape Web Content?

People scrape web content for a variety of reasons. An attacker could use a site’s content in an effort to copy content and usurp its search engine rankings or duplicate the look and branding for fraud purposes. Hackers could also use content scraping to create phishing sites and fake advertising campaigns to trick users into entering personal information.

Plagiarism

A common reason for malicious web scraping is that hackers or scammers need to fill a site with content. We know, of course, that as soon as Google detects it, it labels it as duplicate content, and that could damage the organic search rankings of both the original site as well as the copied site, but for hackers, this is a moot point, as the purpose of the site is likely to carry out ad fraud schemes or deliver malware to visitors.

Fake eCommerce Stores

Another reason hackers scrape websites is to create fake eCommerce sites and try to steal users’ payment information. They can use bots to scrape all the content from your site–product descriptions, prices, and blog content–and then re-create the store, either with your branding intact or with their own. Unsuspecting visitors to this bogus site may attempt to buy “your” product for a great price and, instead, will either receive a low-quality rip-off or nothing at all. Worse yet, the shopper’s payment info may be stolen and sold on the dark web.

Fake Ads

Another way to use scraped content is for ad fraud. That’s when a fraudulent publisher uses popular content on their site and then runs ads on it. The publisher then directs bots to click on these ads, generating bogus ad income for themselves.

Price Scraping

Sometimes a company can download all the pricing information of a competitor in order to adjust their own pricing. It’s a tactic that helps companies stay competitive, and it’s probably the most benign on our list.

What Kind of Content Do Scraping Bots Target?

Bots can get all kinds of content off of your website. It could include text, images, HTML code, CSS codes, product prices, and much more. In a worst-case scenario, web scrapers could even collect improperly stored consumer personally identifiable information (PII).

How Does Web Content Scraping Hurt My Website?

Web scraping attacks can do massive damage to a brand’s reputation, website performance, and security, and even to SEO results.

SEO Rankings

If you’ve ever owned a site, you’ve probably seen spammy pages that copy entire blog posts of yours and even have the audacity to link back to your blog. That’s at the low end of the spectrum. An occasional copy of a post likely won’t hurt your site, but if someone copies your content on a large scale, it can really hurt your rankings. Google can label it as duplicate content, and it may even lead to a penalty.

Site owners can mitigate these scams by disavowing links, using canonical tags, and contacting copycats directly to ask them to take the duplicate content down, but the best defense is to block illegitimate scraping in the first place.

Reputational Damage

Content scraping can seriously damage your reputation especially if you have an eCommerce site. If your users get routed to a fake store that looks just like yours and finds out that it was a scam they can lose trust in your brand and never return.

Website Spoofing

Perhaps the worst damage that could happen from content scraping is when a hacker re-creates your site and steals shoppers’ payment info. They could create elaborate phishing schemes to disguise themselves as real businesses and can get your business banned from various platforms and marketing channels you’re using (like affiliate platforms and eCommerce marketplaces) if scams are misattributed to you.

How can I detect web content scraping?

There are several ways you can detect content scraping, both manually and automatically.

Do a manual search

The simplest way to find duplicate content is to do a manual search on Google for the title of a particular post. You may find some duplicates popping up in the search results. You should do that if you notice an unusual change in your traffic or engagement levels.

The problem is that it could take you hours to go through all the pages on your site, and that’s why you should use a tool that automates the process. Here are some of these tools.

Pingbacks on Internal Links

Platforms like WordPress and Wix have a feature called pingback, where you get a ping every time a page links to your site. It can help you detect any unlawful content scraping and quickly deal with it.

Google Alerts

Google alerts are another great way to catch scraped content. These alerts notify you of any site on the web that links to your content, and it’s a great solution regardless of the CMS you use for your site. What’s really convenient is that you can set up these alerts to email you when a new link gets added.

Keyword Tools

If you’re heavy on SEO and publish a lot of content, then you should already be using keyword tools like Ahrefs or Semrush. These tools show you every single site or page that links to your content. You can filter by domain authority to spot spam sites. And then, it’s easy to decide if you should disavow the link or contact the site owner. You could also set up automatic reports that get sent to your email weekly or monthly, which makes it easier to stay on top of things.

Webmaster tools

Another great way to detect content thieves is by using your webmaster tools. Look up all the links your site gets and then sort them by the linked pages’ column. If you spot a site that’s in an unrelated niche or category and has an unusual amount of pages that link back to your content, then you may have found a culprit.

How can I block web content scrapers?

The techniques outlined above can help you get a leg up on basic web scraper attacks and simple threat actors; malicious scraper bots — even those commercially available to competitors– are increasingly sophisticated and able to subvert basic detection techniques and masquerade as legitimate traffic.

For businesses serious about protecting their online properties, a comprehensive go-to-market security platform will help automatically detect and block invalid traffic in real-time.

Book a demo today to see how CHEQ can help protect your business from web scrapers and other threats.

Black Friday is a big deal. The retail ‘holiday’ now encompasses nearly an entire week and generates revenue numbers equal to the GDP of a small country. For retailers, the holiday season is often a make-or-break quarter, and a good Black Friday, Small Business Saturday, or Cyber Monday can be key to success.

And while blockbuster deals can still get people out to the brick-and-mortar stores, Black Friday is increasingly an online affair: American consumers spent $8.9 billion online during Black Friday 2021, and $10.7 billion on Cyber Monday and are expected to surpass that in 2022.

But where the money goes, cybercriminals typically follow, and cybercriminals and bad actors have found plenty of ways to take advantage of retailers’ investments in Black Friday through various forms of bots, web scrapers, and fraudulent traffic.

Last year, we discovered that bots and fake users made up 35.7% of all online shoppers on Black Friday. Among the forms of fake traffic we uncovered were malicious scrapers and crawlers, sophisticated botnets, fake accounts, click farms, proxy users, and illegitimate users committing eCommerce-related fraud.

As we approach the 2022 holiday shopping season, we’ve decided to analyze how bots and fake users affected eCommerce sites on previous Black Fridays and used that information alongside current fake traffic rates to uncover the potential financial and operational impacts retailers can expect this coming Black Friday in our new report, How Bots and Fake Users Impact Sales on Black Friday 2022.

To build our report, we analyzed data from 233 million eCommerce site visits originating from all source types (direct, organic, paid) across a 6-month span (January – June 2022) and studied the validity of each site visit. From there, we were able to pull inferences from typical site traffic numbers, consumer spending patterns, and media spending in the eCommerce space.

$368M Could be Lost to Fake Clicks on Retail Ads

Bots and fake users frequently click on advertisements they encounter online, either for purposes of ad fraud, to inflate marketing budgets, or simply to scrape a website for competitive users. This can be done on paid search platforms, advertisements on social media networks, and other forms of display and text ads.

The eCommerce industry is certainly not immune to these actions. Based on the standard rates of fraud that are encountered across retailer websites from paid sources, analyzed alongside the volume and frequency of advertising clicks during the holiday season, CHEQ predicts that retailers will lose about $368 million to fraudulent clicks this Black Friday alone.

Get the Full Story in Our New Report

Invalid traffic is a year-round problem, but Black Friday and the holiday shopping season is a period of increased activity among cybercriminals, and retailers should be prepared to deal with ad fraud, skewed metrics, and cart abandonment.

To learn more about fake traffic and how it can affect eCommerce websites this holiday season, in the full Black Friday report, available here.

Traffic is the lifeblood of digital businesses and, by extension, the lifeline of the marketing department. Traffic determines who sees your messaging, where your site ranks, and, ultimately, how many conversions you gain. But traffic isn’t always what it seems. On average, over 20% of the site’s visitors are not what they seem, according to recent research.

This invalid traffic threatens ad revenue, clouds marketing metrics, and muddies conversion rates. But to many marketers, it’s become a fact of life, an annoyance that cannot be effectively solved. But the problem is growing. Most fraudulent traffic today is highly sophisticated. Let’s look at the kinds of sophisticated invalid traffic affecting websites and ad publishers today and examine some of the ways marketers can fight back.

What is Invalid Traffic?

Invalid traffic is web traffic that cannot turn into a legitimate customer. This could mean harmless bots like search engine’s web scrapers or ad fraud botnets.

11.3% of inbound traffic is fake or fraudulent. Download our Free State of Fake Traffic 2023 report to learn more.

To Google, IVT is primarily a concern because it can be used to artificially inflate a publisher’s ad earnings, a practice that is against Google Ad’s terms of service. But invalid traffic isn’t limited to paid traffic, it also makes up a large portion of direct traffic and unique site visitors, and has plenty of adverse effects downstream, from polluted marketing analytics to wasted remarketing efforts.

There are generally two types of invalid traffic, or IVT called general invalid traffic (GIVT) and sophisticated invalid traffic (SIVT). Let’s take a closer look at both below:

What is General Invalid Traffic (GIVT)?

General invalid traffic (GIVT) is the kind of nonhuman traffic you would expect to routinely access your websites: bots, spiders, search engine crawlers, and other traffic coming from data center IP addresses known to site owners and advertisers. GIVT will generally identify itself as such and is easy to filter out of campaigns and metrics.

What is Sophisticated Invalid Traffic (SIVT)?

Sophisticated Invalid Traffic (SIVT) is invalid traffic that does not identify itself as such. SIVT could be competitors clicking on ads, bots scraping your website for pricing information, botnets built for ad fraud, or malicious users concealing themselves via proxies and VPNs.

What are The Types of SIVT?

SIVT can take various forms, depending on its goal. Common forms of SIVT include:

Bots: Scripts running on headless browsers will mimic real humans and click on ads in order to generate fraudulent ad revenue.
Cookie Stuffing: Cookie stuffing is a form of affiliate fraud wherein third-party cookies are illegitimately attached to a user and then used to falsify conversions.
Adware: Adware can be used to generate traffic from infected users without their consent.
Web Scrapers: Web scrapers can be used to gather competitive information, such as pricing from your website.
Concealed ads: Hidden or misleading advertising can be used to attract unwitting users with no chance of converting.
Fraudulent proxy traffic: Proxies and VPNs can be used to mask fraudulent traffic.

How to Detect and Stop SIVT

So how can you detect and block SIVT? As a manual process, it can be a challenge, but there are some best practices that can be taken to achieve at least a baseline of protection. Let’s look at Google’s advice first.

Google’s Advice on Invalid Traffic

While Google does its best to prevent invalid and fraudulent traffic, they, unfortunately, do not offer much in the way of resources for publishers dealing with high volumes of SIVT.

In the eyes of Google, AdSense publishers are ultimately responsible for the traffic on their ads and are more or less left to their own devices when it comes to protecting site traffic and advertising budgets from fraud and waste. However, Google does offer some advice to publishers to help mitigate SIVT.

Namely, Google urges publishers to avoid unscrupulous third parties, make sure AdSense implementations are rock solid, understand ad traffic and site visitors, and watch for unusual behavior.

Best Practices for SIVT Mitigation

There are a few other manual methods for mitigating SIVT, and IVT in general, let’s dive into a few techniques you can use to clean up your traffic within AdSense and other tools you already have.

[Get a free Invalid Traffic Scan. Plug CHEQ in for free and see how many bots and fake users are in your funnel.]

Rethink Your Targeting

As marketers, it’s natural to want to put your messaging in front of the biggest audience possible, but overly broad targeting can easily leave your ad campaign open to invalid traffic and bad actors. To minimize both your ad spend and your exposure to invalid traffic, it’s important to be as granular as possible with your geotargeting. Excluding countries or regions known for high rates of IVT is an easy first step to reducing bad traffic.

On the same note, limiting the runtime of your ads to just your time zones waking hours can also limit exposure.

Check Out Your Ad Placements

It’s easy to keep track of where your ads are being displayed, but many businesses simply set up the ad with Google, and never think about it again. But there are thousands of websites out there that exist just to generate fraudulent ad revenue. Site owners set up a basic site, with little or no genuine content, host ads, then push fake traffic through the site and ads to generate ad revenue.

If you’re getting a lot of referrals, but no conversions, it’s worth taking a look at where those referrals are coming from. If you find a low quality site, it’s probably worth reporting to Google to stop the flow of bad traffic.

Monitor Your Site Traffic

Fraudulent traffic will often come in patterns–high volumes of clicks with low conversions, rapid clicks from a single IP address, etc.– that will give it away under keen observation.

Legitimate site visitors may visit your site multiple times while making purchasing decisions, but if a single IP begins visiting to be arriving on a website from paid advertising in rapid succession, that’s a clear sign of abuse.

Patterns can vary widely by industry, geography or method of IVT, so it’s best to keep a keen eye on your site traffic. Take a look every day, get a feeling for the typical patterns of your site traffic, and try to identify patterns in bad traffic.

Once you’ve found what you’re looking for, it’s time to investigate your server logs for suspicious IP addresses and add those addresses to your blocklists and the blocklists of your advertising partners.

Inspect Packet Headers for Suspicious Data

Packet headers contain a lot of information: browser type and version, OS, and more. This information is extremely useful in uncovering disguised bad traffic.

For example,. if you receive dozens of requests from the same IP address, but the packet header shows different device detail for each visit, it’s fair to deduce that this IP is a proxy server. Malicious users also have their own calling cards. Typically linux is used, and outdated browsers are easier to exploit. So a version of Chrome that is 10 or 15 updates behind is a dead giveaway.

In another example, if a user appears to be using a mobile device, but the packet header shows that there are browser extensions on that device, that is clear evidence of an attempt to disguise the source of the traffic and mislead you and your advertising partners, and should be investigated in further detail.

Set up IP Exclusion Lists

Through the above techniques, you’ll likely find a lot of IP addresses that need blocking. Google Ads offers an IP exclusion list, which lets you identify IP addresses you don’t want your ads served to. However, there is a limit of 500 IP addresses per campaign, which is easy to hit, since fraudsters and bad actors constantly change their IP addresses through proxies and other methods.

Manual IP address exclusion management is a best practice for a basic level of protection from SIVT, but it is a labor intensive and tedious process, and Google’s limited exclusion list size limits its effectiveness greatly.

Stop SIVT with Go-to-Market Security

As outlined above, manual mitigation of invalid traffic is possible, in a limited scope, but it’s labor-intensive, and probably not the most cost-effective use of your team’s time. Not to mention, it can take weeks or months to properly set up, leaving your site exposed to bad traffic in the interim

For businesses serious about protecting their pipeline, a comprehensive go-to-market security platform will help automatically detect and block invalid traffic in real-time, whether the source is paid, organic, or direct, and provide better insight into marketing analytics.

Book a demo today to see how CHEQ can help you lower your CPA and protect your go-to-market efforts against SIVT and other threats.