Scraping is the automated collecting of large volumes of data from web pages and applications by bots. This may include the scraping of content, product descriptions, pricing inventory data, and other public facing information. Bots scrape that information to undercut deals, divert visitors or steal clicks. This includes:
- Classifieds sites, with competitors stealing listings by scraping content affecting the traffic and revenues
- eCommerce attacks, where bots try to steal the price information real-time and use it as a competitive intelligence
- Ticketing sites through denial of inventory, spinning and scalping, scraping seat map inventory, and fraud.
Scraping bots: Sneakers, Ticketmaster and U2
Every company with an online presence (that is all companies) are affected by scraping. These bot scarping attacks have included fashion launches, where bots have been responsible for 473 million requests to purchase sneakers in a single day. Ticketmaster launched a lawsuit after more than 1,000 pages of the site were scraped by bots, making more than 800 reserve requests in a 24-hour period. This made potentially bad actors liable for damages in the amount of twenty-five cents ($0.25) for each page request or reserve request. In another scraping attack, 1,012 tickets to a U2 concert at Madison Square Garden were bought in a minute – nearly 17 tickets a second. Committing fraud by purchasing tickets or other items in bulk for the purpose of reselling them at much higher prices later is now subject to the B.O.T.S Act of 2016. This is much like click fraud attacks are illegal. However, with bots making up up 45% of the web, and bad bots playing a disproportionate role, their effect across any business is all-encompassing. This also, as we shall see, has a direct negative impact on digital marketing ROI.
Scraping and ad fraud
Scraper bots can click on target ads in order to land on websites and landing pages. In these cases, advertisers unknowingly end up paying for bot-generated clicks. More than this direct attack, scrapers steal brand IP and make money from monetizing content created by brands.
In one case, CHEQ for PPC worked with a company whose marketing strategy involved driving users to its recipe sites. These recipes would be stolen, scraped, and monetized through online advertising until we cut them off from the site.
Google has provided other examples of scraping and ad fraud even if it is described “collateral damage” of attacks. In their security blog, Vegard Johnsen, Product Manager Google Ad Traffic Quality, describes a company they investigated which ” uses bots to scrape the web to find out which ad creatives are being served on which websites and at what scale. The company’s scrapers also click ad creatives to analyze the landing page destinations. To provide its clients with the most accurate possible intelligence, this company’s scrapers operate at extraordinary scale and they also do so without including bot identifiers in their User-Agent strings.
“While the aim of this company is not to cause advertisers to pay for fake traffic, the company’s scrapers do waste advertiser spend. They not only generate non-human impressions; they also distort the metrics that advertisers use to evaluate campaign performance—in particular, click metrics.”
Publishers ruined by scaping of valuable content
Publishers face a daily challenge from the bot scrapers.
In one investigation, Megan Graham, a CNBC reporter investigated this practice after seeing her recent CNBC articles lifted in their entirety and monetized through ads on fake news sites.
This scraping and monetization strategy even affected former Google Click Fraud Czar, Shuman Ghosemajumder. He said within an hour of an article he penned for VentureBeat going live, it was posted on a website called “Best Gaming Pro”. He says the site “has automatically rewritten the article so that people Googling related but different keywords will find their article “. He adds: “Content farms like this drive traffic to their sites by deceptive means, but then monetize with legitimate ads. My team at Google used to deal with these all the time, but they have gotten more sophisticated over time.”
Sophistication of scrapers increases
As described by Ghosemajumder all bots have become more adept at mimicking actual users and disguising their purpose, making identification harder than ever. By deploying CHEQ For PPC to eliminate invalid clicks on paid platforms, including Google Ads, Microsoft Ads, Pinterest, LinkedIn, and Twitter, it also banishes bad scraper activity. This prevents financial losses, missed ad revenue, and wasted marketing spend. The prevention of scraping bots is an added bonus.
How scraper bots are denied?
Why marketers hate scrapers?
Marketers who care about their spend, their ROI or their brand hate scrapers. They divert messages, hurt your brand, and overload web infrastructure by sending millions of requests to a specific path, such as login or checkout pages, causing a slowdown. Malicious web-scraping bots can steal your copyrighted content and images damaging SEO rankings when search engines detect pages with duplicate content. It hurts every part of the marketing mix.
Protect yourself from scraping hurting your marketing and get your free CHEQ for PPC trial today.
Want to protect your sites and ads? Click here to Request A Demo.