What Are Bots? Types, How They Work, and Why They Matter
Jason Melman
Bot & AI Agent Trust Management
March 30, 2026

-
Bots are any software acting on a network without a person directing each request, and the category now spans search crawlers, scrapers, and AI agents transacting on users’ behalf.
-
CHEQ’s network data shows invalid traffic (automated, suspicious, or otherwise non-genuine) grew nearly 50% year-over-year, and the AI agent share is growing fast.
-
Function matters more than a binary good/bad label. The same code can be a legitimate price tracker or an unauthorized scraper depending on context.
-
AI agents broke the old model. They can be authorized, unauthorized, or spoofed, and the same request shape can mean very different things.
-
Modern response is a spectrum, not a switch: allow, monitor, step-up, constrain, throttle, misdirect, or block, matched to entity and intent.
-
The shift underway is from bot detection to AI agent governance, answering what the entity is, whether it can be trusted, on whose behalf it acts, and what response fits.
A bot is software that acts on a network without a person directing each request.
The category has always been broad, but it now includes a layer it did not have five years ago. AI agents research, compare, fill forms, and transact on behalf of real users, alongside the search crawlers, scrapers, and adversarial bots that have been on the open web for decades.
That mix breaks the framework most teams still use. A binary “is it a bot, block it” question made sense when traffic split cleanly into humans and adversaries. It does not work when an AI agent acting on a user’s behalf can look the same as an automated scraper while calling for a very different response.
This guide covers what bots are, the major categories (including AI agents), how much of the web is automated, and how organizations are moving from binary blocking toward governance that matches the response to the entity behind each request.
What Is a Bot?
A bot is software that performs automated tasks over a network. The word “bot” is short for “robot,” which is the answer to a question many readers arrive with. The terms bot and internet bot are used interchangeably, and both refer to the same broad category of automated software acting on the open web.
The range is wide. A bot can be a few lines of code that fetches a webpage on a schedule. It can be a search engine crawler that indexes billions of pages. It can be an autonomous AI agent that reasons about a page, plans an action, and carries it out across multiple sites. All of these fit inside the same umbrella term.
Bots are not new. They have existed since the early commercial web, beginning with search engine crawlers like the original Googlebot. What has changed is the variety. A definition that worked twenty years ago, “software that browses the web for you,” now covers everything from a price tracker to a generative AI assistant taking actions on a user’s behalf.
This breadth is useful as a starting point and limiting as an end point. The word “bot” is a category, not a description. Different bots have very different intents and effects, and the most useful way to talk about them is by function rather than as a single thing.
How Bots Work
Most bots make HTTP requests to web servers, in the same way a browser does, and then process the response programmatically. The mechanics fall into a few common patterns:
- Request. The bot sends an HTTP request to a URL. This can be a single fetch on a schedule or a continuous stream of requests across many pages.
- Render or call. Simple bots parse the raw response. More capable bots use headless browsers, which are browsers without a visible interface that render pages and execute JavaScript exactly as a normal browser would. Others bypass the browser entirely and call the site’s API.
- Process. The bot interprets the response. A scraper extracts content. A monitor checks for changes. A search crawler updates its index. An AI agent reasons about what it sees and decides what to do next.
Several automation frameworks frequently appear in this space. Puppeteer, Playwright, and Selenium are the common names. They are general-purpose tools used for legitimate testing and automation, and they are also the tools many adversarial operators reach for.
The meaningful technical shift in recent years is the move from rule-based scripts to LLM-driven agents.
Older bots followed a fixed script that broke when the page layout changed. Newer agents can interpret a page, decide what to do next, and adapt to changes in real time.
This distinction justifies treating AI agents as a separate category rather than a variation on older bot patterns.
The Major Categories of Bots
The most useful way to talk about bots is by function, not by a binary good-or-bad split. The same software can be helpful in one context and harmful in another. Function gives a clearer picture than moral category.
The six categories below cover most of what shows up in web traffic.
| Bot Category | What It Does | Typical Examples |
|---|---|---|
| Search and Indexing Bots | Crawls public pages so the content can be discovered and indexed for later retrieval. | Googlebot, Bingbot, GPTBot, Claudebot |
| AI Agents and Assistants | Reasons about a page and takes action on a user’s behalf, including research, comparison, and transactions. | ChatGPT with browsing, Operator, Perplexity, Claude |
| Monitoring and Operational Bots | Observes sites and services without acting on them, often run by the site owner or an authorized partner. | Uptime monitors, price trackers, status checkers |
| Scrapers and Data Harvesters | Extracts content, prices, or structured data from sites. Same technique is used for legitimate research and for unauthorized extraction. | Academic crawlers, archive projects, LLM training scrapers, data brokers |
| Automation Bots | Runs repetitive workflows against web services, typically authorized by both sides of the integration. | RPA tools, Zapier, n8n, enterprise workflow automation |
| Adversarial Bots | Performs actions that harm the operator of the site they hit, ranging from financial fraud to content theft. | Credential stuffing, payment card testing, inventory hoarding, click fraud, scraping for IP theft |
Here is a closer look at each category and where it fits in modern web traffic.
Search and Indexing Bots
Search and indexing bots crawl public pages so the content can be found later.
The classic examples are Googlebot and Bingbot for traditional search. AI-related crawlers are a newer addition. Some support AI search and retrieval, while others crawl public content for model training, assistant features, or downstream AI systems.
Major crawlers usually identify themselves through known user agents and may provide robots.txt controls or other documented ways for site owners to manage access. Most sites allow the same level of access as legitimate visitors because the visibility they provide is the entire point of a public web page.
AI Agents and Assistants
AI agents and assistants are software that reason about a page and take action on a user’s behalf.
Examples include ChatGPT, Gemini, Claude, and similar assistants. The agent may research a topic, compare products, fill out a form, or complete a purchase on the user’s instructions.
This category is new and does not fit cleanly into older bot taxonomies. An AI agent can be authorized (a user explicitly delegates a task) or unauthorized (an agent that crawls for training data without site permission). It can also be spoofed (by adversarial software that pretends to be a known assistant).
The same request shape can mean very different things depending on context, which is why this category is treated as a distinct group rather than a subset of search crawlers.
Monitoring and Operational Bots
Monitoring and operational bots watch sites and services without taking action on them. Uptime monitors, price trackers, status checkers, and internal security scanners all fit here. Most are run by an organization on its own infrastructure or by an authorized partner. Their function is observation, not interaction, and they rarely raise any commercial or security concerns when they belong to the site they are monitoring.
Scrapers and Data Harvesters
Scrapers and data harvesters extract content, prices, or structured data from sites. The same technique applies to academic research projects, public archive efforts, competitive intelligence within legal limits, and unauthorized content extraction for data brokers or LLM training sets.
The technical action is the same. The context determines whether the site operator considers it acceptable.
For organizations dealing with unauthorized scraping at scale, calibrated controls for automated scrapers are what the industry uses to separate legitimate research traffic from extraction that hurts the business.
Automation Bots
Automation bots run repetitive workflows against web services. Robotic process automation tools, workflow integrations like Zapier or n8n, and enterprise automation that connects internal systems to web-facing services all fit here. Most are operated by the user’s own team or by a trusted vendor, and most are authorized by both sides of the integration.
The defining trait is bilateral authorization: both sides of the integration know about and accept the automation. When that authorization is missing, the same technical action falls into the adversarial category instead.
Adversarial Bots
Adversarial bots exist to harm the operator of the site they hit. The patterns vary by target, but some common ones cluster around login pages, payment flows, registration forms, and ad networks:
- Credential stuffing: trying stolen username and password combinations against login forms.
- Fake account creation: registering synthetic identities to abuse onboarding incentives or seed downstream fraud.
- Lead generation fraud: flooding forms with fabricated submissions that pollute CRMs and burn sales-team time.
- Click fraud: driving up ad costs without genuine intent.
- Content scraping for IP theft: extracting proprietary content for resale or unauthorized training data.
The pattern matters more than the moral framing. Each one calls for a different response, and grouping them under a single “adversarial” label produces the worst of both worlds: false positives that disrupt legitimate users and false negatives that let the actual harm through.
How Much of the Internet Is Bots?
The honest answer is that the share is large and growing, and the exact figure depends on what you count.
CHEQ’s own network analyzes roughly 6 trillion signals daily across more than 300,000 customer-monitored sites. Within that scope, invalid traffic (traffic flagged as automated, suspicious, or otherwise non-genuine) grew nearly 50% year-over-year in the most recent measurement period, even as overall visit volume across customer sites rose only 3.5%. Outside research groups land in different ranges depending on methodology, but they consistently agree that the share is large and the trend is up.
The composition is shifting as well. The share of traffic from AI agents and LLM-driven crawlers has grown substantially since 2023 and continues to grow.
CHEQ’s own network data shows the same trend from a different angle: even as total invalid traffic has climbed, the share of that traffic classified as malicious has trended down from roughly 11% to under 6% over the past twelve months. The growth is increasingly coming from legitimate-but-undeclared automation, including search-style crawlers that train language models, assistants that browse on a user’s behalf, and a growing population of agents that transact rather than only read.
Search crawlers, monitoring tools, and authorized AI agents make up a real share of what gets counted as “bot” traffic, and most of that traffic is necessary for the modern web to function. Treating all automated traffic as a problem misreads the data.
For a detailed breakdown of how bot traffic is composed and how it is measured, see the deeper breakdown of bot traffic types and how they are identified.
Are Bots Good or Bad? Why It’s Not That Simple
A bot is not automatically good or bad. A search crawler that indexes a site is helpful. The same crawler scraping prices for a competitor’s training set is not. The code can be identical. The intent and context are what change.
Here is what that looks like in concrete terms.
The same scraping framework can power an internal price tracker and an unauthorized competitor’s scraper, and the appropriate response to each is very different:
The technical signature can look identical. Intent, authorization, and business impact are what differ.
Not every case is this clean. The modern web includes humans, useful bots, adversarial bots, and a growing gray zone where authorization is not obvious. AI agents make this especially clear: an agent completing a task for a user may be acting with user permission, while still falling outside a site’s policy for automated access or transactions.
That is why the question is no longer only “is this a bot.” The more useful question is: what is this entity, what is it trying to do, and what response is appropriate?
How Organizations Identify and Respond to Bots
Organizations identify bots using a combination of signal families and then choose a response that matches the entity and its likely intent.
Identifying Bots
The signal families used at a definitional level break into four groups:
- Behavioral signals. Mouse movement, scroll patterns, typing cadence, and timing across interactions. Human behavior has a recognizable shape. Sustained deviation from that shape is one of the strongest indicators of automated activity. Temporal distribution is another behavioral signal: human traffic shows a measurable sleep-cycle dip, while bot traffic tends to run uniformly across the 24-hour day. CHEQ research on entropy-based detection found the difference is large enough to flag automated traffic at scale with high corroboration against other signal families.
- Network signals. IP reputation, ASN, datacenter origin, and known proxy infrastructure. These signals are easier for adversaries to manipulate than they used to be, which is why they are usually one input among many rather than a primary verdict.
- Fingerprint signals. Browser characteristics, TLS handshake details, and headless browser indicators. Modern automation frameworks try to hide these signals, and modern detection tries to identify the hiding.
- Identity signals. Cross-session consistency, known credentials, and the presence or absence of an authenticated identity. These signals are most useful at higher-risk steps like login, checkout, and account creation.
In practice, these families work in combination rather than isolation. A single suspicious indicator is rarely enough; consistent patterns across multiple families are what produce a reliable verdict.
Responding to Bots
Modern responses go beyond a single block-or-allow switch. The industry has converged on a spectrum of actions that can be matched to the entity and its assessed intent:
- Allow legitimate traffic without friction
- Monitor traffic that is borderline but not yet acting against the site
- Step-up verification when the situation calls for an extra layer of confidence
- Constrain what an entity is permitted to do (read but not write, browse but not transact)
- Throttle suspicious traffic to limit its impact while gathering more signal
- Misdirect confirmed adversarial traffic toward a controlled environment
- Block entities where there is high confidence and clear policy
This is what proportional enforcement looks like in practice. Binary blocking either over-blocks (cutting off legitimate automation and search crawlers along with bad actors) or under-blocks (letting adversarial traffic through because the only available verdict is “human”). A spectrum lets the response match the risk.
This is also why how organizations approach bot management has changed structurally over the last few years. The deeper read covers the detection-to-enforcement pipeline, signal taxonomy, and how teams calibrate response policies to their specific risk tolerance.
Operationally, the spectrum is implemented through policy-based controls across humans, bots, and AI agents rather than as a single static rule applied to everything.
AI Agents and the New Governance Question
AI agents are now a meaningful share of automated web traffic and a category that legacy bot tools were not built to handle. They shop on retail sites, research on healthcare and finance sites, and complete tasks on behalf of human users across the open web.
AI Agents vs Traditional Bots
The distinction between AI agents and traditional bots is structural, not incremental:
Groups of AI Agents and the Governance Gap
Three groups of agents matter for this discussion:
- Authorized agents: a user has explicitly delegated a task to ChatGPT, Claude, Gemini, or a similar assistant.
- Unauthorized agents: the operator behind the agent is acting against the site’s policies, often to scrape content for training data.
- Spoofed agents: adversarial software pretends to be a known assistant in order to get the treatment that the real assistant would receive.
These three share an interface but warrant very different verdicts, which is what makes a one-size-fits-all “is this an AI agent” rule inadequate.
The structural problem this creates is what the industry has started calling the Governance Gap.
Traditional bot tools were built for a binary world: separate humans from bots, then either allow or block the bots.
AI agents do not fit cleanly on either side of that line. Some are valuable visitors. Some are unauthorized. Some are adversarial. The same request shape can mean very different things depending on context, and a tool that only outputs “human or bot” cannot tell those situations apart.
AI agent governance is the framing the industry is using for the structural response to that gap.
A Modern Framework for AI Agent Governance
Any modern approach to this problem has to answer four questions:
- What is hitting the page?
- Can it be trusted?
- On whose behalf is it acting?
- What response is appropriate given the answer to the first three?
Three use case patterns make the abstraction concrete.
The first is retail. AI agents are starting to compare products, complete checkouts, and manage subscriptions on behalf of users, which is why sites need a way to govern AI agents transacting in commerce journeys rather than treating every agent identically.
The second is hybrid journeys, where a human starts a research task, hands off to an agent, then comes back to complete the transaction. Entity type changes mid-flow, which is why hybrid human-agent journeys need governance that does not break when the entity changes.
The third is scraping. Large language model training has created sustained demand for proprietary content, and the line between legitimate research crawling and unauthorized extraction has gotten harder to draw.
The shared thread across all three is that organizations need to govern automated traffic, not only identify and stop it.
Where to Go Next
This guide covers the foundational layer. The links below go deeper into the specific questions that come next:
- Bot traffic in detail. The CHEQ bot traffic guide breaks down each bot traffic type, its business impact, and the signals that distinguish one from another.
- Bot management in practice. A closer look at how organizations approach bot management, including the detection-to-enforcement pipeline and how teams calibrate response policies to their risk tolerance.
Teams ready to put these ideas into practice can see how CHEQ applies policy-based controls across humans, bots, and AI agents in a single governance framework.