Web Crawler Reference
Complete guide to 31+ web crawlers, bots, and spiders. Learn about their user agents, operators, and purposes.
7
Search Engine
11
AI & Machine Learning
8
Social Media
5
SEO & Analytics Tools
Search Engine Crawlers
Major search engines that index web content for search results
| Bot Name | Operator | Robots.txt | Docs |
|---|---|---|---|
Googlebot Googlebot/2.1; +http://www.google.com/bot.html... | ✓ | Docs → | |
Googlebot Smartphone Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X) Apple... | ✓ | Docs → | |
Bingbot Mozilla/5.0 (compatible; bingbot/2.0; +http://www.... | Microsoft | ✓ | Docs → |
DuckDuckBot DuckDuckBot/1.0; (+http://duckduckgo.com/duckduckb... | DuckDuckGo | ✓ | Docs → |
Slurp Mozilla/5.0 (compatible; Yahoo! Slurp; http://help... | Yahoo | ✓ | Docs → |
Baiduspider Mozilla/5.0 (compatible; Baiduspider/2.0; +http://... | Baidu | ✓ | Docs → |
YandexBot Mozilla/5.0 (compatible; YandexBot/3.0; +http://ya... | Yandex | ✓ | Docs → |
AI & Machine Learning Crawlers
Crawlers used by AI companies for training language models
| Bot Name | Operator | Robots.txt | Docs |
|---|---|---|---|
GPTBot GPTBot/1.0 (+https://openai.com/gptbot)... | OpenAI | ✓ | Docs → |
ChatGPT-User ChatGPT-User (+https://openai.com/bot)... | OpenAI | ✓ | Docs → |
OAI-SearchBot OAI-SearchBot/1.0 (+https://openai.com/searchbot)... | OpenAI | ✓ | Docs → |
ClaudeBot ClaudeBot/1.0 (+https://www.anthropic.com)... | Anthropic | ✓ | Docs → |
Claude-Web Claude-Web (+https://www.anthropic.com)... | Anthropic | ✓ | Docs → |
Google-Extended Google-Extended... | ✓ | Docs → | |
PerplexityBot PerplexityBot (+https://perplexity.ai)... | Perplexity AI | ✓ | Docs → |
CCBot CCBot/2.0 (https://commoncrawl.org/faq/)... | Common Crawl | ✓ | Docs → |
Bytespider Bytespider (https://zhanzhang.toutiao.com/)... | ByteDance | ✓ | Docs → |
cohere-ai cohere-ai... | Cohere | ✓ | Docs → |
Amazonbot Amazonbot/0.1 (+https://developer.amazon.com/amazo... | Amazon | ✓ | Docs → |
SEO & Analytics Tools
Professional SEO tools that crawl sites for analysis
| Bot Name | Operator | Robots.txt | Docs |
|---|---|---|---|
AhrefsBot Mozilla/5.0 (compatible; AhrefsBot/7.0; +http://ah... | Ahrefs | ✓ | Docs → |
SemrushBot Mozilla/5.0 (compatible; SemrushBot/7~bl; +http://... | Semrush | ✓ | Docs → |
MJ12bot Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj... | Majestic | ✓ | Docs → |
Screaming Frog SEO Spider Screaming Frog SEO Spider... | Screaming Frog | ✓ | Docs → |
DotBot DotBot/1.2 (+https://opensiteexplorer.org/dotbot)... | Moz | ✓ | Docs → |
Example robots.txt for Maximum Crawler Access
# Allow all crawlers
User-agent: *
Allow: /
# Explicitly allow AI crawlers
User-agent: GPTBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: CCBot
Allow: /
# Sitemap location
Sitemap: https://example.com/sitemap.xmlThis site uses a comprehensive robots.txt that explicitly welcomes all major crawlers. View our robots.txt file.
Learn More About Crawlers
Check out our blog for in-depth articles about web crawler behavior.
Read Our Blog
Social Media Crawlers
Crawlers that fetch content for link previews and sharing