Web Crawler Reference

Complete guide to 31+ web crawlers, bots, and spiders. Learn about their user agents, operators, and purposes.

7

Search Engine

11

AI & Machine Learning

8

Social Media

5

SEO & Analytics Tools

Search Engine Crawlers

Major search engines that index web content for search results

Bot NameOperatorRobots.txtDocs
Googlebot
Googlebot/2.1; +http://www.google.com/bot.html...
GoogleDocs →
Googlebot Smartphone
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X) Apple...
GoogleDocs →
Bingbot
Mozilla/5.0 (compatible; bingbot/2.0; +http://www....
MicrosoftDocs →
DuckDuckBot
DuckDuckBot/1.0; (+http://duckduckgo.com/duckduckb...
DuckDuckGoDocs →
Slurp
Mozilla/5.0 (compatible; Yahoo! Slurp; http://help...
YahooDocs →
Baiduspider
Mozilla/5.0 (compatible; Baiduspider/2.0; +http://...
BaiduDocs →
YandexBot
Mozilla/5.0 (compatible; YandexBot/3.0; +http://ya...
YandexDocs →

AI & Machine Learning Crawlers

Crawlers used by AI companies for training language models

Bot NameOperatorRobots.txtDocs
GPTBot
GPTBot/1.0 (+https://openai.com/gptbot)...
OpenAIDocs →
ChatGPT-User
ChatGPT-User (+https://openai.com/bot)...
OpenAIDocs →
OAI-SearchBot
OAI-SearchBot/1.0 (+https://openai.com/searchbot)...
OpenAIDocs →
ClaudeBot
ClaudeBot/1.0 (+https://www.anthropic.com)...
AnthropicDocs →
Claude-Web
Claude-Web (+https://www.anthropic.com)...
AnthropicDocs →
Google-Extended
Google-Extended...
GoogleDocs →
PerplexityBot
PerplexityBot (+https://perplexity.ai)...
Perplexity AIDocs →
CCBot
CCBot/2.0 (https://commoncrawl.org/faq/)...
Common CrawlDocs →
Bytespider
Bytespider (https://zhanzhang.toutiao.com/)...
ByteDanceDocs →
cohere-ai
cohere-ai...
CohereDocs →
Amazonbot
Amazonbot/0.1 (+https://developer.amazon.com/amazo...
AmazonDocs →

Social Media Crawlers

Crawlers that fetch content for link previews and sharing

Bot NameOperatorRobots.txtDocs
Twitterbot
Twitterbot/1.0...
X (Twitter)~Docs →
FacebookBot
facebookexternalhit/1.1 (+http://www.facebook.com/...
Meta~Docs →
LinkedInBot
LinkedInBot/1.0 (compatible; Mozilla/5.0; +http://...
LinkedIn~Docs →
Applebot
Mozilla/5.0 (Applebot/0.1; +http://www.apple.com/g...
AppleDocs →
Slackbot
Slackbot-LinkExpanding 1.0 (+https://api.slack.com...
Slack~Docs →
Discordbot
Mozilla/5.0 (compatible; Discordbot/2.0; +https://...
Discord~Docs →
TelegramBot
TelegramBot (like TwitterBot)...
Telegram~Docs →
WhatsApp
WhatsApp/2.0 (+https://www.whatsapp.com/)...
Meta/WhatsApp~Docs →

SEO & Analytics Tools

Professional SEO tools that crawl sites for analysis

Bot NameOperatorRobots.txtDocs
AhrefsBot
Mozilla/5.0 (compatible; AhrefsBot/7.0; +http://ah...
AhrefsDocs →
SemrushBot
Mozilla/5.0 (compatible; SemrushBot/7~bl; +http://...
SemrushDocs →
MJ12bot
Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj...
MajesticDocs →
Screaming Frog SEO Spider
Screaming Frog SEO Spider...
Screaming FrogDocs →
DotBot
DotBot/1.2 (+https://opensiteexplorer.org/dotbot)...
MozDocs →

Example robots.txt for Maximum Crawler Access

# Allow all crawlers
User-agent: *
Allow: /

# Explicitly allow AI crawlers
User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: CCBot
Allow: /

# Sitemap location
Sitemap: https://example.com/sitemap.xml

This site uses a comprehensive robots.txt that explicitly welcomes all major crawlers. View our robots.txt file.

Learn More About Crawlers

Check out our blog for in-depth articles about web crawler behavior.

Read Our Blog