Web Crawler Reference

Complete guide to 31+ web crawlers, bots, and spiders. Learn about their user agents, operators, and purposes.

Search Engine

AI & Machine Learning

Social Media

SEO & Analytics Tools

Search Engine Crawlers

Major search engines that index web content for search results

Bot Name	Operator	Purpose	Robots.txt	Docs
Googlebot Googlebot/2.1; +http://www.google.com/bot.html...	Google	Primary crawler for Google Search indexing	✓	Docs →
Googlebot Smartphone Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X) Apple...	Google	Mobile-first indexing crawler	✓	Docs →
Bingbot Mozilla/5.0 (compatible; bingbot/2.0; +http://www....	Microsoft	Primary crawler for Bing Search	✓	Docs →
DuckDuckBot DuckDuckBot/1.0; (+http://duckduckgo.com/duckduckb...	DuckDuckGo	Crawler for DuckDuckGo search engine	✓	Docs →
Slurp Mozilla/5.0 (compatible; Yahoo! Slurp; http://help...	Yahoo	Yahoo Search crawler	✓	Docs →
Baiduspider Mozilla/5.0 (compatible; Baiduspider/2.0; +http://...	Baidu	Primary crawler for Baidu (Chinese search engine)	✓	Docs →
YandexBot Mozilla/5.0 (compatible; YandexBot/3.0; +http://ya...	Yandex	Primary crawler for Yandex (Russian search engine)	✓	Docs →

AI & Machine Learning Crawlers

Crawlers used by AI companies for training language models

Bot Name	Operator	Purpose	Robots.txt	Docs
GPTBot GPTBot/1.0 (+https://openai.com/gptbot)...	OpenAI	Collects training data for GPT models	✓	Docs →
ChatGPT-User ChatGPT-User (+https://openai.com/bot)...	OpenAI	Fetches pages when users share URLs with ChatGPT	✓	Docs →
OAI-SearchBot OAI-SearchBot/1.0 (+https://openai.com/searchbot)...	OpenAI	OpenAI's search feature crawler	✓	Docs →
ClaudeBot ClaudeBot/1.0 (+https://www.anthropic.com)...	Anthropic	Collects training data for Claude AI models	✓	Docs →
Claude-Web Claude-Web (+https://www.anthropic.com)...	Anthropic	Fetches web content for Claude's web features	✓	Docs →
Google-Extended Google-Extended...	Google	Google's AI/ML training crawler (separate from search)	✓	Docs →
PerplexityBot PerplexityBot (+https://perplexity.ai)...	Perplexity AI	Crawler for Perplexity AI search engine	✓	Docs →
CCBot CCBot/2.0 (https://commoncrawl.org/faq/)...	Common Crawl	Open dataset used by many AI projects	✓	Docs →
Bytespider Bytespider (https://zhanzhang.toutiao.com/)...	ByteDance	ByteDance/TikTok's web crawler	✓	Docs →
cohere-ai cohere-ai...	Cohere	Training data collection for Cohere AI models	✓	Docs →
Amazonbot Amazonbot/0.1 (+https://developer.amazon.com/amazo...	Amazon	Amazon's crawler for Alexa and other AI services	✓	Docs →

Social Media Crawlers

Crawlers that fetch content for link previews and sharing

Bot Name	Operator	Purpose	Robots.txt	Docs
Twitterbot Twitterbot/1.0...	X (Twitter)	Fetches content for Twitter Card previews	~	Docs →
FacebookBot facebookexternalhit/1.1 (+http://www.facebook.com/...	Meta	Fetches Open Graph data for Facebook shares	~	Docs →
LinkedInBot LinkedInBot/1.0 (compatible; Mozilla/5.0; +http://...	LinkedIn	Fetches content for LinkedIn post previews	~	Docs →
Applebot Mozilla/5.0 (Applebot/0.1; +http://www.apple.com/g...	Apple	Crawler for Siri and Spotlight suggestions	✓	Docs →
Slackbot Slackbot-LinkExpanding 1.0 (+https://api.slack.com...	Slack	Fetches content for link previews in Slack	~	Docs →
Discordbot Mozilla/5.0 (compatible; Discordbot/2.0; +https://...	Discord	Fetches content for link embeds in Discord	~	Docs →
TelegramBot TelegramBot (like TwitterBot)...	Telegram	Fetches content for Telegram link previews	~	Docs →
WhatsApp WhatsApp/2.0 (+https://www.whatsapp.com/)...	Meta/WhatsApp	Fetches link previews for WhatsApp shares	~	Docs →

SEO & Analytics Tools

Professional SEO tools that crawl sites for analysis

Bot Name	Operator	Purpose	Robots.txt	Docs
AhrefsBot Mozilla/5.0 (compatible; AhrefsBot/7.0; +http://ah...	Ahrefs	Backlink analysis and SEO research	✓	Docs →
SemrushBot Mozilla/5.0 (compatible; SemrushBot/7~bl; +http://...	Semrush	SEO and competitive analysis	✓	Docs →
MJ12bot Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj...	Majestic	Link intelligence and SEO metrics	✓	Docs →
Screaming Frog SEO Spider Screaming Frog SEO Spider...	Screaming Frog	Technical SEO auditing tool	✓	Docs →
DotBot DotBot/1.2 (+https://opensiteexplorer.org/dotbot)...	Moz	Moz's link analysis crawler	✓	Docs →

Example robots.txt for Maximum Crawler Access

# Allow all crawlers
User-agent: *
Allow: /

# Explicitly allow AI crawlers
User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: CCBot
Allow: /

# Sitemap location
Sitemap: https://example.com/sitemap.xml

This site uses a comprehensive robots.txt that explicitly welcomes all major crawlers. View our robots.txt file.

Learn More About Crawlers

Check out our blog for in-depth articles about web crawler behavior.

Read Our Blog