# === ALLOWED: Major search engines & legitimate crawlers === User-agent: Googlebot Allow: / User-agent: Googlebot-Image Allow: / User-agent: Bingbot Allow: / User-agent: Slurp Allow: / User-agent: DuckDuckBot Allow: / User-agent: Baiduspider Allow: / User-agent: YandexBot Allow: / # === BLOCKED: SEO scrapers === User-agent: SemrushBot Disallow: / User-agent: AhrefsBot Disallow: / User-agent: MJ12bot Disallow: / User-agent: DotBot Disallow: / User-agent: BLEXBot Disallow: / User-agent: PetalBot Disallow: / User-agent: serpstatbot Disallow: / User-agent: DataForSeoBot Disallow: / # === BLOCKED: AI training crawlers === User-agent: GPTBot Disallow: / User-agent: Bytespider Disallow: / User-agent: CCBot Disallow: / User-agent: anthropic-ai Disallow: / User-agent: Claude-Web Disallow: / User-agent: cohere-ai Disallow: / User-agent: FacebookBot Disallow: / User-agent: Omgilibot Disallow: / User-agent: YouBot Disallow: / User-agent: PerplexityBot Disallow: / # === BLOCKED: Aggressive/malicious crawlers === User-agent: SiteAuditBot Disallow: / User-agent: ZoominfoBot Disallow: / User-agent: Linguee Bot Disallow: / User-agent: meanpathbot Disallow: / User-agent: DomainCrawler Disallow: / User-agent: EmailCollector Disallow: / User-agent: EmailSiphon Disallow: / User-agent: WebBandit Disallow: / User-agent: EmailWolf Disallow: / User-agent: ExtractorPro Disallow: / # === DEFAULT: Allow legitimate unknown bots with crawl delay === User-agent: * Crawl-delay: 10