- Home
- Categorie
- Digital Marketing
- Consigli su Penalizzazioni e Test SEO
- Inktomi/Slurp non rispetta robots.txt
-
Inktomi/Slurp non rispetta robots.txt
Le mie [url=http://en.wikipedia.org/wiki/Honeypot_%28electronics%29]honeypot mi dicono che Inktomi/Slurp (Yahoo! Search) non rispetta robots.txt.
Esempio:
User-agent: * Disallow: /honeypot/
Questa directory viene richiesta periodicamente da Slurp, es.:
[url=http://whois.sc/66.196.91.15]66.196.91.15
User-agent: Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)Qualcun altro rileva lo stesso comportamento?
-
Ho segnalato il problema a Yahoo! Search. Vi copincollo la loro risposta, che penso possa essere di pubblico interesse:
Hello Xxx,
Thanks for writing the Yahoo! Search and Directory Support.
I have investigated the issue and it seems that your robots.txt is written in a way that will prevent our crawler from recognizing the excluded directories. When Slurp access the robots.txt it will search for either user-agent: slurp OR * (asterisk), once it finds one or the other it will obey the exclusions and not look further. In your specific case that means that Slurp will only see the following:
User-Agent: Slurp
Crawl-Delay: 20Since Slurp does not look any further, it does not see the exclusions for user-agent: *. To remedy this, I would suggest adding the excluded folder to the user-agent: slurp part of your robots.txt, so it will read:
User-Agent: Slurp
Crawl-Delay: 20
Disallow: /honeypot/
Disallow: /etc/Once you update this, Slurp should stop crawling those folders within a day or two. I apologize for the inconvenience.
For answers to other questions you may have regarding Yahoo! Search, please see:
http://help.yahoo.com/help/us/ysearch/For answers to other questions you may have regarding the Yahoo!
Directory, please see:
http://help.yahoo.com/help/us/dir/Xxx
Search & Directory Support
Yahoo! Inc.Original Message Follows:
Mail-Id: xxx
I'd like to ask about ...
Reporting a ProblemWhat is your name?
Xxx XxxPlease enter your question, comment, or suggestion:
Hello,Slurp is not respecting the robots.txt file at
http://www.example.com/robots.txt
(/honeypot/ is disallowed for all user-agents).============================================
2006-02-22 (Wed) 14:03:02"HTTP_HOST: www.example.com" "HTTP_ACCEPT: */*" "HTTP_USER_AGENT: Mozilla/5.0"
(compatible; Yahoo! Slurp;
http://help.yahoo.com/help/us/ysearch/slurp)
"HTTP_ACCEPT_ENCODING: gzip, x-gzip"
"REMOTE_ADDR: 66.196.91.137"
"SERVER_PROTOCOL: HTTP/1.0"
"REQUEST_METHOD: GET"
"QUERY_STRING: "
"REQUEST_URI: /honeypot/"Xxx Xxx
Webmaster @ Example.comWhile Viewing:
Yahoo ID: unknown : no amt link
Browser: Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.0.1)
Gecko/20060111 Firefox/1.5.0.1
REMOTE_ADDR: xxx.xxx.xxx.xxx
REMOTE_HOST: unknown
Date Originated: Wednesday February 22, 2006 - 05:42:15