Inktomi/Slurp non rispetta robots.txt
Le mie [url=http://en.wikipedia.org/wiki/Honeypot_%28electronics%29]honeypot mi dicono che Inktomi/Slurp (Yahoo! Search) non rispetta robots.txt.
User-agent: * Disallow: /honeypot/
Questa directory viene richiesta periodicamente da Slurp, es.:
User-agent: Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)
Qualcun altro rileva lo stesso comportamento?
Ho segnalato il problema a Yahoo! Search. Vi copincollo la loro risposta, che penso possa essere di pubblico interesse:
Thanks for writing the Yahoo! Search and Directory Support.
I have investigated the issue and it seems that your robots.txt is written in a way that will prevent our crawler from recognizing the excluded directories. When Slurp access the robots.txt it will search for either user-agent: slurp OR * (asterisk), once it finds one or the other it will obey the exclusions and not look further. In your specific case that means that Slurp will only see the following:
Since Slurp does not look any further, it does not see the exclusions for user-agent: *. To remedy this, I would suggest adding the excluded folder to the user-agent: slurp part of your robots.txt, so it will read:
Once you update this, Slurp should stop crawling those folders within a day or two. I apologize for the inconvenience.
For answers to other questions you may have regarding Yahoo! Search, please see:
For answers to other questions you may have regarding the Yahoo!
Directory, please see:
Search & Directory Support
Original Message Follows:
I'd like to ask about ...
Reporting a Problem
What is your name?
Please enter your question, comment, or suggestion:
Slurp is not respecting the robots.txt file at
(/honeypot/ is disallowed for all user-agents).
2006-02-22 (Wed) 14:03:02
"HTTP_HOST: www.example.com" "HTTP_ACCEPT: */*" "HTTP_USER_AGENT: Mozilla/5.0"
(compatible; Yahoo! Slurp;
"HTTP_ACCEPT_ENCODING: gzip, x-gzip"
Webmaster @ Example.com
Yahoo ID: unknown : no amt link
Browser: Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:22.214.171.124)
Date Originated: Wednesday February 22, 2006 - 05:42:15