Ho segnalato il problema a Yahoo! Search. Vi copincollo la loro risposta, che penso possa essere di pubblico interesse:
Hello Xxx,
Thanks for writing the Yahoo! Search and Directory Support.
I have investigated the issue and it seems that your robots.txt is written in a way that will prevent our crawler from recognizing the excluded directories. When Slurp access the robots.txt it will search for either user-agent: slurp OR * (asterisk), once it finds one or the other it will obey the exclusions and not look further. In your specific case that means that Slurp will only see the following:
User-Agent: Slurp
Crawl-Delay: 20
Since Slurp does not look any further, it does not see the exclusions for user-agent: *. To remedy this, I would suggest adding the excluded folder to the user-agent: slurp part of your robots.txt, so it will read:
User-Agent: Slurp
Crawl-Delay: 20
Disallow: /honeypot/
Disallow: /etc/
Once you update this, Slurp should stop crawling those folders within a day or two. I apologize for the inconvenience.
For answers to other questions you may have regarding Yahoo! Search, please see:
http://help.yahoo.com/help/us/ysearch/
For answers to other questions you may have regarding the Yahoo!
Directory, please see:
http://help.yahoo.com/help/us/dir/
Xxx
Search & Directory Support
Yahoo! Inc.
Original Message Follows:
Mail-Id: xxx
I'd like to ask about ...
Reporting a Problem
What is your name?
Xxx Xxx
Please enter your question, comment, or suggestion:
Hello,
Slurp is not respecting the robots.txt file at
http://www.example.com/robots.txt
(/honeypot/ is disallowed for all user-agents).
============================================
2006-02-22 (Wed) 14:03:02
"HTTP_HOST: www.example.com"
"HTTP_ACCEPT: */*"
"HTTP_USER_AGENT: Mozilla/5.0"
(compatible; Yahoo! Slurp;
http://help.yahoo.com/help/us/ysearch/slurp)
"HTTP_ACCEPT_ENCODING: gzip, x-gzip"
"REMOTE_ADDR: 66.196.91.137"
"SERVER_PROTOCOL: HTTP/1.0"
"REQUEST_METHOD: GET"
"QUERY_STRING: "
"REQUEST_URI: /honeypot/"
Xxx Xxx
Webmaster @ Example.com
While Viewing:
Yahoo ID: unknown : no amt link
Browser: Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.0.1)
Gecko/20060111 Firefox/1.5.0.1
REMOTE_ADDR:
xxx.xxx.xxx.xxx
REMOTE_HOST: unknown
Date Originated: Wednesday February 22, 2006 - 05:42:15