- Home
- Categorie
- Digital Marketing
- SEO
- Robots.txt e esclusione user-agent
- 
							
							
							
							
							
Robots.txt e esclusione user-agentStavo cercando una versione della pagina di un grosso portale italiano con archive.org ma inserendo il nome del dominio mi son trovato la risposta che il crawling non era abilitato a causa delle impostazioni del robots.txt. 
 A questo punto curiosando come fosse configurato questo robots.txt ho visto che vengono messi in disallow certi user-agent. Controllando anche su giorgiotave.it esiste una lista, ma è diversa da questa.
 La mia domanda a questo punto è: è una best-practise da adottare quella di bloccare certi user-agent?
 Esiste una lista aggiornata e affidabile degli user-agent che è meglio siano bloccati?
 User-agent: Googlebot 
 ************ omesso ******************User-agent: Mediapartners-Google 
 Allow: /User-agent: * 
 *********** omesso **********************Some bots are known to be trouble, particularly those designed to copyentire sites. Please obey robots.txt.User-agent: grub-client 
 Disallow: /User-agent: grub 
 Disallow: /User-agent: looksmart 
 Disallow: /User-agent: WebZip 
 Disallow: /User-agent: larbin 
 Disallow: /User-agent: b2w/0.1 
 Disallow: /User-agent: psbot 
 Disallow: /User-agent: Python-urllib 
 Disallow: /User-agent: NetMechanic 
 Disallow: /User-agent: URL_Spider_Pro 
 Disallow: /User-agent: CherryPicker 
 Disallow: /User-agent: EmailCollector 
 Disallow: /User-agent: EmailSiphon 
 Disallow: /User-agent: WebBandit 
 Disallow: /User-agent: EmailWolf 
 Disallow: /User-agent: ExtractorPro 
 Disallow: /User-agent: CopyRightCheck 
 Disallow: /User-agent: Crescent 
 Disallow: /User-agent: SiteSnagger 
 Disallow: /User-agent: ProWebWalker 
 Disallow: /User-agent: CheeseBot 
 Disallow: /User-agent: LNSpiderguy 
 Disallow: /User-agent: ia_archiver 
 Disallow: /User-agent: ia_archiver/1.6 
 Disallow: /User-agent: Teleport 
 Disallow: /User-agent: TeleportPro 
 Disallow: /User-agent: MIIxpc 
 Disallow: /User-agent: Telesoft 
 Disallow: /User-agent: Website Quester 
 Disallow: /User-agent: moget/2.1 
 Disallow: /User-agent: WebZip/4.0 
 Disallow: /User-agent: WebStripper 
 Disallow: /User-agent: WebSauger 
 Disallow: /User-agent: WebCopier 
 Disallow: /User-agent: NetAnts 
 Disallow: /User-agent: Mister PiX 
 Disallow: /User-agent: WebAuto 
 Disallow: /User-agent: TheNomad 
 Disallow: /User-agent: WWW-Collector-E 
 Disallow: /User-agent: RMA 
 Disallow: /User-agent: libWeb/clsHTTP 
 Disallow: /User-agent: asterias 
 Disallow: /User-agent: httplib 
 Disallow: /User-agent: turingos 
 Disallow: /User-agent: spanner 
 Disallow: /User-agent: InfoNaviRobot 
 Disallow: /User-agent: Harvest/1.5 
 Disallow: /User-agent: Bullseye/1.0 
 Disallow: /User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95) 
 Disallow: /User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0 
 Disallow: /User-agent: CherryPickerSE/1.0 
 Disallow: /User-agent: CherryPickerElite/1.0 
 Disallow: /User-agent: WebBandit/3.50 
 Disallow: /User-agent: NICErsPRO 
 Disallow: /User-agent: Microsoft URL Control - 5.01.4511 
 Disallow: /User-agent: DittoSpyder 
 Disallow: /User-agent: Foobot 
 Disallow: /User-agent: WebmasterWorldForumBot 
 Disallow: /User-agent: SpankBot 
 Disallow: /User-agent: BotALot 
 Disallow: /User-agent: lwp-trivial/1.34 
 Disallow: /User-agent: lwp-trivial 
 Disallow: /User-agent: BunnySlippers 
 Disallow: /User-agent: Microsoft URL Control - 6.00.8169 
 Disallow: /User-agent: URLy Warning 
 Disallow: /User-agent: Wget/1.6 
 Disallow: /User-agent: Wget/1.5.3 
 Disallow: /User-agent: Wget 
 Disallow: /User-agent: LinkWalker 
 Disallow: /User-agent: cosmos 
 Disallow: /User-agent: moget 
 Disallow: /User-agent: hloader 
 Disallow: /User-agent: humanlinks 
 Disallow: /User-agent: LinkextractorPro 
 Disallow: /User-agent: Offline Explorer 
 Disallow: /User-agent: Mata Hari 
 Disallow: /User-agent: LexiBot 
 Disallow: /User-agent: Web Image Collector 
 Disallow: /User-agent: The Intraformant 
 Disallow: /User-agent: True_Robot/1.0 
 Disallow: /User-agent: True_Robot 
 Disallow: /User-agent: BlowFish/1.0 
 Disallow: /User-agent: JennyBot 
 Disallow: /User-agent: MIIxpc/4.2 
 Disallow: /User-agent: BuiltBotTough 
 Disallow: /User-agent: ProPowerBot/2.14 
 Disallow: /User-agent: BackDoorBot/1.0 
 Disallow: /User-agent: toCrawl/UrlDispatcher 
 Disallow: /User-agent: WebEnhancer 
 Disallow: /User-agent: suzuran 
 Disallow: /User-agent: VCI WebViewer VCI WebViewer Win32 
 Disallow: /User-agent: VCI 
 Disallow: /User-agent: Szukacz/1.4 
 Disallow: /User-agent: QueryN Metasearch 
 Disallow: /User-agent: Openfind data gathere 
 Disallow: /User-agent: Openfind 
 Disallow: /User-agent: Xenu's Link Sleuth 1.1c 
 Disallow: /User-agent: Xenu's 
 Disallow: /User-agent: Zeus 
 Disallow: /User-agent: RepoMonkey Bait & Tackle/v1.01 
 Disallow: /User-agent: RepoMonkey 
 Disallow: /User-agent: Microsoft URL Control 
 Disallow: /User-agent: Openbot 
 Disallow: /User-agent: URL Control 
 Disallow: /User-agent: Zeus Link Scout 
 Disallow: /User-agent: Zeus 32297 Webster Pro V2.9 Win32 
 Disallow: /User-agent: Webster Pro 
 Disallow: /User-agent: EroCrawler 
 Disallow: /User-agent: LinkScan/8.1a Unix 
 Disallow: /User-agent: Keyword Density/0.9 
 Disallow: /User-agent: Kenjin Spider 
 Disallow: /User-agent: Iron33/1.0.2 
 Disallow: /User-agent: Bookmark search tool 
 Disallow: /User-agent: GetRight/4.2 
 Disallow: /User-agent: FairAd Client 
 Disallow: /User-agent: Gaisbot 
 Disallow: /User-agent: Aqua_Products 
 Disallow: /User-agent: Radiation Retriever 1.1 
 Disallow: /User-agent: Flaming AttackBot 
 Disallow: /User-agent: Oracle Ultra Search 
 Disallow: /User-agent: MSIECrawler 
 Disallow: /User-agent: PerMan 
 Disallow: /User-agent: searchpreview 
 Disallow: /User-agent: UbiCrawler 
 Disallow: /User-agent: IsraBot 
 Disallow: /User-agent: Orthogaffe 
 Disallow: /User-agent: DOC 
 Disallow: /User-agent: Zao 
 Disallow: /BFrank, 25.04.07