- Home
- Categorie
- Digital Marketing
- Posizionamento Nei Motori di Ricerca
- Robots.txt e esclusione user-agent
-
Robots.txt e esclusione user-agent
Stavo cercando una versione della pagina di un grosso portale italiano con archive.org ma inserendo il nome del dominio mi son trovato la risposta che il crawling non era abilitato a causa delle impostazioni del robots.txt.
A questo punto curiosando come fosse configurato questo robots.txt ho visto che vengono messi in disallow certi user-agent. Controllando anche su giorgiotave.it esiste una lista, ma è diversa da questa.
La mia domanda a questo punto è: è una best-practise da adottare quella di bloccare certi user-agent?
Esiste una lista aggiornata e affidabile degli user-agent che è meglio siano bloccati?
User-agent: Googlebot
************ omesso ******************User-agent: Mediapartners-Google
Allow: /User-agent: *
*********** omesso **********************Some bots are known to be trouble, particularly those designed to copy
entire sites. Please obey robots.txt.
User-agent: grub-client
Disallow: /User-agent: grub
Disallow: /User-agent: looksmart
Disallow: /User-agent: WebZip
Disallow: /User-agent: larbin
Disallow: /User-agent: b2w/0.1
Disallow: /User-agent: psbot
Disallow: /User-agent: Python-urllib
Disallow: /User-agent: NetMechanic
Disallow: /User-agent: URL_Spider_Pro
Disallow: /User-agent: CherryPicker
Disallow: /User-agent: EmailCollector
Disallow: /User-agent: EmailSiphon
Disallow: /User-agent: WebBandit
Disallow: /User-agent: EmailWolf
Disallow: /User-agent: ExtractorPro
Disallow: /User-agent: CopyRightCheck
Disallow: /User-agent: Crescent
Disallow: /User-agent: SiteSnagger
Disallow: /User-agent: ProWebWalker
Disallow: /User-agent: CheeseBot
Disallow: /User-agent: LNSpiderguy
Disallow: /User-agent: ia_archiver
Disallow: /User-agent: ia_archiver/1.6
Disallow: /User-agent: Teleport
Disallow: /User-agent: TeleportPro
Disallow: /User-agent: MIIxpc
Disallow: /User-agent: Telesoft
Disallow: /User-agent: Website Quester
Disallow: /User-agent: moget/2.1
Disallow: /User-agent: WebZip/4.0
Disallow: /User-agent: WebStripper
Disallow: /User-agent: WebSauger
Disallow: /User-agent: WebCopier
Disallow: /User-agent: NetAnts
Disallow: /User-agent: Mister PiX
Disallow: /User-agent: WebAuto
Disallow: /User-agent: TheNomad
Disallow: /User-agent: WWW-Collector-E
Disallow: /User-agent: RMA
Disallow: /User-agent: libWeb/clsHTTP
Disallow: /User-agent: asterias
Disallow: /User-agent: httplib
Disallow: /User-agent: turingos
Disallow: /User-agent: spanner
Disallow: /User-agent: InfoNaviRobot
Disallow: /User-agent: Harvest/1.5
Disallow: /User-agent: Bullseye/1.0
Disallow: /User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95)
Disallow: /User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
Disallow: /User-agent: CherryPickerSE/1.0
Disallow: /User-agent: CherryPickerElite/1.0
Disallow: /User-agent: WebBandit/3.50
Disallow: /User-agent: NICErsPRO
Disallow: /User-agent: Microsoft URL Control - 5.01.4511
Disallow: /User-agent: DittoSpyder
Disallow: /User-agent: Foobot
Disallow: /User-agent: WebmasterWorldForumBot
Disallow: /User-agent: SpankBot
Disallow: /User-agent: BotALot
Disallow: /User-agent: lwp-trivial/1.34
Disallow: /User-agent: lwp-trivial
Disallow: /User-agent: BunnySlippers
Disallow: /User-agent: Microsoft URL Control - 6.00.8169
Disallow: /User-agent: URLy Warning
Disallow: /User-agent: Wget/1.6
Disallow: /User-agent: Wget/1.5.3
Disallow: /User-agent: Wget
Disallow: /User-agent: LinkWalker
Disallow: /User-agent: cosmos
Disallow: /User-agent: moget
Disallow: /User-agent: hloader
Disallow: /User-agent: humanlinks
Disallow: /User-agent: LinkextractorPro
Disallow: /User-agent: Offline Explorer
Disallow: /User-agent: Mata Hari
Disallow: /User-agent: LexiBot
Disallow: /User-agent: Web Image Collector
Disallow: /User-agent: The Intraformant
Disallow: /User-agent: True_Robot/1.0
Disallow: /User-agent: True_Robot
Disallow: /User-agent: BlowFish/1.0
Disallow: /User-agent: JennyBot
Disallow: /User-agent: MIIxpc/4.2
Disallow: /User-agent: BuiltBotTough
Disallow: /User-agent: ProPowerBot/2.14
Disallow: /User-agent: BackDoorBot/1.0
Disallow: /User-agent: toCrawl/UrlDispatcher
Disallow: /User-agent: WebEnhancer
Disallow: /User-agent: suzuran
Disallow: /User-agent: VCI WebViewer VCI WebViewer Win32
Disallow: /User-agent: VCI
Disallow: /User-agent: Szukacz/1.4
Disallow: /User-agent: QueryN Metasearch
Disallow: /User-agent: Openfind data gathere
Disallow: /User-agent: Openfind
Disallow: /User-agent: Xenu's Link Sleuth 1.1c
Disallow: /User-agent: Xenu's
Disallow: /User-agent: Zeus
Disallow: /User-agent: RepoMonkey Bait & Tackle/v1.01
Disallow: /User-agent: RepoMonkey
Disallow: /User-agent: Microsoft URL Control
Disallow: /User-agent: Openbot
Disallow: /User-agent: URL Control
Disallow: /User-agent: Zeus Link Scout
Disallow: /User-agent: Zeus 32297 Webster Pro V2.9 Win32
Disallow: /User-agent: Webster Pro
Disallow: /User-agent: EroCrawler
Disallow: /User-agent: LinkScan/8.1a Unix
Disallow: /User-agent: Keyword Density/0.9
Disallow: /User-agent: Kenjin Spider
Disallow: /User-agent: Iron33/1.0.2
Disallow: /User-agent: Bookmark search tool
Disallow: /User-agent: GetRight/4.2
Disallow: /User-agent: FairAd Client
Disallow: /User-agent: Gaisbot
Disallow: /User-agent: Aqua_Products
Disallow: /User-agent: Radiation Retriever 1.1
Disallow: /User-agent: Flaming AttackBot
Disallow: /User-agent: Oracle Ultra Search
Disallow: /User-agent: MSIECrawler
Disallow: /User-agent: PerMan
Disallow: /User-agent: searchpreview
Disallow: /User-agent: UbiCrawler
Disallow: /User-agent: IsraBot
Disallow: /User-agent: Orthogaffe
Disallow: /User-agent: DOC
Disallow: /User-agent: Zao
Disallow: /BFrank, 25.04.07