Un modo Alternativo di creare la Sitemap: sfruttando Googlebot!

giorgiotave

Un modo Alternativo di creare la Sitemap: sfruttando Googlebot!

Il titolo mi ha incuriosito: An Alternative Approach to XML Sitemaps.

Essendo che qui su Connect.gt ci sono persone molto preparate sono sempre a caccia di chicche da condividere. Un titolo come quello, in un settore tecnico...mi sono detto sarà interessante!

Guardate come genera la Sitemap:

You have a list of URLs you want Googlebot to crawl.

You generate an XML sitemap based on a restricted slice of the most recent of these URLs (in this example, the top 20,000).

You monitor your access logs for Googlebot requests.

Whenever Googlebot makes a request to one of the URLs you are monitoring, the URL is removed from your list. This removes it from /uncrawled.xml

This is appended to the appropriate long-term XML Sitemap (e.g. /posts-sitemap-45.xml). This step is optional.

Secondo lui funziona molto
Schermata 2020-07-17 alle 16.12.57.png

Vi consiglio la lettura e spero nel vostro parere

kal

Super interessante... mentre leggo mi sono imbattuto in questo:

I’ve been intrigued with the idea that the amount of short-term storage available impacting the indexing of lower quality content (I think Gary Illyes said this recently but can’t find a reference)

AH MA VEDI CHE ALLORA NON ERA SOLO UNA FISSA MIA?!

Adesso vado a linkargli il twit di Illyes.

Comunque credo che sia MOLTO rilevante questo articolo che fece @lowlevel oramai molto tempo fa:

http://www.lowlevel.it/seo-serendipita-cosa-si-scopre-su-googlebot-quando-meno-te-laspetti/

Con tanto di video:

Quindi il limite di 1000 URL è una cosa decisamente concreta e "hardcoded" dentro il comportamento di scansione di Googlebot.

Certamente questa cosa va anche a impattare sulle sitemap.XML.