asymptoticdesign

About Google Sitemaps

The Google sitemap is a way to submit a site to search engines like Google, Yahoo, Ask.com and soon to MSN, and to view statistics and feedback on crawling the site (in Google's case by using the Google Webmaster Tools). An XML or text file, a mini-database for the site's crawlable URLs, is created and uploaded to the site's server and submitted to Google and Yahoo.

To submit sitemaps to Yahoo, follow the link Submit Your Site for Free from the Submit Your Site to Yahoo page. To submit sitemaps to Ask.com, as explained in Ask.com FAQ, ping the sitemap URL via http://submissions.ask.com/ping?sitemap=http%3A//yoursitemapurl
MSN has new Webmaster Tools to submit sitemaps and view crawling stats at Live Search Webmaster Center

Main search engines can find automatically the sitemap URL from a Sitemap: line in the robots.txt file, as described in sitemaps.org - specifying the sitemap location in robots.txt

Sitemap: http://www.domain.com/some_sitemap_url.xml

A reference website for sitemaps is sitemaps.org and there are many other external links.
The Google sitemap can be in various formats: a plain text file UTF-8 encoded consisting of a list of crawlable URLs with one URL per line, or an XML file as specified in sitemaps.org protocol (contains for each URL information like the date when the file was last modified, an estimate of the frequency of change, priority over the other pages), with a version for news sitemaps, or syndication feeds RSS 2.0 and Atom 0.3, and OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting). Yahoo accepts, besides the text format and sitemaps.org XML format, feeds in RSS 0.9, RSS 1.0, RSS 2.0 or Atom 0.3 format. Several sitemaps can be submitted listed in a sitemap index file, for example in the case of a very large number of URLs or if URLs with high change frequency are listed in separate sitemaps from the other URLs. The information provided by sitemaps can help search engines, but there are no guarantees for indexing or improved page ranking.
There are some aspects that require care, like the UTF-8 character encoding of non-ASCII characters and the type of the URLs listed in the sitemap (these URLs should be public access documents, not for restricted access, and also with content, not a not-found error page for example). The URLs listed in the sitemap have to be from the same domain as the URL of the submitted sitemap, for example a sitemap submitted to Google with URL http://domain.com/sitemap.xml should contain URLs starting only with http://domain.com/, and not with http://www.domain.com/ (domain.com and www.domain.com are considered by Google Sitemaps as different domains).

URLs can be found by search engines from the sitemap alone, without links from other pages. I did a test stand alone page which was found by Googlebot only from an XML sitemap, but it appears in the supplemental index. Usually a page stays in the supplemental index if there is a problem about it, so no links from other pages seems to be considered problematic. I am linking now to the previously stand alone page to see if I can get it out of the supplemental index to the main index.

Google stats and robots.txt analysis

The feedback and stats in the Google Webmaster Tools consist of lists of URLs Googlebot could not crawl and why, Google search queries that lead to that website (top search queries and top search query clicks), most encountered words in content and anchor text of inbound links, list of external inbound links and internal links, download status and feedback on the robots.txt file. A Google sitemap file is not required for viewing this information. Google needs some verification to make these statistics available, and this is done by either uploading a file with a certain name to the server or by including a meta tag for verification in the home page. It is good practice for a website to return 404 (Not Found) HTTP response instead of 200 (OK) for a non-existent URI, so that search engines have a way of knowing which are the good URIs, and Google needs this set-up to do the verification by file. The verification by meta tag is an alternative for websites that have redirection for non-existent URLs to an error page which returns HTTP status response 200 (OK).

The Google Webmaster Tools panel provides also a feedback and testing tool for the robots.txt file explained in the Google Blog in Analyzing a robots.txt file. Other features are the possibility to chose the preferred domain for the Google index, www.domain.com or domain.com, downloading of data as csv files, submitting a re-inclusion request and reporting spam.

Google gives an indexing summary on the Google Site Status page.

Results

Google sitemaps can help search results by giving search engines information about crawlable URLs, and giving better access to crawling stats, but there are always more things to be done for optimising search results.

Some websites with sitemaps:

www.viviendas-valencia.com is a beautiful website with properties in beautiful... Valencia and surroundings

www.storkit.com mobile mini storage and self moving service Vancouver British Columbia is an established website with good search results in Yahoo and MSN for keywords like "mobile storage Vancouver". The Google sitemap improved Google search results. Permanent redirections with HTTP status 301 were done to concentrate all search results to canonical URLs, see Mat Cutts's page on canonical URLs

www.imperial-newton.com impact sockets and wrenches, torque multipliers and other products.

www.chilternimageservice.co.uk plan copying services for architects, builders, and new developers

www.romaniatours.us guided tours to Romania, special tour at Halloween, vacations to palaces, mountain resorts, painted monasteries, medieval towns, great food, and much more.