What Is an XML Sitemap?

An XML sitemap is a file that lists all important URLs on your site so search engines can discover them efficiently. The protocol is defined by sitemaps.org and is supported by Google, Bing, Yahoo, and Yandex. Without a sitemap, Googlebot (Google's crawling robot) must follow links from page to page to find your content — a slower and less reliable process.

XML Structure of a Sitemap

A valid sitemap starts with the XML declaration and uses the root element <urlset>. Each page is wrapped in a <url> element with the following tags:

  • loc: the absolute URL of the page (required)
  • lastmod: last modification date in ISO 8601 format (recommended)
  • changefreq: estimated modification frequency (always, hourly, daily, weekly, monthly, yearly, never) — advisory only
  • priority: relative importance from 0.0 to 1.0 (default 0.5) — advisory only
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/</loc>
    <lastmod>2025-11-01</lastmod>
    <changefreq>weekly</changefreq>
    <priority>1.0</priority>
  </url>
</urlset>

Limits and Sitemap Index

A sitemap cannot exceed 50,000 URLs or 50 MB uncompressed. For large sites (e-commerce, media), a sitemap index is used: a master file referencing multiple child sitemaps. Each child sitemap lists its own URLs. Google allows up to 500 sitemaps in one index.

Specialized sitemaps also exist for images (Google Images) and videos (Google Video), using dedicated namespace extensions.

Submitting Your Sitemap to Google

Two complementary methods exist. The first is adding a directive in your robots.txt:

Sitemap: https://example.com/sitemap.xml

The second — and most reliable — is to submit the sitemap directly via Google Search Console (Indexing › Sitemaps menu). You then get statistics: submitted URLs, indexed URLs, detected errors.

Common Mistakes to Avoid

The most frequent: including pages marked noindex (meta robots tag or X-Robots-Tag header) in the sitemap. This is contradictory — you are simultaneously asking Google to index and not index the URL. Google will either ignore it or behave unpredictably. Only include pages you want indexed, accessible (HTTP 200), and not blocked by robots.txt.

Other mistakes: relative URLs instead of absolute, HTTP instead of HTTPS, pages with 301 redirects, or a sitemap that is never updated.

Dynamic vs. Static Generation

For small sites, a static sitemap is sufficient. For frequently updated sites (blogs, e-commerce), dynamic generation is preferred: the sitemap is produced on the fly by your CMS (WordPress, Shopify) or web framework, always reflecting the actual state of the catalog. Validate your sitemap with Google Search Console's XML test tool before submitting.

Audit Your Sitemap with TheSiteFuse

A misconfigured sitemap directly hurts your indexation. Run a free audit to detect duplicated pages, noindex URLs in your sitemap, and XML structure errors.