✨ Train your first AI chatbot free — no credit card neededStart free →
Alee
← All resources
Guides · 14 min read

Sitemap Generator: The Complete Guide for 2026

Everything you need about sitemap generators — what they do, how to pick one, step-by-step setup, and common mistakes that kill crawl coverage.

A sitemap generator is one of the most underrated tools in your SEO toolkit. It creates a structured file — usually XML — that tells search-engine crawlers which pages exist on your site, how often they change, and which ones matter most. Without one, Google has to discover your pages by following links alone. With one, you hand the crawler a roadmap. This guide covers everything from the basics to advanced configuration, so you can stop guessing and start getting indexed.

Key takeaways

  • An XML sitemap doesn't guarantee ranking, but it dramatically improves crawl coverage — especially for new sites, large sites, and pages buried deep in your navigation.
  • Most tools are free for sites under a few hundred pages; paid tiers add scheduling, change detection, and image/video sitemaps.
  • Submit your sitemap to Google Search Console and Bing Webmaster Tools — it takes two minutes and doubles your reach.
  • Dynamic sites (e-commerce, SaaS, blogs) should auto-generate their sitemap; static sites can get away with a one-time tool.
  • Alee's sitemap source is one of the fastest ways to train an AI chatbot on your full site — more on that below.

---

What a sitemap generator actually does

At its core, a sitemap generator crawls your website (or reads your CMS database) and outputs a text file that follows the Sitemap Protocol — an open standard maintained by sitemaps.org and supported by all major search engines. The resulting file looks like structured XML:

```xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/about</loc>
<lastmod>2026-05-10</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
</urlset>
```

Four tags matter here. <loc> is the canonical URL. <lastmod> tells Google when you last edited it — if you fib here, Google will notice the mismatch and eventually ignore the field altogether. <changefreq> is a hint, not a command; treat it as advisory. <priority> (0.1–1.0) signals relative importance within your own site; Google has confirmed it pays limited attention to this, but it doesn't hurt to set it accurately.

The tool automates all of this. Instead of building and maintaining that XML by hand across hundreds of URLs, the generator discovers, formats, and — if it's a good one — keeps the file current.

Types of sitemaps

  • XML sitemap — the standard, supported by all crawlers. Use this.
  • HTML sitemap — a human-readable page listing your site's structure; useful for UX but not what search engines primarily read.
  • Image sitemap — an extension that lists images on each page, helping Google Images index your visual content.
  • Video sitemap — similar but for hosted video; required if you want rich results in video search.
  • News sitemap — for Google News partners, with a 48-hour freshness window.

Most businesses only need an XML sitemap. If you run a photography portfolio or product catalog with lots of images, add an image sitemap extension. Don't overthink the rest.

---

Why your site needs a sitemap generator right now

If your site has fewer than five pages and every page is linked from your homepage, Google will almost certainly find everything. The sitemap becomes critical the moment any of these are true:

  1. You have more than ~50 pages. Navigation links don't always reach every URL, especially filtered or paginated pages.
  2. Your site is new. Fresh domains have low PageRank and fewer external links pointing at them. A sitemap compensates while you build authority.
  3. You publish frequently. Blogs, news sites, and stores with rotating inventory benefit from a sitemap that signals update cadence.
  4. You have pages with few or no inbound links. These are invisible to link-following crawlers. Your sitemap is the only way to surface them.
  5. You use JavaScript-heavy rendering. Crawlers are getting better at JS, but they can still miss dynamically rendered pages that never appear in anchor tags.

Google's own documentation puts it plainly: sitemaps help Google "learn about pages on your site it might not otherwise discover." That's the job.

---

The best sitemap generator tools compared

There's no single best option — the right tool depends on your CMS, site size, and whether you need real-time updates or a one-off file.

| Tool | Best for | Price | Auto-updates | Image sitemap | Notes |
|------|----------|-------|--------------|---------------|-------|
| Yoast SEO (WordPress) | WordPress sites | Free / €99/yr Premium | Yes (on publish) | Yes | Industry standard for WP; tightly integrated |
| Rank Math (WordPress) | WordPress power users | Free / ~$59/yr | Yes | Yes | More granular controls than Yoast |
| Google Search Console | Verifying & monitoring | Free | No (manual submit) | No | Not a generator, but the submission endpoint |
| XML-sitemaps.com | Static / non-CMS sites | Free (500 pages) / ~$3.99/mo | No (manual crawl) | Yes | Good for one-off audits |
| Screaming Frog | Technical SEO audits | Free (500 URLs) / £259/yr | Semi-manual | Yes | Best for developers and agencies |
| Jetpack (WordPress) | WordPress + performance | Free tier available | Yes | Limited | Part of a broader plugin suite |
| Next.js / framework built-ins | React/Node apps | Free (open source) | Yes (at build) | Configurable | next-sitemap npm package is the standard |
| Astro / Gatsby plugins | Jamstack | Free | Yes (at build) | Yes | @astrojs/sitemap, gatsby-plugin-sitemap |

For most WordPress sites, Yoast SEO's free tier is all you need. It auto-generates and updates your sitemap every time you publish or edit. If you want per-post schema control, Rank Math gives you more knobs.

For non-CMS or custom-coded sites, use xml-sitemaps.com for a quick one-off file, or integrate a framework plugin if you're on a Jamstack stack.

For large sites (10,000+ URLs), use Screaming Frog or a server-side solution. Online crawlers time out on large sites and don't give you the filtering you need.

---

How to generate your sitemap: step-by-step

WordPress (using Yoast SEO)

  1. Install and activate Yoast SEO from the plugin directory.
  2. Go to SEO → General → Features and confirm "XML sitemaps" is toggled on.
  3. Click the question-mark icon next to XML sitemaps and then "See the XML sitemap." Your sitemap lives at yourdomain.com/sitemap_index.xml.
  4. Decide what to include. Under SEO → Search Appearance, flip the toggle for post types and taxonomies you don't want indexed — author archives, tag pages, and attachment pages are common exclusions.
  5. Submit the URL to Google Search Console (covered below).

Yoast creates a sitemap index (a sitemap of sitemaps) by default, with separate child sitemaps for posts, pages, categories, and custom post types. This keeps individual files under Google's 50,000-URL limit.

Static or custom sites (xml-sitemaps.com)

  1. Go to xml-sitemaps.com and enter your domain.
  2. Set crawl options: maximum depth (3–5 is usually enough), last modification date format, change frequency, and priority. Defaults are fine for most sites.
  3. Click Start and wait for the crawl.
  4. Download the resulting sitemap.xml.
  5. Upload it to your site's root (/sitemap.xml) via FTP, cPanel, or your hosting dashboard.
  6. Submit to Search Console.

One limitation: the free tier caps at 500 pages. If your site is larger, either use the paid tier or switch to Screaming Frog.

Next.js (next-sitemap)

  1. npm install next-sitemap
  2. Create next-sitemap.config.js in your project root:

```js
/* @type {import('next-sitemap').IConfig} /
module.exports = {
siteUrl: 'https://yourdomain.com',
generateRobotsTxt: true,
exclude: ['/admin/', '/api/'],
};
```

  1. Add a postbuild script to package.json: "postbuild": "next-sitemap".
  2. Deploy. Your sitemap will be at /sitemap.xml and /sitemap-0.xml, with a robots.txt pointing to it.

The exclude array is important — you don't want internal API routes or admin pages in your sitemap.

---

Submitting your sitemap to Google and Bing

Generating the file is only half the job. You need to tell search engines where it lives.

Google Search Console

  1. Go to search.google.com/search-console and select your property.
  2. In the left sidebar, click Sitemaps under the Index section.
  3. Paste your sitemap URL (e.g., sitemap_index.xml) and click Submit.
  4. Check back in a few days to confirm the status shows "Success" and the discovered URL count looks right.

If the count is significantly lower than your actual page count, investigate which pages Google is seeing versus which exist. Common culprits: noindex meta tags on pages you meant to include, canonical tags pointing elsewhere, or URLs blocked in robots.txt.

Bing Webmaster Tools

Bing still drives meaningful traffic in many markets, especially in the US, India (via Microsoft Edge), and for older demographics. Submitting there takes three minutes:

  1. Go to bing.com/webmasters and verify your site.
  2. Click SitemapsSubmit sitemap.
  3. Paste the URL and submit.

Bing also indexes your sitemap automatically if you include a Sitemap: directive in your robots.txt — a nice passive backup:

```
Sitemap: https://yourdomain.com/sitemap.xml
```

---

Advanced sitemap configuration

Sitemap index files

If your site has more than 50,000 URLs — or even just multiple content types you want to organize separately — use a sitemap index. It's a master XML file that lists individual sitemap files, each capped at 50,000 URLs and 50 MB uncompressed.

```xml
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://example.com/sitemap-posts.xml</loc>
<lastmod>2026-06-18</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap-products.xml</loc>
<lastmod>2026-06-18</lastmod>
</sitemap>
</sitemapindex>
```

Hreflang sitemaps (multilingual sites)

If your site serves content in multiple languages, you can add hreflang annotations directly in the sitemap instead of in <head> tags. This is cleaner for large sites where editing every page template isn't practical. Check Google's documentation on this format — it uses the xhtml:link extension namespace.

Dynamic sitemaps

For SaaS apps, marketplaces, or any site where URLs are created by user actions, a static generated file won't stay current. The cleaner solution is a server-rendered sitemap endpoint — a route in your app (/sitemap.xml) that queries your database and streams XML on the fly, with appropriate caching headers. This way the sitemap is always accurate without a separate generation step.

---

Common sitemap mistakes to avoid

These errors come up repeatedly in SEO audits, and each one chips away at crawl efficiency.

Including non-canonical URLs. If example.com/page and www.example.com/page both appear in your sitemap, you're sending mixed signals. Every URL in your sitemap should be the canonical version — the one you're telling Google to prefer via <link rel="canonical">.

Listing noindex pages. A page with <meta name="robots" content="noindex"> should never appear in your sitemap. Including it tells Google: "index this" while the page itself says "don't index this." Google will usually respect the noindex, but the contradiction wastes crawl budget.

Wrong `lastmod` dates. Either pull this from your CMS's actual last-modified timestamp or leave the field out. Setting static dates or lying about modification times trains Google to ignore the field.

Forgetting to update after structural changes. If you change your URL structure, add a subdomain, or launch a new section, your sitemap needs to reflect it immediately. Auto-generated sitemaps handle this; manually maintained ones often don't.

Blocking the sitemap in robots.txt. Sounds obvious, but it happens. Check that robots.txt doesn't disallow /sitemap.xml or the directory where your sitemap lives.

Overloading with low-quality pages. Pagination pages, thin tag archives, and URL parameter variations bloat your sitemap and dilute the crawl budget. Be selective: if a page isn't worth indexing, keep it out.

---

Sitemap generators and AI chatbot training

Here's a connection worth knowing about: sitemap files are the fastest way to feed an AI chatbot all the content from your website at once.

When you set up Alee — an AI chatbot that you train on your own content — you can point it at your sitemap URL rather than pasting dozens of individual page links. Alee reads the sitemap, crawls every listed URL, chunks the content, and embeds it into a private knowledge base. Ask it a question, and it retrieves the most relevant chunks and has an LLM write an answer grounded only in your content, with source links. No hallucinations.

This means the quality of your sitemap directly affects the quality of your chatbot. If your sitemap is incomplete or outdated, your bot won't know about pages you've published. If it includes low-quality tag archives, those get ingested too. Clean sitemaps make sharper bots — another reason to spend five minutes getting the generator configuration right.

You can start for free with one chatbot and up to 200 messages per month. Browse the resources section for examples of teams using sitemap-driven chatbots in production.

---

How to audit your existing sitemap

Even if you've had a sitemap running for years, it's worth auditing it periodically. Here's a quick checklist:

  • [ ] Fetch your sitemap URL in a browser and confirm it loads without errors
  • [ ] Check the URL count against your actual page count — large discrepancies signal missing pages or over-inclusion
  • [ ] Open Google Search Console → Sitemaps and confirm status is "Success"
  • [ ] Compare "Submitted" vs "Indexed" URL counts; a large gap suggests indexing issues worth investigating
  • [ ] Run the sitemap through a validator (xml-sitemaps.com has one, as does Screaming Frog)
  • [ ] Confirm no noindex pages appear in the sitemap
  • [ ] Verify all canonical URLs match the <loc> values
  • [ ] Check that lastmod values are realistic and recent for pages you've updated

If your Search Console shows hundreds of submitted URLs but only a fraction indexed, the problem is usually page quality, not the sitemap itself. Thin content, duplicate content, and slow page speed are the top culprits. The tool told Google where to look; the pages themselves need to be worth indexing.

---

Platform-specific sitemap generators and choosing the right one

Shopify

Shopify auto-generates a sitemap at yourdomain.com/sitemap.xml with no setup required. It includes products, collections, pages, and blog posts. You can't remove individual URLs from it natively (without an app), but you can prevent specific pages from appearing in search by adding noindex via a theme edit or an SEO app.

Squarespace

Squarespace also auto-generates sitemaps at /sitemap.xml. Coverage is reasonably complete but doesn't include password-protected pages or pages disabled in your navigation by design. No plugins needed.

Wix

Wix generates sitemaps automatically through its SEO Wiz. You can access and verify it via Marketing & SEO → Get Found on Google in your Wix dashboard.

Ghost

Ghost includes a built-in sitemap at /sitemap.xml for all published posts, pages, tags, and authors. Configuration is minimal — you can exclude specific pages from indexing in the page settings.

Custom / headless builds

If you're running a headless CMS (Contentful, Sanity, Strapi) with a custom frontend, build a server-rendered /sitemap.xml route in your frontend framework. Fetch published content from your CMS API at build time (or request time with caching) and generate the XML dynamically. This is the most reliable approach because it stays synchronized with your content model without a separate crawl step.

A decision framework

Ask yourself three questions:

1. What CMS or framework am I on?
WordPress → Yoast or Rank Math. Shopify/Squarespace/Wix/Ghost → built-in. Next.js/Astro/Gatsby → framework plugin. Custom → roll your own endpoint.

2. How often does my content change?
Daily → you need auto-generation tightly coupled to your publish workflow. Monthly or less → a one-off crawl tool with a reminder to regenerate works fine.

3. How large is my site?
Under 500 pages → any free tool handles it. 500–50,000 → most paid tiers cover it. Above 50,000 → you need programmatic generation with sitemap index files, not an online crawler.

If you're on a Jamstack build and feeding your content into an AI assistant, point Alee at your sitemap during onboarding — it'll crawl everything in one pass and keep the knowledge base fresh when you regenerate and re-import. Check the tutorials for a step-by-step walkthrough of connecting a sitemap to an Alee chatbot.

For a full breakdown of how Alee compares to similar tools, Alee vs SiteGPT walks through the differences in knowledge ingestion, embedding quality, and chatbot customization. See pricing to find the plan that fits your site size.

---

Frequently asked questions

Does having a sitemap guarantee Google will index my pages?

No. A sitemap tells Google where your pages are; it doesn't guarantee they'll be indexed. Google still evaluates each URL for crawlability, page quality, and duplicate content. Think of the sitemap as a nomination — Google decides whether to accept it.

How often should I update my sitemap?

For auto-generated sitemaps (Yoast, Shopify, framework plugins), updates happen automatically and you don't need to think about it. For manually generated files, regenerate whenever you add new pages, change your URL structure, or delete content. At a minimum, audit quarterly.

What's the difference between a sitemap and robots.txt?

A sitemap is an invitation — it tells crawlers what you want them to visit. robots.txt is a restriction — it tells crawlers what they cannot visit. They work together but don't overlap: a URL in your sitemap can still be blocked by robots.txt (which is a configuration error), and pages not in your sitemap can still be crawled if they have inbound links.

Can a sitemap help with my site's rankings?

Indirectly, yes. Better crawl coverage means more of your pages are eligible to rank. If important pages were previously undiscovered, getting them indexed can drive additional organic traffic. The sitemap itself doesn't boost the ranking of pages that are already indexed — content quality and links do that.

How do I use my sitemap to train an AI chatbot on my website content?

Tools like Alee accept a sitemap URL as an input source during setup. Alee crawls every URL listed, extracts the content, and builds a private knowledge base from it. When a visitor asks a question, the system retrieves the most relevant content chunks and generates a grounded answer using an LLM. It's the most efficient way to get an AI chatbot up to speed on a large site — far faster than pasting pages one at a time.

---

Ready to put your sitemap to work? Once your sitemap is clean and submitted, use it as the foundation for an AI chatbot that answers visitor questions around the clock — start free on Alee and have your first bot live in under ten minutes.

Build your own AI chatbot with Alee

Train it on your site, embed it anywhere, capture leads 24/7. Free to start.

Related reading