✨ Train your first AI chatbot free — no credit card neededStart free →
Alee
← All resources
Guides · 14 min read

Free Sitemap Analytics: The Complete Guide

Master free sitemap analytics: track crawl coverage, fix indexing gaps, and turn raw XML data into real SEO wins — without spending a cent.

Your sitemap is a promise you make to search engines — "these are my pages, please index them." Free sitemap analytics tells you how well that promise is being kept. Without it, you're publishing content into a black box, hoping Google found it, with no idea whether it did.

This guide covers everything you need to know about tracking and acting on that data: the tools worth using, what the numbers mean, how to diagnose the gaps that actually hurt rankings, and how to set up a monitoring routine you'll actually stick to. No paid subscriptions required for any of it.

Key takeaways

  • Google Search Console is the best free sitemap analytics tool — and most sites use less than 20% of what it offers
  • Submitted vs. indexed gaps above 15–20% are a red flag worth investigating immediately
  • Orphaned URLs (in your sitemap but not linked internally) are the most common cause of low indexing rates
  • Sitemap analytics should feed your content strategy, not just your dev checklist
  • Combining sitemap coverage data with chatbot question analytics is a powerful way to find content gaps

---

What "free sitemap analytics" actually means

The phrase covers two related but distinct things. First, it means using free tools to analyze the technical health of your XML sitemap — errors, format validity, URL counts, coverage ratios. Second, it means the broader practice of using free analytics data to understand which of your sitemapped pages are being crawled, indexed, and surfaced in search.

Most guides focus entirely on the technical layer. That's a mistake. A sitemap with zero errors and 100% indexing is just a formality if the indexed pages aren't getting any traffic. The real goal of sitemap analytics is to close the loop: submitted → crawled → indexed → visited → converted.

The sitemap as a signal, not a guarantee

Submitting a URL in your sitemap doesn't force Google to index it. It signals that you consider the URL important. Google then decides whether to crawl it, and separately decides whether to index it. Your analytics work starts where that decision-making diverges from your expectations.

---

The best free sitemap analytics tools (and what each does well)

You don't need to pay to get meaningful sitemap data. These free tools, used together, give you everything a paid tool offers for the sitemap-specific use case.

Google Search Console (the non-negotiable one)

If you haven't submitted your sitemap in Search Console and aren't reviewing the Sitemaps report weekly, that's the single highest-leverage change you can make right now. It's free, it's authoritative (the data comes from Google itself), and the Sitemaps report gives you:

  • Submitted vs. discovered vs. indexed counts per sitemap file
  • Specific indexing errors with URL-level detail
  • Last crawled dates so you can see stale pages
  • Coverage warnings broken into categories: Crawled but not indexed, Discovered but not indexed, Excluded by robots.txt, Duplicate without canonical, etc.

The Coverage report (now called "Pages" in the newer Search Console interface) is where the real diagnostic work happens. Filter it by "Sitemap" to see only URLs you've explicitly submitted, and the gap between that count and the indexed count is your first signal.

Bing Webmaster Tools

Often overlooked, Bing Webmaster Tools offers its own sitemap submission and coverage data. For B2B and enterprise sites especially, Bing can drive a meaningful share of organic search traffic — often more than site owners expect. The Sitemaps section shows crawl status, errors, and indexing progress in much the same way Search Console does. Worth five minutes to set up and check monthly.

Screaming Frog SEO Spider (free tier)

The free version of Screaming Frog crawls up to 500 URLs. For smaller sites that's plenty. Its sitemap-specific value is that you can:

  1. Upload your XML sitemap(s)
  2. Run a crawl
  3. Compare sitemap URLs against crawled URLs to find orphans, redirects, and broken links hiding in your sitemap

That "Sitemap vs. Crawl" comparison is something Google Search Console doesn't give you directly, so these two tools complement each other well.

Google's XML Sitemap Validator (via Search Console Rich Results or third-party validators)

Before you analyze analytics, make sure the file itself is valid. Common format errors — URLs with unescaped special characters, missing <lastmod> formats, sitemap index files that reference non-existent child sitemaps — can prevent search engines from parsing your file at all. Google Search Console surfaces these under Sitemaps > [your sitemap URL] > "See details." Third-party validators like XML-Sitemaps.com's validator or Sitechecker's free tool catch errors before you've even submitted.

A quick comparison table

| Tool | Crawl data | Indexing data | Error detail | URL limit (free) | Best for |
|---|---|---|---|---|---|
| Google Search Console | Yes (Googlebot view) | Yes | Granular | Unlimited | Primary analytics, error diagnosis |
| Bing Webmaster Tools | Yes (Bingbot view) | Yes | Moderate | Unlimited | Bing coverage, secondary check |
| Screaming Frog (free) | Yes (your crawl) | No | High technical detail | 500 URLs | Orphan detection, redirect audits |
| XML-Sitemaps Validator | No | No | Format errors only | 500 URLs | Pre-submission validation |
| Ahrefs Webmaster Tools (free) | Partial | No | Moderate | Unlimited* | Backlink + basic crawl data |

*Ahrefs Webmaster Tools is free for verified site owners, gives you crawl errors and some sitemap insights, and is worth adding to your toolkit.

---

Reading the numbers: what your sitemap analytics data actually means

Having data and knowing what it means are different things. Here's how to interpret the core metrics from your sitemap coverage reports.

Submitted vs. indexed: the gap that matters most

Pull up Search Console > Sitemaps and look at two numbers side by side: URLs submitted and URLs indexed. The difference — often called the indexing gap — is your primary health indicator.

A gap under 10% is generally fine, especially for large sites where some pages are legitimately thin or duplicative. A gap between 10–20% warrants investigation. A gap above 20–30% is a problem that's costing you rankings right now.

What causes large indexing gaps?

  • Thin or near-duplicate content — pages Google considers too similar to others to index separately
  • Orphaned URLs — pages in your sitemap but not reachable through internal links (Googlebot follows links; if a page has no inbound links, it may never actually get crawled despite being in the sitemap)
  • Slow server response — Googlebot times out and deprioritizes crawling
  • Crawl budget issues — for large sites (10,000+ URLs), Google allocates a crawl budget; bloated sitemaps with low-value URLs eat into budget for your important pages
  • Manual actions or penalties — visible in Search Console under Security & Manual Actions

Crawled but not indexed: the most actionable category

Search Console breaks your unindexed pages into subcategories. "Crawled — currently not indexed" is the most actionable one. Google found the page, fetched the content, and decided not to index it. That's almost always a content quality signal.

The typical fix is to consolidate, improve, or noindex pages in this category. Don't keep submitting them in your sitemap if you haven't changed anything — that just uses crawl budget without result.

Discovered but not indexed: the queue category

"Discovered — currently not indexed" means Google knows the URL exists (from your sitemap or internal links) but hasn't crawled it yet. This is often a crawl budget issue or a sign that Google has deprioritized your site based on past quality signals.

For new sites or recently added content, some lag here is normal. If pages stay in this state for more than 4–8 weeks, that's the signal to investigate.

Sitemap-specific error codes

| Error type | What it means | Fix |
|---|---|---|
| noindex tag | URL is in sitemap but page has noindex meta tag — contradictory | Remove from sitemap or remove the noindex |
| Redirect | Sitemap URL redirects to another URL | Update sitemap to use the canonical destination |
| 4xx (404, 410) | URL returns an error | Remove from sitemap, fix or redirect the page |
| Soft 404 | Page loads but Google considers content as "not found" | Add real content or redirect to a relevant page |
| Blocked by robots.txt | Sitemap URL is disallowed in robots.txt | Fix the robots.txt conflict |

These appear in the Coverage report filtered by Sitemap. Work through them in priority order: 4xx errors first (they waste crawl budget), then redirect chains, then noindex conflicts.

---

How to set up a sitemap monitoring routine (under 30 minutes a month)

Checking sitemap coverage once and forgetting it is the most common mistake. Free sitemap analytics only pays off when you build a consistent review habit. Here's a minimal routine that fits into any workflow.

Weekly (5 minutes)

Log into Google Search Console. Check if there are new errors in the Coverage / Pages report. If the submitted URL count has changed significantly (new sitemap file or content published), check that the number makes sense.

Monthly (20 minutes)

  1. Open the Sitemaps report and record submitted vs. indexed counts in a simple spreadsheet
  2. Look at the trend: is the indexing gap growing, shrinking, or stable?
  3. Export the "Crawled — currently not indexed" URLs and bucket them by template (blog posts, product pages, landing pages, etc.)
  4. Identify which template has the highest failure rate — that's where your content quality work should go
  5. Cross-reference with your top organic landing pages (Performance report, sorted by clicks) to make sure your highest-traffic pages don't appear in the unindexed list

Quarterly (30 minutes)

Run Screaming Frog against your full sitemap. Export the "Sitemap" tab and compare it to your crawl results. Look for:

  • Pages in sitemap that return 3xx or 4xx
  • Pages in sitemap that have no internal links pointing to them
  • Pages NOT in sitemap that are receiving organic traffic (add them)
  • Canonical mismatches between sitemap URL and the canonical tag on the page

---

Common sitemap mistakes that undermine your analytics

Including every URL, including the ones that shouldn't be indexed

A sitemap is not an inventory of every URL on your site. It's a curated list of pages you want indexed. Including thin category pages, faceted navigation URLs, session-parameter URLs, or admin pages in your sitemap actively hurts your crawl efficiency.

Audit your sitemap file with fresh eyes once a quarter. If a URL wouldn't deserve to rank in Google, it probably shouldn't be in your sitemap.

Submitting a sitemap and never checking back

Google Search Console will tell you when a sitemap file returns a 404 (yes, your sitemap URL itself can break). It will tell you when the format becomes invalid. It will warn you when indexing drops. But it won't alert you unless you've set up email alerts (you can do this under Settings > Email preferences in Search Console). Turn those on.

Confusing "submitted" with "indexed" when reporting to stakeholders

"We have 1,200 pages in our sitemap" is not the same as "we have 1,200 indexed pages." If you're presenting SEO progress to a client or executive team, make sure the distinction is clear and you're reporting the indexed count, not the submitted count.

Ignoring the content question behind the analytics

Free sitemap analytics will tell you that a page isn't indexed. It won't tell you why on its own — you have to read the content. A "Crawled — currently not indexed" signal on a 150-word product page is almost certainly a thin-content problem, not a technical one. The analytics surface the symptom; you have to diagnose the cause.

---

Using sitemap data to improve your content strategy

This is where tracking your sitemap coverage shifts from a technical SEO task to a genuine content intelligence tool.

Finding content gaps through question analytics

One of the most effective techniques is combining sitemap coverage data with the questions real visitors are asking. If 40% of your blog posts are in the "discovered but not indexed" bucket and you're getting hundreds of chatbot conversations a week asking questions those posts were meant to answer, that's a direct signal to rewrite or consolidate.

Tools like Alee let you see exactly what questions your visitors are asking your AI chatbot — not the keywords you guessed they'd search for, but the exact words they use. When you cross-reference those question clusters against your sitemap analytics, you often find a clear pattern: the pages that aren't being indexed are the ones that don't actually answer the question behind the search query.

Prioritizing content updates

Sort your unindexed pages by estimated traffic potential (use your keyword research data or the impressions column in Search Console Performance). Update the top 10–20% first — improve depth, add examples, cover related questions. Resubmit via the URL Inspection tool. Track whether indexing status changes within 30 days. That's a simple, repeatable improvement cycle.

Identifying your site's content authority clusters

Sitemap analytics will often reveal uneven indexing across different content categories. You might find that your tutorial pages have 90% indexing rates while your comparison pages have 40%. That tells you where Google trusts your content and where it doesn't — information that should directly influence where you invest writing effort.

The tutorials section and resources on sites with good topical authority typically index faster and more completely than thin promotional pages. Your sitemap analytics will confirm this pattern for your specific site.

---

Free sitemap analytics across different site types

The same principles apply across site types, but the priorities differ.

E-commerce sites

The main challenge is URL sprawl: faceted navigation, filter parameters, and variant URLs can create thousands of near-duplicate URLs. Free sitemap analytics almost always reveals massive indexing gaps on e-commerce sites for this reason.

Focus on:

  • Keeping product sitemap limited to canonical product URLs only
  • Separating category, product, and blog sitemaps into separate files in a sitemap index
  • Monitoring the indexing rate per sitemap file to catch which section is underperforming

Content-heavy blogs and media sites

The common problem here is publishing velocity that outpaces crawl budget allocation. Posting 20 articles a week on a domain with modest authority means new content takes 4–8 weeks to index.

The fix isn't to submit more — it's to build internal link structures that give Googlebot a path to new content from your high-authority pages, and to consistently improve existing content so Google allocates more crawl budget to your domain overall.

SaaS and software product sites

SaaS sites tend to have well-structured sitemaps but often miss key sections: help center articles, feature landing pages, and comparison pages frequently go un-submitted. Check that your sitemap covers features, comparison pages (like Alee vs SiteGPT), and your full documentation or help content.

Pricing pages deserve special attention. They convert, they attract links, and many SaaS sites accidentally noindex them or omit them from sitemaps entirely. Check yours — and see Alee's pricing as a reference for a well-structured SaaS pricing page.

---

Going beyond free: when it's worth paying

Free sitemap analytics tools handle 80–90% of what most sites need. The scenarios where a paid tool earns its cost:

  • Large sites (100,000+ URLs) where you need daily crawl data and can't wait for Googlebot's schedule
  • Enterprise multi-domain setups where you need consolidated analytics across dozens of properties
  • JavaScript-heavy sites where a standard crawler doesn't render the page the way Googlebot does and you need a JS-rendering crawl

Even then, start with the free layer — Google Search Console's data is authoritative and free, and it should always anchor your analysis even if you layer paid tools on top.

---

How Alee connects sitemap health to chatbot content

If you're running an AI chatbot trained on your own content, your sitemap and your knowledge base are tightly linked. Pages that aren't indexed often aren't being included in your chatbot's training content either — and both problems trace back to the same root cause: the content isn't substantive enough, or isn't being reached by crawlers.

Alee's knowledge brain ingests content from your sitemap URL, individual pages, PDFs, and other sources, then surfaces the closest content when a visitor asks a question. When you review your sitemap analytics and find gaps, checking whether those same pages are effectively answering chatbot questions gives you a second confirmation signal. If visitors are asking about something your bot can't answer well AND those topic pages aren't indexing, you have clear evidence of a content quality problem worth fixing.

Start free and connect your sitemap to see which pages are actually powering your chatbot's answers — it's a fast way to cross-reference sitemap coverage with real visitor intent.

---

Frequently asked questions

What is sitemap analytics?

Sitemap analytics is the practice of measuring how many of the URLs in your XML sitemap are actually being crawled and indexed by search engines, and using that data to improve your site's visibility. It draws primarily on data from Google Search Console's Sitemaps and Coverage reports, supplemented by crawl tools like Screaming Frog.

Is Google Search Console really free for sitemap analytics?

Yes — completely free, with no URL limits for your verified properties. Google Search Console is the most authoritative source for sitemap coverage data because it comes directly from Googlebot. There's no paid tier; it's a free tool for all site owners.

How often should I check my sitemap analytics?

A quick weekly check (under 5 minutes) for new errors, plus a deeper monthly review of the submitted-vs-indexed gap and the error breakdown categories. Quarterly, run a crawler audit to catch redirect chains, orphaned URLs, and canonical mismatches that Search Console doesn't surface clearly on its own.

Why are some of my sitemapped pages not being indexed?

The most common reasons: thin or near-duplicate content that Google doesn't consider distinct enough to index; orphaned pages with no internal links making them hard for Googlebot to reach; crawl budget limitations on large sites; pages that have a noindex tag contradicting the sitemap submission; or pages returning redirects or errors that make the sitemap URL invalid. The Coverage report in Search Console breaks this down by category for each specific URL.

Can sitemap analytics help me improve my chatbot's answers?

Yes. If you're using a content-trained AI chatbot, the pages your chatbot draws from should overlap closely with your indexed pages. Free sitemap analytics helps you identify content gaps — topics visitors are asking about where you don't have indexed, substantive pages. Fixing those gaps improves both your chatbot's knowledge base and your organic search coverage simultaneously.

---

Start tracking your sitemap health today — [start free](/signup) with Alee, connect your site, and see exactly which pages are indexed, which are being asked about, and where your biggest content gaps are hiding.

Build your own AI chatbot with Alee

Train it on your site, embed it anywhere, capture leads 24/7. Free to start.

Related reading