How to Perform a Sitemap Audit

You can think of an XML sitemap as your website’s roadmap. It’s what search engine crawlers follow to find and index the important pages of your site. A sitemap audit is an important part of SEO because it ensures that all of the correct URLs are included in the XML sitemap.

Google has an in-depth explainer about sitemaps if you’re interested in learning more.

What is an XML Sitemap?

The point of an XML sitemap is for search engines to understand the content and structure of your website. It crawls and indexes important pages so that it can show them to people who are searching for the information your site provides.

You may not need an XML sitemap (but you probably have one anyway).

Typically, an XML sitemap and a sitemap audit are only important for large websites. If you have a small website, you may not have to worry about this just yet. (However, it’s likely you still have a sitemap since WordPress creates them automatically.)

On a large website, incorrect URLs in the sitemap can have a negative impact on the crawl budget.

A crawl budget is how many pages a search engine will crawl within a certain timeframe. If a web crawler wastes time on incorrect URLs, it may not be able to get to the correct URLs, which means they won’t get indexed.

There’s an exception, though. If you have a small website without any backlinks, a sitemap will help search engine crawlers find and index your site. Otherwise, since no other sites are pointing to it, it could get missed completely.

Only important URLs should be part of a sitemap.

Not every single URL on your website needs to be part of your sitemap. Only the URLs that you want search engines to crawl and index should be included.

Pages to include in your sitemap:

  • Pages that you want to rank for in search results.
  • Pages that search engines may have a hard time finding.

You don’t need to include under-construction pages, non-canonical URLs, and other unimportant or duplicate pages.

How to Audit an XML Sitemap

When performing a sitemap audit, the XML sitemap should only have URLs with a 200 OK status code. You can read about status codes here.

Find the Website’s XML Sitemap

To perform a sitemap audit, you’ll want to find and review the XML sitemap for your website. There are a few go-to places to check first.

1. Add one of the following common sitemap locations to the end of your homepage URL:

  • /sitemap.xml
  • /sitemap_index.xml
  • /sitemap/

For example, when I enter lindsaypietroluongo.com/sitemap.xml or lindsaypietroluongo.com/sitemap_index.xml, I see my sitemap.

An XML sitemap for a website.

2. If none of the above work, try a few less-common sitemap locations:

  • /sitemap.php
  • /sitemap.txt
  • /sitemap.xml.gz

3. Add /robots.txt to the end of the URL. The sitemap URL should be one of the last lines.

For example, when I go to lindsaypietroluongo.com/robots.txt, this is what I see:

The XML sitemap URL in the robots.txt file.

I could then go to that URL to find my sitemap. You may also find our how to find a sitemap article useful.

Use Google Search Console to Check the XML Sitemap

When performing a sitemap audit, you should always check the Google Search Console (GSC) Sitemaps report. This report will tell you the following:

  • If Google has received a submission of the XML sitemap.
  • If there are any problems with fetching your XML sitemap.
  • Any indexing issues with submitted URLs.

This is an important step of a sitemap audit because it tells you how well the XML sitemap is performing and if there are any problems that need attention. Here’s how to check your sitemaps in GSC:

1. Go to Google Search Console and log in.

2. Click Sitemaps in the left sidebar and select an entry under Submitted sitemaps.

Google Search Console navigation to begin a sitemap audit.

3. Select one of the sitemaps indexed by Google.

Sitemaps in GSC.

4. Click Open Sitemap in the top-right corner.

Arrow to open up a sitemap in GSC for a sitemap audit.

Now you’ll see the sitemap, and you can review it.

A pages sitemap in GSC.

Address Issues that GSC Found

Aside from reviewing your XML sitemap as part of your sitemap audit, you can also review issues that Google found when trying to crawl your site.

1. Go to GSC, click Sitemaps in the left sidebar, then click one of your submitted sitemaps.

2. Select See Page Indexing toward the top.

A page indexing report link in GSC.

3. Uncheck the green Indexed box so that only the gray Not indexed box is selected.

The "not indexed" sitemap audit section in GSC.

4. Scroll down to the Why pages aren’t indexed section and click on a Reason to find out more.

Reasons for why pages aren't indexed in GSC.

Now you can look into the problem more and troubleshoot if necessary. For example, when I select Duplicate, Google chose different canonical than user, this is what I see:

Duplicate content report in GSC for a sitemap audit.

I can now research this URL to determine if Google’s indexing the correct URL or not. If not, I’ll take the steps necessary to fix it and have it reflected on my XML sitemap.

Use SEO Spider for Website Crawling and a Sitemap Audit

The Screaming Frog dashboard.

Aside from GSC, you can use the SEO Spider tool from Screaming Frog (or another similar tool) to crawl the website and XML sitemap or the sitemap only. This helps you find orphan URLs, meaning URLs that are in the sitemap but are not internally linked on the website. It also shows you URLs that were discovered in the crawl but are not included in the sitemap.

Crawl the Website and Sitemap

1. In the navigation bar at the very top, select Configuration > Spider > Crawl. This will open a pop-up box over the main Spider dashboard.

Navigation options in Spider for a sitemap audit.

2. The Crawl option should be automatically chosen for you in the left sidebar.

Left sidebar in navigation bar.

In the XML Sitemaps section, check Crawl Linked XML Sitemaps and Crawl These Sitemaps, then enter the sitemap URL for your website.

Configuring crawl settings in Spider.
Source: Screaming Frog

3. Go back to the Spider dashboard. Enter the website URL and click Start to begin the crawl.

Running a website crawl to help with a sitemap audit.

This will crawl the website and the XML sitemaps at the same time.

4. Click the Sitemaps tab. If you can’t find it, click the Down Arrow and uncheck tabs you don’t need at the moment. That should make the Sitemaps tab visual in that tab bar.

Navigating to the sitemaps tab.

5. Run a post-crawl analysis. Go to Crawl Analysis > Configure and make sure Sitemaps is checked.

Crawl analysis settings in Spider.
Source: Screaming Frog

Next, select Crawl Analysis > Start from the top navigation bar.

Crawl analysis settings.

When the analysis is complete, you’ll see a green box with Analysis 100% at the top of the Spider tool.

The green box that shows the analysis is complete.

6. Review the filters on the left side of the Spider tool. The dropdown menu lets you choose from seven filters.

Sitemap audit filter dropdown menu.

Here’s the data that each of them shows you to help with your sitemap audit:

  • URLs in Sitemap: All of the URLs included in the sitemap.
  • URLs not in Sitemap: URLs discovered in the website crawl that are not included in the sitemap. Review the URLS and, for the important ones that should be included in the sitemap, update it to include them.
  • Orphan URLs: URLs included in the sitemap but not found during the crawl. You can either remove the pages from the sitemap or link to them internally on your website so they appear in the next website crawl.
  • Non-indexable URLs in Sitemap: URLs that are in the sitemap but are not indexable. Either they should be removed from the sitemap, or you should fix their ability to be indexed. Yoast has a helpful article about what to do with an accidental noindex post.
  • URLs in Multiple Sitemaps: URLs that appear in more than one sitemap. While this may not be a problem, if a URL shouldn’t be appearing in multiple sitemaps, you can remove the extra instances.
  • XML Sitemap with over 50k URLs: This filter shows your sitemaps with over 50k URLs. According to Google, sitemaps have a size limit of 50,000 URLs or 50MB.
  • XML Sitemap over 50MB: This filter shows your sitemaps with a file size of 50MB, which is above the Google sitemap limit.

Crawl the Sitemap Only

While this won’t be as comprehensive as crawling both the website and the sitemap because you won’t be able to compare the two, you will get an overview of your sitemap with this method.

This is particularly useful if you’re only after certain information, like noindex pages or pages that return a status code other than 200 OK.

For large websites, it’s helpful to get that information so those pages can be removed from the sitemap when possible. That way, the crawl budget won’t get used up so quickly.

1. From the top navigation bar, select Mode > List.

Settings to run a sitemap audit without a website crawl.

2. Click the Upload dropdown menu and select Download XML Sitemap.

Option to download an XML sitemap to run an audit.

3. Enter the sitemap URL in the box that appears and click OK.

Entering the sitemap URL.

4. Perform a post-crawl analysis. From the top navigation bar, select Crawl Analysis > Start. Wait until you see the green box with Analysis 100% at the top.

5. Select All in the Filters dropdown on the left side of the tool. You can then scroll through all of the links and associated information, or you can click on the headers to sort them that way. For example, if I click Indexability, the noindex pages show up first.

Organizing the sitemap audit links by indexing.

Non-indexable URLs should either be removed from the sitemap or updated so they can be indexed.

Go to the Overview section on the right to dig into the URLs more. For example, there’s a Canonicals dropdown menu that lets you see just your website’s canonical links.

The canonical report in Spider.

There’s also a dropdown for Response Codes, a tab with all sorts of Issues, etc.

The issues report in Spider for a sitemap audit.

Dig into these different sections (and any others you want) to perform a thorough sitemap audit.

Final Thoughts About Your Sitemap Audit

Finding your sitemap and running an audit isn’t difficult, but combing through your website to figure out what to update can be time-consuming. Since your XML sitemap is what search engines use to find, index and show your content in search results, though, it’s a crucial part of website SEO.

Also, keep in mind that websites with strong internal linking make it easier for Google to crawl the site and add important URLs to the sitemap. Build internal links the smart way with Link Whisper.

Build A Powerful Internal Link Strategy Today

Enter your email & we'll send you 8 tips to build an internal link strategy.

And 3 things you should avoid doing with internal links.

    We won't send you spam. Unsubscribe at any time.

    Your Guide to Google Search Operators and How to Use Them

    Contents1 What Are Google Search Operators?2 The Complete List of Google Search Operators3 How to Use Google Search Operators3.1 1. Spy on Your Competitors3.2 2. Find Indexing Issues3.3 3. Look For Guest Posting Opportunities3.4 4. Find Competitor Mentions 3.5 5. Identify…

    Read More

    SEO Glossary: Essential Terms for Digital Marketers

    Contents1 SEO Fundamentals1.1 On-Page SEO1.2 Off-Page SEO2 Technical Aspects of SEO2.1 Crawling and Indexing2.2 Page Speed2.3 Mobile Friendliness2.4 Structured Data3 Keyword Research4 Content Optimization4.1 Meta Tags4.2 Headers4.3 URL Structure4.4 Internal Linking5 Link Building Strategies5.1 Natural Links5.2 Manual Outreach5.3 Self-Created Links6…

    Read More

    Get Started with LinkWhisper

    Speed Up the Process of Internal Linking and Help You Rank Better in Google

    Get LinkWhisper Now