How to Do a Website Content Inventory Without Missing Key Pages

Learn how to build a website content inventory so you can decide what to keep, update, merge, redirect, or retire without missing important pages.

A website content inventory is not glamorous, but it is how you avoid deleting something important, duplicating what already works, or redirecting a valuable page into a wall.

If you are trying to clean up a website and asking Which URLs actually matter?, What should stay live instead of getting merged?, Which files are still linked from somewhere obscure?, and Who is supposed to approve each decision?, this is the workflow that turns the mess into a usable queue.

Search engines and visitors both prefer structure over improvisation. Google’s guidance on sitemaps and redirects exists because websites tend to fail in familiar ways: important URLs disappear, weaker pages compete with stronger ones, and once-useful files linger long after anyone remembers why.

By the end of this guide, you will have a practical way to inventory every meaningful URL, classify it by intent, assign an owner, choose the next action, and run the quality checks that keep the final site tidy instead of accidentally theatrical.

Content inventory spreadsheet for website content planning.
A useful inventory makes pages, files, owners, and next actions visible in one place.

What “content inventory” means in a real site cleanup

A content inventory is a structured list of the pages, posts, attachments, PDFs, and other public-facing assets on a website, plus the decision-making data attached to each one. It tells you what exists, where it lives, who owns it, and what should happen next.

It helps to separate three terms that teams love to blend together until they become soup:

  • Inventory: the master list of URLs and assets.
  • Audit: the evaluation of quality, accuracy, usefulness, and overlap.
  • Pruning: the execution step where you update, merge, redirect, or retire content.

If you skip the inventory and jump straight into edits, you will make decisions from memory. Memory is not a content management system. It is an excellent way to forget the PDF that still gets downloads, the old service page with three strong backlinks, or the tutorial that quietly answers half of your support questions.

I treat the inventory as the operating system for the project. Once every URL has an owner, a status, and a next action, the work stops feeling like debate and starts feeling like execution.

Define the scope before the spreadsheet turns into folklore

The first job is deciding what belongs in the inventory. If you leave scope fuzzy, the sheet expands forever and nobody trusts the numbers.

Domains and subdomains

List every hostname connected to the site project:

  • Main website
  • Blog subdomain or resource hub
  • Help center or docs section
  • Landing page tools or campaign microsites
  • Regional or language-specific versions

A surprising amount of important content lives just outside the neat part of the navigation. If you only inventory the primary domain, you will miss the support section, the old campaign pages, and the file library that people still link to from somewhere on the internet with the stubbornness of a mislabeled cable.

Languages and audience variants

If the site has more than one language, add a language column immediately. Do not trust yourself to infer it later from the URL structure. Once English pages, translated pages, and regional variants are mixed together, the inventory becomes difficult to sort and even harder to redirect correctly.

Content types

At minimum, track these content types separately:

  • Pages
  • Blog posts
  • Category or index pages
  • Service or product pages
  • FAQ and support articles
  • PDFs and downloadable files
  • Media attachments tied to real user journeys

That last item matters more than teams expect. Attachments are often treated as scenery until a form guide disappears, a sales PDF breaks, or an image embed turns into a blank rectangle. The interface looked clean. The underlying system was less disciplined.

Choose inventory sources that expose different kinds of truth

No single source gives you the full picture. You need more than one because each source tells a different truth about the site.

Sitemap

Your XML sitemap is usually the fastest starting point for the URL list. It reflects the pages the site intends to surface publicly. Use it to seed the sheet, then compare it with other sources to find the gaps.

CMS export

A CMS export shows the editorial layer: pages, posts, authors, categories, attachments, and update dates. On WordPress, the official export guide is enough to pull a clean content export without treating the database like a dare.

Crawl data

A site crawl shows the public-facing version of the system. Good crawl data gives you:

  • Status code
  • Canonical target
  • Title and meta description
  • Word count
  • Internal inlinks
  • Depth from the homepage

If a URL exists in the CMS but not in a crawl, that tells you something. If a URL appears in a crawl but nowhere in the CMS, that also tells you something, and usually something slightly annoying.

Analytics and internal search

Analytics tells you which pages still attract attention. Landing pages, organic entrances, conversions, and assisted conversions are the practical signals. Google Analytics explains its Pages and screens reporting well enough to help you pull a quick top-pages view.

Internal search terms are even better than many teams realize. They show what visitors expected to find and could not find easily through navigation. That gives you user intent, not just page inventory.

Backlink and file references

If you have backlink data, use it. If you do not, at least pull server logs, search console exports, or internal link reports where available. The goal is simple: do not retire a quiet page that still has earned links or a file that sales, support, or customers still reference.

Build a tracking sheet with columns that actually change decisions

The best inventory sheets are not encyclopedias. They are decision tools. Add columns that change what happens next; skip the ones that only make the file feel impressively busy.

Column Why it matters Example value
URL The base unit of work /services/site-audit/
Content type Keeps pages, posts, PDFs, and attachments from being treated as identical Landing page
Topic / primary keyword Reveals overlap and intent website content inventory
Primary goal Shows what success means Contact request
Owner Assigns responsibility for the decision Content lead
Last updated Flags stale content without guesswork 2025-10-14
Traffic or entrances Separates useful pages from decorative debris 1,240 sessions in 90 days
Backlinks Useful when deciding whether to preserve or redirect 8 referring domains
Status Keeps the workflow moving Needs review
Next action Turns analysis into execution Update and keep URL

If you need a smaller starter version, keep seven columns: URL, type, topic, owner, traffic, status, and next action. That is enough to drive decisions without drowning the team in decorative metadata.

A simple starter template

URL: /guides/content-audit-checklist/
Type: Blog post
Topic: content audit checklist
Primary goal: Newsletter signup
Owner: Content team
Traffic: Medium
Backlinks: Yes
Status: Review
Next action: Update, keep URL, refresh examples

Example rows that show how the sheet works

URL Intent Signal Decision
/about/ Trust and orientation Low traffic, high importance Keep, update messaging, improve internal links
/blog/old-checklist/ Informational Strong backlinks, dated examples Update in place
/downloads/legacy-guide.pdf Support resource Steady downloads, poor mobile UX Replace with HTML page and redirect file URL if appropriate

The first pass should be quick. Resist the temptation to rewrite copy while inventorying. Inventory first. Then make decisions. Then edit. Mixing those steps is how a one-day review turns into a two-week wandering session with better spreadsheets and worse momentum.

Classify content by intent so unlike things stop competing

Most content inventories become noisy because everything is labeled “page.” That is not classification. That is surrender.

Classify each item by user intent:

  • Core landing pages: home, services, pricing, contact, about, key conversion pages.
  • Support and how-to content: tutorials, setup guides, FAQs, troubleshooting.
  • Category or index pages: blog index, resource hubs, directories, listing pages.
  • Seasonal or campaign content: temporary pages tied to launches, events, or promotions.
  • Internal resources exposed publicly: PDFs, attachment pages, forms, handbooks, specification files.

This classification matters because the decision rules are different. A core landing page with conversions and backlinks should usually stay at its URL and get updated carefully. A stale campaign page with no visits may be a clean redirect candidate. A PDF with consistent downloads may deserve an HTML replacement, not quiet deletion.

If you want a useful reference point for image and text accessibility while reviewing assets, the W3C image alt decision tree is a practical companion during the media pass.

Decide actions with a clear decision tree

Every row in the sheet should land on a specific next action. The standard set is usually enough:

  • Keep as-is: current, accurate, useful, and already serving the right intent.
  • Update: good URL, weak execution.
  • Consolidate: multiple pages chase the same topic or intent.
  • Replace: the topic matters, but the current page format is wrong.
  • Redirect: the old URL should send users to a better page with matching intent.
  • Retire: the page has no meaningful user value and no sensible replacement.

A simple decision tree looks like this:

  1. Does this URL still serve a real user need?
  2. If yes, is the topic still strategically relevant?
  3. If yes, does the current page satisfy that need well enough?
  4. If not, should the page be updated in place, replaced, or merged into a stronger page?
  5. If no user need remains, is there a clearly relevant destination for a redirect?
  6. If there is no relevant destination, retire the page cleanly instead of inventing a fake match.

Some quick examples:

  • A tutorial with backlinks and traffic but outdated screenshots: update.
  • Three thin pages targeting the same service intent: consolidate.
  • An old downloadable guide with a stronger HTML replacement: replace or redirect, depending on whether the file still matters.
  • An expired event page with no traffic, no links, and no ongoing purpose: retire.

Handle redirects like infrastructure, not glitter

Redirects are helpful when they preserve intent. They become damaging when they are used to hide weak decisions.

Redirect a page when:

  • The old URL still has links, traffic, or bookmarks
  • There is a close destination that serves the same user need
  • You are merging duplicates or removing an outdated version of a still-relevant topic

Do not redirect everything to the homepage because it feels convenient. That is not a redirect strategy. It is a polite way of telling users you gave up halfway through the map.

Keep your redirect sheet explicit:

Old URL New URL Reason Status
/old-checklist/ /website-content-inventory/ Merged into a stronger, current guide Ready
/downloads/old-guide.pdf /guides/content-audit/ HTML replacement offers the same information with better usability Ready

Then check for chains. If /a redirects to /b and /b redirects to /c, point /a directly to /c. Browsers tolerate chains. Users dislike them. Crawlers rarely send flowers.

Inventory media and attachments before they become invisible liabilities

Pages get most of the attention. Files cause a large share of the trouble.

For images, PDFs, and embedded files, capture:

  • File URL
  • File type
  • Where it is used
  • Whether it still supports an active page or journey
  • Whether the file needs better naming, replacement, or alt text

Media review usually exposes three problems quickly:

  • Broken embeds: the page is live, but the linked file moved or vanished.
  • Attachment clutter: old uploads exist publicly with no clear purpose.
  • Accessibility gaps: useful images have weak or missing alt text, especially in older content.

Prioritize alt text for images that carry information, not just decoration. For PDFs, decide whether the file should remain a file, be replaced with an HTML page, or be linked only from specific support flows. If a PDF is important enough to keep, it is important enough to track in the inventory.

Run quality checks after the sheet is complete

The inventory is not finished when the rows are filled in. It is finished when the map still holds up under verification.

Internal link coverage

Spot-check whether important pages are reachable from the homepage, the main navigation, and relevant hubs like the blog. Pages with no internal links are easy to forget and even easier for visitors to miss.

Canonical and metadata consistency

Confirm that every kept page has a clear title, a coherent meta description, and a canonical URL that points where you expect. Metadata drift is common when teams merge content quickly and forget to review what search engines and social previews will actually read.

Orphan page detection

An orphan page is a URL with no meaningful internal links pointing to it. Some pages are meant to be quiet; most are just stranded. Crawl data, CMS exports, and internal link reports together make these easier to find.

A final review checklist

  • Every meaningful URL exists in the sheet
  • Every row has an owner
  • Every row has one next action
  • Redirects point directly to the best destination
  • Important files are tracked, not assumed
  • Internal links support the pages you decided to keep
  • Metadata matches the final URL strategy

If the list is complete but no one knows who approves the actions, the project is still under-specified. Ownership is part of the inventory, not a separate meeting that appears later to drain morale.

Turn the inventory into a working workflow

A solid content inventory does not try to answer every editorial question at once. It creates a reliable queue: keep, update, merge, redirect, or retire. That is enough to move quickly without losing important pages or quietly breaking the site’s structure.

If you want a clean place to start, begin with the homepage, the About page, the Contact page, and the top few entries in your main navigation. Then expand into the long tail. The glamorous part can happen later. Systems work first. Drama is optional.

If you need help turning a messy inventory into an actual action plan, the team behind this site shares more practical guides in the blog, and you can use the contact page when you want a second set of eyes on the workflow.