The Problem
Logged into Google Search Console for a client last week and found 643 URLs sitting under Indexed, not submitted in sitemap. The site has roughly 180 product pages and 40 marketing pages. The submitted sitemap.xml contained exactly those 220 URLs. The other 423 had been crawled, indexed, and were quietly competing with the canonical product pages for the same keywords.
A few of them were obvious: trailing-slash variants of pages we serve without a slash, lowercase variants of routes that should have been case-normalised. Most of them were not. There were faceted-search URLs like /shop?colour=red&size=42&sort=price-asc, paginated archive pages from a category we had not enabled pagination on, and a long tail of ?utm_source= parameter combinations that should have collapsed to the canonical. None of these were in the sitemap. All of them were indexed. Search Console was correctly flagging the inconsistency, and the report was telling us we had three problems at once: a sitemap that did not match reality, internal links pointing at non-canonical URLs, and a canonical strategy that was being ignored.
I ran into this on a client project where the App Router migration had moved the sitemap generation from a static file to app/sitemap.ts, and nobody had checked which URLs Google was actually finding versus the ones we listed.
Why It Happens
"Indexed, not submitted in sitemap" is the wrong report to panic about until you read it carefully. It does not mean Google is doing something wrong. It means Google found URLs on your site through internal links, external links, or its own historical crawl data that you did not list as canonical in your sitemap. Three things tend to cause it on Next.js sites.
First, the sitemap only enumerates the canonical routes, but the site links to non-canonical variants. A product card on the listing page links to /product/blue-shirt?ref=home, the page renders fine, the <link rel="canonical"> points to /product/blue-shirt, and Google ends up indexing both. The ref-tagged URL is technically a duplicate but Google treats them as separate URLs until it consolidates them, and during the window of inconsistency they appear in this report.
Second, the App Router serves both the trailing-slash and the non-slash versions of a route by default unless you configure trailingSlash explicitly in next.config.ts. Whichever variant gets discovered first ends up in the index, and the sitemap usually only lists one form. Same content, two URLs, one in the sitemap, the other in the report.
Third, search and filter pages with query parameters are crawlable by default. The robots policy on /shop allows crawling, the link from the navigation includes a default filter as a parameter, and Google crawls the parameterised URL and indexes it as a separate page. The canonical tag is honoured only if Google believes it. With low-quality signal and inconsistent internal links, Google often ignores the canonical and keeps both.
The Search Console help on this status is short and underestimates the diagnostic work involved. The status itself is informational, but the URLs underneath it are a list of holes in your canonical strategy.
The Fix
Three things to do, in order. Find what Google actually found, decide which version is canonical, then close the loop with the sitemap and the canonical tag.
Step 1: Generate the sitemap from the same source of truth as your routes. On the App Router, app/sitemap.ts should enumerate every URL you actually want indexed, derived from the database, not hand-maintained:
import type { MetadataRoute } from 'next'
import { db } from '@/lib/db'
export default async function sitemap(): Promise<MetadataRoute.Sitemap> {
const products = await db.from('products')
.select('slug, updated_at')
.eq('status', 'published')
const base = 'https://example.com'
const productEntries = (products.data ?? []).map((p) => ({
url: `${base}/product/${p.slug}`,
lastModified: new Date(p.updated_at),
changeFrequency: 'weekly' as const,
priority: 0.8,
}))
const staticEntries = [
{ url: `${base}/`, priority: 1.0 },
{ url: `${base}/services`, priority: 0.7 },
{ url: `${base}/blog`, priority: 0.6 },
]
return [...staticEntries, ...productEntries]
}
This is one URL per canonical resource, no parameters, no trailing slash variants. The list mirrors the database, so when you publish a new product it appears in the sitemap on the next request, and when you unpublish one it disappears.
Step 2: Pin a trailing-slash policy and a canonical on every page. In next.config.ts, decide once and serve only that form:
import type { NextConfig } from 'next'
const config: NextConfig = {
trailingSlash: false,
}
export default config
Then in your route's generateMetadata, emit a self-referencing canonical with the same trailing-slash policy and no query parameters:
export async function generateMetadata({ params }: Props) {
const { slug } = await params
return {
alternates: {
canonical: `https://example.com/product/${slug}`,
},
}
}
The canonical is a hint Google chooses to follow. If your internal links also point at this exact URL form, Google has no signal that the parameterised variants are anything other than tracking duplicates, and consolidation kicks in.
Step 3: Block crawling on faceted search URLs. Anything that produces a permutation of query parameters but does not represent unique indexable content goes in robots.txt:
User-agent: *
Disallow: /shop?
Disallow: /search?
Disallow: /*?utm_
Allow: /shop$
Allow: /search$
The $ anchor allows the parameter-less base path. The ? prefix on Disallow blocks any URL with a query string at those paths. UTM-tagged variants of any URL on the site are blocked outright, which stops the cycle of indexing-then-canonicalising every campaign landing variant.
Verify the fix is closing the gap. Two weeks after deploying, recheck the report. The number of URLs in Indexed, not submitted in sitemap should be trending down as Google reprocesses the canonicals and consolidates the duplicates. To force a faster check on a single URL, use the URL Inspection tool in Search Console, hit Test live URL, and look for Canonical: User-declared, matches Google-selected. If Google-selected differs from your declared canonical, your signals are not consistent yet — usually a leftover internal link pointing at the wrong form, which you can find with a site search like site:example.com inurl:utm_source.
The Lesson
The report is a list of URLs Google found that your sitemap did not claim. Most of the time it is internal links pointing at non-canonical variants, query-parameter pages that should be blocked, or trailing-slash inconsistencies. Build the sitemap from the database, pin a trailing-slash policy, emit a self-referencing canonical, and block the parameter permutations in robots.txt. The report drains over a couple of weeks.
If your Search Console coverage is a mess after an App Router migration and you need somebody to read the canonicals against the sitemap and clean it up, that is a job I do. See my services. For a related canonical issue, read Duplicate canonical in Search Console.
Search Console flagging hundreds of off-sitemap URLs? Let me clean it up.
