The Problem
A client pinged me because their Search Console coverage report had quietly doubled in crawled URLs, and the new ones all looked like this:
https://example.com/blog/some-post?_rsc=1f9k2
https://example.com/products/widget?_rsc=8a3qz
https://example.com/?_rsc=0b71m
Every one of them was flagged either Alternate page with proper canonical tag or Duplicate, Google chose different canonical than user. Crawl stats were climbing, real pages were getting crawled less often, and a handful of important URLs had slipped out of the index entirely. The site is a standard Next.js App Router build on Vercel, and nobody had ever written a link with ?_rsc= in it.
The pages themselves are fine. Canonicals are correct, the rendered HTML is clean. Googlebot was just discovering and crawling a parallel universe of ?_rsc= versions of every page on the site.
Why It Happens
The App Router <Link> component prefetches. When a link scrolls into view, Next.js fetches the React Server Components payload for the destination so the navigation feels instant. To do that it requests the same URL with a ?_rsc= query parameter appended and an RSC: 1 request header set. The query param is a cache buster so the RSC payload is not served from an HTML cache.
That mechanism is meant to be invisible to crawlers. The problem is it is not. Googlebot renders pages, sees the prefetch traffic, and discovers those ?_rsc= URLs as ordinary links. It then crawls them as GET requests. Because Googlebot does not send the RSC: 1 header, the server returns the full HTML page rather than the RSC payload, so Google ends up with what looks like a complete duplicate of the canonical page sitting on a different URL.
The page's canonical tag does point back to the clean URL, which is why most of these land in the "alternate page with proper canonical" bucket rather than getting indexed outright. But it still burns crawl budget, and on larger sites it dilutes how often Google crawls the URLs you actually care about. There is a long-running Next.js discussion tracking the _rsc Search Console issue, and the short version is that the parameter is baked into how RSC prefetch works, so you have to handle it at the edge rather than wait for a config flag.
The Fix
Two layers. Tell crawlers not to crawl ?_rsc= URLs, and back it up with a header so anything that slips through is not indexed.
Step 1: Disallow the parameter in robots.txt. If you generate robots dynamically, add a rule in app/robots.ts:
import type { MetadataRoute } from 'next'
export default function robots(): MetadataRoute.Robots {
return {
rules: {
userAgent: '*',
allow: '/',
disallow: '/*?*_rsc=',
},
sitemap: 'https://example.com/sitemap.xml',
}
}
The /*?*_rsc= pattern matches any path that carries an _rsc query parameter, so Googlebot stops requesting them. This is the single highest-impact change and it takes effect as soon as Google refetches robots.txt.
Step 2: Add a noindex header for any _rsc request that is not an actual RSC prefetch. robots.txt stops well-behaved crawling, but URLs already discovered can linger. A header makes the intent unambiguous. In middleware.ts (or proxy.ts on Next 16), tag any request that carries the _rsc param without the matching RSC header:
import { NextRequest, NextResponse } from 'next/server'
export function middleware(request: NextRequest) {
const hasRscParam = request.nextUrl.searchParams.has('_rsc')
const isRealPrefetch = request.headers.get('RSC') === '1'
// A genuine prefetch always sends RSC: 1. A crawler hitting the
// bare ?_rsc= URL does not, so mark that response noindex.
if (hasRscParam && !isRealPrefetch) {
const response = NextResponse.next()
response.headers.set('X-Robots-Tag', 'noindex')
return response
}
return NextResponse.next()
}
export const config = {
matcher: '/((?!_next/static|_next/image|favicon.ico).*)',
}
The logic is the part that matters: a real prefetch from the <Link> component always carries RSC: 1, so it is untouched and prefetch still works. A crawler requesting the naked ?_rsc= URL has no such header, so it gets X-Robots-Tag: noindex and Google drops it from the index on the next crawl.
Step 3: Confirm your canonical is self-referencing and absolute. Each page should declare its own clean URL as canonical so the signal is consistent. In the route's metadata:
export const metadata = {
alternates: {
canonical: 'https://example.com/blog/some-post',
},
}
Step 4: Verify the header is firing. Request a _rsc URL the way a crawler would, with no RSC header, and check for the noindex tag:
curl -sI "https://example.com/?_rsc=test" | grep -i x-robots-tag
# expect: x-robots-tag: noindex
Then request it as a real prefetch and confirm the tag is absent:
curl -sI -H "RSC: 1" "https://example.com/?_rsc=test" | grep -i x-robots-tag
# expect: no output
After deploying, use the URL Inspection tool in Search Console on a couple of the ?_rsc= URLs and request validation. Google reprocesses them, sees the disallow and the noindex, and the count starts dropping over the following weeks.
The Lesson
Next.js App Router prefetch appends ?_rsc= to every link target, and Googlebot crawls those URLs as duplicate pages, wasting crawl budget and muddying your canonical signals. Disallow *?*_rsc= in robots.txt, add a conditional X-Robots-Tag: noindex for _rsc requests that lack the RSC: 1 header, and keep canonicals self-referencing. Prefetch keeps working; the crawl noise goes away.
If your Search Console coverage report is filling up with URLs you never created, that is the kind of technical SEO cleanup I handle on client sites. See my services. For a related canonical headache, read fixing duplicate canonical errors in Search Console.
Search Console drowning in URLs you never made? I can help.
