The Problem
A SaaS client pinged me last Friday: their blog had published 84 posts in the last quarter, every one well-written, properly tagged, schema-valid. Google Search Console showed 71 of them parked under Discovered – currently not indexed. Not crawled and rejected, never crawled at all. The URL Inspection tool confirmed it: "URL is known to Google but not currently indexed", last crawled date empty.
If you have a sitemap full of URLs that GSC has acknowledged but never visited, you are looking at a crawl-budget signal, not a quality signal. The fix is mechanical and I have run it on five client sites this year.
Why It Happens
"Discovered – currently not indexed" means Googlebot found the URL (usually through your sitemap or an internal link) but decided not to spend budget fetching it. Two things drive that decision:
- Predicted low value. Google's crawl scheduler runs a cheap pre-fetch model that scores URLs by similarity to URLs it has already indexed. New URLs on a site with a thin link graph get scored low and queued behind everything else. If the site has thousands of low-value templated pages (tag archives, paginated category pages, parameter URLs), the scheduler de-prioritises genuinely new posts.
- Server response weight. Googlebot caps the per-host fetch rate by response time and error rate. If your server takes 1500ms+ TTFB, returns soft 404s, or serves a chain of redirects, the scheduler drops the priority. I measured one client site at 2.4s mean TTFB. Googlebot was happy to discover URLs but very unhappy to fetch them.
The second one compounds. A slow server pushes the queue back, the scheduler spreads the budget across other hosts, and new URLs sit in "Discovered" indefinitely.
A third factor worth ruling out: a noindex header served only to Googlebot because of a mis-configured CDN rule or a broken bot-detection plugin. Always check the rendered HTML through GSC's URL Inspection (Live Test, not the cached view).
The Fix
Step 1: Confirm Googlebot can fetch a sample URL fast. Run a real-world fetch test with the Googlebot user agent and look at TTFB plus the rendered status:
curl -s -o /dev/null \
-H "User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" \
-w "ttfb=%{time_starttransfer}s total=%{time_total}s code=%{http_code} redirects=%{num_redirects}\n" \
https://example.com/blog/your-stuck-url/
Anything above ttfb=0.6s is a problem. If redirects is non-zero, fix the redirect at the canonical level so Googlebot hits a 200 directly. The Vercel Speed Insights docs cover capturing real-user TTFB if you want to confirm the fix on production traffic.
Step 2: Cut the low-value URLs from the sitemap. If your sitemap contains tag archives, paginated lists, attachment pages, or parameter URLs, you are diluting your discovery budget. In WordPress this is one Yoast or Rank Math setting plus a robots.txt rule. For a custom sitemap on Next.js, return only canonical content URLs:
// app/sitemap.ts
import type { MetadataRoute } from "next";
import { getPosts } from "@/lib/posts";
export default async function sitemap(): Promise<MetadataRoute.Sitemap> {
const posts = await getPosts({ status: "published", noindex: false });
return posts
.filter((p) => p.canonical === p.url)
.map((p) => ({
url: p.url,
lastModified: p.updatedAt,
changeFrequency: "weekly",
priority: p.featured ? 0.9 : 0.6,
}));
}
The filter on canonical === url is the key. Never list a URL whose <link rel="canonical"> points elsewhere. That single filter cleared 31 of 71 stuck URLs on the SaaS client.
Step 3: Build internal links from already-indexed pages. This is the single highest-leverage move. For every stuck post, find three already-indexed pages that should naturally link to it and add the link in the body, not the footer or sidebar. A link from your homepage or a top-trafficked guide is worth more than fifty sitemap entries.
I script the audit in WP-CLI on WordPress sites:
wp eval '
$stuck = [ "/blog/post-a/", "/blog/post-b/" ]; // your Discovered URLs
foreach ( $stuck as $path ) {
$count = 0;
$posts = get_posts( [ "post_status" => "publish", "posts_per_page" => -1 ] );
foreach ( $posts as $p ) {
if ( strpos( $p->post_content, $path ) !== false ) {
$count++;
}
}
echo "$path -> $count internal links\n";
}
'
Anything below three internal links is a candidate to receive more.
Step 4: Issue an Indexing API ping for the highest-priority URLs. Google's Indexing API is officially for JobPosting and BroadcastEvent schema, but the URL submission endpoint reliably accelerates crawl scheduling on regular content if used sparingly. Submit the top 10 URLs, not all 71:
import { google } from "googleapis";
const auth = new google.auth.GoogleAuth({
credentials: JSON.parse(process.env.INDEXING_API_KEY!),
scopes: ["https://www.googleapis.com/auth/indexing"],
});
const indexing = google.indexing({ version: "v3", auth });
await indexing.urlNotifications.publish({
requestBody: {
url: "https://example.com/blog/your-stuck-url/",
type: "URL_UPDATED",
},
});
Hammering this with hundreds of URLs trips the per-day quota. Ten URLs per day for a week is the cadence that works.
Step 5: Recheck after seven days, not seven hours. GSC's Page Indexing report updates on a 2–3 day rolling window. Refreshing it on day one and seeing the same numbers is normal. If the count has not dropped after seven days of clean fetches, your TTFB or redirect chain is still dirty, go back to Step 1.
The Lesson
"Discovered – currently not indexed" is almost always a crawl-budget problem caused by slow responses and a bloated sitemap, not a content problem. Cut the dead URLs out of the sitemap, lower TTFB, build real internal links from indexed pages to the stuck ones, and use the Indexing API surgically. The backlog clears within a fortnight on healthy sites.
If your editorial calendar is shipping content into a black hole because Google never crawls it, this is the kind of technical SEO work I do — see my services. For a related symptom that needs a different fix, see Crawled currently not indexed GSC fix.