INP Regression From Sentry Session Replay 9 on Mobile

PerformanceCore Web VitalsSentry

May 16, 20266 min read1089 words

The Problem

I ran into this on a client SaaS dashboard last week. They upgraded @sentry/nextjs from 8.x to 9.x for the new tracing integration. The next morning, their Core Web Vitals report from CrUX showed mobile INP jumping from 162ms (Good) to 412ms (Poor) in a single 24-hour window. Lab data from Lighthouse was fine. WebPageTest on a desktop was fine. Real users on mid-range Android devices were the only ones being measured, and they were the only ones seeing the regression.

The dashboard had not changed. No new third-party scripts, no design refactor, no React updates. The only deploy in the window was the Sentry bump. Disabling Session Replay in production brought INP back to 158ms within a day. That confirmed the source, but you do not want to ship without session replay on a paid SaaS product, so I had to find the actual fix.

Why It Happens

Sentry Session Replay 9 ships a new default for the rrweb DOM mutation observer. In 8.x, mutation batching ran on a 50ms debounce with a hard cap of 200 mutations per flush. In 9.x, the default flipped to a "fidelity-first" sampler that flushes on every input event and every 16ms animation frame to capture smoother replay timelines.

The animation-frame flush is the part that wrecks INP on mobile. On a desktop, the main thread has enough headroom that a 16ms task does not push interaction latency past the threshold. On a mid-range Android device with a slow CPU, the same flush runs in 80-140ms and lands inside the same task as the click handler that triggered the interaction. INP is measured as the longest task from input to next paint. A 120ms rrweb flush plus a 60ms React render plus paint puts you well past 200ms on every other tap.

Three things make this hard to spot:

Lab tests miss it entirely. Lighthouse runs in a desktop Chrome instance with throttling applied to network and CPU separately, but rrweb's flush cost scales with DOM size, not CPU throttle multiplier. Lab Lighthouse shows the same INP as before.
The Sentry SDK keeps replay disabled in dev by default. You will never see this number in npm run dev. It only appears in production where replaysSessionSampleRate and replaysOnErrorSampleRate are non-zero.
The regression only shows up on the third or fourth interaction. The first input is fine because the mutation buffer is empty. Each subsequent tap adds to the buffer; once it crosses the buffer threshold, every flush gets expensive. Real-user monitoring picks this up because real users do more than one tap; synthetic tests stop at one.

The web.dev INP documentation explains the measurement, and Sentry's Session Replay performance tuning covers the new defaults.

The Fix

You need to bring back the 8.x-style batching, lower the sample rate on slow devices, and prove the change worked with field data before you trust it. Lab numbers will not tell you.

Step 1: Configure the replay integration with explicit batching. In your sentry.client.config.ts:

import * as Sentry from '@sentry/nextjs'

const isSlowDevice =
  typeof navigator !== 'undefined' &&
  (navigator.hardwareConcurrency ?? 8) <= 4

Sentry.init({
  dsn: process.env.NEXT_PUBLIC_SENTRY_DSN,
  replaysSessionSampleRate: isSlowDevice ? 0.0 : 0.1,
  replaysOnErrorSampleRate: 1.0,
  integrations: [
    Sentry.replayIntegration({
      maskAllText: false,
      blockAllMedia: false,
      mutationLimit: 200,
      mutationBreadcrumbLimit: 250,
      flushMinDelay: 5000,
      flushMaxDelay: 15000,
      _experiments: {
        captureMutationsManually: false,
      },
    }),
  ],
})

The two pieces that matter for INP are mutationLimit and flushMinDelay. mutationLimit: 200 reinstates the 8.x cap that bails out of recording when a single tick produces too many DOM changes. flushMinDelay: 5000 pushes the buffer flush off the input-handling task entirely. The replay stays useful for debugging; it just stops trying to capture every animation frame.

Step 2: Gate Session Replay by device class. Mid-range Android devices report hardwareConcurrency <= 4 and are the entire source of the regression in the field data I have seen. Cutting replay off for those users brings the p75 mobile INP back to where it was. Power users on flagship Android and all iOS devices still get session replay.

Step 3: Move the SDK initialisation to load after interactions are unblocked. Sentry's default loader runs at the top of _app.tsx, which means the integration is initialising during the first paint. Defer it:

// app/components/sentry-loader.tsx
'use client'
import { useEffect } from 'react'

export function SentryLoader() {
  useEffect(() => {
    const idle =
      'requestIdleCallback' in window
        ? window.requestIdleCallback
        : (cb: () => void) => setTimeout(cb, 1)
    idle(() => {
      import('@/sentry.client.config')
    })
  }, [])
  return null
}

Mount <SentryLoader /> once in your root layout. The integration now boots after first paint, so it never competes with the initial input. You lose error capture on the very first interaction; if that is unacceptable, ship the bare error capture eagerly and lazy-load only the replay integration.

Step 4: Validate with real field data. Lab data is not going to confirm this. Watch CrUX in PageSpeed Insights or pull the Chrome UX Report from BigQuery for the 72 hours after deploy:

SELECT
  p75_inp,
  fast_inp_density,
  slow_inp_density
FROM `chrome-ux-report.country_us.202605`
WHERE origin = 'https://example.com'
  AND form_factor.name = 'phone'

INP regressions take 72 hours to show in the 28-day rolling window, so do not panic if PageSpeed Insights still shows the bad number on day one. Real-user monitoring from Vercel Speed Insights or Sentry's own performance tab shows the change within an hour.

The Lesson

Session replay tools record DOM mutations, and that costs main-thread time on every interaction. When the default sampler turns more aggressive, the cost lands on input handlers and tanks INP on slow devices. Set mutationLimit and flushMinDelay, gate the integration by hardware concurrency, defer init past first paint, and trust field data over lab tests.

If you have a Core Web Vitals regression you cannot reproduce in Lighthouse but real users are seeing, that is the kind of investigation I do. See my services. For another mobile INP gotcha I wrote about recently, see INP regression from GTM third-party tags.

Back to blog Start a project