Why your Playwright tests break every sprint (and it's not Playwright's fault)

If you're a Lead QA reading this on a Monday morning, there's a decent chance you've already spent 30 minutes fixing a test that worked Friday afternoon. The selector pointed at .btn-primary >> nth=2. A dev moved a button. CI is red. You sigh, open DevTools, build a new path, push, merge. You'll do it again next sprint.

Most teams in this situation eventually conclude that "Playwright is flaky". They consider switching to Cypress. Or to Selenium. Or, dramatically, to "no e2e at all, we'll just do unit tests."

That's the wrong diagnosis. Playwright didn't change. Cypress didn't change. The DOM did. And your selectors were tied to things that were never contract material.

This article walks through the real culprits, with names, examples, and a verdict on each.

The mental model: contractual vs accidental selectors

Before naming names, here's the frame that makes everything else click.

A selector is contractual when both sides (the dev who writes the markup, the QA who writes the test) explicitly agree that this string represents a stable handle. Renaming it requires a coordinated change. Both sides own the contract.

A selector is accidental when it works today because of how the markup happens to be structured, but nobody signed up to keep it stable. The CSS class .btn-primary is accidental. The third button child is accidental. The text "Submit" is accidental (the marketing team can rename it to "Save changes" tomorrow without telling you).

The vast majority of your test breakage isn't Playwright misbehaving. It's accidental selectors becoming false at the next refactor.

Culprit 1: CSS class selectors

What it looks like:

await page.locator('.btn-primary').click()
await page.locator('.user-row > .actions > button').click()

Why it breaks:

CSS classes have one job, and it's not testing. They exist to apply styles. The day a designer decides "primary buttons should now use the variant .btn-action-emphasis", every test using .btn-primary breaks. The day a frontend dev migrates from BEM to Tailwind, all your class-based selectors die at once.

Worse: framework migrations (Vue 2 to Vue 3, CSS-in-JS to vanilla CSS, Tailwind v3 to v4) often shuffle generated class names. Your tests don't survive the upgrade.

Verdict: weak. Acceptable as a temporary patch when nothing else is available, but it's borrowing time.

Culprit 2: nth-child / nth-of-type

What it looks like:

await page.locator('table tr:nth-child(3) td:nth-child(4) button').click()
await page.locator('div.cards > div:nth-of-type(2)').click()

Why it breaks:

Position-based selectors couple your test to the visual order of elements. The day a dev adds a new column, swaps two rows for accessibility, or wraps a section in an extra div, the index shifts and the test points at the wrong element.

The truly insidious failure mode is the silent one: the test still passes, but it's now clicking on a different button than intended. Your suite turns green while testing nothing useful.

Verdict: weakest of all. Avoid even as a temporary patch.

Culprit 3: text content

What it looks like:

await page.getByText('Submit').click()
await page.getByRole('button', { name: 'Save' }).click()

Why it breaks:

Text-based selectors are tightly coupled to copy. Three real-world examples:

The marketing team rebrands "Submit" to "Save changes" because user research showed it converts better.
The product is internationalized. Your tests run in EN but suddenly someone runs the suite against the FR build. All getByText('Submit') break.
A copy review removes a comma. getByText('Submit, please') becomes getByText('Submit please').

Worth noting: getByRole({ name: 'Submit' }) is better than raw getByText because the role itself is structural and won't move with copy. But the name part still matches against accessible label, which is often the visible text. Same risk, slightly mitigated.

Verdict: usable for crucial CTAs that you control the copy of, fragile elsewhere. Treat as middle-tier.

Culprit 4: auto-generated IDs

What it looks like:

await page.locator('#mui-component-select-3847').click()
await page.locator('#radix-:r19:').click()

Why it breaks:

Modern UI libraries (Material UI, Radix, headless UI, etc.) generate IDs at runtime. They look stable in DevTools but they're regenerated on every render. The number after mui-component-select- increments every time another component mounts before this one. Your test passes locally and fails in CI because the order of mounts differs.

Verdict: trap. They look like real IDs but behave like nth-child.

Culprit 5: long XPath

What it looks like:

await page.locator('xpath=//*[@id="root"]/div/div[2]/main/section[3]/form/button[2]').click()

Why it breaks:

Long XPath compounds every fragility above. It depends on tree structure, on positions, on element types, on attribute values. Any of those changes invalidates the path. And worst of all: long XPath is what test recorders generate when they can't find a stable identifier, which means it's a signal that no stable identifier exists on this element.

Your test recorder isn't being lazy. It's telling you the dev didn't make this element testable.

Verdict: code smell at the element level, not at the test level. The fix isn't a better XPath, it's a better element.

What "contractual" looks like

Three patterns survive refactors, framework migrations, and copy changes:

Selector	Why it survives
`data-testid` (or `data-cy`, `data-test`)	Purpose-built for tests. Nobody styles it, localizes it, or renames it for marketing reasons.
Hand-written stable `id`	A real `id` that the dev treats as a contract (not auto-generated by a library).
`aria-label` owned by the accessibility team	The a11y team has a vested interest in keeping it stable. Bonus: improves a11y at the same time.

If your selector strategy doesn't lean primarily on those three, your test suite is on borrowed time.

The fix isn't more locator engineering. It's testability culture.

The trap most teams fall into when they realize their selectors are fragile: they spend a week refactoring tests to use "smarter" CSS paths or more flexible XPath. They feel productive. The fragility comes back next sprint.

The real fix is less glamorous. It's making testability a shared concern between dev and QA, with three rules:

New code adds data-testid (or aria-label) on every interactive element that QA actually touches. Reviewers reject PRs that don't.
When a test breaks, the fix isn't a better selector. It's asking the dev to add a stable handle, then updating the test. One ticket, two minutes of dev time, permanent fix.
Use a tool to audit what's missing instead of inspecting every element by hand. TestID Hunter records a QA session, ranks every interacted element by selector stability (Solid / Usable / Weak), and generates a ready-to-paste ticket for the dev: "add these data-testid to component X to stabilize the signup flow."

That third point matters because manual auditing doesn't scale past 5-6 flows. You'll skip it, hit Monday morning chaos, and conclude "Playwright is flaky" all over again.

The bottom line

Stop looking for a better framework. Start looking for a better contract.

A selector that survives the next refactor is the one your dev signed up to keep stable. Everything else, no matter how clever, is borrowing time from your future self.