How to Automate Design System Audits and Testing

Set up guide and 5 practical examples for any design team

Oct 24, 2025

👋 Get weekly insights, tools, and templates to help you build and scale design systems. More: Design Tokens Mastery Course / YouTube / My Linkedin

I’m not affiliated with or sponsored by any companies mentioned. Everything shared here is based on what I’ve learned and tested firsthand.

What you’ll learn:

How Playwright + AI agents can automate design (system) testing
The three specialized Playwright agents: Planner, Generator, Healer
5 useful examples: token audits, behavior testing, accessibility, docs validation, visual regression
Set up guide (you need 10 minutes)

You ship a button update.
Three months later, someone finds the old version still living in the checkout flow.

You deprecate a component.
It’s still there. With 12 variations you didn’t know existed.

Manual testing doesn’t scale.

Your design system grows. Your codebase grows. Your team grows.
But your time? Still stuck at 24 hours a day. 🫠

What if AI could:

Explore your app and automatically create test plans?
Write the actual test code for you?
Fix broken tests when your UI changes?

It can. Here’s how you can do it with Playwright.

What is Playwright?

Playwright is Microsoft’s browser automation tool. Think of it as a robot that can:

Open a real browser
Click buttons, fill forms, navigate pages
Test everything just like a human would
Run 100x faster than manual testing
Never get tired 😅

It’s been the industry standard for automated testing since 2020.

But here’s what changed in 2025:
Microsoft added AI to Playwright. Not just “AI-assisted” code completion.
Full AI agents that explore, write, and fix tests autonomously.

They call it Playwright Test Agents + Playwright MCP.

Who this is for

Before 2025:
Playwright was a developer tool. You needed to:

Write TypeScript/JavaScript
Understand CSS selectors
Debug async code
Maintain test suites manually

Design system teams had two options: wait for dev resources (slow) or learn to code (steep learning curve).

After AI Agents:
Designers can now:

Describe tests in plain English
Let AI write the code
Get automatic maintenance when UI changes
Own quality without engineering bottlenecks

✨ This isn’t about replacing developers. ✨ It’s about empowering design system teams to validate their work independently, ship faster, and catch issues before they need dev intervention.

What are Playwright test agents and MCP?

Two technologies working together:

1. Playwright MCP (Model Context Protocol)
The invisible infrastructure that connects AI assistants to browsers.

Uses structured data instead of screenshots
Fast, lightweight, deterministic
Works with Claude, Cursor, VS Code, and other AI tools

2. Playwright test agents
Three specialized AI agents that run on top of MCP:

🎭 Planner: Explores your app like a user, creates test plans in Markdown
🎭 Generator: Reads the plan, writes executable test code
🎭 Healer: Monitors tests, auto-fixes when they break

Here’s exactly how it works.

So, what is the fuss about the three agents?

🎭 Planner Agent

Opens your app like a real user
Clicks around, explores flows
Generates human-readable test plans (Markdown)
Identifies edge cases and scenarios

🎭 Generator Agent

Reads the test plan
Writes actual Playwright test code
Verifies selectors exist before writing assertions
Creates executable, production-ready tests

🎭 Healer Agent

Monitors test failures automatically
Inspects UI to find what changed
Patches selectors and assertions
Suggests fixes when manual intervention needed

Let’s set it up

Step 1: Install Playwright

Run this command in your terminal:

npm init playwright@latest

Pick TypeScript when asked, accept the defaults for everything else.

Step 2: Initialize Test Agents

If you use Cursor:

npx playwright init-agents --loop=claude

This creates three agent files that your AI assistant will use to help you test.

What gets created:

repo/
├── .github/
│   ├── planner-agent.md      # Planner instructions
│   ├── generator-agent.md    # Generator instructions
│   └── healer-agent.md       # Healer instructions
├── specs/                    # Where test plans live
├── tests/                    # Where actual tests live
└── playwright.config.ts

Step 3: Set up in Cursor or Claude Code

For Cursor:

Install the “Playwright” extension
Open .github/planner-agent.md
Your AI agent is now Playwright-aware

For Claude Code:

Point Claude to your project
Say: “Read the Playwright agent definitions.”
Claude now knows how to use all three agents

Step 4: Create a seed test

Create one simple test file to teach the agents how your app works.

This could be as simple as:

Navigate to your app’s homepage
Wait for it to load
Check that the main heading appears

The agents will copy this pattern when creating new tests.

How the agents work together

Step 1: Planner creates test plan

Your prompt:

“Planner: Audit all button components for consistent spacing and color tokens”

Planner explores your app and creates:

specs/button-audit.md:

# Button Component Audit

## Test Scenarios

### 1. Token Consistency
- Navigate to /components/buttons
- Find all button variants
- Check each button uses CSS variables (not hardcoded values)
- Verify spacing matches design tokens

### 2. Responsive Behavior
- Test buttons at 375px (mobile)
- Test buttons at 1440px (desktop)
- Ensure touch targets are 44x44px minimum

### 3. State Variations
- Default state
- Hover state
- Disabled state
- Focus state (keyboard navigation)

You can add instructions on where to put reports and tests. This part is totally up to you.

Step 2: Generator writes test code

Your prompt:

“Generator: Convert button-audit.md into Playwright tests”

It will generate code based on your markdown file.

Step 3: Healer fixes broken tests

You ship a UI update. Test breaks.

Healer automatically:

Detects the failure
Replays the test step-by-step
Inspects the new DOM structure
Finds equivalent selectors
Patches the test

5 Useful design system use cases 🙌

1. Token audit: catch hardcoded values

You have 100+ components across 15 pages. Some use design tokens. Others have hardcoded hex values, px spacing, or magic numbers scattered everywhere.

Prompt:

“Audit my prototype to find hardcoded values (colors, spacing, typography) that should use design tokens from our system.”

You will get an extensive report with suggestions in markdown (left image), which you can translate into HTML as well (right image).

The results in markdown and HTML

2. Behavior & state validation

Test dynamic components like dropdowns, modals, tooltips, and tabs.

Playwright can handle login flows, store session cookies, and run all tests while authenticated. You write the login steps once, and it runs automatically forever.

Prompt:

“Log into the app using my test credentials, then test all interactive components. Simulate real user interactions: click, hover, keyboard navigation. Verify focus management works correctly. Ensure Escape closes modals, Tab navigates properly, and Enter activates buttons.”

What it catches:

Broken keyboard navigation
Focus management issues
Missing keyboard shortcuts
State transitions that don’t work
Focus traps that leak
Components that behave differently when logged in/out

3. Accessibility audit (WCAG compliance)

We want every component to pass WCAG AA standards.

Prompt:

“Audit the entire design system for WCAG 2.1 AA compliance. Check color contrast, keyboard navigation, ARIA labels, and focus management. Group violations by severity.”

What Planner creates:

Full accessibility test plan
Component-by-component scenarios
Integration with axe-core

4. Documentation validation

Your docs look great. But do the code examples actually work? Are there console errors? Do the components render as expected?

Prompt:

“Crawl our design system documentation site. Verify all code examples render without console errors. Check that live preview components match their descriptions. Detect broken code snippets or missing imports.”

What it catches:

Broken code examples
Console errors in live previews
Missing imports or syntax errors
Components that don’t render
Broken internal links
Documentation drift from actual code

5. Visual Regression Testing

Catch unintended visual changes before they hit production.

Prompt:

“Create visual regression tests for every component variant in our design system. Capture baseline screenshots and fail if any visual differences exceed 0.1% pixel diff.”

What it catches:

Spacing changes that break the system
Color token mismatches
Theme switcher bugs
Visual regressions
Token application failures

Reality check

🕐 Time saved: 2 weeks of manual testing → hours

🙌 Impact: HUGE!

My recommendations

🙅‍♀️ Don’t skip the seed test.
Agents need to understand your setup. Spend 5 minutes on a good seed test, save hours later.

✅ Always review suggestions.

✅ Be specific (as always with AI).
“Audit buttons” is better than “Check the app.”

✅ Do start with one component type.
Buttons, inputs, or cards. Get the workflow down, then scale.

✅ Customize agent instructions.
Tailor them to your design system’s unique needs.

✅ Iterate on test plans.
Planner gets better as it learns your system.

Design systems aren’t just about creating components.
They’re about maintaining them at scale.

Useful Links

🔗 Playwright Test Agents

🔗 Playwright MCP server

🔗 Playwright Official Website

Stay tuned for more and enjoy exploring, ⚡️
Romina

— If you enjoyed this post, please tap the Like button below 💛 This helps me see what you want to read. Thank you.

💎 Community Gems

✨ Google AI Studio is out, and it is FREE 🔥

Unified Playground → access Gemini, GenMedia (Veo 3.1), text-to-speech, and Live models in one place without tab switching
Redesigned homepage → central command center for platform capabilities, updates, and quick project access
Real-time rate limit page → clear visibility into usage and limits to avoid surprises
Google Maps grounding → integrate real-world location data directly into AI workflows

🔗 Link

When AI Meets Design Systems: My Storybook Webinar Recap

A behind-the-scenes look at my recent Storybook webinar on AI-powered design systems, including the lessons, surprises, and why this conversation matters.

From TJ Pitre

🔗 Link

Discussion about this post

Ready for more?