How to Automate Design System Audits and Testing
Set up guide and 5 practical examples for any design team
đ Get weekly insights, tools, and templates to help you build and scale design systems. More: Design Tokens Mastery Course / YouTube / My Linkedin
Iâm not affiliated with or sponsored by any companies mentioned. Everything shared here is based on what Iâve learned and tested firsthand.
What youâll learn:
How Playwright + AI agents can automate design (system) testing
The three specialized Playwright agents: Planner, Generator, Healer
5 useful examples: token audits, behavior testing, accessibility, docs validation, visual regression
Set up guide (you need 10 minutes)
You ship a button update.
Three months later, someone finds the old version still living in the checkout flow.
You deprecate a component.
Itâs still there. With 12 variations you didnât know existed.
Manual testing doesnât scale.
Your design system grows. Your codebase grows. Your team grows.
But your time? Still stuck at 24 hours a day. đŤ
What if AI could:
Explore your app and automatically create test plans?
Write the actual test code for you?
Fix broken tests when your UI changes?
It can. Hereâs how you can do it with Playwright.
What is Playwright?
Playwright is Microsoftâs browser automation tool. Think of it as a robot that can:
Open a real browser
Click buttons, fill forms, navigate pages
Test everything just like a human would
Run 100x faster than manual testing
Never get tired đ
Itâs been the industry standard for automated testing since 2020.
But hereâs what changed in 2025:
Microsoft added AI to Playwright. Not just âAI-assistedâ code completion.
Full AI agents that explore, write, and fix tests autonomously.
They call it Playwright Test Agents + Playwright MCP.
Who this is for
Before 2025:
Playwright was a developer tool. You needed to:
Write TypeScript/JavaScript
Understand CSS selectors
Debug async code
Maintain test suites manually
Design system teams had two options: wait for dev resources (slow) or learn to code (steep learning curve).
After AI Agents:
Designers can now:
Describe tests in plain English
Let AI write the code
Get automatic maintenance when UI changes
Own quality without engineering bottlenecks
⨠This isnât about replacing developers. ⨠Itâs about empowering design system teams to validate their work independently, ship faster, and catch issues before they need dev intervention.
What are Playwright test agents and MCP?
Two technologies working together:
1. Playwright MCP (Model Context Protocol)
The invisible infrastructure that connects AI assistants to browsers.
Uses structured data instead of screenshots
Fast, lightweight, deterministic
Works with Claude, Cursor, VS Code, and other AI tools
2. Playwright test agents
Three specialized AI agents that run on top of MCP:
đ Planner: Explores your app like a user, creates test plans in Markdown
đ Generator: Reads the plan, writes executable test code
đ Healer: Monitors tests, auto-fixes when they break
Hereâs exactly how it works.
So, what is the fuss about the three agents?
đ Planner Agent
Opens your app like a real user
Clicks around, explores flows
Generates human-readable test plans (Markdown)
Identifies edge cases and scenarios
đ Generator Agent
Reads the test plan
Writes actual Playwright test code
Verifies selectors exist before writing assertions
Creates executable, production-ready tests
đ Healer Agent
Monitors test failures automatically
Inspects UI to find what changed
Patches selectors and assertions
Suggests fixes when manual intervention needed
Letâs set it up
Step 1: Install Playwright
Run this command in your terminal:
npm init playwright@latestPick TypeScript when asked, accept the defaults for everything else.
Step 2: Initialize Test Agents
If you use Cursor:
npx playwright init-agents --loop=claudeThis creates three agent files that your AI assistant will use to help you test.
What gets created:
repo/
âââ .github/
â âââ planner-agent.md # Planner instructions
â âââ generator-agent.md # Generator instructions
â âââ healer-agent.md # Healer instructions
âââ specs/ # Where test plans live
âââ tests/ # Where actual tests live
âââ playwright.config.ts
Step 3: Set up in Cursor or Claude Code
For Cursor:
Install the âPlaywrightâ extension
Open
.github/planner-agent.mdYour AI agent is now Playwright-aware
For Claude Code:
Point Claude to your project
Say: âRead the Playwright agent definitions.â
Claude now knows how to use all three agents
Step 4: Create a seed test
Create one simple test file to teach the agents how your app works.
This could be as simple as:
Navigate to your appâs homepage
Wait for it to load
Check that the main heading appears
The agents will copy this pattern when creating new tests.
How the agents work together
Step 1: Planner creates test plan
Your prompt:
âPlanner: Audit all button components for consistent spacing and color tokensâ
Planner explores your app and creates:
specs/button-audit.md:
# Button Component Audit
## Test Scenarios
### 1. Token Consistency
- Navigate to /components/buttons
- Find all button variants
- Check each button uses CSS variables (not hardcoded values)
- Verify spacing matches design tokens
### 2. Responsive Behavior
- Test buttons at 375px (mobile)
- Test buttons at 1440px (desktop)
- Ensure touch targets are 44x44px minimum
### 3. State Variations
- Default state
- Hover state
- Disabled state
- Focus state (keyboard navigation)
You can add instructions on where to put reports and tests. This part is totally up to you.
Step 2: Generator writes test code
Your prompt:
âGenerator: Convert button-audit.md into Playwright testsâ
It will generate code based on your markdown file.
Step 3: Healer fixes broken tests
You ship a UI update. Test breaks.
Healer automatically:
Detects the failure
Replays the test step-by-step
Inspects the new DOM structure
Finds equivalent selectors
Patches the test
5 Useful design system use cases đ
1. Token audit: catch hardcoded values
You have 100+ components across 15 pages. Some use design tokens. Others have hardcoded hex values, px spacing, or magic numbers scattered everywhere.
Prompt:
âAudit my prototype to find hardcoded values (colors, spacing, typography) that should use design tokens from our system.â
You will get an extensive report with suggestions in markdown (left image), which you can translate into HTML as well (right image).
The results in markdown and HTML
2. Behavior & state validation
Test dynamic components like dropdowns, modals, tooltips, and tabs.
Playwright can handle login flows, store session cookies, and run all tests while authenticated. You write the login steps once, and it runs automatically forever.
Prompt:
âLog into the app using my test credentials, then test all interactive components. Simulate real user interactions: click, hover, keyboard navigation. Verify focus management works correctly. Ensure Escape closes modals, Tab navigates properly, and Enter activates buttons.â
What it catches:
Broken keyboard navigation
Focus management issues
Missing keyboard shortcuts
State transitions that donât work
Focus traps that leak
Components that behave differently when logged in/out
3. Accessibility audit (WCAG compliance)
We want every component to pass WCAG AA standards.
Prompt:
âAudit the entire design system for WCAG 2.1 AA compliance. Check color contrast, keyboard navigation, ARIA labels, and focus management. Group violations by severity.â
What Planner creates:
Full accessibility test plan
Component-by-component scenarios
Integration with axe-core
4. Documentation validation
Your docs look great. But do the code examples actually work? Are there console errors? Do the components render as expected?
Prompt:
âCrawl our design system documentation site. Verify all code examples render without console errors. Check that live preview components match their descriptions. Detect broken code snippets or missing imports.â
What it catches:
Broken code examples
Console errors in live previews
Missing imports or syntax errors
Components that donât render
Broken internal links
Documentation drift from actual code
5. Visual Regression Testing
Catch unintended visual changes before they hit production.
Prompt:
âCreate visual regression tests for every component variant in our design system. Capture baseline screenshots and fail if any visual differences exceed 0.1% pixel diff.â
What it catches:
Spacing changes that break the system
Color token mismatches
Theme switcher bugs
Visual regressions
Token application failures
Reality check
đ Time saved: 2 weeks of manual testing â hours
đ Impact: HUGE!
My recommendations
đ
ââď¸ Donât skip the seed test.
Agents need to understand your setup. Spend 5 minutes on a good seed test, save hours later.
â Always review suggestions.
â
Be specific (as always with AI).
âAudit buttonsâ is better than âCheck the app.â
â
Do start with one component type.
Buttons, inputs, or cards. Get the workflow down, then scale.
â
Customize agent instructions.
Tailor them to your design systemâs unique needs.
â
Iterate on test plans.
Planner gets better as it learns your system.
Design systems arenât just about creating components.
Theyâre about maintaining them at scale.
Useful Links
đ Playwright Official Website
Stay tuned for more and enjoy exploring, âĄď¸
Romina
â If you enjoyed this post, please tap the Like button below đ This helps me see what you want to read. Thank you.
đ Community Gems
⨠Google AI Studio is out, and it is FREE đĽ
Unified Playground â access Gemini, GenMedia (Veo 3.1), text-to-speech, and Live models in one place without tab switching
Redesigned homepage â central command center for platform capabilities, updates, and quick project access
Real-time rate limit page â clear visibility into usage and limits to avoid surprises
Google Maps grounding â integrate real-world location data directly into AI workflows
đ Link
When AI Meets Design Systems: My Storybook Webinar Recap
A behind-the-scenes look at my recent Storybook webinar on AI-powered design systems, including the lessons, surprises, and why this conversation matters.
From TJ Pitre
đ Link










