Spec-Driven Development with Claude Code: A Guided Tutorial

Learn how to write a spec, turn it into a plan, and let Claude Code build using spec-driven development. Compare Superpowers, Spec Kit, and BMAD-METHOD to find the right tool for your workflow.

May 19, 2026 · 15 min read

Vibe-coding with Claude Code works fine on small jobs. You describe a change, the agent writes it, and you check the result. The trouble starts when a feature touches many files at once. By then, the hard part is the design decision, not the implementation.

Spec-driven development handles that design decision in writing, before any code runs. You write a short spec that says what the change should do. You turn the spec into a plan of numbered tasks. Claude Code then writes code against the plan, one task at a time, with a human review between every step.

This tutorial teaches the workflow end-to-end. It walks through three open-source setups that run it inside Claude Code: Superpowers, GitHub Spec Kit, and BMAD-METHOD.

If you're new to Claude Code, I recommend starting with the Claude Code 101 course. To get hands-on experience with various AI coding tools, check out our AI for Software Engineering skill track.

What Is Spec-Driven Development?

Spec-driven development is a workflow built on three documents in order: one that tells what a change should do, a plan that specifies the steps, and code written against the plan, with a human review between every pair.

The three review points a feature passes through in spec-driven development.

A spec is a short document, written in plain language before any code, that says what a change should do. Take a feature like "let users export their data." A spec for it pins down the answers an agent would otherwise guess at. It lists

Supported file formats
The delivery mode
The behavior during a half-finished export
The parts of the feature that are intentionally left out

Here is the real opening of a spec Claude Code wrote for a workout-shape-verification change in a Telegram-based accountability app of mine. The change replaces a brittle heart-rate threshold with a check on the shape of the heart-rate curve over time:

# Workout Shape-Based Verification — Design Spec
 
**Created:** 2026-05-05
**Status:** Draft
**Supersedes (partially):** [2026-03-17-calisthenics-verification-design.md]
  — replaces the absolute-HR thresholds for the Workout activity type.
  Run / Ride / Walk verification is unchanged.
 
## Problem
 
The current Workout verifier accepts an activity only if absolute heart-rate
levels clear fixed cutoffs: avg ≥ 120, max ≥ 140, range ≥ 30, suffer_score ≥ 3.
Two failures in production:
 
1. **False-negative risk.** As cardiovascular fitness improved
   (resting HR ~80), real calisthenics sessions with disciplined rest now
   average 115–125 bpm. Recent sessions have come within 4 bpm of the 120 floor.
 
<!-- ... continues for hundreds of lines through Solution, Risks,
 	Out of scope, and What is removed / added / changed / unchanged -->

The plan is the next document. It breaks the spec above into numbered tasks that the agent can work on one at a time, each task naming a file, a change, an order, and a test. Where the spec answers "what," the plan answers "in what steps."

The code comes last, written against the plan one task at a time.

Three documents. A human review sits between every pair. You review the spec before it becomes a plan. You review the plan before it becomes code. You review the code before it merges.

How spec-driven development differs from plan mode

You may have used Claude Code's built-in plan mode (press Shift+Tab twice to enter it) and wondered why this is different. Plan mode produces a plan inside a single chat turn. The plan lives in memory, with no persisted spec and no review step between phases.

Spec-driven development persists the spec and the plan as files on disk. Each one passes a human review before the next phase starts, and the artifacts survive across sessions. Plan mode compresses two phases of software development into one chat turn. That works on small jobs and fails as soon as the codebase grows and starts serving real users.

Why Vibe-Coding Hits a Wall

Vibe-coding works on prototypes, single files, and throwaway scripts. It gets worse in real applications with users to answer to and in existing large codebases. The line worth drawing is at about 4 files. Any change touching that many files needs a spec, as do any refactor with a coherent end state, or any task where "what should this do exactly?" is the hard part.

The failure has a clear cause. A vague prompt like "add photo sharing to my app" makes the model guess at thousands of unstated requirements.

Take a single one of those requirements: notification preferences. The product manager assumes per-channel toggles. The backend builds an on/off switch. The frontend assumes OS-level integration. Four reasonable readings of three words, four different products.

Each review step in spec-driven development catches a different class of mistake before it gets expensive. The spec review catches scope creep and wrong-root-cause framings. The plan review catches half-finished implementations and conflicting patterns. The code review catches plans that read fine but break on the first failing test.

Failure mode	What goes wrong	Caught at
Scope creep mid-task	Agent expands the feature past the original ask	Spec review
Half-finished implementations	Agent declares done at 80% with stubs and TODOs	Plan review
Conflicting patterns	The agent picks a different pattern than the rest of the codebase	Plan review
Wrong-root-cause fixes	Agent patches a symptom instead of the underlying bug	Spec review
Plans that break on contact	Plan reads fine, but doesn't survive the first failing test	Code review

The payoff is real, and it builds slowly. The spec phase costs hours of writing before any code runs, and the first few features feel slower than vibe-coding. My own break-even point came around the fourth or fifth feature. By then, the specs were catching design mistakes I would otherwise have shipped and rewritten a week later.

The next three sections walk through three open-source approaches that run this workflow inside Claude Code. They are ordered from lightest to heaviest in the structure they enforce.

Superpowers

Superpowers is the lightest of the three. It is the one I use day to day, and the one we will cover in the most detail.

What is Superpowers?

Superpowers is a Claude Code plugin by Jesse Vincent (obra/superpowers, MIT license), with around 194k stars on GitHub.

It ships a set of skills. A Claude skill, in Claude Code, is a named instruction file that the agent loads on demand to follow a specific workflow. Superpowers ships skills that hold Claude Code to the spec-driven loop instead of letting it jump straight to code.

The Superpowers project page on GitHub.

How to install Superpowers

Install it through Claude Code's official plugin marketplace:

/plugin install superpowers@claude-plugins-official

A SessionStart hook auto-loads the using-superpowers skill, so the workflow is active the moment you start typing. (Claude code hooks are scripts the agent runs at a specific lifecycle event.) There is nothing to wire up per project.

The Superpowers workflow

Afterward, four skills manage your daily work:

Skill	What it does
`brainstorming`	Talks through the design with you and produces the spec document
`writing-plans`	Turns the approved spec into a numbered task list
`subagent-driven-development`	Executes the plan one task at a time, with a test-first cycle and a code-review subagent after each task
`requesting-code-review`	Runs an independent code-review subagent over the full diff before merge

A subagent is a separate Claude Code instance that the parent dispatches to do focused work in its own context window. The reviewer subagents in the table above run as subagents, so they read the code cold, without the parent's framing.

How to use Superpowers

You invoke the four skills by describing what you want in plain language. The brainstorming skill hears "let's discuss this new feature" and kicks off the spec conversation on its own. The others trigger the same way.

The four Superpowers skills in order, with the two human review points sitting between brainstorming and writing-plans.

The walkthrough below uses the same workout-shape-verification feature from the spec excerpt above.

Stage 1: brainstorm to spec

I open Claude Code and type:

Let's discuss a new feature. The Workout verifier in make-me-work uses absolute heart-rate cutoffs and is now misfiring as my resting HR drops. I want to replace the absolute cutoffs with a check on the shape of the HR curve over the session.

The brainstorming skill takes over and asks ten or so questions back, among them:

What counts as the right "shape"
Which data streams to combine
What to do with sessions that look right on shape but fail an old cutoff
Whether the change should apply to Run and Ride too

Two human review points land here. The first is the design review, where I confirm the answers I gave match what I want. The second is the spec review. I read the file Claude has written and approve it before any plan work begins.

Stage 2: spec to plan

I run the writing-plans skill. It reads the approved spec and writes a plan file with four parts:

A definition of what “Done” means
A file map of touched files
A user journey through the demo path
A numbered task list of checkbox sub-steps.

I review the plan, push back on tasks that look out of order or too coarse, and approve.

Stage 3: plan to code

I run subagent-driven-development. From this point the loop runs without me. For each task in the plan, the skill:

Writes a failing test
Writes the code to pass it
Refactors
Dispatches a code-review subagent that reads the diff cold

If the reviewer flags an issue, the loop fixes it before moving to the next task. There is no human review point inside this stage. The reviews that matter for this stage are the two before it.

Stage 4: full-diff review

Once the plan is done, I run requesting-code-review. A fresh subagent reads the whole diff against the spec and the plan, and posts a review. I take the suggestions before merging.

When a task in the plan reveals a contradiction with the spec, the loop stops and asks. I can edit the spec (or let Claude do it) and regenerate the affected tasks. The other option is a one-off correction in the task itself. Superpowers does not silently work around spec errors.

Real specs and plans on disk

Here is the spec for the workout-shape-verification feature, open in an editor:

The spec file as it lands on disk after the brainstorming skill writes it.

The header carries the Created, Status, and Supersedes fields that the brainstorming skill writes by default. The Problem section follows. None of it is code. The file continues beyond the screenshot through sections for the proposed solution and the constraints on what the change should and should not touch.

The matching plan opens with its User Journey:

The plan file that the writing-plans skill produces from the approved spec.

The journey walks the demo path five steps at a time, naming the exact commands, files, and arguments at every step. The numbered tasks that follow translate each step into checkbox sub-steps that the subagent-driven-development skill can work through.

The two documents pair up like this:

Spec and plan side by side. The spec answers what changes and why. The plan answers in what steps.

For larger specs and plans, I add one step the official loop does not have: a red-team pass. Before I sign off, I have one or several Opus subagents read the spec cold, looking for holes from different angles. That is a personal habit, not a Superpowers feature. It has caught enough bad assumptions that I keep it.

When Superpowers is the wrong choice

Superpowers fits solo work on a single repo. It works best when the whole codebase fits in one Claude Code session, and you will actually read a 2-page spec. The detailed comparison lives in How to choose between them further down. The short version: Superpowers struggles with multi-repo features and with work that needs clear role separation.

One developer caught a fourth failure mode in a public complaint about the plugin: “Even the smallest of tasks takes forever, with Claude spinning up subagents and writing plans that are completely overkill. Changing some CSS now takes forever.”

The fix is to skip Superpowers for tiny changes. The skills only activate on the brainstorming trigger. A one-line CSS edit can go through Claude Code without ever invoking the spec loop. The real failure mode there is over-applying the workflow to work that does not need a spec.

GitHub Spec Kit

Spec Kit is the choice when the spec has to outlast any single Claude Code session. It is also the right pick when people who never open Claude Code need to read the spec.

What is the GitHub spec-kit?

Spec Kit is a GitHub project (github/spec-kit, MIT license), maintained by GitHub itself, with over 100k stars. It ships a CLI plus a workflow that runs the same way across every major AI coding agent. Claude Code, Cursor, Aider, Cline, and Roo Code are all supported. The agent-neutral design is what lets the spec live outside Claude Code.

The Spec Kit project page on GitHub.

How to install the GitHub spec-kit

There is no official PyPI package yet, so install the CLI from the Git tag with uv:

uv tool install specify-cli --from git+https://github.com/github/spec-kit.git@vX.Y.Z

Replace vX.Y.Z with the current release tag. The package is specify-cli, and the command it registers is specify.

The GitHub spec-kit workflow

The workflow runs through nine slash commands that the CLI installs into your agent's slash-command list. Six are core to the loop, three are optional for cases the core loop does not cover.

Slash-Command	Type	Description
`/speckit.constitution`	Core	writes the project rules that every later artifact has to follow
`/speckit.specify`	Core	produces the spec
`/speckit.plan`	Core	produces the architecture document
`/speckit.tasks`	Core	produces the numbered task list
`/speckit.taskstoissues`	Core	turns those tasks into GitHub issues
`/speckit.implement`	Core	works the tasks one at a time
`/speckit.clarify`	Optional	asks the user follow-up questions when the spec has gaps
`/speckit.analyze`	Optional	looks for contradictions across spec, plan, and tasks
`/speckit.checklist`	Optional	runs a quality check on the artifacts before implementation

The separator between command group and verb is a dot, not a colon: /speckit.specify, not /speckit:specify.

The nine Spec Kit slash commands: six core commands on the pipeline, three optional commands hanging off it.

The artifacts these commands produce are the same spec and plan you saw in the Superpowers section, also written to disk and tracked by Git. The difference is portability: Spec Kit's artifacts are designed to work with any AI coding agent, not just Claude Code, and the workflow is built for stakeholder review via GitHub pull requests rather than as a byproduct of a single tool's loop.

When to use GitHub spec-kit

On a solo project, you probably will not need the Spec Kit. Reach for it when:

The project grows past one person
Your spec needs review by people who never open Claude Code
You are running a non-Claude-Code agent for some of the work
You want a spec format that lives outside any one tool and still reads months later

The BMAD Method

Where Spec Kit organizes artifacts, BMAD organizes people. It splits the spec-to-code workflow into four phases, each run by a named role-agent.

What is BMAD?

BMAD-METHOD (bmad-code-org/BMAD-METHOD, MIT license, about 47k stars) is on version 6. The acronym, in the project's own docs, expands to "Breakthrough Method for Agile AI-Driven Development." It runs on top of Claude Code and other agents, and it installs as a module ecosystem. The default install gives you a core module that carries six role-agents, four workflow phases, and 34 or more named workflows.

The BMAD-METHOD project page on GitHub.

How to install BMAD

Install BMAD with Node:

npx bmad-method install

The six role-agents are prompt personas the user activates by name from inside the agent host. In Claude Code, that means typing the activation command BMAD installs. Check the README for the exact syntax, which shifts between releases.

Introducing the BMAD coworker agents and artifacts

Once activated, the agent takes on that role's instructions, voice, and outputs until you switch personas. The six are:

Mary, the Analyst
Paige, the Technical Writer
John, the Product Manager
Sally, the UX Designer
Winston, the Architect
Amelia, the Developer

Two roles you might expect are missing in v6: there is no Scrum Master agent and no standalone QA agent. Sprint planning and story prep fall to the Developer agent, and QA test generation is a workflow that the Developer triggers.

The artifact set is heavier than a single spec. You get:

a product brief
a PRD (Product Requirements Document)
a UX spec
an architecture document
epics broken into user stories (what users can do once the work ships)

The PRD and the architecture document together play the same role as the Superpowers spec. The split puts them across two role-agents and into a more formal format. The artifact set as a whole covers a full software-development lifecycle, with each feature inheriting context from the layer above.

The BMAD workflow

The v6 workflow runs in four phases.

The four BMAD phases and the role-agent running each one. The Quick Flow track skips the first three phases for small work.

Phase 1, analysis, is optional. Mary (Analyst) and Paige (Tech Writer) run research and produce a product brief. Skip the phase if you already know what you are building.

Phase 2, planning, is required. John (PM) writes the PRD. Sally (UX Designer) adds a UX spec when the feature has a UI.

Phase 3, solutioning, is Winston's phase. The Architect drafts the architecture first, then John breaks requirements into epics and stories. Putting stories after the architecture is a v6 choice that sizes them against real implementation boundaries. Winston then runs an implementation-readiness check that ends in a PASS, CONCERNS, or FAIL verdict.

Phase 4, implementation, is where Amelia (Developer) works story by story: create the story, build it, and code-review it. Once a full epic is done, she triggers a QA test generation workflow across the whole epic. This is the phase where Claude Code does the actual coding, working as Amelia.

For small, well-scoped work, BMAD ships a "Quick Flow" track that activates Amelia directly and skips the first three phases. The activation command is in the BMAD README (the exact syntax shifts between releases). Quick Flow produces no PRD and no architecture document, just a short story and the code that satisfies it. It is the answer to the "this is overkill for a button change" objection.

When the spec turns out to be wrong mid-implementation, BMAD loops back through Winston's Phase 3 verdict. A FAIL sends you back to Phase 2 to rewrite the PRD. A CONCERNS proceeds with Winston's noted risks attached to the story. The split lets you keep moving on small inconsistencies and stop hard on large ones.

When the complexity pays off

BMAD pays off on long-running projects with real users to answer to. It also fits multi-developer teams, handing work off between people. The phase-and-role separation has to save more time than it costs.

It is the wrong fit for a one-person side project. On solo work, the four-phase, six-agent split is mostly overhead. There is no second person on the team for the role separation to matter.

How to Choose Between the Frameworks

Framework	Install	Where the work lives	Best for
Superpowers	`/plugin install superpowers@claude-plugins-official` (CC marketplace)	Skills auto-loaded inside Claude Code	Solo work, single-repo features, long unattended runs
GitHub Spec Kit	`uv tool install specify-cli --from git+https://github.com/github/spec-kit.git@vX.Y.Z` (CLI)	Nine /speckit.* slash commands producing spec, plan, and tasks artifacts on disk	Cross-team spec review, spec-to-code traceability
BMAD-METHOD	`npx bmad-method install` (Node)	Six named role-agents across four phases (Analysis, Planning, Solutioning, Implementation)	Long-running projects, a real PM in the loop, multi-dev handoffs

Three rules decide the choice.

Use Spec Kit if the spec has to be read by people who never open Claude Code, or has to live in Git as a long-term artifact.
If several people work across distinct roles, or a real PM-style stakeholder is in the loop, use BMAD.
Otherwise, use Superpowers.

Three questions about your project, four framework choices on the other side.

There is a fourth option that the decision tree names: combine Spec Kit with Superpowers. Use Spec Kit for the spec phase so the artifacts live in Git for cross-team review. Then point Superpowers' subagent-driven-development skill at the Spec Kit plan file in one line of config. You get the durable spec from Spec Kit alongside the tight implementation loop from Superpowers.

Conclusion

Spec-driven development is three documents in order. The spec says what to build, the plan says in what steps, and the code follows the plan. A human review sits between every pair.

Run the decision tree above to pick a framework, which, for most readers, will land on Superpowers. Install it and pick one feature you would otherwise vibe-code, something that touches 3 to 5 files. Run it end-to-end through brainstorm, spec, plan, and execute. One real run teaches the workflow better than any description.

If you want to refresh Claude Code fundamentals first, DataCamp has a practical Claude Code tutorial, a best-practices guide covering plan mode, CLAUDE.md, and TDD, and a deep dive on plan mode itself.

What is spec-driven development in Claude Code?

How is it different from Claude Code's built-in plan mode?

Which framework should I start with: Superpowers, GitHub Spec Kit, or BMAD-METHOD?

How do I install Superpowers in Claude Code?

What happens when the spec turns out to be wrong mid-implementation?

Author

Bex Tuychiev

Topics

AI Agents

Artificial Intelligence

Top AI Software Engineering Courses

Track

AI for Software Engineering

7 hr

Write code and build software applications faster than ever before with the latest AI developer tools, including GitHub Copilot, Windsurf, and Replit.

See Details

Start Course

Course

Software Development with Claude Code

4 hr

4.3K

Claude Code brings AI assistance to your terminal. Learn the workflows that turn it into a reliable tool for real software development.

See Details

Start Course

Course

Claude Code 101

3 hr

15.5K

Learn how to use Claude Code effectively in your daily development workflows.

See Details

Start Course

Tutorial

Claude Code Tutorial: Setup, Refactoring, and Debugging in Practice

Learn how to use Anthropic's Claude Code to improve software development workflows through a practical example using the Supabase Python library.

Aashi Dutt

Tutorial

Claude Code Best Practices: Planning, Context Transfer, TDD

Learn Claude Code best practices from production teams. Use plan mode, CLAUDE.md files, and test-driven development to make the most of Claude's context window.

Bex Tuychiev

Tutorial

Writing the Best CLAUDE.md: A Complete Guide for Claude Code

Learn how to design and maintain a lean CLAUDE.md file, so Claude Code reliably follows your project’s rules, conventions, and workflows in every session.

Bex Tuychiev

Tutorial

Claude Code 2.1: A Guide With Practical Examples

Explore what’s new in Claude Code 2.1 by running a set of focused experiments on an existing project repository within CLI and web workflows.

Aashi Dutt

Tutorial

Imagine with Claude: A Guide With Practical Examples

Learn how Anthropic's Imagine with Claude introduces a new paradigm for AI-assisted software development, generating functionality on the fly.

François Aubry

Tutorial

How to Build Claude Code Plugins: A Step-by-Step Guide

A complete guide to Claude Code plugins. Discover how to install extensions, choose between Skills and MCPs, and build a custom session logger from scratch.

Bex Tuychiev

See More See More

What Is Spec-Driven Development?

How spec-driven development differs from plan mode

Why Vibe-Coding Hits a Wall

Superpowers

What is Superpowers?

How to install Superpowers

The Superpowers workflow

How to use Superpowers

Stage 1: brainstorm to spec

Stage 2: spec to plan

Stage 3: plan to code

Stage 4: full-diff review

Real specs and plans on disk

When Superpowers is the wrong choice

GitHub Spec Kit

What is the GitHub spec-kit?

How to install the GitHub spec-kit

The GitHub spec-kit workflow

When to use GitHub spec-kit

The BMAD Method

What is BMAD?

How to install BMAD

Introducing the BMAD coworker agents and artifacts

The BMAD workflow

When the complexity pays off

How to Choose Between the Frameworks

Conclusion

Spec-Driven Development in Claude Code FAQs

Which framework should I start with: Superpowers, GitHub Spec Kit, or BMAD-METHOD?

How do I install Superpowers in Claude Code?

What happens when the spec turns out to be wrong mid-implementation?

Claude Code Tutorial: Setup, Refactoring, and Debugging in Practice

Claude Code Best Practices: Planning, Context Transfer, TDD

Writing the Best CLAUDE.md: A Complete Guide for Claude Code

Claude Code 2.1: A Guide With Practical Examples

Imagine with Claude: A Guide With Practical Examples

How to Build Claude Code Plugins: A Step-by-Step Guide

.css-1531qan{-webkit-text-decoration:none;text-decoration:none;color:inherit;}AI for Software Engineering

Software Development with Claude Code

Claude Code 101

Claude Code Tutorial: Setup, Refactoring, and Debugging in Practice

Claude Code Best Practices: Planning, Context Transfer, TDD

Writing the Best CLAUDE.md: A Complete Guide for Claude Code

Claude Code 2.1: A Guide With Practical Examples

Imagine with Claude: A Guide With Practical Examples

How to Build Claude Code Plugins: A Step-by-Step Guide

AI for Software Engineering