AI Hub · Field Report

How I Build Production Software with Claude Code

A structured system of specialized AI agents, persistent context, and automated quality gates — not just "chatting with AI and hoping."

Patrick Kass · Functional Analyst & Solo Developer · May 2026

Why Not Just Use ChatGPT?

Most AI coding tools work without context, without memory, without quality control. Claude Code is fundamentally different.

ChatGPT / Gemini / Copilot Chat

  • No access to filesystem or project
  • Context lost after every session
  • Generic answers without project knowledge
  • Copy-paste between browser and IDE
  • One model for everything — no specialization
  • No automatic quality checks
  • No access to database, CI/CD, docs
VS

Claude Code (Agentic Coding)

  • Direct filesystem access: read, write, edit
  • Persistent memory across sessions
  • Knows the entire project including architecture
  • Works directly in terminal — no copy-paste
  • Specialized agents for different tasks
  • Automatic build, lint, and security checks
  • Tool integration: DB, Confluence, browser, email

The secret: It's not the AI alone that makes the difference — it's the structure around it. A well-designed system of context, specialization, and verification turns a language model into a real development partner.

Three Layers, One System

The setup is organized in three layers: global standards, project-specific context, and runtime tools.

The global configuration applies to all projects. It defines coding standards, language rules, team structure, preferred technologies, and the agent workflow.

~/.claude/ CLAUDE.md # ~400 lines: coding standards, tech stack, git rules agents/ assessment-manager.md # Feasibility, legal, ROI, market analysis planner.md # Break requirements into feature specs architect.md # DB schema, API design, migrations developer.md # Implementation in small batches quality-engineer.md # Testing, bug docs, release gate context/ agent-driven-development.md # Workflow reference design-principles.md # Colors, typography, spacing settings.json # MCP servers, permissions, hooks

Key principle: The global CLAUDE.md is the "constitution" — it contains rules that Claude loads automatically at every conversation start. Naming conventions, tech stack preferences, accessibility standards, git workflow: defined once, applied consistently.

Each project has its own lean CLAUDE.md and a /dev folder with detailed documentation.

project/ CLAUDE.md # Project overview, dev commands, quick reference features/ AUTH-001-mfa.md # Feature spec: grows through the pipeline UI-003-dashboard.md # Requirements → Design → Code → Tests docs/dev/ ARCHITECTURE.md # Tech stack, DB schema, components CHANGELOG.md # Update history with working hours PENDING-TASKS.md # Open tasks, current work IMPACT-CHECKLIST.md # Pre-change checkpoints (18 sections) FUTURE-ROADMAP.md # Strategic feature planning TESTDATA-SET.md # Test accounts, demo data PRODUCT-SUMMARY.md # Product overview, synced to Confluence

Session start: Claude automatically reads CLAUDE.md and instantly knows: which app, which DB schema, which edge functions, which coding patterns. No context ramp-up — every session starts with full knowledge.

At runtime, Claude Code connects to external services via Model Context Protocol (MCP) and uses an arsenal of specialized skills and sub-agents.

MCP Servers

Supabase (DB, migrations), Atlassian (Confluence/Jira), Playwright (browser), Sequential Thinking, Google Suite

Filesystem

Read, Edit, Write, Glob, Grep, Bash terminal — direct access to all project files without intermediary

Sub-Agents

Parallel specialists: Explore (codebase search), Plan (architecture), /devteam agents (pipeline)

MCP Servers — Claude Talks to the World

The Model Context Protocol connects Claude Code to external services. Not chatting about code — but taking direct action.

Supabase

Direct SQL queries, deploy migrations, manage edge functions, read logs, verify RLS policies — all without the Supabase Dashboard.

execute_sqlapply_migrationdeploy_edge_functionget_logs

Atlassian

Read and update Confluence pages, create Jira issues, keep product documentation in sync — bidirectionally.

getConfluencePageupdateConfluencePagesearchJiraIssues

Playwright

Remote-control browsers: navigate pages, take screenshots, test forms, verify accessibility, detect visual regressions.

navigatescreenshotclickevaluate

Sequential Thinking

Think through complex problems step by step: schema design, architecture decisions, migration plans — structured and deliberate.

sequentialthinking

Google Suite

Read Gmail and create drafts, manage Google Drive files, calendar events — integrated into the workflow.

GmailDriveCalendar

Context7

Fetch live documentation for libraries and APIs — current docs instead of outdated training data. Proactively used for every new library.

query-docs

The difference: Instead of "write me a SQL query that I'll paste into the dashboard," Claude executes the query directly, sees the result, and adapts the code accordingly. Instead of "describe how the page looks," Claude takes a screenshot and verifies it.

Agent-Driven Development

My custom-built workflow: 5 specialized agents with clear roles, strict boundaries, and a human at every gate.

AGENT 01
Assessment Manager
Evaluates feasibility, legal/GDPR, ROI, and market fit. Creates business case.
WebSearchSequential Thinking
✗ Never writes code
✗ Never creates feature specs
AGENT 02
Planner
Breaks requirements into testable feature specs with acceptance criteria.
BrainstormingSequential Thinking
✗ Never writes code
✗ No architecture decisions
AGENT 03
Architect
Designs DB schema, API design, migrations. Creates mockups and diagrams.
Sequential ThinkingSupabase MCP
✗ No frontend code
✗ No migration without approval
AGENT 04
Developer
Implements in small batches. Build must be green after every step.
TDDSub-Agents
✗ Never changes DB schema
✗ Max 5-7 files per commit
AGENT 05
Quality Engineer
Tests against acceptance criteria. Documents bugs. GO/NO-GO gate.
VerificationSystematic Debugging
✗ Never fixes bugs
✗ No GO with critical bugs
✓ HUMAN GATE between every agent

Core pattern: Negative Constraints. Every agent has explicit NEVER rules. The Planner can't write code. The Developer can't change the schema. The QA can't fix bugs. This prevents scope creep and enforces clean handoffs.

The Feature Spec: Single Source of Truth

Every task is documented as a feature spec. It grows through the pipeline — each agent adds their part.

Done AUTH-005: Multi-Factor Authentication
Status
Assessed → Planned → In Progress → Testing → Done
Effort
3 sessions (estimated: 2-3)
Affected Files
auth/, api/, components/auth/, edge-functions/
Acceptance Criteria
6/6 fulfilled, 0 bugs, QA: GO

How a Feature Is Built

From idea to verified implementation — every step with clear responsibility.

ASSESSMENT

Create Business Case

Assessment Manager researches: Is it feasible? Legal hurdles? Worth the investment? Result: GO / NO-GO / CONDITIONAL GO with clear reasoning.

PLANNING

Define Feature Spec

Planner breaks the requirement into testable units. Acceptance criteria, dependencies, edge cases, rollback plan — all documented before a single line of code is written.

DESIGN

Tech Design & Schema

Architect designs DB schema, RLS policies, API endpoints. Creates mockups or architecture diagrams. Security analysis included. Migration SQL is written and reviewed.

CODE

Implementation in Batches

Developer implements strictly according to spec. Small commits (max 5-7 files). Build must be green after every batch. Existing patterns are followed, no independent architecture decisions.

TESTING

QA & Release Gate

Quality Engineer tests against all acceptance criteria. Bugs are documented with severity and handed back — never fixed by QA. Only after a green suite: GO for deployment.

Documentation as Code

7 mandatory files per project. Each with a clear purpose. Together they form the project's memory.

CLAUDE.md
Project overview, dev commands, quick reference. Loaded automatically at every session start.
ARCHITECTURE.md
Tech stack, DB schema, component overview. The "what exists" reference.
CHANGELOG.md
Update history with commits and working hours. Not just "what" but "why" and "how long."
PENDING-TASKS.md
Open tasks, current work list. Single source of truth for "what's happening right now."
IMPACT-CHECKLIST.md
18-section pre-change verification. Checked before EVERY change. Prevents side effects.
FUTURE-ROADMAP.md
Strategic feature planning. Separated from Pending Tasks: long-term vision vs. current work.
TESTDATA-SET.md
Test accounts, demo data, test scenarios. Serves as seed template and documentation.

Impact Checklist in detail: Before any change, the checklist verifies 18 areas: DB schema, exports (11 categories!), imports, test data, edge functions, i18n, realtime subscriptions, admin components, storage buckets, WCAG compliance, native app behavior. A forgotten side effect simply can't happen.

Auto-Memory: Remembering Across Sessions

Claude stores learnings in a file-based memory system. Four types: user (preferences), feedback (corrections), project (context), reference (external resources).

# Example: Feedback Memory --- name: feedback-no-mocks-in-tests description: Integration tests must use real DB metadata: type: feedback --- Integration tests always against real database, never mocks. **Why:** Mock tests passed but prod migration failed. **How to apply:** For every DB-related test.

38 Test Types Across 18 Categories

A global testing strategy that applies to all projects. Documented on Confluence and as Markdown — always in sync.

38
Test Types
18
Categories
A–T
Coverage
AA
WCAG Level
A) Functional Tests2
Unit Tests (isolated logic)
Widget / Component Tests
B) Security Tests4
Input Validation (SQL Injection, XSS)
API Key Exposure
Auth Bypass
Data Isolation (RLS)
C) Integration Tests5
Cross-User RLS (Live DB)
DB Operations (CRUD)
E2E Flow Tests
API Contract Validation
Payment / Stripe Webhooks
D) Auth & Identity3
OAuth Flow Tests
UUID as Identifier
Session & Token Handling
E) UI / UX Tests4
Responsive / Device Sizes
Accessibility (Touch, Contrast)
Screen Element Verification
UI Consistency & Contrast
F) Regression & Quality4
Regression (known bugs)
UAT — User Acceptance
Tolerant Search
Subscription / Tier Tests
G–J) Resilience4
Offline / Network Resilience
AI / LLM Integration
Performance / Startup
i18n Completeness
K–T) Specialized12
Deep Links, DB Migration, Error Tracking
OWASP Mobile, Rate Limiting
App Store Compliance, Backward Compat.
WCAG 2.1 AA, Cross-Browser Playwright

Verification strategy: Each test type has a clear schema: What is tested, When (trigger), What it checks (checklist), which Tools are used. The testing strategy lives as a Markdown file and is automatically synchronized with Confluence.

Verification Levels

Automated

Build check after every commit. Lint & TypeCheck. Playwright E2E tests. axe-core for WCAG. CI/CD pipeline on GitHub Actions.

Agent-Based

Quality Engineer checks code against spec. /design-review takes screenshots and verifies UI. /security-review analyzes vulnerabilities.

Manual

UAT on real devices (mobile + desktop). Human gate at every pipeline transition. Final approval before deployment.

Specialized Capabilities

Skills are predefined workflows that extend Claude Code with domain-specific capabilities. Invoked via slash commands.

/devteam
Orchestrate agent pipeline
/design-review
Visual UI review via Playwright
/webapp-testing
Automate browser tests
/frontend-design
Generate high-quality UI
/security-review
Security analysis of changes
/review
Review pull requests
/claude-api
Build Claude API apps
/simplify
Review code quality
/pptx /docx /xlsx
Create office documents
/init
CLAUDE.md for new project
/skill-creator
Develop custom skills
/schedule /loop
Automated routines

Custom skills: Using /skill-creator, domain-specific skills can be defined. /devteam is a custom-built skill that orchestrates the entire 5-agent pipeline — including human gates, feature spec management, and handoff logic.

What Happens in a Session

A typical feature — from requirement to deployment. Not hypothetical, but real metrics from production.

1
Prompt
~60
Tool Calls
30+
Files Analyzed
5
Agent Phases
3
Self-Found Bugs
<2h
Total Duration
SESSION START ├─ CLAUDE.md loaded Project context, tech stack, patterns ├─ Memory loaded 30+ stored learnings ├─ PENDING-TASKS.md read Current work list ├─ IMPACT-CHECKLIST.md ready 18 checkpoints activated └─ MCP servers connected Supabase, Atlassian, Playwright ASSESSMENT (Assessment Manager) ├─ Requirement analyzed ├─ GDPR implications checked ├─ Market research via WebSearch └─ Business case: GO ╰ HUMAN GATE → Approved PLANNING (Planner) ├─ 3 feature specs created ├─ 12 acceptance criteria defined └─ Dependencies documented ╰ HUMAN GATE → Approved DESIGN (Architect) ├─ DB schema designed ├─ RLS policies defined ├─ Migration SQL written └─ Mockup created ╰ HUMAN GATE → Approved CODE (Developer) ├─ Batch 1: API + Types build green ├─ Batch 2: Components + Hooks build green ├─ Batch 3: Edge Function build green └─ Batch 4: i18n + Integration build green ╰ HUMAN GATE → Approved TESTING (Quality Engineer) ├─ 12/12 Acceptance Criteria ├─ Security: RLS, Auth, Input ├─ 3 bugs found → back to Developer ├─ Bugs fixed, re-test └─ Release gate: GO DEPLOYMENT ├─ git push → Cloudflare auto-deploy ├─ CHANGELOG.md updated ├─ PENDING-TASKS.md updated └─ Confluence synchronized

Why It Works

Six principles that turn a language model into a real development partner.

Context Over Copy-Paste

Claude knows the entire project: schema, architecture, patterns, history. Not "here's my code, what's wrong?" but "feature X doesn't work" — and Claude knows where to look.

Specialization Over Generalism

5 agents with clear roles and strict boundaries. The Planner plans, the Architect designs, the Developer codes, the QA tests. No agent does everything.

Memory Over Amnesia

Persistent memory system across sessions. Preferences, corrections, project context, and external references — learned once, available forever.

Quality Gates Over Blind Trust

Human gate at every pipeline transition. Impact checklist before every change. 38 test types in QA. Build check after every commit. No change without verification.

Tool Integration Over Isolation

Directly interact with database, Confluence, browser, email. No intermediary steps, no copy-paste. Claude acts — not just advises.

Documentation as Code

7 mandatory files per project. Every change updates the relevant docs. CHANGELOG tracks working hours. PRODUCT-SUMMARY syncs with Confluence. Nothing is lost.

Vibe Coding with Claude Code is not "chatting with AI and hoping for good code." It's a structured system of context, specialization, quality control, and memory — enabling a single developer to build production-grade software with the velocity of a small team.