Turning an ambitious idea into a working product takes organization. To keep everything on track, I maintain a dedicated task list for every development phase and/or project. This ensures I never lose sight of the bigger picture while my three-step AI stack handles the heavy lifting.
Step 1: Planning and Research (Gemini Pro)
Before writing any code, I use AI to brainstorm and research the best programming languages and tech stack for the job. This upfront research serves two vital purposes: it heads off problems before they happen, and it ensures that I enter the build phase with a solid plan. Because I have already explored alternative ways to achieve my goals, that plan remains highly adaptable.
Step 2: The Foundation (AI Studio / Project Synapse)
Once the blueprint is mapped out, I use AI Studio to generate the initial codebase based on that strategic blueprint. For less complex projects, I will occasionally route this phase through Project Synapse, my own custom AI assistant. Either way, this step gives me a massive head start and completely eliminates the blank-page phase.
Step 3: Review, Testing, and Management (Claude Code)
This is the step I lean on hardest. I use Claude Code via the CLI as my self-managing engineering assistant to review the initial code, verify it against the Gemini Pro plan, stress test the logic, and offer architectural improvements and enhancements.
How I Built a Self-Managing Engineering Assistant
That third step is where most of my time goes, so it is the step I invested in automating. Using an AI coding assistant out of the box exposes certain friction points: it redundantly reads the exact same files multiple times in a single session, interrupts with repetitive permission prompts, and balloons the session context until responses slow down. To fix this, I taught the tool my environment using a specific configuration that stops the assistant from asking repetitive questions and allows it to manage itself.
What follows is the technical detail behind that configuration. If that isn't your thing, skip ahead to The Result; you won't miss the story.
My Workflow with Claude: Three Layers
Layer 1: Durable Rules
A single standing-instructions file read at the start of every session. Not a script or an agent, just non-negotiable rules that override default behavior:
- Plan first: Sessions start in planning mode; no changes happen until I've seen and approved a plan. (Also enforced by the session default.)
- UI verification: Every interface change is verified in a real browser via Playwright before it's called done: navigate the page, exercise the flow, read the console and network activity, and capture a screenshot. If browser verification isn't available, the assistant stops and tells me.
- Read first, change second: Any file must be re-read in the current session before editing; past knowledge doesn't count. Behavior must be verified in the code before writing copy that describes it. If I say "stop guessing" or "read first," it stops and reads.
- Dangerous-command block: If a risky command is blocked by the guard, the assistant stops, reports what it was attempting, and waits. No rephrasing or working around it.
- Test before deploy: Anything beyond a trivial copy tweak must have passing tests before it ships. A clean build is not the gate; passing tests are.
- Graceful deploys only: Every deploy uses a graceful reload so the site stays up for users in the middle of a request.
- Memory files: Canonical facts and prior feedback are stored so the assistant references them instead of rediscovering them each session.
Layer 2: Automatic Hooks
Small scripts that fire automatically on specific triggers:
- Session start: A fast snapshot of the running environment status, disk usage, and dirty git repositories; this saves several discovery steps up front.
- Prompt submit: A context injector that supplies the relevant project's details when I mention it, so the assistant doesn't guess which app I mean.
- Before reads: An advisory that flags a file already read this session.
- Before risky commands: Guards that refuse destructive operations with a clear error, and steer risky deploy actions toward the safe path instead. Ordinary operations are untouched; only the specifically dangerous forms are blocked.
- After edits: Automatically formats the edited file to a consistent style.
- End of turn: Counts tool activity in the session and suggests compacting the context once it grows large, then reminds me periodically after that. It only advises; it never halts the work.
Layer 3: Manual Tools and Agents
Run on demand, either invoked by me or by an agent acting on my behalf when a task requires it. Each has a single responsibility and tight boundaries:
- Specialized agents: One drives a real browser and returns a clear pass or fail with evidence ("it compiles" is not "it works"); another does read-only health checks on active services without restarting anything; others clean up dead code, plan safe breakups of large files, and check performance impact. More exist for exploration, planning, debugging, and migrations.
- Tools and skills: Browser automation for verification, plus small skills that streamline the workflow and cut repetitive prompts.
- Design principle: Narrowness builds trust. Because the read-only and non-mutating agents can't change code or restart services, they're safe to invoke without close supervision.
The Result
This setup means I start every session with an environment that already knows the rules of the house. It blocks destructive mistakes, keeps the context window lean, and tests the app like a real user. While I still do the essential work -- reviewing the final code, making architectural decisions, and improving the UX -- automating the tedious parts frees me up to actually finish the project. It's the difference between booting up an environment that forgot your preferences overnight, and loading a perfect snapshot with the kind of digital muscle memory that lets you pick up exactly where you left off.