Origin Story
The Challenge
Could AI, given only prompts and a clear goal, build a production-quality tool from scratch? No human writing a single line of code. Just: describe what you need, review what comes back, redirect if needed. Ship it.
This project started — as the best ones do — from several directions at once. A real governance gap worth solving. A curiosity about where vibe-coding actually breaks down. A desire to demonstrate something concrete rather than just talk about AI potential. And a personal interest in AI risk as a domain worth taking seriously.
The hypothesis was simple: if you have a business problem and some AI tokens, you should be able to go from idea to working prototype in a day or two. Not a toy. Not a clickable mockup. A real tool — one that a real organisation could actually use.
🎯 The Business Problem
Organisations are adopting AI faster than their risk frameworks can keep up. Existing tools are either too generic (standard IT risk templates), too expensive (vendor-managed platforms), or too complex for a non-specialist to run without a consultant. A structured, accessible, free assessment tool was missing.
🧪 The Experiment
Pure vibe coding. Human involvement was limited to: state the goal, review the output, redirect if wrong. No writing HTML. No adjusting JavaScript. No manually designing components. Everything — from the risk taxonomy to the visual design — generated by AI.
The rules of engagement
- No human-written code — not a single line
- No manual UI design — layout, colours, and components chosen by AI
- No copy-pasting from templates or external libraries
- Prompting only: state the outcome, review the result, redirect
- Build something genuinely useful, not a proof-of-concept toy
The Point
If you have a business problem and some tokens, AI can build you a working solution. The bottleneck is no longer coding skill — it is knowing clearly what you want to build. That is the human's job now.
Architecture
The Dilemma
Before writing a single prompt, there was one architectural question that mattered more than any other. It is also the question anyone building an AI-powered tool must answer first: should AI power the tool at runtime, or should AI just build it?
Option A
Full GenAI Engine
Feed each system description to an LLM at assessment time. Let the model identify which risks apply and score them dynamically for that specific system.
+Contextually aware of the specific system being assessed
+No need to pre-define a fixed risk taxonomy upfront
+Could ask nuanced follow-up questions per system
−Non-reproducible: same input can score differently each run
−Not auditable: "the AI said so" fails governance requirements
−Costs tokens per assessment — every run charges the user
−Requires live internet and an API key at assessment time
−Black-box scores — impossible to challenge or defend
✓ Chosen
Deterministic Model
Pre-define the risk taxonomy. Build a mathematical scoring engine — weights, caps, matrix lookups — that runs entirely in the browser with no AI at assessment time.
+Fully reproducible: same inputs always produce the same output
+Auditable: transparent scoring formula, no black box
+Zero cost at assessment time — runs offline, no API needed
+Configurable: risk leads can tune the engine to their standards
+Defensible in governance, regulatory, and audit contexts
−Requires a well-designed taxonomy (more upfront AI effort)
−Less adaptive to AI system types not yet in the taxonomy
The Key Insight
Use AI to build the tool — not to run it. A deterministic model means every result can be explained, challenged, and defended — which is what governance actually requires. AI's contribution is in the design, taxonomy, and engineering. The assessor's contribution is the evidence. That division of labour matters.
Implementation
How It Was Built
The build followed four phases — each with a distinct AI role. The progression was: understand the domain, model the math, lock down the architecture, then build.
📚 Research — Building the Taxonomy
Claude and Gemini were used in parallel to survey the major AI risk frameworks: NIST AI RMF, ISO 42001, the EU AI Act, OWASP LLM Top 10, MITRE ATLAS, Google SAIF, and Microsoft's Responsible AI framework. Each model was asked to synthesise a risk taxonomy independently, and the outputs were compared, gaps identified, and the lists merged and refined. The result was 60 distinct AI risks grouped into 10 families, each mapped to a phase of the AI lifecycle. Using two models in parallel was faster and more thorough than one — gaps in one were covered by the other, and disagreements flagged important edge cases.
📊 Excel Modelling — Fine-Tuning the Math
Before a single line of HTML was written, the scoring logic was validated in a spreadsheet. Base scores (1–10) were assigned to each risk. The 5×5 likelihood × impact matrix was calibrated. Answer weight mappings (Yes = 1.0, Partial = 0.5, No = 0.0) were tested against synthetic assessment data. Survey caps and control reduction caps were tuned to produce meaningful LOW/MEDIUM/HIGH/EXTREME distributions — not everything clustering at HIGH or EXTREME. This phase caught several mathematical edge cases before they could cause inconsistent behaviour in the app. AI assisted throughout: generating test data, checking distributions, and validating the formula logic.
🏗️ Technology Decisions — Security First
Three principles shaped every technology choice. Serverless: no server means no attack surface, no maintenance, no infrastructure cost, and no deployment complexity. The tool can be emailed or shared via a network drive. No data storage in the system: assessments live only on the assessor's machine. Nothing is transmitted, logged, or retained. Zero GDPR exposure. No installation: open a file in any modern browser and it works immediately. The entire application — styles, logic, risk data, and controls — lives in one self-contained HTML file. Auditable by design: anyone can open the source and read exactly what it does.
⚙️ AI Platform — The Right Tool for Each Phase
Not all AI tools are equal for all tasks. Research used Claude and Gemini — both are strong at framework synthesis and comparative analysis. Coding used Claude Code/Cowork and Codex, both working on the same codebase and contributing components across sessions. The key learning: treat AI coding tools like contractors. Give each one clear scope, good context, and well-defined interfaces. Where one ran into context limits or hit an edge case, the other picked it up. Sharing a consistent codebase with coherent conventions made hand-offs between models clean and reduced rework.
AI Prompting
Prompting Strategy
How you prompt determines everything about the quality of what you get back. Two techniques were central to this build: reverse prompting and goal-oriented development. Together they produced a more coherent result than traditional back-and-forth prompting would have.
Reverse Prompting
Instead of writing a prompt and hoping it was complete, the process was flipped. A rough goal was described at a high level, and the AI was asked to generate the best prompt for achieving it — including all the clarifying questions it would need answered first. Those questions were answered. Then a second AI model reviewed the resulting prompt for gaps, contradictions, and ambiguities before it was used for actual development. This cross-validation step caught missing requirements that would otherwise have caused rework mid-build.
💬
1 — Describe the goal at high level
"I want to build an enterprise AI risk assessment tool. Help me write the best possible development prompt for this."
↓
🧠
2 — AI generates its clarifying questions
The model produces 8–12 targeted questions covering scope, constraints, target users, output format, security requirements, and edge cases it needs decided before it can write the prompt well.
↓
✏️
3 — Answer the questions
Short, clear answers. This is the human's highest-value contribution in the entire process — deciding what the tool actually needs to do, and what it does not.
↓
📝
4 — AI generates the full development prompt
A comprehensive, structured brief is produced — including architecture constraints, output requirements, edge case handling, and visual design direction.
↓
🔍
5 — Second AI validates the prompt
A different model reviews the generated prompt for gaps, contradictions, and missing requirements before it is used. This cross-validation step consistently caught things the first model missed.
↓
🚀
6 — Run the validated prompt for development
Only now does actual coding begin — with a rigorously defined brief rather than an improvised one.
Goal-Oriented Development
Throughout the build, prompts were framed around outcomes, not implementation details. "Build a risk scoring engine that produces auditable, reproducible results from survey evidence and control assessments" — rather than "write a function that takes a score and returns a band."
This gave each AI model the latitude to own the architecture, produce coherent components, and avoid the fragmentation that comes from over-prescriptive, line-by-line prompting. When AI designs the structure rather than just filling in blanks, the result is a more consistent codebase with fewer contradictions between parts.
Why this works
When the AI owns the architecture, naming is consistent, components fit together, and context is preserved across a session. Telling AI
what to achieve rather than
how to achieve it produces a better result — and reveals faster when the goal itself needs rethinking.
Cost & Time
Cost & Time
The full build — from first research prompt to the three-file public release — was completed within existing subscription allowances. No per-token API billing. No additional spend.
5h
Human time
Prompting, reviewing, directing, testing
~2.7M
Tokens estimated
Input + output across all models and sessions
$0
Incremental cost
Covered by existing subscriptions
3
Delivered files
Dashboard · Config Editor · User Guide
Where the tokens went
| Phase | Activity | Est. tokens |
| Research |
AI risk framework synthesis across Claude + Gemini, taxonomy building and cross-model comparison |
~200K |
| Excel modelling |
Formula validation, scoring distribution testing, edge case analysis with AI assistance |
~100K |
| Dashboard |
Main 6-step assessment app: 60 risks, 241 controls, heat map, exports — multiple development iterations |
~1.2M |
| Config Editor |
5-tab configuration tool with live preview, preset profiles, and JSON export |
~400K |
| User Guide |
22-section documentation with sidebar navigation, full taxonomy reference, and glossary |
~300K |
| Fixes & polish |
Bug fixes (v0.6), full visual redesign (v0.7), security fixes (v0.8), final testing pass (v0.9) |
~500K |
| Total |
|
~2.7M |
What this would have cost traditionally
Without AI
- 2–4 weeks of a senior full-stack developer
- 1 week of a risk consultant to build the taxonomy
- UI/UX design time for layout and visual design
- A QA cycle with documented test cases
- Technical writing for the user guide
- Estimated cost: $20,000–$50,000+
With AI (this build)
- 5 hours of human prompting, reviewing, and directing
- Existing Claude Pro + Gemini Advanced subscriptions
- No developer, no designer, no consultant hired
- Designed, built, and shipped in under 3 days
- Production quality — documented, security-first, configurable
- Incremental cost: $0
The Implication
The limiting factor is no longer time, budget, or coding skill. It is clarity of thought about what you want to build. If you can describe the problem precisely, AI can build the solution. That changes a lot.
The Product
The Product
So what actually got built? Here is the summary — in exactly 100 words.
AI Risk Map v1.0
A structured, browser-based AI risk assessment suite — two tools in one. The Dashboard guides users through a six-step workflow covering 60 AI risks across 10 families, producing Base, Inherent, and Residual risk ratings from a configurable, deterministic scoring engine. The Config Editor lets risk leads tune the 5×5 matrix, answer weights, survey evidence caps, and control treatment factors to match their organisation's risk appetite. Built entirely in client-side HTML with no server, no installation, and no data transmission. Every assessment stays on the assessor's machine. Deployable by email. Fully auditable. Entirely AI-built.
60 AI Risks
Across 10 families: Model, Data, Security, Governance, Operational, Business & Reputation, Human & Ethical, Monitoring, Agentic AI, and Fail-Safe. Each with a base score, explanation, and mitigation guidance.
241 Controls
Built-in controls mapped to risk codes and classified by effect type: preventive, detective, corrective, governance, and both. Custom controls can be added per assessment.
Three Risk Views
Base (taxonomy starting point), Inherent (after vendor & internal survey evidence), and Residual (after implemented controls). Each visualised in an interactive heat map.
Multiple Exports
JSON (full assessment backup & reload), CSV (risk register), Markdown (governance report narrative), and PNG (heat map image for board packs and presentations).
The Takeaway
If you have a business problem and some tokens — you can ship. The question worth asking is not
"can AI build this?" It is
"what do I actually need?" Answer that clearly, and the rest follows faster than you expect.