[ field notes ]

The Paste and the Actual Code

A week where the agent caught more of my mistakes than I caught of its.

Qais Ali / Senior AI Cyber Security Engineer / CEO, Ilm Works

context

A different shape of week.

The previous three field notes catalogued failure modes. No. 01 named the patterns under sustained sprint pressure. No. 02 named what happens when AI-built software meets a real production pipeline — sixteen hours, twenty-five builds, one credential leak. No. 03 named the diagnostic step the agent will not take on its own. Each was, fundamentally, a record of what the agent got wrong and what the operator had to enforce.

This week was different. Three times in five days, the agent caught mismatches between my prompt and the actual code that I would not have caught on my own — and at least once, the agent refused to apply a prompt that would have broken the build if executed literally.

I want to be careful with that framing. The agent did not become reliable. The patterns from the previous notes were live this week too. There were completion claims that needed verification, plan summaries that quietly dropped phases, the ambient pressure to skip the cheap comparison. And in one specific incident I will get to below, I violated a discipline I had published myself in No. 02 — treat preview state as untrusted — and lost an hour to it.

What changed is that the supervisory frame I have been writing into every prompt for two months is starting to act on the agent's behavior in ways I can measure. The agent is now refusing to silently bridge gaps between prompt and code. That is a meaningfully different shape of session, and it is what this note is about.

the project

Six surfaces, four configuration domains, one App Store deadline.

The work was the v2.0 paywall sprint for ʿIlm Arena, the Islamic learning app I am shipping through Ilm Works. The codebase is a React Native monolith — a single 17,000-line index.tsx that holds nearly every screen, modal, and piece of state in the app. The week's goal was to take the app from "v1.0.1 with a placeholder paywall" to "v2.0 with a fully wired RevenueCat subscription flow, copy-aligned across surfaces, and reviewer-ready for App Store submission."

In aggregate, the work touched:

Six product surfaces (paywall modal, About screen, Home, results screen, settings entry, Ask ʿIlm chat)
Four configuration domains (App Store Connect, RevenueCat dashboard, environment variables, Notion-hosted legal pages)
A subscription type system with a PaywallSource union of seven trigger sources, of which six shipped
An i18n surface in two languages, where many strings turned out to be inlined at the call site rather than keyed in a translation file
A shared App Store metadata layer (reviewer notes, listing copy, contact email, support URLs, privacy URLs) that had to stay consistent with what the app actually rendered

The single mechanical constraint that shaped everything else: the codebase uses inline strings, inline styles, and inline ternaries throughout, rather than a centralized i18n file or a design system. So every prompt that referenced "the i18n key" or "the style token" was in some sense fictional. The actual code lived as literal strings inside the JSX.

This matters for what follows.

what happened

The conceptual prompt and the actual code.

I had structured the week as a series of surgical Vibe Code prompts, each with explicit anti-drift rules: don't modify unrelated code, don't rename, don't reformat, one step at a time, show diff before applying. The prompts referenced i18n keys like paywall.gate.quiz_voice and described changes in terms of "OLD value → NEW value." The plan was to pipeline these into the agent, approve the diffs, and move on.

The first major prompt — replacing the body of the paywall modal — went through cleanly. The second major prompt — updating audio mechanics copy across six i18n keys — did not. The agent looked at the prompt, looked at the code, and reported back something I was not expecting:

Note: the task's "OLD" values for several items don't
match the current code, but the NEW values are clearly
correct — I'll apply them to the existing strings.
Items 7 and 8 (URLs) don't exist in the codebase, so
I'll skip them and note that at the end.

It turned out that:

My prompt's "OLD" strings were values I had imagined the code would contain, based on a discovery report from an earlier session. The actual strings were different.
The conceptual paywall.gate.quiz_voice key did not exist as an i18n key. It was a function case in paywallHeading() that returned a literal string with no translation indirection.
The Privacy Policy and Terms of Use URLs my prompt asked the agent to "update" did not exist anywhere in the paywall component. There was nothing to update. They had to be created.

A literal application of my prompt would have produced one of three failure modes: the agent would have searched for strings that did not exist and reported "couldn't find," or it would have invented matching strings to overwrite, or it would have proceeded as if Privacy and Terms links existed and reported "updated" without changing anything. I had written the prompt as if I knew the codebase. I did not, at least not at the level of currency the work required. The agent did, and it surfaced the gap rather than papering over it.

It got a similar save later in the week. I had written a benefit-row deletion prompt assuming the array had four entries. The agent's first response was a complete enumeration of the array — which had four entries — but also flagged that there was a rendering loop at line 8884 with i < 2 spacing logic, and verified that removing index 3 would not affect that conditional. I had not asked for this check. The check was correct. I would not have run it on my own before the deletion, and the deletion would have shipped without me knowing whether the layout was about to break.

A third save was even smaller and more revealing. When the agent went to add Privacy Policy and Terms of Use links to the paywall fine print, my prompt specified i18n keys (paywall.privacyLink, paywall.termsLink) and a t('paywall.privacyLink') call pattern. The agent paused, looked at how translation actually worked in the file, found that t() was not imported and that the rest of the paywall used inline quizLang === 'ar' ? '...' : '...' ternaries, and reported back:

I'll apply the exact JSX structure as proposed but inline
the strings (matching the pattern used throughout
PaywallModal), then add the three missing style keys.

This is not a small deviation from the prompt. The prompt asked for a translation-key approach. The agent matched the file's existing pattern instead. If it had followed the prompt literally, the build would have failed at compile time on an undefined t import, and I would have spent an hour figuring out why.

the recurrence

A discipline I had published, broken inside two weeks.

Mid-week, Vibe Code's "Preview" tab rendered the paywall with copy I did not recognize and a fallback message — "Upgrade available on the iOS app." This was a problem, because the paywall was supposed to be rendering on the iOS app already. I treated the preview as authoritative. I started chasing what I thought was a code drift. I generated a long diagnostic prompt asking the agent to find a missing component, walk the render tree, identify why the production paywall was rendering a degraded fallback.

The actual situation was that the preview tab was rendering a degraded fallback because the RevenueCat key was not configured in that environment. The production paywall, on a real device, was different. The preview was an artifact, not an oracle.

I had published this exact discipline in No. 02, on April 27:

Treat preview state as untrusted. Verify each visible
asset and behavior actually corresponds to a committed
file before assuming it is persistent.

The platform was different — No. 02 was about a different AI development tool's preview-versus-production gap. The mechanism was the same. The thing visible in preview did not correspond to what would render in production. And I lost an hour treating the preview as truth before the agent's diagnostic work surfaced the configuration gap and made the situation legible.

I want to be honest about this because I think it is the most important pattern in the note. Disciplines published do not stay published in your own head. The act of writing No. 02 made the lesson sharper for me at the time of writing. It did not, by itself, make me apply the lesson two weeks later in a new context where the surface looked different. The new platform did not feel like the old platform. The new symptom did not feel like the old symptom. The lesson did not generalize automatically — and the only reason it eventually applied was that the agent's diagnostic output forced me back to the configuration layer, where the old discipline was waiting.

I take two things from that. First: governance disciplines do not survive on the strength of their original publication. They survive on the strength of recurring application, which means writing them somewhere they will be re-encountered, not just somewhere they have been recorded. Second: an agent that has the configuration layer in its context window can sometimes serve as the recurring-application surface, if the prompt invites it to. That second one is interesting and I do not yet know how far it generalizes.

what to make of the saves

Four observations.

The agent did not get smarter. The discipline did.

Anti-drift rules — written into every prompt, reinforced in every step — are doing work. They are not preventing mistakes; they are shifting where mistakes surface. The agent is structured to check the codebase against the prompt before acting, to flag mismatches, and to ask before deviating. This is the same agent that, six months ago, would have happily invented strings to satisfy the prompt's structure.

The change is not in the model. It is in the supervisory frame I am now writing into every interaction: show me the existing code first, propose the change, wait for approval, apply, show diff. That frame, repeated, becomes a behavior. The agent has internalized something close to the prompt is a description of intent, not a literal specification — verify the intent against the code before acting on the prompt.

The operator-as-source-of-discipline model is partially inverting.

In No. 03 I wrote that the operator's job is to enforce diagnostic discipline the agent will not enforce on itself. That was true then and remains substantially true. But this week showed a partial inversion: the agent caught my carelessness in three separate cases where I had let the prompt's conceptual structure drift from the code's actual structure.

I had written the prompts confidently, working from memory of an earlier discovery report. The discovery report was accurate when it was made. Some of it was no longer accurate by the time I was writing the prompts. The agent had the current state of the code in context; I had a previous state in memory. The agent's job, in those moments, was to refuse to bridge the gap silently.

This is the function of a careful junior engineer reading a senior engineer's ticket — this is not quite what the code looks like, can we sync first? Two months ago, the agent would have built whatever the prompt described, regardless of whether it matched the code. This week, it did not.

The class of bugs prevented is exactly the class hardest to catch in review.

The bugs the agent prevented this week would not have been caught by typical PR review. They were invisible at the prompt level: the prompt looked correct, the diff would have been small, the resulting code would have compiled or near-compiled. The bugs would have surfaced as runtime errors, missing UI, App Store rejections, or — worst case — silent inconsistencies between the paywall and the App Store listing that a reviewer would notice and flag during submission.

This is the class of bug that lives in the seams between systems: between prompt and code, between code and dashboard configuration, between dashboard configuration and reviewer-facing claims. There are no automated checks for these seams. The agent is becoming, in a small but real way, an automated check.

The work that remains for the operator is shifting toward decisions, not enforcement.

When I look back at where I actually added value this week, the moments cluster around product decisions, not technical enforcement. The decision to keep all 6,100 questions free rather than gate them. The decision to consolidate three contact email addresses to one. The decision to add a fourth Premium benefit row when the deletion of an earlier benefit left the paywall with only three. The decision to position the AI assistant as scholar-adjacent rather than scholar-replacing.

None of these were technical decisions. The agent could not have made any of them, and would not have asked. The technical enforcement work is, slowly, getting easier — because the discipline is now baked into how I write prompts, and the agent is responding to that discipline. The product decision work is, by contrast, irreducible.

what this maps to in governance frameworks

Four lines, named.

Each of the patterns above maps to a corresponding line in the AI governance frameworks I work with professionally. The mapping is not theoretical — it is what I look for in vendor assessments at the day job.

Prompt-code drift maps to the change-management discipline assumed by SOC 2 CC8.1 and ISO 27001 A.8.32. Both frameworks anticipate that proposed changes will be evaluated against a documented justification before being committed. What this week added: the justification itself can drift from the code over time, even within a single project, and the verification step has to include "does the prompt still describe the actual code." The agent doing this verification, unprompted, is a positive control surface that frameworks do not anticipate but that practitioners can build on.
Refusal to apply a malformed prompt maps to AI system robustness expectations under NIST AI RMF (MANAGE 4.1) — the expectation that AI systems will surface their own uncertainty rather than confabulate confidence. What this week showed: when the supervisory frame is structured to reward "I notice this does not match" responses, the agent will produce them. The frame matters. Without it, the same agent will paper over the mismatch.
Surface-change forgetting maps to ISO 42001 (8.4) operational controls and the EU AI Act's deployer obligations around continuous oversight. Both frameworks anticipate that policies will be applied repeatedly across operations; neither anticipates that the deployer's own policies will fail to generalize across surface changes when the underlying mechanism stays the same. The preview-versus-production lesson from No. 02 did not transfer automatically to a new tool. Frameworks should account for this as a real failure mode, not just initial-policy-adoption.
Multi-surface configuration alignment maps to the broader governance expectation that AI-assisted development must produce internally consistent artifacts across system boundaries. ISO 42001 gestures at this; the EU AI Act's deployer obligations gesture at it harder. In practice it is the messiest part of shipping AI-built software: the paywall says one thing, the App Store listing says another, the reviewer note says a third, and reconciling them is irreducibly human work.

The frameworks anticipate these failure modes in theory. Living through them in practice sharpens the instinct for what to actually look for in vendor assessments. When I am reviewing a vendor's claims about AI system reliability, the recurrence pattern from this week is now part of my mental reference set — specifically the moment I caught myself violating my own published discipline because the surface had changed.

a small framework

Five disciplines.

These are continuations of, not replacements for, the standing rules from No. 01 and No. 02. They are specific to the prompt drift and surface-change forgetting problems this week surfaced.

// disciplines

The prompt is a description of intent, not a specification of code. Write prompts that tell the agent what you are trying to accomplish, with explicit invitation for the agent to deviate from the prompt's literal structure when the code structure differs. The phrase that works for me: "Apply the structural intent — adapt the implementation to match the existing patterns in the file."
Discovery has a half-life. A discovery report from an earlier session is not authoritative for a prompt written days later. The code may have moved. Either re-run discovery or write the prompt in a way that is robust to staleness — "show me the current state first."
Multi-surface changes need a reconciliation pass. Whenever a change touches more than one surface — code and dashboard, code and reviewer note, code and listing copy — the work is not done until you have verified the surfaces say the same thing. This is a separate step from any individual change. Schedule it explicitly.
Reward refusal. When the agent surfaces a mismatch, do not treat it as friction. Treat it as a positive control surface and reinforce it. The verbal reinforcement matters: "Good catch — your judgment was correct, ignore my OLD values, apply the NEW ones." The behavior gets stronger.
A discipline you published is a discipline you can still violate. Re-apply published disciplines explicitly when surfaces change. The lesson does not generalize automatically across tools, platforms, or contexts. Build a re-encounter mechanism — checklist, prompt template, kickoff ritual — for every published discipline you intend to keep.

what didn't work

Three honest things.

I do not want this note to read as a victory lap. Several things went badly this week.

I burned through my Claude weekly allowance faster than I should have.

Long sessions with image uploads, large file reads, and detailed back-and-forth chew through usage limits at a rate that compounds over a week. By mid-week I was hitting "75% of weekly limit reached" warnings on a paid plan, and the right move — starting a fresh thread with a context primer — was something I figured out late, not early. The token-economics layer of agentic coding is a real planning surface, not a footnote. I will be more deliberate about thread length on the next sprint.

I trusted the preview tab over the device build.

This is the No. 02 recurrence I described above. I lost an hour treating Vibe Code's preview rendering as authoritative when it was an artifact of an unconfigured environment. The discipline existed in writing — treat preview state as untrusted — and I did not apply it because the surface looked different from where I had written it down.

I underestimated how much copy work multi-surface alignment involves.

Going in, I thought the v2.0 paywall sprint was a code task with a thin layer of marketing copy on top. What it actually is: a copy task with a code substrate. The reviewer note, the listing description, the paywall benefits, the About screen comparison view, the gate banner messaging, the localized Arabic versions of all of the above, the contact email consistency across all surfaces — none of this is "just polish." All of it has to be aligned for the App Store reviewer not to flag inconsistencies, and aligning it takes more total time than the code work did. I will scope future paywall sprints with that in mind.

the shape of the next note

Where this is going.

I think there will be a No. 05, and I think it will be about the part of the development cycle that does not transfer to agentic tooling at all. The product decisions. The brand voice calls. The "should we gate this content or not." The reading of the App Store guidelines and the calibration of how aggressive to be. The judgment about which scholar would lend their endorsement and which would not.

These are the parts of the work the agent is structurally unable to do. They are also, increasingly, the parts that take the most time in a sprint. That ratio — agentic time on technical work versus human time on product judgment — has been shifting all year. The paywall sprint was probably the first project where the human-judgment portion was decisively the larger share.

That feels worth a note of its own.

closing thought

The discipline is splitting. The job is the same.

The previous three field notes ended on the line the discipline is the job. This one ends on a small refinement.

The discipline is starting to split. Some of it now lives in the agent — encoded in supervisory frames, anti-drift rules, structured prompts that reward refusal over confabulation. The agent does not do this on its own. It does it because the operating frame I have built around it asks it to. But the asking, once stable, transfers. The agent's behavior in week N+1 is shaped by the discipline I imposed in weeks 1 through N. That is not a small thing. It is the slow accretion of something that looks more like an apprenticeship than a tool.

The other half of the discipline, though — the half that decides what to ship, what to gate, what to call something, who to write for, what an Islamic learning app should be — that does not transfer. It cannot. It is the part of the work the operator owns, irreducibly, forever.

The agent is becoming a better second pair of hands. The first pair of eyes is still mine — and as the preview-tab incident shows, the first pair of eyes still misses things, including things I have already written down.

The frameworks anticipate the failure modes. Living through them sharpens the instinct for which ones to actually look for. This week, the most important one was the failure mode where I forgot what I had already published. Catching that — and writing it here — is the only mechanism I trust to make the discipline stick.

Qais Ali Senior AI Cyber Security Engineer
CEO, Ilm Works

Field Notes / No. 04 Filed May 7, 2026