Howardism

Question#

What's the best way to learn and co-work with AI models and services nowadays — especially for a software engineer who may no longer require heavy coding tasks in a new AI era? Create guidelines for individuals to learn and develop skills for upcoming changes.

TL;DR#

Coding skill is becoming the baseline, not the differentiator. The job is migrating from writing code to deciding what to build, designing the environment the agent works in, and verifying the output. Six skill clusters earn their tokens in 2026 and beyond:

Product taste — picking the right thing to build (see Engineer PM Convergence, Printing Press Software Democratization)
Harness engineering — designing the scaffolding around the model (see Agent Harness Engineering, Claude Code Best Practices)
Alignment-first planning — reach shared design concept before any artifact (see Design Concept Grilling, Vertical Slice Tracer Bullets)
Architecture for agents — codebase shape that conserves model attention (see Deep Modules for Agents, Context Window Smart Zone)
Verification & review — mechanical feedback loops + fresh-context review (see Agent Loop Pattern, Harness Shrinkage as Models Improve)
Strategic positioning — pick moats AI doesn't dissolve (see Seven Powers Applied to AI)

The frame is co-worker, not tool: interview the model, treat its failures as harness signals, build for the model six months out, prune crutches every release. Soft skills (judgment, EQ, taste) and domain knowledge become more valuable, not less.

I. The mindset shift#

From "I write code" to "I decide what gets built and verify it works"#

Boris Cherny's printing-press analogy frames this directly: software-writing is at the same democratization inflection literacy hit in 1400 (Printing Press Software Democratization). Cost of production collapses; what you build for whom becomes the differentiator. Boris's claim — "the best person to write accounting software is a really good accountant, not an engineer, because they know the domain really well and coding is the easy part" — is a directive: invest in domain depth, not coding cleverness.

Cat Wu's blunter version: "As code becomes much cheaper to write, the thing that becomes more valuable is deciding what to write" (Engineer PM Convergence).

Implications for an individual engineer:

Hire-yourself bar shifts to taste. Cat's hiring bias at Claude Code is "engineers with great product taste." If you can't articulate why feature X matters more than feature Y, that gap is now your bottleneck — not your TypeScript.
Domain depth compounds. A backend engineer who deeply understands clinical workflows beats a senior engineer with no domain. Pick a domain. Stay long enough to know its tacit constraints.
Cross-disciplinary range matters more than vertical depth. Cat reports every functional role on the Claude Code team codes — designers ship code, PMs ship code, data scientist codes (Engineer PM Convergence). Reverse direction: engineers who can also do design, PM, or data work compound their leverage.

From "tool I drive" to "co-worker I interview"#

The most underrated technique Cat Wu names: when the agent does something wrong, ask it why (Model Introspection Feedback). Don't re-prompt with corrections. Read the model's account of its own reasoning, then fix the harness — not the model — based on what surfaces.

The reframe: the model's behavior is a function of the harness; the failure is information about the harness. Your job is to design an environment where the model can succeed, not to make the model smarter.

Internalize the constraint: smart zone, not 1M tokens#

Matt Pocock (citing Dex Hardy) frames the hardest constraint: LLMs degrade quadratically with context size because attention is O(n²). The first ~100K tokens are the smart zone; beyond that the model "gets dumber and dumber" regardless of advertised window (Context Window Smart Zone). 1M-token windows shipped in 2026 "just shipped a lot more dumb zone" — useful for retrieval, not reasoning.

Practical: every minute spent learning to manage context budget pays back tenfold. Status-line token counters are essential, not optional.

II. Six skill clusters to develop#

1. Product taste#

What it is: the ability to pick the right thing to build and recognize when a response is on-character or off-character.

How to develop it:

Ship things, get feedback, iterate fast. AI Native Product Cadence reports Anthropic's Claude Code team going from "see user feedback on Twitter to shipped product by end of week" — the loop tightness is how taste calibrates.
Maintain a "what would I build differently?" file. When you use a product, note what's wrong and what you'd do instead. Compare your judgment to what the team actually shipped six months later.
Practice character work. Claude Character as Product shows character (low-ego, lighthearted, bias-toward-action, honest feedback) is real product surface. Try to articulate why a given AI response feels right or wrong — that's the same eval skill in miniature.
Lunchtime vibe-checks. Cat Wu runs team lunches asking each member "what is your vibe on the model?" before looking at metrics. Qualitative-first, data-second is a discipline you can practice on every model release.

2. Harness engineering#

What it is: designing the scaffolding around the agent — context files, skills, hooks, subagents, permission classifiers, mechanical verifiers (Agent Harness Engineering).

How to develop it:

Build a CLAUDE.md / AGENTS.md for every project you own. Treat it like code: review when things go wrong, prune ruthlessly, keep it as a table of contents pointing at deeper docs (Claude Code Best Practices). 250K-token system prompts push the model into the dumb zone before it does anything.
Practice push-vs-pull discipline (Deep Modules for Agents): always-in-context (CLAUDE.md, system prompt) for reviewer agents who need standards to compare against; on-demand skills for implementer agents.
Run the introspection-debugging loop. When an agent fails, ask it why, then fix the harness — not the model.
Read your own system prompt at every model launch. Cat Wu's discipline at Claude Code: "We read through the entire system prompt and reflect on, for each section, does the model really need this reminder anymore? If not, remove it." Most teams only add — subtract on cadence (Harness Shrinkage as Models Improve).

3. Alignment-first planning#

What it is: reaching shared understanding (the "design concept" in Frederick Brooks's sense) before any artifact. The output of grilling is alignment; PRDs and plans are downstream (Design Concept Grilling).

How to develop it:

Adopt a grill-me discipline. Matt Pocock's skill, verbatim: "Interview me relentlessly about every aspect of this plan until we reach a shared understanding. Walk down each branch of the decision tree, resolving dependencies one by one. For each question provide your recommended answer. Ask the questions one at a time." Use this on yourself before writing PRDs.
Reject specs-to-code as vibe coding. Pocock's strong claim: writing a careful spec, handing it to AI, and refusing to look at the code is vibe coding by another name. The code is the battleground, not the spec (Design Concept Grilling).
Slice vertically, not horizontally. Don't do "all schema → all services → all UI." Do "thin slice through every layer, end-to-end, then the next slice" (Vertical Slice Tracer Bullets). Agents default to horizontal — push back actively.
Build a Kanban with explicit blocking edges, not a phase plan. A numbered phase list locks one agent into sequential execution; a Kanban with blocked-by: lets multiple agents drain it in parallel (Agent Loop Pattern).

4. Architecture for agents#

What it is: codebase shape that lets agents work effectively — deep modules, clear test boundaries, conserved smart-zone budget (Deep Modules for Agents).

How to develop it:

Internalize Ousterhout's deep-vs-shallow distinction. Deep module = small interface, large behavior, one natural test boundary. Shallow module = many small files, dense graph, unclear boundaries. Agents drift toward shallow by default; push back.
Keep a module map in your PRD. When planning, name the modules to be modified explicitly. This connects planning to architecture and prevents the agent from inventing new shallow modules instead of extending existing deep ones.
Run periodic refactor passes that consolidate. Pocock's improve-code-base-architecture skill scans for clusters of related shallow modules and proposes deepening them. Schedule this work — it doesn't happen on its own.
Reviewer in fresh context. If implementation used 80K tokens of smart zone, a same-context reviewer reads the diff in the dumb zone. Clear and review fresh (Deep Modules for Agents, Context Window Smart Zone).
Pair with model selection. Matt Pocock: Sonnet for implementation, Opus for review — "I need the smarts then."

5. Verification and review#

What it is: mechanical feedback loops (tests, types, linters, lints-as-instructions) that set the ceiling on what loops can do. Without good verification, you are coding blind (Agent Loop Pattern, Agent Harness Engineering).

How to develop it:

Treat tests/types/linters as the ceiling. Matt Pocock: "If your code base doesn't have feedback loops, you're never ever ever going to get decent AI output. The quality of your feedback loops influences how good your AI can code. That is the ceiling." Invest in this infrastructure before scaling agent use.
Write lint error messages as remediation instructions. OpenAI's Codex team writes lint error messages as instructions injected directly into agent context — the agent reads the lint output and knows how to fix it (Agent Harness Engineering).
Adopt the AFK vs human-in-loop split (Agent Loop Pattern). AFK tasks (implementation, refactoring, doc gardening, CI healing) are loop-eligible. Human-in-loop tasks (alignment, design choices, prioritization, QA) are not. Trying to loop human-in-loop work produces drift.
Prepare for the new bottleneck: review. Matt Pocock's confession and Cat Wu's same observation: when agents ship more code, humans review more code. The unsolved 2026 problem. Develop your code-review fluency now — it's the durable skill loops can't replace.

6. Strategic positioning#

What it is: choosing problems and moats that survive the AI shift, not ones that erode under it (Seven Powers Applied to AI).

How to develop it:

Audit the moat of any business / project / role you bet on. Process power and switching costs erode under AI; network effects, scale economies, and cornered resources persist. Counter-positioning amplifies — startups can choose business models incumbents structurally can't.
At the personal-career level, the same logic applies. "I have 15 years of process knowledge nobody else has" is process-power that AI is now hill-climbing. "I have a network of trusted relationships in this niche" is network effects that AI doesn't replicate.
Build AI-native from day one. Boris Cherny: a startup builds AI-native; an incumbent has to retrain people, change processes, overcome internal resistance. The same applies to your individual workflow — rebuild your habits AI-native rather than bolt AI onto pre-AI workflows.

III. Daily practices#

Practice	Cadence	Source
Run a `grill-me` session before any non-trivial feature	Per feature	Design Concept Grilling
`/clear` between unrelated tasks	Every task switch	Claude Code Best Practices
Keep a status-line token counter visible	Always	Context Window Smart Zone
Slice work vertically; reject horizontal phasing	Per planning session	Vertical Slice Tracer Bullets
Reviewer agent in fresh context (different model OK)	Per non-trivial diff	Deep Modules for Agents
Read your CLAUDE.md / system prompt at every model release; prune	Per model launch	Harness Shrinkage as Models Improve
Ask the model why it failed before re-prompting	On any unexpected behavior	Model Introspection Feedback
Run AFK loops on Kanban backlogs overnight	Continuously	Agent Loop Pattern
Build for the model six months out, not today's	Strategic horizon	Harness Shrinkage as Models Improve
Lunchtime vibe-check on new model releases	Per model release	Claude Character as Product

IV. Anti-patterns to unlearn#

Anti-pattern	Why it fails	What to do instead
Treating context window as "1M tokens, plenty of room"	Quadratic attention; ~100K smart zone is real	Status-line counter; `/clear` aggressively; subagents for investigation
Adding to system prompt forever, never removing	Crutches accrete; old crutches contradict new model behavior	Prune at every model launch; every section must justify its tokens
Asking agent for a plan before alignment	Agent papers over open questions; rework cost paid in implementation	`grill-me` first; PRD only after alignment
Horizontal layered phases ("all schema, then all service")	No end-to-end feedback until phase 3; mismatches paid late	Vertical slices; tracer-bullet thin paths
Same-context reviewer	Implementer's smart-zone is exhausted; reviewer in dumb zone	Fresh context for review; consider stronger model for review
Specs-to-code without engaging the code	"Vibe coding by another name" — feedback loop runs through wrong layer	Stay in the code; specs are downstream of alignment
Looping human-in-loop work	Agent makes plausible-but-wrong calls; drift accumulates	AFK tagging; human-in-loop tasks stay synchronous
"Bigger model = no design needed"	Bad codebases produce bad agents regardless of model size	Deep modules; mechanical verification
Treating model failure as "model is dumb"	Misses signal about harness gaps	Introspect: ask the model why; fix harness
Defending switching-cost / process-power moats	These erode under AI	Pivot to network effects / scale / cornered resources / counter-positioning

V. What stays human#

Cat Wu explicitly names what isn't merging into the model: tacit, common-sense, EQ-heavy work — knowing the right venue to communicate with stakeholders, sensing when a launch is ready, knowing what counts as a fair trade-off (Engineer PM Convergence). Humans still provide the connective tissue across a launch.

Concretely durable human skills:

Code review fluency — the new bottleneck once agents ship faster (AI Native Product Cadence, Matt Pocock confession)
Convicted articulation — Amanda's character-work skill: saying why a given output is on-character or off-character with conviction (Claude Character as Product)
Cross-functional EQ — knowing when to escalate, what the right venue is, how to read a stakeholder's reluctance
Mission/values clarity as tiebreaker — Cat: "If there's two competing priorities, we'll talk about which one is more important for Anthropic's mission." Removes coordination cost (AI Native Product Cadence)
Domain depth — the accountant who can now write accounting software beats the engineer with no accounting context (Printing Press Software Democratization)

VI. A 90-day learning plan#

Days 1–14 — Get fluent in the harness.

Set up Claude Code or equivalent with status-line token counter
Write a CLAUDE.md / AGENTS.md for one project; prune it weekly
Practice /clear between tasks; observe how it changes output quality
Read Claude Code Best Practices, Agent Harness Engineering, and the source raws

Days 15–30 — Adopt alignment-first planning.

Install or write a grill-me skill; use it before any feature
Slice your next two features vertically; resist horizontal layering
Convert your todo list into a Kanban with blocked-by: edges

Days 31–60 — Build mechanical feedback infrastructure.

Add or strengthen tests/types/linters in one project until they catch agent drift
Write lint error messages as remediation instructions
Set up reviewer-in-fresh-context for non-trivial diffs (different model preferred)

Days 61–90 — Run AFK loops; develop product taste.

Set up a Ralph loop or /loop cron on your Kanban backlog overnight
Keep a "what would I build differently?" journal for products you use
Practice introspection-debugging: when the agent fails, ask why, fix harness
Audit the moat of one business / project / domain you care about against Seven Powers Applied to AI

VII. Source confidence and gaps#

High confidence: smart-zone framing, harness shrinkage, vertical slicing, deep modules, AFK/human-in-loop split, introspection technique. Multiple converging sources from inside Anthropic and from independent practitioners (Matt Pocock).
Medium confidence: 100-line Claude Code prediction (hyperbolic by Boris Cherny's own framing); printing-press analogy timeline (faster than 50 years, exact rate uncertain); product-taste-as-bottleneck (true at small Anthropic-style teams, scaling unclear).
Open questions: How much of Anthropic's cadence is process vs talent density? Does engineer-PM convergence scale beyond ~50-person teams? How reliable are 4.7-class introspection reports? When does a stronger model render the harness unnecessary entirely vs requiring different harness?

The wiki source set leans heavily on Anthropic's own narrative and one independent practitioner (Matt Pocock). Treat as well-grounded for individual workflow guidance, less battle-tested for organization-scale deployment.

Sources#

Engineer PM Convergence — roles merging at Anthropic; product taste as bottleneck
Printing Press Software Democratization — Boris Cherny's macro analogy
Harness Shrinkage as Models Improve — pruning at every launch; build for next model
Agent Loop Pattern — /loop, Ralph loop, Sandcastle; AFK vs human-in-loop
Context Window Smart Zone — quadratic attention; 100K marker; clear-and-restart
Vertical Slice Tracer Bullets — vertical > horizontal; Kanban over phase plans
Design Concept Grilling — grill-me; alignment before artifact
Deep Modules for Agents — Ousterhout for agent codebases; push vs pull
Model Introspection Feedback — ask the model why it failed
AI Native Product Cadence — 6mo→1mo→1day; mission as tiebreaker
Claude Character as Product — character work; vibe-check eval discipline
Claude Code Best Practices — explore→plan→code, environment config, scaling
Agent Harness Engineering — invariants not implementations; AGENTS.md as ToC
Seven Powers Applied to AI — which moats survive AI; counter-positioning amplified

Learning to Co-Work with AI: A Software Engineer's Field Guide

Question#

TL;DR#

I. The mindset shift#

From "I write code" to "I decide what gets built and verify it works"#

From "tool I drive" to "co-worker I interview"#

Internalize the constraint: smart zone, not 1M tokens#

II. Six skill clusters to develop#

1. Product taste#

2. Harness engineering#

3. Alignment-first planning#

4. Architecture for agents#

5. Verification and review#

6. Strategic positioning#

III. Daily practices#

IV. Anti-patterns to unlearn#

V. What stays human#

VI. A 90-day learning plan#

VII. Source confidence and gaps#

Sources#

Raw documents#