Howardism · Vol. 03Plate II · No. 02
Product & Org, in order.
Notes6DomainProduct & OrgOpen Qs17Newest23 May 2026Oldest6 May 2026
Product cadence, org design, and the AI-native team.
Map of Content for the product-org domain — 6 concepts. Curated entry point; see Home for all domains.
- AI Native Product Cadence (hub) — Cat Wu's 6mo→1mo→1day cadence at Anthropic: research-preview branding, mission-as-tiebreaker, evergreen launch room, lighter PRDs, weekly metrics readouts
- Dogfooding as Product Discipline — Product sense is built by relentless first-hand use ("ant food"); Mr. Peanut catch; cross-source (Cat Wu vibe-checks, Glasgow founder-led sales)
- Engineer PM Convergence — Generalists across disciplines; product taste as bottleneck skill; Anthropic Claude Code team as case study; "just do things" cultural substrate
- Evals as Product Spec — Cat Wu's framing of evals as the emerging core PM skill: ten great evals beats a hundred mediocre; encode what done looks like for ambiguous AI features; companion to introspection (hypothesis) and vibe-check (direction)
- Managers as ICs — Every Claude Code manager starts as an IC; flat org; agentic coding collapsed the onboarding cost that pushed managers out of the codebase
- Model Introspection Feedback — Cat Wu's underrated technique: ask the model why it failed; treat answer as harness-debugging signal not model criticism; caveats around model self-report fidelity
Open questions 17 open
- AI Native Product Cadence
- Does the cadence scale beyond ~100 people? Anthropic itself is bigger (~30-40 PMs alone), but the [[claude-code]] team that visibly drives cadence is small.
- What's the equivalent of research-preview branding for B2B enterprise launches where customers expect stability? Cat doesn't address.
- How much of the cadence is structural (process choices) vs cultural (talent density)? Probably both, ratio unclear.
- Dogfooding as Product Discipline
- Dogfooding works when the team *is* the user (Claude Code) or near it (Cat Wu, Boris). How do you build product sense for users very unlike you — does "talk to customers" fully substitute, as Glasgow/Fung's small-business work suggests?
- Can dogfooding scale, or does it implicitly cap how large an AI-native product org can stay taste-driven before it reverts to dashboards?
- Engineer PM Convergence
- Does this scale beyond ~50-person Claude Code-style teams? Boris hedges: "I think this is going to be a question for years."
- What happens to formal PM career ladders in companies where engineers do PM work? Open at Anthropic per Cat.
- Cross-disciplinary generalist is a hiring bar — where does the supply come from? Career changers, or new-grad bias toward AI-native education?
- Evals as Product Spec
- How do you write an eval for *taste*-driven features like [[claude-character-as-product|character]]? Amanda's role is canonical for being eval-resistant; Cat names her as someone who *is* good at evals here, but doesn't describe the technique.
- The 10-vs-100 number is given without justification. Is there a Goldilocks zone, or does it depend on feature surface area? [[client-side-agent-optimization]]'s framing of combos suggests evals also have a combinatorial explosion problem.
- How do evals interact with [[harness-shrinkage-as-models-improve]]? When a harness asset shrinks because the model now handles it natively, the evals built around the old harness may become artifacts rather than guardrails. Does Anthropic retire evals or repurpose them?
- Is there a single non-Anthropic example of a PM-as-eval-writer to cite, or is this currently a Cat-Wu-singular framing? The Matt Pocock workshop reaches the same place from a different vocabulary, but no third source has been ingested yet.
- Managers as ICs
- Fung's own open question: "Do you still need separate iOS and Android orgs?" — if engineers flex across platforms via Claude, the traditional platform-split org may dissolve too. How far does flattening go?
- Does manager-as-IC scale past a certain org size, or only work while Claude Code is small and the codebase is Claude-legible?
- Model Introspection Feedback
- How reliable are 4.7-class introspective reports? Anthropic's interpretability research suggests partial fidelity but not full. Empirically, Cat reports it's good enough to drive harness fixes — but unclear at what model scale this technique becomes load-bearing.
- Does adversarial introspection ("why did you fail?") yield different signal than neutral ("walk me through your reasoning")? Worth probing.
- Could a meta-agent run introspection automatically against logged failures? Sounds tractable but no public implementation.