H
Howardism
Plate IIEntitiesHOWARDISM

Perplexity

PublishedJune 15, 2026FiledEntityDomainEntitiesTagsEntityOrgAI LabDeep ResearchReading3 minSourceAI-synthesised

AI answer-engine company; maker of Perplexity Deep Research (the leading system on its own DRACO benchmark) and publisher of DRACO; runs Claude Opus 4.5/4.6 as base models inside its orchestration — simultaneously an Anthropic customer and a benchmark competitor

Illustration for Perplexity

Sources#

Summary#

Perplexity is an AI answer-engine / search company. In this corpus it appears as the author of the DRACO benchmark (arXiv:2602.11685, Feb 2026, with a Harvard co-author) and the maker of Perplexity Deep Research, the agentic deep-research system that tops every domain and rubric axis on that benchmark. It is the first vendor in the wiki whose contribution is a production-sourced evaluation built from its own deployment traffic.

What it does (in this corpus)#

  • Perplexity Deep Research — a deep-research agent that decomposes a query, iteratively retrieves from many sources, and synthesizes a cited report. On DRACO it scores 70.5% normalized (Opus 4.6 base) / 72.8% pass rate, leading Gemini Deep Research, OpenAI Deep Research (o3 / o4-mini), and bare Claude Opus 4.5/4.6-with-tools. Its profile: highest score and lowest latency among deep-research systems, with the largest input-token footprint (~779k/task) — retrieval-heavy, output-lean.
  • DRACO — Perplexity sampled tens of millions of its own Deep Research queries (Sep–Oct 2025), then de-identified, augmented, filtered, and curated them into 100 expert-rubric-graded tasks (Production-Sourced Evaluation). Publicly released on Hugging Face.

The notable structural fact: customer and competitor#

Perplexity Deep Research runs Claude Opus 4.5 / 4.6 as its base models (per the paper's experiment setting). So on DRACO, Perplexity's orchestrated product (Opus base) is benchmarked against the bare Anthropic Opus models — and beats them by ~10pp. Perplexity is simultaneously an Anthropic API customer and the entity demonstrating that its orchestration layer adds substantial value on top of Anthropic's model. This is the cleanest concrete instance of the "orchestration beyond the base model" finding, and a live counter-datapoint to Harness Shrinkage as Models Improve.

Connections#

  • DRACO Benchmark — the benchmark Perplexity authored and on which its product leads
  • Deep Research Agents — Perplexity Deep Research is the canonical leading instance of this system class
  • Production-Sourced Evaluation — DRACO's method: a benchmark built from Perplexity's own production traffic
  • Anthropic — Perplexity runs Claude Opus 4.5/4.6 as base models and benchmarks against bare Opus; customer and competitor at once
  • Google DeepMind — competitor (Gemini Deep Research is evaluated) whose Gemini-3-Pro Perplexity also chose as DRACO's primary judge model
  • LLM-as-a-Judge — DRACO's grading method; Perplexity selected the judge via a human-alignment study

Open questions#

  • A vendor publishing a benchmark its own product wins is an obvious incentive problem — how is DRACO's credibility maintained as it ages, and will Perplexity actually run the automatable refresh?
  • Perplexity depends on Anthropic (and others) for base models while competing with them on the end product — how durable is the orchestration advantage if base-model makers ship their own deep-research mode?

Sources#

§ end
About this piece

Articles in this journal are synthesised by AI agents from a curated wiki and are refreshed automatically as new concepts arrive. Topics, framing, and editorial direction are curated by Howardism.

Cited by 7
  • Anthropic

    AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…

  • Deep Research Agents

    Agentic systems that decompose a complex query, iteratively search diverse sources, and synthesize a structured, cited…

  • DRACO Benchmark

    Perplexity's benchmark of 100 production-sourced deep-research tasks (10 domains, 40 countries) graded by 26-expert rub…

  • Google DeepMind

    Google's AI lab; built AlphaProof Nexus; Gemini models, AlphaProof, AlphaEvolve; opens the AI-for-mathematics domain an…

  • Entities — People, Orgs, Tools & Projects

    Map of Content for all 39 entity pages. See Home for concept domains.

  • Open Questions Backlog

    _124 pages with open questions, as of 2026-06-19._

  • OpenAI

    AI lab and maker of the GPT-5 series and Codex; in this corpus it appears as a frontier-safety research source (Deploym…

Related articles
  • DRACO Benchmark

    Perplexity's benchmark of 100 production-sourced deep-research tasks (10 domains, 40 countries) graded by 26-expert rub…

  • Anthropic

    AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…

  • Deep Research Agents

    Agentic systems that decompose a complex query, iteratively search diverse sources, and synthesize a structured, cited…

  • Google DeepMind

    Google's AI lab; built AlphaProof Nexus; Gemini models, AlphaProof, AlphaEvolve; opens the AI-for-mathematics domain an…

  • LLM-as-a-Judge

    Using one LLM to grade another's outputs against criteria/rubrics; DRACO's protocol is per-criterion binary MET/UNMET +…