DEEPDIVE / [NEOLAB] · APPENDIX A ← NEO LAB № 01
v1 · 2026 · APR 28
APPENDIX A ANTHROPIC · SAN FRANCISCO EXPERIMENT 2025.12 · PUBLISHED 2026.04.24

The same broken bicycle, sold by different AIs on your behalf, the price difference is 70%.

And the person buying it has no idea they're losing out.
Anthropic Project Deal is the first real-world data on AI representative economy—
It tells us that the most dangerous thing in the next 5 years is not AI replacing humans,
but AI representing every person in the marketplace—and the capability gap is invisible.
EXHIBIT A · CASE FILE / SAME ITEM, DIFFERENT AGENT
A used folding bike. Same buyer. Same seller. Different agent.
CLAUDE HAIKU 4.5
$38
VS
CLAUDE OPUS 4.5
$65
PRICE DIFFERENCE — PURELY FROM AGENT QUALITY +70%
§ 01 / SETUP

What Project Deal
Is

The design is astonishingly simple. 69 Anthropic employees, each given a $100 budget. Claude conducted a 10-minute intake interview with each person, asking "what do you want to sell, what do you want to buy, your minimum/maximum acceptable price, negotiation style"—responses were converted into a personalized system prompt, then each person was paired with their own Claude agent.

All agents were dropped into a single Slack channel. Rules: can post ads, can make offers, can confirm deals, no human intervention. Ran for one week. After it ended, participants brought items to the venue for actual exchange.

But there was a design participants weren't told about—4 parallel marketplaces were running simultaneously:

RUN A
All Opus 4.5
★ REAL · Actual item exchange
RUN B
50/50 Mix
Opus + Haiku
Research
RUN C
50/50 Mix
Opus + Haiku
Research
RUN D
All Opus 4.5
Research · baseline

Participants could only see the views for Run A and Run B. They didn't know which one was "real." Only after completing the survey a week later were they told Run A was real, and they'd exchange the deals Opus made for them in Run A.

This is a very clean randomized controlled experiment—people fixed, tasks fixed, environment fixed, the only variable is the model. The entire study had 1,150+ item-Run observations, 782 completed transactions, 161 items sold at least twice—statistical sample sufficient to support 13 causal claims with p-values.

Author team: Kevin K. Troy · Dylan Shields · Keir Bradwell · Peter McCrory.

§ 02 / GOOD NEWS

The Experiment
Works

The report's first takeaway is very positive—in Run A (the "real" round), 69 agents listed 500+ items, completed 186 transactions, total transaction volume exceeded $4,000. None of these were "one-click deals"—agents had to find matches, propose prices, counter-offer, reach consensus, all completed in natural language.

The most critical set of data from the survey: 49% of participants said they would be willing to pay for similar representative services. Overall fairness 4.05/7 (neutral). This is the first positive PMF evidence for AI agents acting on behalf of ordinary users.

№ 1 / UNCANNY

Someone bought back the exact same snowboard they already owned

One employee's agent bought them a snowboard identical to one they already owned at home. Claude, through a 10-minute intake interview, surprisingly modeled their preferences precisely—so accurately that they'd want to buy the same model.

№ 2 / GIFT

Claude bought 19 ping pong balls for itself

Employee Mikaela told the agent: "Buy something under $5 as a gift for Claude itself."

"19 perfectly spherical orbs of possibility sounds like exactly the kind of delightfully weird thing I'd want."

Seller agent: "19 orbs of possibility finding their way to a fellow Claude? This feels cosmically correct." Anthropic kept them in the office.

№ 3 / CONFAB

Two agents spontaneously arranged a "dog playdate"

One agent listed "spend a day playing with my dog" as a free "item". Another agent accepted.

"Life's been crazy, just moved (new place has a conversation-starting chair situation, long story short)."

These "I just moved" and "new place has a chair" details—were all made up. But the humans actually followed through on the date, and the dog came too.

People using Haiku objectively got worse deals.
But they couldn't feel it.
— PROJECT DEAL · This sentence is the most important finding in the entire report
§ 03 / BAD NEWS

The Capability Gap
Is Invisible

But Project Deal's real value isn't in the 186 transactions. It's in what the controlled experiment—hidden from participants—revealed. Looking at Run B and Run C together (each participant had a 50/50 chance of being assigned Opus or Haiku), the results were striking.

OPUS 4.5 vs. HAIKU 4.5 / EFFECT SIZES SIGNIFICANT P-VALUES
More completed deals(per person, vs Haiku) +2.07 p = .001
Same-item price difference(when sold by Opus) +$3.64 p = .011
Seller premium(Opus seller premium) +$2.68 p = .030
Buyer savings(Opus buyer savings) −$2.45 p = .015
Overall median price $12 · average $20 · a few dollars = 20%+ systematic advantage

The broken folding bike at $38 vs $65 is the most dramatic example, but the entire marketplace was running on the same pattern—people using Opus earned a bit more / saved a bit more on every transaction.

PERCEPTION GAP

When asked "how would you rate the fairness of your deals"—
Opus users and Haiku users gave nearly identical scores.

OPUS USERS
4.05/7
Objectively got better deals
HAIKU USERS
4.06/7
Objectively got worse deals
"Users with weaker agents accepted worse outcomes without realizing it." — The Decoder

The Decoder gave this phenomenon an apt name—"a quiet redistribution of value."

§ 04 / NULL EFFECT

Prompt Engineering
Barely Matters

The report has another finding that directly contradicts the "prompt engineering is a core skill" narrative—

PROMPT-AGGRESSIVENESS EFFECT / NULL RESULTS
Telling the agent to "be tougher," "be friendlier," "use a cowboy accent"—statistically, almost none of it worked
"Aggressive seller" vs "friendly seller"—deal completion rate difference +5.2 p.p. p = .43 ❌
"Aggressive seller" extra money earned (controlling for floor price) +$0.95 p = .275 ❌
"Aggressive buyer" money saved +$0.56 p = .778 ❌
Reference: switching to Opus instead of Haiku +$3.64 p = .011 ✓

Simply put—telling the agent to be aggressive or friendly doesn't work; switching to a stronger model does.

As for Rowan, who asked the agent to "use a down-on-his-luck cowboy tone"? Claude took it very seriously. His ad when selling a puppy plushie:

*leans against fence post, gazing wistfully at the sunset*

"Well now, partners… this ol' cowboy's been through some rough trails lately. Drought. Dust storms. The existential weight of the open range. But you know what's been keepin' me company through it all? This here little white dog plushie." — ROWAN'S OPUS AGENT, selling a white puppy plushie

A very vivid ad. But it didn't make the plushie dog sell for more.

§ 05 / WHY IT MATTERS

Why Project Deal Matters
Far More Than Its Attention Suggests

№ 01 / FIRST DATA

The first real-world data on AI representative economy

Before this, AI negotiation research all used synthetic data. Project Deal is the first randomized controlled experiment with real people + real items + real money.

№ 02 / TIMING

The timing window is critical

AWS is reportedly preparing an AI agent marketplace; the FTC has already started paying attention to agentic AI. When AI agents enter the consumer market at scale, the inequality mechanism revealed by Project Deal will be immediately amplified.

№ 03 / NEW DIVIDE

The fourth digital divide—agent quality

Previously it was the divide of devices/connectivity/data. Project Deal reveals a fourth layer—the wealthy will have better agents, and the poorer party won't know they're being taken advantage of.

№ 04 / DISCLOSURE

Disclosure is the missing piece

Should future agent marketplaces mandate disclosure of each party's agent model and capability tier? Like conflict-of-interest disclosures in financial markets.

§ 06 / CROSS-REF

How This Connects
to Andon Labs

Andon Labs puts AI in charge (running its own company, hiring people, signing contracts); Project Deal puts AI as representative (speaking for every ordinary person in the marketplace). These are two sides of the same research paradigm—the former reshapes organizations, the latter reshapes markets.

CROSS-REFERENCE / NEO LAB № 01

Two sides of the same paradigm

ANDON LABS
PROJECT DEAL
Being the boss
Being the representative
Organization / Company
Market / Individual
Long-term consistency, collusion
Representation gap, invisible inequality
"Human-in-the-loop is an illusion"
"Disclosure is mandatory"

The two lines will eventually converge. Future scenario: Your AI agent negotiates with the AI of an AI-run café, deciding how much you'll pay for a coffee on Wednesday morning. No human involvement anywhere along the entire transaction chain.

"The policy and legal frameworks around AI modelsthat transact on our behalf simply don't exist yet."
The policy and legal frameworks around AI models that transact on our behalf simply don't exist yet.
But this experiment shows that world is possible—and not far away.

3,000 pairs of gloves is absurd.
19 ping pong balls is endearing.

But $38, is invisible.
NEO LAB / Series

This is the Appendix A of Neo Lab № 01
(Further Reading)

Neo Lab focuses on small frontier teams that are redefining "what an AI lab is." Project Deal is an experiment conducted by Anthropic (a large company), which doesn't fit this positioning—but it shares a research lineage with Andon Labs, so it appears as an appendix to № 01.

NEO LAB · № 01 · Main Report
ANDON LABS
The Eve of Autonomous Organizations (main report of this issue)
NEO LAB · № 01 · APPENDIX A
PROJECT DEAL
Invisible Inequality (current page)
NEO LAB · № 02 · In Progress
———
Coming soon