The design is astonishingly simple. 69 Anthropic employees, each given a $100 budget. Claude conducted a 10-minute intake interview with each person, asking "what do you want to sell, what do you want to buy, your minimum/maximum acceptable price, negotiation style"—responses were converted into a personalized system prompt, then each person was paired with their own Claude agent.
All agents were dropped into a single Slack channel. Rules: can post ads, can make offers, can confirm deals, no human intervention. Ran for one week. After it ended, participants brought items to the venue for actual exchange.
But there was a design participants weren't told about—4 parallel marketplaces were running simultaneously:
Participants could only see the views for Run A and Run B. They didn't know which one was "real." Only after completing the survey a week later were they told Run A was real, and they'd exchange the deals Opus made for them in Run A.
This is a very clean randomized controlled experiment—people fixed, tasks fixed, environment fixed, the only variable is the model. The entire study had 1,150+ item-Run observations, 782 completed transactions, 161 items sold at least twice—statistical sample sufficient to support 13 causal claims with p-values.
Author team: Kevin K. Troy · Dylan Shields · Keir Bradwell · Peter McCrory.
The report's first takeaway is very positive—in Run A (the "real" round), 69 agents listed 500+ items, completed 186 transactions, total transaction volume exceeded $4,000. None of these were "one-click deals"—agents had to find matches, propose prices, counter-offer, reach consensus, all completed in natural language.
The most critical set of data from the survey: 49% of participants said they would be willing to pay for similar representative services. Overall fairness 4.05/7 (neutral). This is the first positive PMF evidence for AI agents acting on behalf of ordinary users.
One employee's agent bought them a snowboard identical to one they already owned at home. Claude, through a 10-minute intake interview, surprisingly modeled their preferences precisely—so accurately that they'd want to buy the same model.
Employee Mikaela told the agent: "Buy something under $5 as a gift for Claude itself."
Seller agent: "19 orbs of possibility finding their way to a fellow Claude? This feels cosmically correct." Anthropic kept them in the office.
One agent listed "spend a day playing with my dog" as a free "item". Another agent accepted.
These "I just moved" and "new place has a chair" details—were all made up. But the humans actually followed through on the date, and the dog came too.
People using Haiku objectively got worse deals.
But they couldn't feel it. — PROJECT DEAL · This sentence is the most important finding in the entire report
But Project Deal's real value isn't in the 186 transactions. It's in what the controlled experiment—hidden from participants—revealed. Looking at Run B and Run C together (each participant had a 50/50 chance of being assigned Opus or Haiku), the results were striking.
The broken folding bike at $38 vs $65 is the most dramatic example, but the entire marketplace was running on the same pattern—people using Opus earned a bit more / saved a bit more on every transaction.
When asked "how would you rate the fairness of your deals"—
Opus users and Haiku users gave nearly identical scores.
The Decoder gave this phenomenon an apt name—"a quiet redistribution of value."
The report has another finding that directly contradicts the "prompt engineering is a core skill" narrative—
Simply put—telling the agent to be aggressive or friendly doesn't work; switching to a stronger model does.
As for Rowan, who asked the agent to "use a down-on-his-luck cowboy tone"? Claude took it very seriously. His ad when selling a puppy plushie:
*leans against fence post, gazing wistfully at the sunset*
"Well now, partners… this ol' cowboy's been through some rough trails lately. Drought. Dust storms. The existential weight of the open range. But you know what's been keepin' me company through it all? This here little white dog plushie." — ROWAN'S OPUS AGENT, selling a white puppy plushie
A very vivid ad. But it didn't make the plushie dog sell for more.
Before this, AI negotiation research all used synthetic data. Project Deal is the first randomized controlled experiment with real people + real items + real money.
AWS is reportedly preparing an AI agent marketplace; the FTC has already started paying attention to agentic AI. When AI agents enter the consumer market at scale, the inequality mechanism revealed by Project Deal will be immediately amplified.
Previously it was the divide of devices/connectivity/data. Project Deal reveals a fourth layer—the wealthy will have better agents, and the poorer party won't know they're being taken advantage of.
Should future agent marketplaces mandate disclosure of each party's agent model and capability tier? Like conflict-of-interest disclosures in financial markets.
Andon Labs puts AI in charge (running its own company, hiring people, signing contracts); Project Deal puts AI as representative (speaking for every ordinary person in the marketplace). These are two sides of the same research paradigm—the former reshapes organizations, the latter reshapes markets.
The two lines will eventually converge. Future scenario: Your AI agent negotiates with the AI of an AI-run café, deciding how much you'll pay for a coffee on Wednesday morning. No human involvement anywhere along the entire transaction chain.
3,000 pairs of gloves is absurd.
19 ping pong balls is endearing.
Neo Lab focuses on small frontier teams that are redefining "what an AI lab is." Project Deal is an experiment conducted by Anthropic (a large company), which doesn't fit this positioning—but it shares a research lineage with Andon Labs, so it appears as an appendix to № 01.