beacn
PREPARED BY KT · MAY 2026
BUYER BRIEF · FOR PRIYA ANAND

An LLM underwriter that can show its work — or a CFPB enforcement action waiting to happen.

FOR
Priya Anand
VP Engineering
COMPANY
Mercury Bench
FROM
KT
Agentic Ventures
01 · STRATEGIC VERDICT

Don't pitch observability. Pitch the answer to the question Priya already started asking on stage at AI Engineer Summit two weeks ago: how do you ship LLM-driven decisions you can defend to a regulator? Helicone's eval traces plus prompt-level cost attribution are the only credible answer in the room for an underwriting workflow under CFPB scrutiny. Anchor everything to the Series C announcement language — 'AI underwriting that explains itself' — and the conversation reframes from tool evaluation to compliance infrastructure.


02WHAT CHANGED RECENTLY

What changed recently

Mercury Bench closed a $42M Series C on April 8 led by Bain Capital Ventures, with explicit messaging that the round funds 'production-grade AI underwriting at scale.' Two weeks later, Priya gave a 22-minute talk at AI Engineer Summit titled 'Underwriting at the prompt layer,' where she said — direct quote from the recording — 'our biggest unsolved problem is reproducibility, not accuracy.' On the regulatory front, the CFPB's Circular 2022-03 and Circular 2023-03 both establish that lenders using AI or complex algorithms must provide specific, applicant-level explanations in adverse action notices — generic checklist reasons are insufficient. While the current administration withdrew 67 CFPB guidance documents in May 2025, the underlying ECOA and Regulation B obligations remain law; lenders cannot use 'the algorithm is too complex to explain' as a defense. Mercury Bench is exactly the kind of shop those circulars were written about. Their public LinkedIn shows three senior ML hires in the last 90 days; no observability tooling has been announced.

03COMPANY SNAPSHOT

Company snapshot

Series C fintech (52 employees, mostly engineering) building automated small-business loan underwriting. Live in 6 states, lending against bank-feed and Stripe-revenue signals via Plaid and Stripe Issuing. Reported $11M ARR at the Series C close. Tech stack: LangChain + Anthropic Claude + their own retrieval layer over borrower financials. Underwrites $50K–$2M lines. Loss rates publicly stated at 'better than incumbents,' which is the kind of phrasing regulators read carefully.

04STAKEHOLDER PROFILE

Stakeholder profile

Priya Anand · VP Engineering. Ex-Stripe Capital (4 years on the underwriting platform), ex-Square (2 years on Cash App Borrow). She owns the entire ML and platform org at Mercury Bench — 18 people reporting up. Joined March 2025 as employee #12. Her AI Engineer Summit talk shows she thinks in terms of system properties (reproducibility, latency, cost-per-decision), not features. She does not respond well to demo theater; her LinkedIn comments consistently push back on vendors who lead with capabilities instead of operational characteristics.

05WHAT WE KNOW ALREADY

What we know already

First call. No prior history beyond a 2-line LinkedIn DM exchange where Priya said 'send me the 15-min version, no deck.' Her ask sets the tone — terse, technical, no patience for marketing. Mutual connection: Helicone's CTO went to CMU with one of Mercury Bench's ML engineers (Devesh Rao). Devesh has used Helicone at a previous job and posted positively about it in 2024. Worth surfacing but not leaning on — Priya makes her own calls.

06PAIN POINTS

Pain points

  • Reproducibility of underwriting decisions when prompts and models change weekly

    Their current setup logs nothing systematic — engineers paste failing traces into Slack. This is the problem Priya named on stage.

  • Cost attribution per loan decision

    They're spending an undisclosed but rumored-to-be-significant amount on Anthropic tokens and have no view into which prompts, which retrieval calls, or which model versions are driving the bill.

  • Adverse action notice compliance under ECOA

    CFPB Circulars 2022-03 and 2023-03 require AI-driven lenders to produce specific factor-level explanations for each credit denial — not generic checklist reasons. The current administration withdrew some CFPB guidance in May 2025, but the ECOA obligations themselves are statutory; state AGs and private litigants still enforce. Mercury Bench's current system can produce a decision but cannot reliably produce the underlying factor weights.

  • Audit trail for state-level regulators

    They expanded into California in February, which means DFPI scrutiny. California will ask for decision lineage on individual loans; they have no tooling to produce it.

07QUESTIONS TO ASK

Questions to ask

  • Q01When you said on stage that reproducibility is your biggest unsolved problem, what does 'solved' look like — what would you need to be able to do that you can't today?
  • Q02Walk me through what happens when a loan officer or a regulator asks you to explain a specific decline from three months ago. Where does that investigation start?
  • Q03How are you currently attributing token spend to specific underwriting features versus engineering experimentation? Is anyone in finance asking that question yet?
  • Q04The CFPB opinion two weeks ago — how is that landing inside Mercury Bench? Is anyone owning the response, or is it still 'we'll figure it out before it matters'?
  • Q05What's the gap between your eng team's confidence in the underwriting model and your CCO's confidence in your ability to defend it?
08OBJECTIONS TO EXPECT

Objections to expect

  • PUSHBACK

    We're going to build this internally.

    RESPONSE

    Likely true short-term — they have the talent — but the carrying cost of platform tooling for an 18-person team building a lending product is the trap. Frame Helicone as the eval/observability layer they don't want their ML engineers writing in Python by hand.

  • PUSHBACK

    LangSmith / Arize / Weights & Biases already does this.

    RESPONSE

    LangSmith is the real competitor here. Differentiator: Helicone's cost attribution per request and per user, plus self-hostable. For a regulated lender, data residency matters — bring this up before they do.

  • PUSHBACK

    Send me pricing.

    RESPONSE

    Don't send pricing in a first call. Volume-based, but the relevant number is cost-per-decision-traced versus their token spend, which they haven't shared. Frame it as a fraction-of-percent overhead and resist the urge to anchor on a list price.

  • PUSHBACK

    Our compliance team hasn't asked for this yet.

    RESPONSE

    True for now, but the ECOA obligation is statutory — it predates and survives any administration's guidance withdrawals. State AGs (including California's DFPI) actively enforce adverse action notice requirements on fintech lenders. If their compliance team hasn't mapped the Circulars to their stack, that's the meeting they need next.

09 · WHAT TO WALK AWAY WITH

A second meeting on the calendar — specifically, a working session with Priya plus whoever owns compliance response at Mercury Bench (likely their COO, possibly outside counsel). The conversation we want next is not another sales call; it's a joint scoping of what defensible LLM underwriting infrastructure looks like at their scale. If we walk away with that meeting booked and an introduction to the compliance owner, the deal is in motion. If we walk away with 'send us pricing,' we've lost — that means she's evaluating us as a tool, not a partner.

Reply to KT