Build notes

Build notes from a freight broker's SDR replacement

LAST UPDATED · MAY 9, 2026

By Shrey Vaughn · 2026-03-28 · Build notes · 6 min

What we shipped, what we cut, and the eval that mattered more than the model choice.

A 60-person freight broker came in with a familiar request. Replace the outbound SDR seat, keep the meetings booked, cut the cost. The honest answer is that no one ships an SDR replacement on day one. You ship a system that earns trust over six weeks, and at the end of those six weeks the seat is gone because the work no longer needs it.

The first thing we cut was the model debate. The team had spent three weeks comparing GPT, Claude, and a fine-tuned in-house model on first-touch quality. The differences were real and almost irrelevant. The variance from data quality dwarfed the variance from model choice. Half the prospect records had stale titles. A third of the company records had wrong industry codes. The CRM was the bottleneck, not the writer.

What we shipped, in order. Layer 1 first. Six weeks of context interviews with the founder and the head of sales, captured into a structured brain that every downstream component reads from. Voice, ICP, sequences, objection patterns, three years of closed-won notes. Layer 2 next. Live signal from HubSpot, calendar, and the freight tender feed wired into a warehouse. Layer 3, the outbound automations, only after the brain and the signal were stable.

What we cut. We cut the per-prospect personalization that everyone asks for in week one. It does not move reply rate at this volume. We cut the second-touch follow-up that the team had built in n8n. Replaced with a single eval-gated cadence we could actually trust. We cut the dashboard the founder asked for and shipped a daily Telegram brief instead, because the dashboard was a thing he opened twice a week and the brief was a thing that opened him every morning.

The eval was the unlock. We ran a weekly eval on every cohort the system touched. Reply rate, meeting quality, ICP fit, and one human-graded score on whether the message sounded like a competent operator. The eval caught a regression in week four when a model update silently softened the tone. We rolled the prompt back inside an hour. The team trusted the system after that. The seat closed in week eight.

The model debate did not matter. The eval cadence did. The data layer did. The context capture did. Build the boring layers first and the interesting layers ship themselves.

Subscribe

Less reading. Build it.

A 30 minute consultation where we will talk through your stack, the workflow that is eating your team, and what the first engagement would look like.

Book consultation Read the playbook