AI that listens to your calls and tells you what to ask next

How does AI sales coaching work?

The mechanics, step by step: how an AI coach listens to a call, transcribes it, detects the moments that matter, and prompts the rep with the right move in real time.

Start free trial Watch live demo

Works on Zoom, Teams & Google Meet · Mac & Windows · 7-day free trial

ConversationPilot — live overlay

Objection Handling

They're comparing you to a competitor.

↳ “What would make us the clear choice over them for your team?”

Next best question

“When does your current contract renew?”

Signal detection

Budget mentionedDecision makerCompetitor: LookerRenewal: March

AI sales coaching works by running a continuous four-step loop during the call: it listens to both speakers, transcribes the audio into text, detects the moments that matter — objections, buying signals, qualification gaps — and prompts the rep with a specific next move, all in under two seconds. Instead of a human manager reviewing a recording days later, an AI coach does the listening, interpreting and prompting live, on every call, for every rep. ConversationPilot is a clear example of this loop in action.

The phrase AI sales coaching covers both the live assist and the post-call reinforcement, but the mechanically interesting part is the real-time loop, because doing it fast enough to be useful is genuinely hard. A prompt that arrives ten seconds late is worthless; the conversation has moved on. The whole architecture is therefore built around latency: fast where it must be, thorough where it can afford to be.

This page opens the box and walks through the mechanics: listen, transcribe, detect, prompt, and then reinforce afterward. We use ConversationPilot to make each step concrete — what it captures, which models do which job, and why the design choices matter.

Step 1 — Listen: capturing both speakers

Coaching starts with hearing the call accurately, and the key design choice is how the audio is captured. ConversationPilot captures two separate streams: your microphone (you, the operator) and the meeting or system audio (the counterpart). It does not take a single mixed channel and try to untangle it afterward.

This matters more than it first appears. With separate streams, the system knows from the first word exactly who is speaking — no guessing, no diarisation errors. That precision flows into everything downstream: speaking analytics are exact rather than estimated, an objection is correctly attributed to the prospect rather than confused with the rep's own words, and the coach never wastes precious milliseconds figuring out the channel. Capturing the two sides separately is the unglamorous foundation that makes the rest of the coaching loop both fast and trustworthy.

Live scorecard

NeedCovered

BudgetPartial

AuthorityCovered

TimelineOpen

CompetitionCovered

Call score — strong qualification

Step 2 — Transcribe: audio into text in real time

Once the audio is captured, it has to become text the models can reason over. ConversationPilot transcribes both streams continuously and live using Whisper, so the conversation is converted to text as it is spoken rather than in a batch after the call.

Live transcription is what makes everything else possible. Because the text exists almost immediately, the detection and prompting steps can run against it in near real time. The transcript is also speaker-labelled from the start, since the two streams were captured separately — so when the model reads the text, it already knows which lines are the rep and which are the prospect. Continuous transcription means the coach is never waiting for a chunk of audio to finish before it can react; it is always working with the latest words, which is essential for fitting the whole loop inside a two-second budget.

Post-call report

Buying signal: asked for pricing to share with CFO

Risk: contract renews in March — short window

Step 3 — Detect: finding the moments that matter

Transcribed text is raw material; coaching requires understanding which moments in it actually matter. The detection step reads the live transcript and identifies the meaningful events: objections by type (price, timing, status quo, competitor), buying signals, competitor mentions, budget references, decision-maker cues, timelines, procurement hurdles and renewal dates.

In ConversationPilot this runs on Claude Haiku 4.5, a fast model chosen specifically because detection has to keep pace with a live conversation. The model is not just spotting keywords — it is interpreting intent, so a prospect saying they are happy with their current provider registers as a status-quo objection rather than a passing remark. Each detected moment is what triggers the next step. Without accurate, fast detection, the coach would either miss the moments that count or fire prompts at the wrong time, so this step is where the conversational intelligence really lives.

Step 4 — Prompt: the right move, in under two seconds

Detection is only useful if it turns into a move the rep can make. The prompting step takes a detected moment and generates a single, glanceable suggestion: the next best question, a specific objection response, a qualification prompt, or a nudge to stop talking and listen. It is condensed to one line so the rep can absorb it without losing their place in the conversation.

The defining constraint is speed. ConversationPilot targets a sub-two-second budget end to end, so the prompt appears almost as soon as the prospect finishes speaking — early enough to act on before the next sentence. The guidance is specific to what was just said, not a generic tip the rep has to translate. And it is suggestive, never coercive: the rep stays in control and decides whether to use it. This is the payoff of the whole loop — the right move, in the moment it can still change the outcome.

The scorecard running underneath

Alongside the moment-to-moment prompts, an AI coach keeps a running model of the whole conversation's completeness. ConversationPilot maintains a live qualification scorecard throughout the call — for sales: Need, Budget, Authority, Timeline, Competition and Current Solution — marking each covered, partial or still open as the conversation progresses.

This is coaching at a different timescale than the individual prompt. While a prompt addresses the current moment, the scorecard addresses the arc of the call: it tells the rep, at a glance, what they still have to uncover before they hang up. The two work together. A prompt might surface the exact question to ask about decision-making authority precisely because the scorecard shows Authority still open. The scorecard rolls into a single call score and feeds the post-call report, so the same structure that guides the live call also grounds the coaching afterward.

Step 5 — Reinforce: coaching after the call

The loop does not end at hang-up. Live coaching changes the current call; post-call reinforcement cements the lesson and removes the admin. The moment the call ends, ConversationPilot generates a structured report automatically — an executive summary, key points, objections raised, buying signals, risks, recommended next actions, CRM notes and a follow-up email draft.

This runs on a stronger model, Claude Sonnet 4.6, which can afford to be thorough because it is no longer racing a live conversation. That split — fast model for live prompts, strong model for deep analysis — is what lets the coaching be both instant in the moment and deep afterward. The report is itself a coaching artefact: the rep can review what they missed before the next call, and a manager can coach from specific, real moments in a call review library. Fast guidance live, thorough reflection after, with each reinforcing the other.

How playbooks shape what the coach says

The same conversation can call for very different coaching depending on what kind of call it is, so an AI coach needs a sense of context. ConversationPilot supplies this through AI Playbooks — Sales Discovery, Enterprise Sales, Customer Success, Investor Pitch and more — which sit underneath the loop and shape both what counts as a coachable moment and what the scorecard measures.

Mechanically, the playbook tunes the detection and prompting steps. On a discovery call the coach prioritises open questions and the qualification gaps that matter early; on an enterprise call it weights procurement and multi-stakeholder dynamics more heavily. The scorecard criteria shift to match the scenario. This is why the coaching feels relevant rather than rote — the loop is not running a single generic model of a good call, but the specific model that fits the conversation you selected before you dialled. A team can encode how its best reps run each call type into a playbook, so the coach guides everyone toward that proven standard rather than a textbook one.

Why the coach stays accurate and honest

A coaching loop is only worth following if its inputs are accurate and its claims are honest, so both are designed in deliberately. Accuracy starts with the dual-stream capture: because the rep and the counterpart are separate channels, every detected signal and every speaking metric is attributed to the right person rather than inferred from a mixed recording. A flagged objection genuinely came from the prospect; a talk-listen ratio reflects who actually spoke.

Honesty is about not overclaiming. The optional webcam analysis returns banded engagement indicators — High, Moderate, Low, plus states like Attention Shift or Camera Off — always accompanied by a confidence level, and it never asserts lie detection, emotion certainty or mind reading. The coach guides; it does not pretend to know more than it can. That restraint is what makes the loop trustworthy on real calls: a rep can rely on the signals where they are strong and is never misled by a confident-sounding claim the system cannot actually support, which is precisely what lets the coaching loop earn a lasting place on real calls rather than being switched off after a week of irritation. You also remain responsible for complying with call-recording and consent laws in your jurisdiction.

AI sales coaching vs. manual call review

Capability	ConversationPilot AI	Manual call review
When coaching happens	Live, on every call	Days later, sampled calls
Listening	Dual-stream, automatic	Manager listens manually
Detecting key moments	Live, model-driven	Whatever the reviewer catches
Prompting the rep	Under 2 seconds, in-call	Not possible after the fact
Coverage	100% of calls	A handful per week
Post-call report	Automatic, structured	Manual notes