RaDialog Copilot explores how confidence-aware NLP can support radiologists during report drafting. The system integrates calibrated confidence methods into a feedback loop that aligns AI suggestions with clinician input, making the drafting process both faster and more verifiable. Radiology report drafting needs assistive AI that is transparent and verifiable in real time.
We compared logit-based entropy, sentence-embedding calibration (Duan), and entailment-based NLI approaches, balancing latency against calibration quality.
| Method | Family | Latency | Notes | Verdict |
|---|---|---|---|---|
| Simple Entropy | Logit‑based baseline | ~2s | Baseline calibrationLow semantic signal | Baseline only |
| Duan (med. ST) | Sentence embeddings (medical) | ~30s | Best ascending calibrationDomain‑trained | Most promising |
| Kuhn (NLI) | Entailment (DeBERTa‑MNLI) | ~9.5s | Bi‑directional entailmentNon‑linear < 0.6 | Inconsistent below 0.6 |
Unlike one-shot text assistants, RaDialog Copilot responds as the radiologist types, surfacing confidence-weighted completions that can be accepted with a keystroke or ignored without disruption.

Complete RaDialog Copilot Interface
These gains translate into smoother workflows: fewer edits, faster acceptance, and reduced frustration with over-confident AI text.
| Decision | Rationale | Trade‑off |
|---|---|---|
| Fast vs SOTA modes | 2s for UX; 20s when high certainty needed | Speed vs calibration quality |
| 0.7 threshold | Balances acceptance and precision | May hide rare low‑confidence finds |
| Hybrid early‑stop | Avoids jitter and run‑on sentences | Occasional clipped clause edge cases |