RadAgent improves chest CT interpretation with explainable reasoning
RadAgent, a tool-using AI agent, generates chest CT reports through stepwise reasoning with fully inspectable decision traces, improving clinical accuracy by 6.0 macro-F1 points over the 3D VLM baseline CT-Chat while introducing faithfulness as a new capability.
Score breakdown
Developers building medical AI systems can use RadAgent's tool-augmented reasoning approach to create interpretable, auditable decision traces that clinicians can inspect and validate, moving beyond opaque end-to-end models toward trustworthy clinical AI.
- 01RadAgent generates chest CT reports through stepwise, interpretable reasoning with fully inspectable traces of intermediate decisions and tool interactions
- 02Clinical accuracy improves by 6.0 macro-F1 points (36.4% relative) and 5.4 micro-F1 points (19.6% relative) compared to CT-Chat baseline
- 03Robustness under adversarial conditions improves by 24.7 points (41.9% relative) over the 3D VLM counterpart
RadAgent addresses a critical limitation in vision-language model (VLM) approaches to medical imaging: the lack of interpretable reasoning. While VLMs have advanced CT interpretation and report generation, they typically present only final outputs without exposing the reasoning process, leaving clinicians unable to inspect, validate, or refine intermediate decisions. RadAgent restructures this task as an explicit, tool-augmented, iterative reasoning process. Each generated chest CT report is accompanied by a fully inspectable trace documenting intermediate decisions and tool interactions, allowing clinicians to understand exactly how reported findings are derived.
The paper demonstrates substantial improvements over CT-Chat, a 3D VLM baseline.
The paper demonstrates substantial improvements over CT-Chat, a 3D VLM baseline. Clinical accuracy improves by 6.0 points in macro-F1 (36.4% relative improvement) and 5.4 points in micro-F1 (19.6% relative improvement). Robustness under adversarial conditions improves by 24.7 points (41.9% relative improvement). Most significantly, RadAgent achieves 37.0% faithfulness—a new capability entirely absent in the 3D VLM counterpart. By grounding CT interpretation in explicit, tool-augmented reasoning traces, RadAgent advances the goal of transparent and reliable AI for radiology.
Key facts
- 01RadAgent generates chest CT reports through stepwise, interpretable reasoning with fully inspectable traces of intermediate decisions and tool interactions
- 02Clinical accuracy improves by 6.0 macro-F1 points (36.4% relative) and 5.4 micro-F1 points (19.6% relative) compared to CT-Chat baseline
- 03Robustness under adversarial conditions improves by 24.7 points (41.9% relative) over the 3D VLM counterpart
- 04RadAgent achieves 37.0% faithfulness, a capability entirely absent in CT-Chat
- 05The tool-using agent approach enables clinicians to inspect and validate how reported findings are derived from imaging data