Jun 9, 2026·1 min readApplications & Use Cases

Fine-tuned Parakeet 0.6B medical ASR model released as open weights

u/MajesticAd2862, founder of Omi Health, fine-tuned NVIDIA's Parakeet TDT 0.6B v2 into Omi Med STT v1, a CC-BY-4.0 medical ASR model that runs locally on Mac, Windows, and Linux and benchmarks competitively against cloud transcription APIs.

r/LocalLLaMA·u/MajesticAd2862

Read at source

Composite

5.6

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

Omi Med STT v1 is the best-performing locally-running open model on this benchmark, achieving cloud-competitive M-WER at 0.6B parameters while keeping patient audio entirely on-device.

01Omi Med STT v1 is a fine-tune of NVIDIA's Parakeet TDT 0.6B v2, released under CC-BY-4.0.
02Installable via `pip install omi-med-stt`; auto-selects MLX (Apple Silicon), NeMo (CUDA), or GGUF/parakeet.cpp (CPU) backends.
03Benchmark: 1,513 clips / 7.18 hours of held-out medical audio, scored by M-WER (errors on clinical terms only) and RTFx.

Summary— our read of the original

u/MajesticAd2862, founder of Omi Health, fine-tuned NVIDIA's Parakeet TDT 0.6B v2 on clinical speech and released the result as Omi Med STT v1 under a CC-BY-4.0 license. The model is distributed via `pip install omi-med-stt` and ships with a runtime for Mac, Windows, and Linux that automatically selects the appropriate backend: MLX on Apple Silicon, NeMo on CUDA, and GGUF/parakeet.cpp on CPU. The default quantization is q8; a q4 variant was benchmarked but not shipped because drug-name accuracy regressed too much. Training used approximately 127 hours of audio, roughly 71% real and 29% synthetic, drawn from licensed, openly-available, and custom synthetic sources targeting hard-to-source medical speech.

Evaluation was conducted on a locked held-out split of 1,513 clips totaling 7.18 hours, scored using medical word error rate (M-WER), which counts errors on clinical terms only, and RTFx (× realtime speed).

Evaluation was conducted on a locked held-out split of 1,513 clips totaling 7.18 hours, scored using medical word error rate (M-WER), which counts errors on clinical terms only, and RTFx (× realtime speed). Omi Med STT v1 achieved an M-WER of 2.37%, WER of 8.30%, and drug M-WER of 4.75% at 145× RTFx on an A10 GPU. Among open/local models, only VibeVoice-ASR 9B edges it on M-WER (1.78%), but that model is approximately 15× larger, ran on an H100 in the evaluation, and posted a higher overall WER of 11.10% versus Omi's 8.30%. Against cloud APIs, Omi Med STT v1 sits ahead of Deepgram Nova-3 Medical and Corti Transcripts on M-WER, and is competitive with general-purpose cloud scribes, while keeping audio on-device. A notable benchmark finding: both Gemini 3.1 Pro Preview and Gemini 3.5 Flash exhibited a hallucination failure mode — on a stress lane of 420 benign, non-diagnostic clips, they fabricated entire fake consultations (3.1 Pro on 33/420 clips, 3.5 Flash on 87/420), while every other dedicated ASR model scored 0 on that lane.

Drug-name accuracy is identified as the primary weakness and the top priority for v2. Planned future work includes a streaming version and a multilingual variant.

Key facts

01Omi Med STT v1 is a fine-tune of NVIDIA's Parakeet TDT 0.6B v2, released under CC-BY-4.0.
02Installable via `pip install omi-med-stt`; auto-selects MLX (Apple Silicon), NeMo (CUDA), or GGUF/parakeet.cpp (CPU) backends.
03Benchmark: 1,513 clips / 7.18 hours of held-out medical audio, scored by M-WER (errors on clinical terms only) and RTFx.
04Achieved M-WER of 2.37% and 145× RTFx on an A10 GPU — cutting the base Parakeet TDT 0.6B v2 M-WER of 8.36% by ~3.5×.
05Spurious drug mentions dropped from 131 to 9 versus the base model.
06Gemini 3.1 Pro Preview and Gemini 3.5 Flash fabricated entire fake consultations on 33/420 and 87/420 benign clips respectively; all other dedicated ASR models scored 0 on that lane.
07Training used ~127 hours of audio (~71% real, ~29% synthetic); a q4 quantization was tested but not shipped due to drug-name accuracy regression.

Topics

#fine-tuning #open-source #medical-asr #benchmarks #model-release

Methodology

Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 9, 2026 · 17:05 UTC. How this works →

Jun 9, 2026·1 min readApplications & Use Cases

Fine-tuned Parakeet 0.6B medical ASR model released as open weights

r/LocalLLaMA·u/MajesticAd2862

Read at source

Composite

5.6

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

Omi Med STT v1 is the best-performing locally-running open model on this benchmark, achieving cloud-competitive M-WER at 0.6B parameters while keeping patient audio entirely on-device.

01Omi Med STT v1 is a fine-tune of NVIDIA's Parakeet TDT 0.6B v2, released under CC-BY-4.0.
02Installable via `pip install omi-med-stt`; auto-selects MLX (Apple Silicon), NeMo (CUDA), or GGUF/parakeet.cpp (CPU) backends.
03Benchmark: 1,513 clips / 7.18 hours of held-out medical audio, scored by M-WER (errors on clinical terms only) and RTFx.

Summary— our read of the original

Evaluation was conducted on a locked held-out split of 1,513 clips totaling 7.18 hours, scored using medical word error rate (M-WER), which counts errors on clinical terms only, and RTFx (× realtime speed).

Drug-name accuracy is identified as the primary weakness and the top priority for v2. Planned future work includes a streaming version and a multilingual variant.

Key facts

01Omi Med STT v1 is a fine-tune of NVIDIA's Parakeet TDT 0.6B v2, released under CC-BY-4.0.
02Installable via `pip install omi-med-stt`; auto-selects MLX (Apple Silicon), NeMo (CUDA), or GGUF/parakeet.cpp (CPU) backends.
03Benchmark: 1,513 clips / 7.18 hours of held-out medical audio, scored by M-WER (errors on clinical terms only) and RTFx.
04Achieved M-WER of 2.37% and 145× RTFx on an A10 GPU — cutting the base Parakeet TDT 0.6B v2 M-WER of 8.36% by ~3.5×.
05Spurious drug mentions dropped from 131 to 9 versus the base model.
06Gemini 3.1 Pro Preview and Gemini 3.5 Flash fabricated entire fake consultations on 33/420 and 87/420 benign clips respectively; all other dedicated ASR models scored 0 on that lane.
07Training used ~127 hours of audio (~71% real, ~29% synthetic); a q4 quantization was tested but not shipped due to drug-name accuracy regression.

Topics

#fine-tuning #open-source #medical-asr #benchmarks #model-release

Methodology

Score breakdown

Key facts

Topics

More in Applications & Use Cases.

Score breakdown

Key facts

Topics

More in Applications & Use Cases.