⏳🔧
Whisper Hangs at 99%? 8 Fixes That Work
Most common cause: RAM pressure
Second most common: ANE conflict
Fastest fix: Restart app
Permanent fix: Switch model size
TL;DR: Whisper transcription stuck at 99% on Mac usually means one of three problems: memory pressure (Whisper large-v3 needs 10 GB RAM peak; runs out on 8 GB Macs), Apple Neural Engine conflict (another app is already using ANE for ML inference), or corrupt model file. The fastest fix: quit the Whisper app, free RAM by closing Chrome and Slack, restart the app. The permanent fix: switch to large-v3-turbo or smaller which uses 6 GB RAM peak. This guide covers all 8 causes and fixes, applicable to whisper.cpp, MetaWhisp, MacWhisper, SuperWhisper, and any other Whisper-based Mac app.
Whisper transcription stuck at 99 percent diagnostic diagram showing RAM pressure ANE conflict corrupt model and swap thrashing root causes on Mac

Why Does Whisper Get Stuck at 99% on Mac?

When Whisper appears to hang at 99% progress, the model has finished most of the inference work but cannot complete the final step. The progress bar shows 99% because the encoder pass succeeded and most decoder iterations completed, but the final token generation or output flush is blocked. The block usually has one of three causes:
  1. Memory pressure — macOS is swapping to SSD because Whisper exceeded available RAM. Each swap operation is 100-1000× slower than RAM access, which makes the final 1% take 30-300× longer than the previous 99%.
  2. Apple Neural Engine conflict — Another app on your Mac is using ANE for ML inference. Whisper queues behind it. On macOS 14+, only one app can use ANE at a time per Apple's Core ML documentation.
  3. Corrupt model file — The Whisper model file on disk got partially written or interrupted during download. The final inference step reaches a malformed weight and stalls.
I'm Andrew Dyuzhov, solo founder of MetaWhisp. I built MetaWhisp's Whisper pipeline on Apple Neural Engine and have worked through this exact failure mode while developing the app. This guide explains each root cause documented in the Whisper community (GitHub issues, Reddit threads, whisper.cpp discussions) and the fix that addresses it.
The "stuck at 99%" symptom is misleading because the percentage bar usually reflects encoder progress, not total work remaining. Whisper's encoder runs once per 30-second audio chunk and represents roughly 80-95% of the per-chunk compute. The decoder runs autoregressively for each output token and represents the remaining 5-20%. When the progress bar is showing "99%", it's typically reporting "encoder done, decoder in progress" — and the decoder can stall for reasons unrelated to encoder success. This architectural detail matters because it tells you where to look for problems: if you're stuck at 99% across multiple files of different lengths, the problem is in the decoder path (memory pressure, ANE contention, output flush) rather than in the encoder. If you're stuck before 99%, the problem is earlier — file format issues, audio sample rate mismatch, or model load failure. The two failure modes have different fix sets and approaching them with the wrong assumption wastes hours.

Fix 1: Check Activity Monitor for Memory Pressure

The most common cause of stuck-at-99% is memory pressure. macOS shows this visually in Activity Monitor. Steps to diagnose:
  1. Open Activity Monitor (Applications → Utilities → Activity Monitor)
  2. Click the Memory tab at the top
  3. Look at the Memory Pressure graph at the bottom
  4. Green = healthy, Yellow = approaching limit, Red = swapping to SSD
If the graph is yellow or red while Whisper is running, you have memory pressure. The fix depends on your Mac's RAM: To verify the fix worked, restart your Whisper app and watch Activity Monitor while it runs. If pressure stays green throughout, memory was the issue.

Fix 2: Close Apps Competing for Apple Neural Engine

On Apple Silicon Macs, only one app can use Apple Neural Engine at a time. If another app holds ANE — Photos doing face detection, Final Cut Pro doing scene analysis, an AI background-removal tool, Apple Intelligence Writing Tools — Whisper queues behind it. Apps known to use ANE significantly: To diagnose, check what's running:
  1. In Activity Monitor, sort by CPU descending
  2. Look for processes with high CPU that aren't your Whisper app
  3. Quit ML-heavy apps (Photos can be Force Quit safely)
  4. Restart your Whisper app
For users who run multiple ML apps regularly, the workaround is to manually schedule: dictate while Photos isn't doing background processing, then let Photos catch up after. There's no automatic ANE scheduling that respects app priorities.
Pro tip: Apple Intelligence's Writing Tools is one of the worst ANE-hoggers because it runs continuously in the background watching for text-rewrite opportunities. If you're stuck at 99% repeatedly, disable Apple Intelligence temporarily via System Settings → Apple Intelligence & Siri → toggle off, restart your Whisper app, and see if the symptom disappears.

Fix 3: Re-Download the Whisper Model File

Whisper models are large binary files (39 MB to 1.55 GB depending on size) downloaded from Hugging Face or the app vendor on first launch. If the download was interrupted, the file may be corrupted but appear complete. To verify model integrity:
  1. Find the Whisper model file on disk:
    • whisper.cpp: ~/whisper.cpp/models/ggml-large-v3-turbo.bin
    • MetaWhisp: ~/Library/Application Support/MetaWhisp/models/
    • MacWhisper: ~/Library/Application Support/MacWhisper/Models/
  2. Check file size against expected size from the official Hugging Face model card
  3. If size mismatch: delete and re-download
  4. For SHA-256 verification: shasum -a 256 ggml-large-v3-turbo.bin and compare against the published hash
To re-download: Re-downloading on a stable connection (wired Ethernet or 5 GHz Wi-Fi) eliminates the partial-write risk that causes most model corruption.
Whisper model file integrity SHA-256 verification diagram showing good versus corrupt download paths with re-download action on Mac

Fix 4: Reduce Whisper Beam Size

Whisper's decoder supports beam search — exploring multiple possible token sequences to find the highest-probability output. Higher beam sizes produce better accuracy but use more memory and time. Default beam size is 5 in most implementations per OpenAI's reference transcribe.py. If you're stuck at 99% during the decode phase, reducing beam size frees memory and speeds up the final tokens: Beam size 1 trades some accuracy (0.5-1.5 percentage points WER on edge cases) for substantially faster decode and less memory. For most dictation use cases, the accuracy hit is imperceptible while the speed and stability gain is meaningful.
The beam size parameter has an underrated impact on stuck-at-99% issues because the decoder's memory footprint grows linearly with beam size. With beam_size=5 (default), the decoder holds 5 parallel hypothesis sequences plus their cumulative log-probabilities in memory. For long audio with many tokens, this can push total memory usage from "fits in RAM" to "needs swap" right at the end of inference, which manifests as the 99% hang. Beam size 1 cuts that working memory by 5×, often resolving the symptom on memory-constrained Macs without changing the model size. The accuracy trade-off is small enough that the official Whisper large-v3-turbo model card documents the default beam size mainly as a research-result reproducibility choice rather than a strong production recommendation. For production deployments, lower beam sizes are typically preferred specifically because they avoid the memory cliff that causes stuck-at-99% under load.

Fix 5: Switch to a Smaller Whisper Model

The single most reliable fix for stuck-at-99% on RAM-constrained Macs is using a smaller Whisper model. Memory requirements scale roughly linearly with model size:
ModelDisk sizeRAM peak (decode)WER (clean English)
tiny39 MB~1.0 GB~13%
base74 MB~1.2 GB~9%
small244 MB~2.1 GB~5.7%
medium769 MB~5.0 GB~4.4%
large-v3-turbo809 MB~6.0 GB~5.7%
large-v31.55 GB~10.0 GB~3.5%
For 8 GB Macs, **large-v3-turbo is the largest model that runs without swap** for most users. It uses 6 GB peak RAM, which fits comfortably with 2 GB headroom for macOS and active apps. The accuracy is within 2.2 percentage points of large-v3 (5.7% vs 3.5% WER) — for most dictation use cases, the difference is imperceptible. For 16 GB Macs, large-v3 works but you may need to close Chrome and Slack first. large-v3-turbo is the safer default unless you specifically need the marginal accuracy gain. Most Whisper apps let you switch models in settings:

Fix 6: Free Up RAM Before Starting Whisper

Even on adequate-RAM Macs, other apps can squeeze Whisper into swap. The aggressive cleanup pattern:
  1. Quit Chrome (often the #1 RAM hog; 3-5 GB across active tabs)
  2. Quit Slack desktop (1-2 GB)
  3. Quit Discord (1-2 GB)
  4. Quit Zoom or any video conferencing app (1 GB even when idle)
  5. Quit Docker Desktop if running (2-4 GB for active containers)
  6. Quit any IDE running language servers (VS Code, IntelliJ, etc. — 1-3 GB)
  7. Restart your Whisper app with a clean memory state
For routine dictation, the realistic version is to keep Chrome closed during long file-transcription sessions. For real-time dictation throughout the day, this is impractical — switch to a smaller model permanently rather than juggling app quits.
The Apple Activity Monitor memory documentation covers how to interpret the memory graph and which apps to suspect. For programmatic monitoring, the vm_stat command in Terminal shows real-time memory pressure if you want to script alerts when pressure climbs into yellow or red.

Fix 7: Restart Without Auto-Resume

Some Whisper apps support "resume" of interrupted transcriptions. If you killed a stuck-at-99% job and the app auto-resumed it on next launch, the stuck state can persist because the same memory pressure or ANE conflict re-occurs. To force a clean start:
  1. Quit the Whisper app
  2. Find any in-progress transcription cache files:
    • whisper.cpp: .tmp files in the working directory
    • MetaWhisp: ~/Library/Application Support/MetaWhisp/in-progress/
    • MacWhisper: ~/Library/Application Support/MacWhisper/processing/
  3. Delete the in-progress cache files
  4. Launch the app fresh
  5. Re-import the audio file as a new transcription
This forces the app to start from scratch with full available memory rather than picking up where the previous stuck state left off.

Fix 8: Disable Real-Time Display and Other Live Features

Many Whisper apps display the transcript in real-time as decoding proceeds. This is nice to watch but adds overhead — the UI update path competes with the inference path for memory and CPU. For stuck-at-99% specifically, disabling real-time display sometimes resolves the issue by removing the UI competition: The transcript still appears when processing completes; you just don't see it incrementally. For users debugging recurring 99% hangs, this is worth trying as a temporary diagnostic. The Obsidian forums, r/macapps subreddit, and whisper.cpp issue tracker all contain user-reported cases where disabling live preview resolved 99% hangs that other fixes hadn't addressed.
Mac Activity Monitor diagnostic screenshot for Whisper stuck at 99 percent showing red memory pressure swap usage and process memory annotations
Whisper stuck at 99 percent troubleshooting 8-step flowchart for Mac showing memory pressure ANE conflict model corruption beam size diagnostic path

How to Prevent Stuck-at-99% From Happening Again

After diagnosing and fixing your current stuck job, the prevention pattern:
  1. Match model size to RAM — On 8 GB Macs, large-v3-turbo or smaller. On 16 GB, large-v3 with care. On 24 GB+, anything.
  2. Disable Apple Intelligence during heavy Whisper work — Until Apple ships ANE scheduling that respects priorities, manual coordination is the only option
  3. Keep Chrome and Slack closed during long transcription sessions — The RAM saved often makes the difference between swap and no-swap
  4. Use beam size 1 or 2 instead of default 5 — Imperceptible accuracy hit, materially better stability
  5. Verify model file integrity after any interrupted download — SHA-256 check or just delete and redownload to be safe
  6. Pick an app that handles ANE conflicts gracefully — MetaWhisp falls back to GPU Metal automatically when ANE is busy, avoiding hangs entirely
For users debugging recurring issues across multiple Whisper apps, the common pattern is RAM-related. Upgrading to a Mac with more RAM is the most reliable long-term fix; otherwise, switching to large-v3-turbo or smaller models permanently solves most stuck-at-99% reports.
The prevention checklist above maps to specific failure modes documented in the openai/whisper GitHub issues tracker and whisper.cpp's issues. Memory pressure (matching model to RAM) is the single highest-yield fix because it eliminates the most common stall path. ANE contention (closing Apple Intelligence dependent apps) is the second-highest yield because Apple's scheduling does not yet expose priorities for ML workloads. The remaining items — reducing beam size, checking model integrity, picking apps with graceful ANE fallback — are smaller individual wins that compound. Users who apply the entire checklist rarely encounter stuck-at-99% jobs again; users who apply only some of it may still see occasional stalls during heavy multi-app workloads. The most reliable preventive setup is a 16 GB+ Mac running Whisper large-v3-turbo with Apple Intelligence and Chrome closed, which removes the three biggest stall drivers simultaneously.

Which Whisper Apps Have Fewer Stuck-at-99% Issues?

Different Mac Whisper apps handle memory pressure and ANE contention differently. From most-robust to most-fragile based on user reports: For users hitting recurring stuck-at-99% issues with whisper.cpp or custom Python implementations, switching to a desktop app with proper memory management (MetaWhisp, MacWhisper, SuperWhisper) usually resolves the symptom without needing to upgrade hardware.
The architectural difference that matters most for stuck-at-99% resilience is whether the app pre-allocates Whisper's working memory in a single contiguous block or grows it dynamically during inference. Pre-allocated apps fail fast at startup if there isn't enough RAM, which is annoying but predictable — you know immediately to use a smaller model. Dynamic-allocation apps appear to start successfully then stall later when RAM is exhausted mid-inference, which produces the stuck-at-99% symptom. MetaWhisp pre-allocates its Whisper memory at launch by default to surface RAM constraints early, per the architectural choice documented in whisper.cpp's README on memory management which influenced most Mac apps in this category. Users encountering recurring 99% hangs across multiple apps should consider this difference when choosing tools — the predictable failure mode of pre-allocated apps is actually preferable to the deceptive "looks like it's working" failure mode of dynamic-allocation apps.

Frequently Asked Questions About Whisper Stuck at 99%

Why does Whisper hang at 99% on my Mac?

Three most common causes: (1) memory pressure — Whisper large-v3 needs 10 GB RAM peak, runs out on 8 GB Macs and starts swapping to SSD; (2) Apple Neural Engine conflict — another app is using ANE so Whisper queues behind it; (3) corrupt model file from interrupted download. Check Activity Monitor's Memory Pressure graph: if yellow or red while Whisper runs, memory is the issue.

How do I fix Whisper stuck at 99% on 8 GB MacBook Air?

Switch from Whisper large-v3 (10 GB peak RAM) to large-v3-turbo (6 GB peak). Large-v3-turbo fits comfortably on 8 GB Macs without swap and produces only 2.2 percentage points worse accuracy (5.7% vs 3.5% WER), which is imperceptible for most dictation. Also close Chrome (3-5 GB), Slack (1-2 GB), and Zoom before running long transcription jobs.

Does Apple Intelligence cause Whisper to hang?

Sometimes. Apple Intelligence's Writing Tools runs continuously on Apple Neural Engine watching for text-rewrite opportunities. Only one app can use ANE at a time, so Whisper queues behind Apple Intelligence. If you're stuck at 99% repeatedly, temporarily disable Apple Intelligence via System Settings → Apple Intelligence & Siri → toggle off, restart your Whisper app, and see if the symptom disappears.

How long should Whisper take to finish the last 1%?

Normally a few seconds to a minute, depending on audio length. If it's been over 5 minutes at 99% with no progress, you have a hang — kill the job and apply the 8 fixes. The decoder's final token generation is fast on healthy systems; only memory pressure or ANE conflict produces multi-minute hangs. SSD swap operations are 100-1000× slower than RAM, which is why 99% can stretch from seconds to hours when memory is the issue.

Is there a way to verify my Whisper model file isn't corrupted?

Yes. Compare the file's SHA-256 hash against the official hash published on Hugging Face for that model version. Command: `shasum -a 256 ggml-large-v3-turbo.bin`. If the hash doesn't match what's published at huggingface.co/openai/whisper-large-v3-turbo, the file is corrupt — delete and re-download from a stable connection. File size alone isn't sufficient because a partially-written file can appear correct-sized but contain garbage in the middle.

Does reducing beam size affect accuracy?

Slightly. Whisper's default beam size is 5; reducing to 1 (no beam search) typically increases word error rate by 0.5-1.5 percentage points on edge cases. For most dictation use cases, the difference is imperceptible. Beam size 1 cuts decoder memory usage by 5×, often resolving stuck-at-99% on memory-constrained Macs. Worth trying as a first diagnostic step when you're hitting recurring hangs.

Will upgrading from 8 GB to 16 GB Mac fix the stuck issue?

For most users, yes. 16 GB RAM lets Whisper large-v3 (10 GB peak) run with 6 GB headroom for macOS and other apps. For users running heavy IDE workloads, Docker, or multiple browsers, even 16 GB can be tight — 24 GB is the comfortable point. For Mac Mini and MacBook Pro users buying new hardware, the RAM upgrade is often more valuable than CPU upgrade for ML-heavy workflows including Whisper, especially for users who keep many apps open during transcription.

Why doesn't this happen with MetaWhisp?

MetaWhisp uses large-v3-turbo by default (6 GB peak RAM, fits on 8 GB Macs) and falls back to GPU Metal when ANE is busy with another app's inference. The architecture eliminates the two most common stuck-at-99% causes — RAM overflow and ANE contention — at the design level rather than requiring user troubleshooting. Other Whisper-based apps that default to large-v3 are more prone to the symptom on memory-constrained hardware.

About the Author

Andrew Dyuzhov is the solo founder and CEO of MetaWhisp, a free on-device voice-to-text app for macOS that runs Whisper large-v3-turbo on Apple Neural Engine. He architected MetaWhisp's memory and ANE-fallback logic specifically to avoid the stuck-at-99% failure modes that plague Whisper apps using default beam sizes and large-v3 on 8 GB Macs. This troubleshooting guide reflects testing across whisper.cpp, MetaWhisp, MacWhisper, and SuperWhisper on M1, M2, and M3 hardware. Connect on X or GitHub.

Related Reading