
Why Does Whisper Keep Filler Words in Transcripts?
OpenAI's Whisper is designed for verbatim accuracy. The model transcribes what you actually said, including hesitations, restarts, and filler words. This is a deliberate design choice in the architecture: removing fillers would require post-processing logic that introduces accuracy trade-offs and language-specific tuning, per OpenAI's Whisper repository documentation. For some use cases (legal transcription, medical dictation, journalism quotes), verbatim output is exactly what you want — every "um" is part of the record. For most other use cases (emails, Slack messages, blog drafts, meeting notes), fillers are noise that needs to come out before the transcript is useful. The common filler words in English:- Hesitation markers: um, uh, hmm, er, ah
- Discourse particles: like, you know, I mean, sort of, kind of
- Intensifiers used as fillers: basically, literally, actually, honestly
- Repetition fragments: "I-I-I think", "we-we need", "the the the"
- Restart phrases: "wait, actually", "no, scratch that", "let me start over"
Method 1: MetaWhisp Clean Mode (Automatic at Dictation Time)
The lowest-friction option is using a voice-to-text app that handles filler removal automatically as part of the transcription pipeline. MetaWhisp's Clean mode does this — your audio runs through Whisper for accurate transcription, then through a lightweight GPT pass that strips fillers and fixes grammar without changing your voice. Setup:- Download MetaWhisp (free, requires Apple Silicon M1+)
- Open MetaWhisp Settings → Processing Modes
- Select Clean mode
- For free tier: enter your OpenAI API key (costs roughly $0.01-0.05/day of normal use). For Pro tier: no API key needed.
- Press Right Option, speak naturally with fillers, release
- The text that pastes into your active app has fillers removed automatically
"so um I was thinking like we should probably uh move the deadline you know to next Friday because um the design team needs like more time to you know finish things"After (Clean mode output):
"I was thinking we should move the deadline to next Friday because the design team needs more time to finish."The key property of Clean mode: it preserves your original meaning and phrasing but removes fillers, fixes grammar, and adds punctuation. It does NOT restructure sentences or upgrade vocabulary — that's what Rewrite mode does. Clean is the right setting when you want polished text that still sounds like you.
Pro tip: Set MetaWhisp's default mode to Clean for daily dictation. Switch to Raw mode only for specific use cases (meeting notes, journaling, legal transcription) where verbatim output matters. This way you get filler-free output automatically without having to remember to enable it per-recording.
Method 2: AI Post-Processing with ChatGPT or Claude
If you already have a Whisper transcript from another app (MacWhisper file transcription, Wispr Flow, raw whisper.cpp, Word M365 Transcribe), the easiest cleanup path is pasting it into ChatGPT or Claude with a targeted prompt. The prompt that works well:You are a transcript editor. Below is a verbatim voice transcript that contains
filler words and hesitations. Remove fillers (um, uh, like, you know, I mean,
basically, literally, actually as fillers), fix grammar, add proper punctuation,
and capitalize sentence starts. PRESERVE the original meaning, vocabulary, and
sentence structure. Do NOT rewrite, paraphrase, or upgrade word choice. Just
clean up.
Transcript:
[paste your transcript here]
This works in:
- ChatGPT (free tier sufficient for short transcripts; Plus tier for long ones)
- Claude (free tier sufficient; Pro for higher-volume use)
- Google Gemini (similar capability)

Method 3: Regex Script for Bulk Processing
For batch processing dozens or hundreds of transcripts, AI APIs are overkill and slow. A regex-based script handles 90% of filler removal at near-zero cost and millisecond latency. Python script:import re
FILLERS = [
r'\b(um+|uh+|hmm+|er+|ah+|mhm+)\b', # hesitations
r'\b(like|you know|I mean|sort of|kind of)\b', # discourse particles
r'\b(basically|literally|actually|honestly)\b', # intensifiers (use with care)
r'\b(\w+)-\1\b', # word repetitions: "the the"
]
def remove_fillers(text):
for pattern in FILLERS:
text = re.sub(pattern, '', text, flags=re.IGNORECASE)
# Collapse multiple spaces
text = re.sub(r'\s+', ' ', text)
# Fix punctuation spacing
text = re.sub(r'\s+([.,!?])', r'\1', text)
return text.strip()
# Process a transcript file
with open('transcript.txt') as f:
raw = f.read()
clean = remove_fillers(raw)
with open('transcript-clean.txt', 'w') as f:
f.write(clean)
Bash one-liner for quick cleanup:
sed -E 's/\b(um|uh|like|you know|I mean|basically|literally)\b//gi' transcript.txt
Strengths: Zero cost. Millisecond processing. Easily scriptable for batch jobs (`for file in *.txt; do ... done`).
Weaknesses: Doesn't handle context. Removes "basically" even when the word is being used legitimately ("the basically free option"). Doesn't fix grammar. Doesn't restructure restart phrases.
Method 4: Manual Cleanup in Word or BBEdit
For one-off transcripts where you want full control, manual cleanup with find-and-replace in any text editor works. This is the fallback when other methods produce edge cases you need to fix anyway. In Microsoft Word:- Press Cmd+F to open the Find pane
- Click the gear icon → Advanced Find & Replace
- Enable Use wildcards in the search options
- Find:
\b(um|uh|hmm|er|ah)\b→ Replace: (empty) → Replace All - Repeat with discourse particles:
\b(like|you know|I mean)\b - Manual pass for remaining edge cases
- Cmd+F for Find
- Enable Grep for regex search
- Search pattern:
\b(um|uh|hmm|like|you know|I mean)\b - Replace with empty string
- Use Replace All
What Counts as a Filler Word in Different Contexts?
The same word can be filler in one context and content in another. Context-sensitive filler removal:| Word | Filler use | Content use |
|---|---|---|
| like | "It was, like, really cold" | "I like coffee" |
| basically | "Basically, I mean, you know" | "This is basically a regex engine" |
| literally | "I literally just woke up" | "Translated literally from French" |
| actually | "It's, actually, kind of fine" | "That's actually wrong; the correct answer is X" |
| honestly | "Honestly, I don't, you know" | "He answered honestly about the bug" |
| you know | "It was, you know, complicated" | "You know my brother, right?" |
How Many Filler Words Does the Average Person Use?
Linguistics research suggests filler frequency varies widely by speaker, context, and language. Per Wikipedia's overview of filler words in linguistics, English speakers use fillers at rates of roughly 2-5% of total spoken words in casual conversation, dropping to 1-2% in formal presentations and rising to 5-8% in spontaneous unprepared speech. Practical implication: a 1,000-word raw Whisper transcript of casual conversation contains 20-50 filler words. A presentation transcript contains 10-20. An unprepared brainstorm transcript can contain 50-80. The cleanup workload scales accordingly. For voice-to-text users who dictate routine work content (Slack, emails, notes), the filler rate tends to land around 3-4% — which means roughly 30-40 fillers per 1,000 words of dictation. Removing them by hand takes 2-3 minutes per 1,000 words; automatically via Clean mode or AI takes seconds.
What's the Difference Between Filler Removal and Rewrite Mode?
Two distinct operations that often get confused:- Filler removal (Clean / Correct mode) — Removes fillers, fixes grammar, adds punctuation. Preserves your original meaning, sentence structure, and vocabulary. The output reads like a polished version of what you said.
- Rewrite mode — Restructures sentences, upgrades vocabulary, changes tone. The output reads like what you said was professionally rewritten by someone else. Useful for client emails or documentation, but loses your voice.
Original (Raw): "so um I was thinking like we should probably uh move the deadline you know to next Friday because um the design team needs like more time"Clean mode keeps your conversational voice ("I was thinking we should..."). Rewrite mode makes it more formal ("I'd like to propose..."). Use Clean for Slack messages and casual emails; use Rewrite for client communication and published documentation.
Clean mode: "I was thinking we should move the deadline to next Friday because the design team needs more time."
Rewrite mode: "I'd like to propose extending the deadline to next Friday. The design team requires additional time to deliver quality work."
How Do Other Voice-to-Text Apps Handle Filler Removal?
A quick survey of how the major Mac voice-to-text apps approach filler word removal in 2026:- MetaWhisp Clean mode — Built-in, uses GPT post-processing, preserves your voice. Free with own OpenAI key or included in Pro.
- Wispr Flow — Built-in AI editing on Pro tier ($12/month). Tends to lean toward Rewrite-style restructuring rather than minimal filler removal. Less voice preservation.
- SuperWhisper Custom Modes — User defines per-mode AI prompts. You can write a Clean-style prompt or a Rewrite-style prompt; the flexibility is the feature. Pro tier $8.49/month.
- Otter.ai — Has "Insights" that summarize meetings but doesn't offer per-transcript filler removal as a separate feature. Their default transcripts include fillers.
- MacWhisper — Pure Whisper output with no built-in cleanup. Run post-processing yourself via AI or script.
- Apple Dictation — Doesn't include AI cleanup. Output contains fillers verbatim.

Does Filler Removal Hurt Transcript Accuracy?
Filler removal is a post-processing step that doesn't affect the underlying Whisper transcription accuracy. The model still hears "um" and outputs it; the cleanup pass then removes it. The Whisper word error rate (5-7% on clean English per OpenAI's model card) measures core transcription accuracy, separate from filler retention. What can hurt accuracy is over-aggressive filler removal that strips meaningful words. The intensifier problem (literally, actually, basically as both filler and content) is the most common source of accidental content loss. The fixes:- Conservative regex patterns — Don't auto-strip intensifiers; only strip clear hesitation markers (um, uh, hmm)
- AI cleanup with context understanding — Let GPT or Claude decide which usage is filler vs content
- Two-pass workflow — Regex for clear wins, then manual or AI for ambiguous cases
Frequently Asked Questions About Filler Word Removal
How do I automatically remove filler words from Whisper transcripts?
Use a voice-to-text app with built-in cleanup like MetaWhisp's Clean mode. It runs your audio through Whisper for accurate transcription, then through a lightweight GPT pass that removes fillers (um, uh, like, you know) and fixes grammar without changing your voice. Setup takes 5 minutes. Cost is roughly $0.01-0.05 per day on free tier with your own OpenAI key, or included in Pro tier.
Why does Whisper keep "um" and "uh" in transcripts?
Whisper is designed for verbatim accuracy. The training data (680,000 hours of audio with paired transcripts) preserved fillers because professional captioning retains them for accuracy. Whisper's neural decoder doesn't have an explicit filler-skipping layer like older rule-based ASR systems. Post-processing is the only way to remove fillers from Whisper output, which is what Clean mode and AI cleanup do.
What's the cheapest way to clean up Whisper transcripts?
For occasional cleanup: paste the transcript into free ChatGPT or Claude with a cleanup prompt. Zero cost for most users. For high-volume: write a regex script in Python or sed — millisecond processing, zero cost. For ongoing daily dictation: MetaWhisp Clean mode with your own OpenAI API key costs roughly $1-1.50 per month for typical use.
Should I use regex or AI for filler removal?
Use regex for clear hesitation markers (um, uh, hmm, er, ah) — fast, free, no false positives. Use AI (ChatGPT, Claude, or MetaWhisp Clean mode) for context-sensitive intensifiers (literally, actually, basically) where the word might be filler or content. Two-pass workflow combines both: regex first for cheap wins, AI second for ambiguous cases.
Does filler removal change my voice in the transcript?
Clean mode and similar "filler removal only" methods preserve your voice — they remove hesitations and fix grammar but keep your original sentence structure and vocabulary. Rewrite mode (a different operation) does change your voice by restructuring sentences and upgrading vocabulary. Use Clean for casual writing where authenticity matters; use Rewrite for formal client communication where polish matters more than voice.
Can I batch-process many Whisper transcripts at once?
Yes. For batch processing, use a regex script in Python or sed that loops over your transcript files. Each file processes in milliseconds. For higher quality batch processing with context awareness, write a small Python script that calls the OpenAI API or Anthropic API with the cleanup prompt for each file. Cost via API is roughly $0.001-0.005 per 1,000-word transcript.
What about filler words in other languages?
Each language has its own fillers. Spanish: "eh", "este", "o sea". French: "euh", "ben", "tu vois". German: "ähm", "halt", "äh". Russian: "ну", "это", "как бы". Whisper preserves these too. AI cleanup with GPT or Claude handles non-English fillers because the LLMs are multilingual. Regex needs language-specific patterns. MetaWhisp's Clean mode handles fillers for 99 Whisper-supported languages via its GPT post-processing.
How accurate is automatic filler removal?
For clear hesitation markers (um, uh, hmm), automatic removal is essentially 100% accurate — these are never content words. For discourse particles (like, you know), automatic removal hits 90-95% accuracy depending on the method. For intensifiers (literally, actually, basically), accuracy drops to 80-90% because context determines whether the word is filler or content. AI-based methods score higher on intensifiers than regex-based methods.
About the Author
Andrew Dyuzhov is the solo founder and CEO of MetaWhisp, a free on-device voice-to-text app for macOS that runs Whisper large-v3-turbo on Apple Neural Engine. He built MetaWhisp's Clean mode and Rewrite mode to handle filler removal and text polish for users who want dictated content that doesn't sound dictated. The four methods in this article reflect his hands-on experience tuning the Clean mode prompts, testing regex scripts against real Whisper transcripts, and working with users who needed transcript cleanup for journalism, legal, and academic workflows. Connect on X or GitHub.
Related Reading
- What Is Whisper large-v3-turbo? Local AI for Mac — the underlying transcription model
- Why Whisper Hallucinates in Silence — adjacent transcription quality issue
- Voice-to-Text for ADHD Writers — when Raw vs Clean mode matters
- 7 Best Voice-to-Text Apps for Mac (2026) — apps with built-in cleanup compared
- How to Transcribe an Audio File in Word on Mac — Word M365 + cleanup workflow