- Built-in cloud transcription works fine for most meetings — fast, accurate, free on paid plans. Audio sits on Zoom's servers 30 days.
- Zoom AI Companion adds summaries and action items, also cloud-processed.
- Local recording + on-device transcription is the private path: zero data leaves your Mac. Adds ~3 minutes of post-meeting work.
- Third-party tools (Otter, Fathom, Fireflies) add speaker diarization but bring back cloud privacy questions.
- For NDA, medical, legal meetings: switch Zoom Settings → Recording to Local, transcribe with MetaWhisp or Whisper Transcription. Audit with Little Snitch.
- What Zoom offers natively in 2026
- Method 1: Zoom cloud transcription (built-in)
- Method 2: Zoom AI Companion
- Method 3: Local recording + on-device transcription (private)
- Method 4: Third-party meeting tools
- The privacy reality of Zoom transcription
- All four methods compared
- How to set up private Zoom transcription on Mac
- Real-world workflows
- FAQ
- About the author
What Zoom offers natively in 2026
Zoom shipped three transcription products. All three process audio on Zoom's infrastructure.Live captions
Free on every Zoom plan since 2022. Click CC in any meeting, choose Show captions. Real-time speech-to-text appears at the bottom of the call. Useful for accessibility, multilingual teams, and noisy environments. Live captions are processed live on Zoom's servers — no storage by default unless you also enable recording. Latency is typically 1-2 seconds. Accuracy is decent on clean audio in supported languages. Off-script speech, heavy accents, and multi-speaker overlap reduce quality.Audio transcription of cloud recordings
When you record a meeting to Zoom's cloud, Zoom can also generate a written transcript. Available on paid plans (Pro and higher). The transcript shows up alongside the recording in your Zoom web dashboard, usually within 30 minutes after the meeting ends. Behind the scenes: cloud recording uploads the meeting to Zoom's storage. Their backend transcription service processes the audio, returns a VTT file (subtitles format) and a TXT file (plain text), and stores both alongside the recording. This is the "default" transcription most users encounter. It's good enough for internal team meetings, training videos, and casual recordings.Zoom AI Companion
Released 2023, expanded heavily in 2024-2025. AI Companion goes beyond raw transcription — it produces a structured meeting summary, identifies action items, breaks the recording into chapters, and lets you query the recording in plain English ("What did we decide about pricing?"). Available on Pro plans and higher (often included at no extra cost in 2026). Per Zoom's documentation, AI Companion data is processed on Zoom's infrastructure and does not contribute to model training without explicit admin opt-in. For team meetings and sales calls where you want notes plus action items, AI Companion saves real time. For meetings where you don't want any AI to summarize the content, you can disable it per-meeting. ---Method 1: Zoom cloud transcription (built-in)
How it works
- Sign in to zoom.us in a browser.
- Settings → Recording → enable Cloud recording.
- Within Cloud recording settings, enable Audio transcript.
- Schedule and host a meeting normally. Click Record → Record to the Cloud.
- End the meeting. Zoom processes for 5-30 minutes. You receive an email with the recording link and transcript file.
What you get
- VTT file (subtitle format with timestamps)
- TXT file (plain text transcript, no timestamps)
- MP4 recording with embedded captions
- Web-based playback with searchable transcript
Strengths
- Zero post-meeting work — Zoom handles it
- Searchable across past recordings
- Built-in sharing (links to recording + transcript)
- Decent accuracy on clear audio
- Free with any paid Zoom plan
Weaknesses
- Audio recording stored on Zoom's servers (30 days default; longer based on plan)
- Transcript can be reviewed by Zoom employees for "service improvement" per their privacy policy
- Not HIPAA-compatible without a Business Associate Agreement (paid add-on)
- No speaker diarization in the basic transcript (just one block of text)
- Auto-deletes after retention period — extending costs more
When to use it
Internal team meetings. All-hands. Training videos. Anything where the content is internal, low-sensitivity, and you want zero friction. ---Method 2: Zoom AI Companion
How it works
Pre-meeting: in your Zoom account settings, enable AI Companion (admin level on business plans, account level on Pro). Per-meeting: as host, click the AI Companion icon during the call to start summarization. Or set it to auto-start on every meeting. After the meeting, AI Companion produces a structured summary that includes:- Brief meeting overview (3-5 sentences)
- Key topics discussed (bullet points)
- Action items with owners
- Decisions made
- Chapter timestamps for navigation
- Q&A: ask plain-English questions about the meeting
What it's good for
Sales calls. Customer success interviews. Cross-functional planning meetings. Any meeting where you'd write a summary email afterward — AI Companion drafts that for you in 30 seconds.Privacy posture
Per Zoom's privacy documentation, AI Companion processes audio on Zoom's infrastructure. Data is not used to train Zoom's AI models without explicit account-admin opt-in. However, audio and transcripts still flow through Zoom's pipeline during processing — the same posture as cloud recording transcription. For sensitive meetings (legal, medical, financial, HR), AI Companion adds value but doesn't change the underlying privacy reality: your audio is processed in Zoom's cloud.Cost
Included with Pro plans and higher in most regions in 2026. Zero incremental cost beyond your Zoom subscription. Compared to standalone meeting AI tools (Otter Pro $17/mo, Fathom $19/mo, Fireflies $20/mo), AI Companion is essentially free if you already pay for Zoom Pro. ---Method 3: Local recording + on-device transcription (private)
This is the private path. The audio never leaves your Mac.The architecture
┌─────────────────────────────────────────────────────────┐ │ YOUR MAC │ │ │ │ ┌──────────┐ ┌────────────┐ ┌─────────────┐ │ │ │ Zoom │──▶│ M4A file │──▶│ Whisper │ │ │ │ (local │ │ saved to │ │ large-v3 │ │ │ │ record) │ │ ~/Zoom/ │ │ on ANE │ │ │ └──────────┘ └────────────┘ └─────────────┘ │ │ │ │ │ │ └─ network: only Zoom call │ │ │ ▼ │ │ ┌─────────────┐ │ │ │ Plain text │ │ │ │ transcript │ │ │ └─────────────┘ │ │ │ │ ┌─────────── NETWORK BOUNDARY ─────────────┐ │ │ │ ❌ no transcript / audio egress │ │ │ └────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────┘The Zoom call itself uses the network (you can't eliminate that). But the recording and transcription stay local.
Setup
Switch Zoom from Cloud to Local recording
In Zoom Settings → Recording: disable Cloud recording (if enabled), enable Local recording. Choose a folder for saved recordings (default: ~/Documents/Zoom).
Record the meeting locally
As host, in any meeting click Record → Record on this Computer. Zoom captures both video (.mp4) and audio (.m4a) to your Mac. End the meeting; Zoom processes the file in 1-3 minutes for a 30-minute call.
Find the audio-only file
In your Zoom recordings folder, look for audio_only_*.m4a. This is the cleaner input for transcription — no video processing overhead.
Transcribe locally
Drag the M4A into MetaWhisp for batch file transcription, or Whisper Transcription by Good Snooze. Both use Whisper large-v3-turbo on Apple Neural Engine. A 30-minute meeting transcribes in 30-90 seconds.
(Optional) Audit with Little Snitch
If you need compliance proof: install Little Snitch, run a transcription, observe network activity. You'll see zero outbound connections from the transcription tool. This is what HIPAA / NDA compliance teams want to see.
Strengths
- Zero data leaves your Mac after the call ends
- HIPAA-compatible by architecture (no Business Associate needed since no PHI transmission)
- Works in airplane mode after the call
- One-time cost (free options available)
- Audio file stays in your control — keep it, encrypt it, delete it
Weaknesses
- 3 extra minutes per meeting vs. fully automatic cloud transcription
- No speaker diarization in the basic flow (one block of text)
- Requires you to be the host (only the host can record)
- No built-in sharing — you handle distribution
When to use it
Client calls under NDA. Medical or legal consultations. HR conversations. Therapy or counseling sessions. Any meeting where the conversation is sensitive enough that "Zoom's privacy policy" isn't sufficient assurance. ---Method 4: Third-party meeting tools
Several products integrate with Zoom and add features. They share one common cost: your audio passes through their servers too.Otter.ai
Joins Zoom meetings as a virtual participant (the now-familiar "Otter.ai has joined" message). Provides real-time transcription, speaker labels (who said what), and a searchable archive. Free tier offers 300 minutes/month. Pro is $16.99/month. Strengths: speaker diarization is solid, searchable cross-meeting transcripts, mobile-friendly. Weaknesses: cloud-based, the bot's presence changes meeting dynamics, audio retained on Otter's servers.Fireflies.ai
Similar approach to Otter — bot joins meeting, transcribes, summarizes. $19/user/month for Pro. Integrates with CRMs (Salesforce, HubSpot).Fathom
Free for individual users (paid for teams). Records and summarizes Zoom meetings without joining as a separate bot — uses local app integration. Fast adoption among sales teams.Read AI
Focuses on sentiment analysis and engagement metrics during meetings, in addition to transcription. Useful for managers analyzing meeting effectiveness.Common privacy issue
All of these tools require uploading audio (or letting their bot capture it) to their cloud. Each has its own privacy policy, retention period, and data-handling pipeline. None are HIPAA-compatible by default — most offer BAAs as paid add-ons. For meetings where speaker diarization or AI features matter more than maximum privacy, these tools fill a real need. For maximum privacy, they don't. ---The privacy reality of Zoom transcription
Let me show what actually happens to your audio in each method.| Method | Where audio goes | Retention | Human review? |
|---|---|---|---|
| Zoom cloud transcription | Zoom servers (US/EU based on region) | 30 days default; up to years on paid retention | Possible per privacy policy |
| Zoom AI Companion | Zoom servers + AI processing pipeline | Until you delete | Possible per privacy policy |
| Local recording + Whisper local | Your Mac only | You decide | Impossible |
| Otter.ai | Otter servers | Until you delete | Sampled per ToS |
| Fireflies.ai | Fireflies servers | Until you delete | Sampled per ToS |
| Fathom | Fathom servers | Until you delete | Per ToS |
What "human review" actually means
Most cloud meeting tools state in their privacy policies that recordings may be reviewed for service improvement, abuse detection, model improvement, or legal compliance. In practice, this is usually a small sample, often anonymized, often just for QA. But it does happen. For a casual team meeting, none of this matters. For a doctor-patient consultation, a lawyer-client privileged conversation, an HR investigation, or a board meeting discussing M&A — it matters a lot.HIPAA, NDA, and compliance
Zoom offers a HIPAA Business Associate Agreement (BAA) as a paid add-on for healthcare organizations. With a BAA, Zoom commits to specific data-handling practices that align with HIPAA requirements. Without a BAA, recording PHI with Zoom is a HIPAA violation. For NDA-sensitive business meetings, contractual terms vary. The safest path: don't generate a third-party copy of the audio at all. Use local recording + on-device transcription. The legal exposure is limited to your own device.The on-device alternative for compliance
When transcription happens entirely on your Mac:- No Business Associate exists (because no third party processes the data)
- No HIPAA BAA needed (because no PHI transmission)
- No retention policy depends on a vendor (you delete when you choose)
- No employee at any company can review your audio
- A subpoena can only target your device, not multiple servers
All four methods compared
| Criteria | Zoom cloud | AI Companion | Local + Whisper | Otter / Fireflies |
|---|---|---|---|---|
| Setup time | 2 min | 5 min | 10 min | 10 min |
| Per-meeting effort | 0 | 0 | 3 min | 0 |
| Privacy | Cloud | Cloud | Local-only | Cloud |
| HIPAA without BAA | No | No | Yes (architecture) | No |
| Speaker labels | Limited | Yes | Limited | Yes |
| Action items / summary | No | Yes | No (or via post) | Yes |
| Search across meetings | Yes | Yes | Manual | Yes |
| Offline capable | No | No | Yes (after model d/l) | No |
| Cost (per user/year) | $180+ for Pro plan | Included Pro+ | $0-30 | $200-240 |
| Languages supported | 30+ | 10+ | 30+ (Whisper) | 3-5 |
Decision matrix
- Internal team meeting? Zoom cloud + AI Companion. Zero friction.
- Sales call you want to review later? Otter or Fathom for the speaker labels.
- Client call with NDA? Local recording + on-device transcription.
- Medical / legal / HR? Local recording + on-device transcription. No question.
- Multi-language meeting? Local + Whisper for the language coverage. Whisper handles code-switching better than most cloud tools.
- You're not the host? You can't record locally. Best option: ask the host, or accept whatever method they choose.
How to set up private Zoom transcription on Mac
The complete recipe for someone who wants the on-device path. Time to first transcript: ~15 minutes including downloads.Update Zoom settings
Open Zoom on Mac. Settings (⌘,) → Recording. Disable Cloud recording. Enable Local recording. Set "Store my recordings at" to ~/Documents/Zoom or another folder.
Choose your transcription tool
Two solid options on Mac:
- MetaWhisp — free, real-time dictation (system-wide), and now batch file transcription. 30+ languages. On-device by default. Optional cloud at $30/year.
- Whisper Transcription by Good Snooze — paid one-time, dedicated file-based UI, drag-and-drop batch processing.
For mixing real-time dictation with file transcription, MetaWhisp covers both. For pure file batch work, Whisper Transcription is more focused.
Wait for the model download
Both tools download Whisper large-v3-turbo on first run (~1.5 GB). One-time download over the internet. After that, all transcription is fully offline.
Test on a short recording
Record a 2-minute test meeting (or use any existing M4A file). Drag into your tool of choice. Verify: transcript appears within ~10 seconds, accuracy looks reasonable (4-7% errors on clean speech), no network activity during transcription (Little Snitch test if you want proof).
Build a workflow
For most users: end meeting → wait 1-3 min for Zoom local processing → drag M4A into transcription tool → transcript ready. Total time vs. cloud auto-transcription: ~3 extra minutes per meeting. Worth it for sensitive content.
Decide retention
You now have local audio (M4A) and transcript (TXT or DOCX). Decide your retention policy: encrypt with FileVault (default on macOS), keep for X days, delete after, etc. This is a real decision your previous setup made for you (Zoom defaulted to 30 days).
(Optional) Document compliance posture
If you work in regulated industries: write a one-page memo describing your transcription pipeline. "Audio recorded locally, transcribed via Whisper large-v3-turbo running on Apple Neural Engine, no third-party data processing." Provide network-activity logs if your compliance officer asks. This conversation is much easier with on-device tools than with cloud tools requiring BAAs.
Real-world workflows
These are composite profiles drawn from MetaWhisp users and conversations with people who chose specific Zoom transcription approaches. Names changed.Alex bills $250/hour. Most clients sign NDAs. He used to enable Zoom cloud transcription "for convenience" until a client asked if his transcripts were stored anywhere. He couldn't answer with confidence.
His current flow: Local recording on every client call. After the meeting, drags M4A into MetaWhisp. Reviews transcript, copies key quotes into his client doc, deletes the M4A within 24 hours. Total post-meeting work: 5 minutes.
What changed: Compliance conversations with new clients are dramatically simpler. "I record locally, transcribe locally, delete in 24h" beats "Zoom retains it 30 days, here's a BAA."
Maya does telehealth therapy via Zoom. She originally avoided ALL recording because she didn't want PHI on Zoom's servers, even with a BAA. But she missed having session notes for review.
Her flow: Zoom Healthcare Plan with BAA for the call itself. Local recording only with patient consent (rare). When recording, transcribes immediately on Mac with MetaWhisp, generates her session notes, deletes the audio. Audio never goes to any third party.
Compliance posture: Documented. Audited by her practice's compliance lead. Approved.
Dani uses cloud transcription extensively. Her sales team needs speaker diarization, action items, and integration with HubSpot. Privacy is not the binding constraint — sales velocity is.
Her stack: Zoom Pro + AI Companion for internal recordings. Otter.ai for external customer calls (better speaker labels, CRM integration). Both routed through their respective clouds.
Why not local: "We're moving too fast for the manual workflow. We accept the privacy tradeoff because we're discussing software pricing, not health records."
Boris interviews sources on sensitive topics, sometimes from countries with hostile press environments. Cloud-based recording is a non-starter — anything that touches a third-party server creates discovery exposure.
His flow: Local recording always. Transcription on his offline-capable Mac after each interview. Audio encrypted with FileVault and isolated to a separate user account on his Mac. Source identities never touch any cloud service.
Why this matters: "The moment audio touches a server, that server's jurisdiction becomes part of my source's exposure. On-device removes that variable."
Frequently asked questions
Can Zoom transcribe meetings automatically?
Yes. Three native features: live captions during calls (free on all plans), audio transcription of cloud recordings (paid plans with cloud recording), and Zoom AI Companion (Pro+ plans). All three process audio on Zoom's servers and store transcripts for 30 days by default.How do I enable Zoom transcription on Mac?
Sign in to zoom.us → Settings → Recording → enable Cloud recording → enable Audio transcript. Your next cloud-recorded meeting will have a transcript emailed to you within ~30 minutes. For live captions: in any meeting click CC → Show captions.Is Zoom transcription private?
Not by default. Zoom processes audio on their servers, stores recordings and transcripts for 30 days (longer with paid retention), and per their privacy policy, transcripts may be reviewed by Zoom employees for service improvement. For NDA, medical, legal, or HR-sensitive meetings, on-device transcription is the only fully private option.How do I transcribe a Zoom meeting privately on Mac?
Switch Zoom Settings → Recording from Cloud to Local. Record meetings to your Mac. Transcribe the saved M4A file with an on-device tool: MetaWhisp (free, system-wide voice-to-text plus file batch) or Whisper Transcription (paid, dedicated file UI). Audio never leaves your Mac.Does Zoom AI Companion store my data?
Yes. AI Companion processes audio on Zoom's infrastructure and stores meeting summaries in your Zoom account. Per Zoom's documentation, AI Companion data is not used to train Zoom's AI models without explicit admin opt-in, but audio and transcripts still flow through Zoom's pipeline.What's the most accurate Zoom transcription on Mac?
For clean English speech, all major tools achieve 4-7% word error rate. Zoom native and AI Companion are highly accurate on clear audio. On-device tools using Whisper large-v3-turbo (MetaWhisp, Whisper Transcription) match cloud accuracy after a one-time 1.5 GB model download. Otter.ai and Fathom add speaker diarization which is harder to do on-device.Can I transcribe an old Zoom meeting recording on Mac?
Yes. Download the recording (zoom.us → Recordings → Download) or use a local one. Drop the MP4 or M4A file into a transcription tool. MetaWhisp processes locally; Whisper Transcription handles batch files; Otter or Rev for cloud-based with speaker labels. For privacy, use local. For speaker labels, use cloud.How much does Zoom transcription cost?
Basic transcription is included with paid Zoom plans starting at $14.99/user/month. Live captions are free. AI Companion is included on Pro+ tiers. Third-party alternatives: Otter Pro $16.99/month with unlimited transcription; MetaWhisp free for unlimited on-device, $30/year optional cloud features.What's the difference between Zoom cloud recording and local recording?
Cloud uploads the meeting to Zoom's servers, generates an automatic transcript, retains 30 days default. Local saves files (MP4 + M4A) directly to the host's Mac with no Zoom-server involvement, no automatic transcript. Local is the privacy-first option.Can Zoom transcription handle multiple languages?
Zoom supports 30+ languages for transcription. Live captions auto-detect for English and major European languages. AI Companion summaries are available in fewer languages. For multilingual meetings (multiple languages in one call), on-device tools using Whisper large-v3-turbo handle code-switching well.About the author
Andrew Dyuzhov
CEO & Solo Founder, MetaWhisp
I record client calls weekly. Half are under NDA. The rest are general-purpose. I built MetaWhisp partly because I couldn't find a clean, fast, on-device path for transcribing Zoom recordings without involving a third-party server.
This guide reflects the actual workflow I use, plus conversations with consultants, therapists, journalists, and sales leaders who picked different paths for different reasons. There is no single right answer — privacy posture, speaker diarization, AI summarization, and cost all trade off against each other.
What MetaWhisp adds for Zoom users on Mac:
- Drag any M4A or MP4 from Zoom local recordings → instant on-device transcription
- System-wide voice-to-text in any app (not just inside Zoom or Word)
- 30+ languages with auto-detection
- Free for unlimited local use; $30/year optional cloud features
- HIPAA-compatible by architecture (no data transmitted)
If something in this guide is wrong or your workflow looks different, email me. Follow the build journey on X (@hypersonq).