Table of Contents
Every podcaster knows the feeling: you record a great conversation, then spend an hour clicking through a waveform in Audacity, cutting out every pause, filler word, and cough. It does not have to take that long — and it does not require desktop software.
This guide describes the complete podcast editing workflow, from raw recording to a polished episode ready for Spotify, Apple Podcasts, and your RSS feed. Each step runs in a browser. The workflow handles multi-segment recording assembly, automatic silence removal, noise reduction, loudness normalization to the podcast standard (−16 LUFS), and format export at the correct settings for major podcast platforms.
Whether you produce a solo recording, a co-hosted show, or an interview podcast with guests calling in from different locations, the same workflow applies.
The Complete Podcast Editing Workflow
The typical podcast editing workflow has six steps, in this order:
1. Assemble segments — merge intro music, main content (possibly multiple takes), and outro
2. Remove dead air — automatic silence removal to clean up pauses
3. Reduce noise — address room noise, hum, or background sounds
4. Level the volume — normalize loudness to −16 LUFS for podcast distribution
5. Export in the right format — MP3 128 kbps mono
6. Quality check — listen to the first and last 30 seconds plus two spots in the middle
Skipping steps or doing them out of order causes problems. Normalizing before removing silence means the silence-removal threshold calibration is affected by the peak level. Merging after normalizing means merged segments may have different loudness values. Follow the order above.
Step 1: Assemble Multi-Segment Recordings
Podcast recordings rarely come from a single continuous take. You may have:
- Intro music recorded separately
- Main content split across multiple takes (recording stopped and restarted)
- A separate outro or sponsor read
- Guest audio recorded on a different device at a different volume
Merging the segments:
Use the Merge Audio tool to combine segments in order: intro → main content (in take order) → outro. Drag to arrange the correct order before merging.
Handling different volume levels between segments: If one segment (typically the guest audio or a second microphone) is consistently louder or quieter than another, normalize each segment individually before merging. Use the Volume Normalizer on each file separately at −16 LUFS, then merge the normalized segments.
Cross-fades at segment boundaries: Add a 100–200ms cross-fade between segments when joining two continuous speech sections. This eliminates any click at the cut point. For joining speech to music, a 500ms–1s cross-fade creates a more natural transition.
The "room tone" technique: When you cut out a section, the background noise level may change abruptly (one recording environment to another). If you have a few seconds of just background noise from each recording environment, use these as "room tone" patches — short inserts of ambient sound at cut points to smooth the transition.
Merge podcast segments— Combine intros, main content, and outrosStep 2: Remove Dead Air and Long Pauses
After assembling, run automatic silence removal. A typical 60-minute interview contains 8–15 minutes of dead air — pauses between questions, filler hesitations, "um"s followed by silence, coughing gaps. Removing this tightens the pacing dramatically.
Recommended silence removal settings for podcasts:
- Silence threshold: −38 dB to −42 dB. The exact value depends on your noise floor. If your room is very quiet, go lower (−45 dB). If there is notable background hum, go higher (−35 dB) to treat the noise floor as silence.
- Minimum silence duration: 0.8–1.2 seconds. This removes pauses that are genuinely dead air while preserving natural conversational rhythm. Setting it below 0.5 seconds creates an unnaturally rushed pace.
- Padding: 100–150ms. Leaving a short tail on each side of a removed section prevents the speech from feeling abruptly cut. The next word does not need to start immediately.
What silence removal does NOT do: It does not remove filler words ("um," "uh," "you know") that are spoken, only pauses. Manual cutting is required for those. It does not remove noise — if your noise floor is above the threshold, the noisy sections will be kept.
After silence removal: Listen to three random spots in the edited audio. If speech feels unnaturally rushed or words are being clipped, increase the minimum silence duration or add more padding.
Remove silence automatically— Configurable threshold and padding for podcast editingStep 3: Noise Reduction
Noise in podcast recordings comes from several sources, each with different treatments:
HVAC/fan hum (steady broadband noise): The most common recording problem. A constant low-frequency rumble from air conditioning or a computer fan. AI noise removal handles this effectively at medium aggressiveness settings. Alternative: a low-shelf EQ cut below 80–100 Hz removes rumble without touching voice frequencies.
Electrical hum (50 Hz or 60 Hz and harmonics): A characteristic buzz from ground loops, cheap USB audio interfaces, or fluorescent lighting. Frequencies: 50/60 Hz, 100/120 Hz, 150/180 Hz (and up). The Equalizer tool can apply narrow notch filters at these exact frequencies. This is a free solution that handles hum without affecting voice quality.
Keyboard/mouse clicks: Transient sounds. These require manual identification and cutting — listening through the recording and cutting around each click. There is no automated solution that handles this without risk of cutting into speech.
Room reverb (echo): Cannot be removed after recording — dereverberation algorithms exist but produce artifacts. The only effective solution is acoustic treatment before recording. A closet full of clothes is the classic budget solution.
Recommendation for most podcasters: If you have significant background noise (audible fan, traffic, HVAC), use AI noise removal at 30–50% aggressiveness. This preserves voice naturalness while significantly reducing the noise floor. Over-processing creates a characteristic "swimmy" or "underwater" artifact that is more distracting than moderate noise.
Remove background noise with AI— AI-powered noise reduction — uses creditsStep 4: Normalize to −16 LUFS
Loudness normalization is the last processing step before export. Every major podcast platform applies its own loudness normalization, but matching your loudness to the platform's target before upload gives you full control over how it sounds.
The podcast loudness standard:
- Apple Podcasts: −16 LUFS
- Spotify: −14 LUFS (though Spotify normalizes podcast streams like music)
- Pocket Casts, Overcast, and most other players: match your RSS feed loudness
- Most podcast hosting platforms recommend: −16 LUFS integrated, −1 dBFS true peak maximum
Why −16 LUFS specifically: Speech has a high dynamic range compared to mastered music. At −16 LUFS, voices are loud enough to hear clearly in noisy environments (commuting, gym) without the compression artifacts that result from pushing speech too loud.
LUFS vs peak normalization for podcasts: Use LUFS, not peak. Peak normalization only sets the loudest sample — a recording with a single loud cough or door slam will be normalized based on that peak, making the speech quietly after the single loud event. LUFS measures the perceived loudness across the entire file, giving a consistent and correct result.
After normalization: Check true peak level. The true peak should not exceed −1 dBFS. If it does, apply a limiter to −1 dBFS before re-normalizing. Most normalization tools handle this automatically.
Normalize to −16 LUFS— Podcast preset at −16 LUFS with true peak limitingStep 5: Export Settings for Podcast Distribution
Format: MP3. Every podcast app and directory supports MP3. AAC at 128 kbps is a valid alternative but MP3 is the universal choice.
Bitrate: 128 kbps constant bitrate (CBR). This is the podcast industry standard. 96 kbps is acceptable for speech-only shows where file size matters significantly. 192+ kbps wastes bandwidth with no perceptible benefit.
Channels: Mono. Stereo podcasts are twice the file size with zero benefit for listeners — human speech is a mono signal. The only reason to use stereo is if you have music or spatial audio effects that genuinely benefit from stereo separation.
Sample rate: 44100 Hz. Some platforms specify 44100, some accept 48000. 44100 is universally accepted.
ID3 tags: Set before upload: title (episode title), artist (podcast name), album (podcast name), track number (episode number), year, and embed your podcast artwork (1400×1400px to 3000×3000px, JPEG). Most podcast hosts read these tags. Properly tagged episodes display correctly in every directory.
File size as a sanity check: A one-hour episode at MP3 128 kbps mono = approximately 56 MB. If your export is significantly larger, check the format settings. If it is 10× smaller, check that quality settings were not accidentally set to minimum.
Frequently Asked Questions
What loudness level should my podcast be?
Should I use stereo or mono for my podcast?
How do I edit out my filler words (um, uh)?
My guest was recorded at a different volume than me. How do I fix this?
What is the best file format for podcast submission?
How long should it take to edit a one-hour podcast episode?
Summary
A polished podcast episode is the result of a repeatable six-step workflow, not hours of manual waveform clicking. Assemble segments, remove dead air, reduce noise, normalize to −16 LUFS, export at 128 kbps mono, and listen-check. The total time for a well-recorded episode should be 20–40 minutes of editing, not hours.
The investment in good recording conditions (a quiet room, decent microphone, proper positioning) pays far more dividends than elaborate post-processing. Better source material means less editing time and better results. But for recordings you already have, the workflow above extracts every bit of quality available from the raw material.