How to Edit a Podcast Online: Complete Workflow Guide

Table of Contents

Every podcaster knows the feeling: you record a great conversation, then spend an hour clicking through a waveform in Audacity, cutting out every pause, filler word, and cough. It does not have to take that long — and it does not require desktop software.

This guide describes the complete podcast editing workflow, from raw recording to a polished episode ready for Spotify, Apple Podcasts, and your RSS feed. Each step runs in a browser. The workflow handles multi-segment recording assembly, automatic silence removal, noise reduction, loudness normalization to the podcast standard (−16 LUFS), and format export at the correct settings for major podcast platforms.

Whether you produce a solo recording, a co-hosted show, or an interview podcast with guests calling in from different locations, the same workflow applies.

The Complete Podcast Editing Workflow

The typical podcast editing workflow has six steps, in this order:

1. Assemble segments — merge intro music, main content (possibly multiple takes), and outro

2. Remove dead air — automatic silence removal to clean up pauses

3. Reduce noise — address room noise, hum, or background sounds

4. Level the volume — normalize loudness to −16 LUFS for podcast distribution

5. Export in the right format — MP3 128 kbps mono

6. Quality check — listen to the first and last 30 seconds plus two spots in the middle

Skipping steps or doing them out of order causes problems. Normalizing before removing silence means the silence-removal threshold calibration is affected by the peak level. Merging after normalizing means merged segments may have different loudness values. Follow the order above.

Step 1: Assemble Multi-Segment Recordings

Podcast recordings rarely come from a single continuous take. You may have:

Intro music recorded separately
Main content split across multiple takes (recording stopped and restarted)
A separate outro or sponsor read
Guest audio recorded on a different device at a different volume

Merging the segments:

Use the Merge Audio tool to combine segments in order: intro → main content (in take order) → outro. Drag to arrange the correct order before merging.

Handling different volume levels between segments: If one segment (typically the guest audio or a second microphone) is consistently louder or quieter than another, normalize each segment individually before merging. Use the Volume Normalizer on each file separately at −16 LUFS, then merge the normalized segments.

Cross-fades at segment boundaries: Add a 100–200ms cross-fade between segments when joining two continuous speech sections. This eliminates any click at the cut point. For joining speech to music, a 500ms–1s cross-fade creates a more natural transition.

The "room tone" technique: When you cut out a section, the background noise level may change abruptly (one recording environment to another). If you have a few seconds of just background noise from each recording environment, use these as "room tone" patches — short inserts of ambient sound at cut points to smooth the transition.

Merge podcast segments— Combine intros, main content, and outros

Step 2: Remove Dead Air and Long Pauses

After assembling, run automatic silence removal. A typical 60-minute interview contains 8–15 minutes of dead air — pauses between questions, filler hesitations, "um"s followed by silence, coughing gaps. Removing this tightens the pacing dramatically.

Recommended silence removal settings for podcasts:

Silence threshold: −38 dB to −42 dB. The exact value depends on your noise floor. If your room is very quiet, go lower (−45 dB). If there is notable background hum, go higher (−35 dB) to treat the noise floor as silence.
Minimum silence duration: 0.8–1.2 seconds. This removes pauses that are genuinely dead air while preserving natural conversational rhythm. Setting it below 0.5 seconds creates an unnaturally rushed pace.
Padding: 100–150ms. Leaving a short tail on each side of a removed section prevents the speech from feeling abruptly cut. The next word does not need to start immediately.

What silence removal does NOT do: It does not remove filler words ("um," "uh," "you know") that are spoken, only pauses. Manual cutting is required for those. It does not remove noise — if your noise floor is above the threshold, the noisy sections will be kept.

After silence removal: Listen to three random spots in the edited audio. If speech feels unnaturally rushed or words are being clipped, increase the minimum silence duration or add more padding.

Remove silence automatically— Configurable threshold and padding for podcast editing

Step 3: Noise Reduction

Noise in podcast recordings comes from several sources, each with different treatments:

HVAC/fan hum (steady broadband noise): The most common recording problem. A constant low-frequency rumble from air conditioning or a computer fan. AI noise removal handles this effectively at medium aggressiveness settings. Alternative: a low-shelf EQ cut below 80–100 Hz removes rumble without touching voice frequencies.

Electrical hum (50 Hz or 60 Hz and harmonics): A characteristic buzz from ground loops, cheap USB audio interfaces, or fluorescent lighting. Frequencies: 50/60 Hz, 100/120 Hz, 150/180 Hz (and up). The Equalizer tool can apply narrow notch filters at these exact frequencies. This is a free solution that handles hum without affecting voice quality.

Keyboard/mouse clicks: Transient sounds. These require manual identification and cutting — listening through the recording and cutting around each click. There is no automated solution that handles this without risk of cutting into speech.

Room reverb (echo): Cannot be removed after recording — dereverberation algorithms exist but produce artifacts. The only effective solution is acoustic treatment before recording. A closet full of clothes is the classic budget solution.

Recommendation for most podcasters: If you have significant background noise (audible fan, traffic, HVAC), use AI noise removal at 30–50% aggressiveness. This preserves voice naturalness while significantly reducing the noise floor. Over-processing creates a characteristic "swimmy" or "underwater" artifact that is more distracting than moderate noise.

Remove background noise with AI— AI-powered noise reduction — uses credits

Step 4: Normalize to −16 LUFS

Loudness normalization is the last processing step before export. Every major podcast platform applies its own loudness normalization, but matching your loudness to the platform's target before upload gives you full control over how it sounds.

The podcast loudness standard:

Apple Podcasts: −16 LUFS
Spotify: −14 LUFS (though Spotify normalizes podcast streams like music)
Pocket Casts, Overcast, and most other players: match your RSS feed loudness
Most podcast hosting platforms recommend: −16 LUFS integrated, −1 dBFS true peak maximum

Why −16 LUFS specifically: Speech has a high dynamic range compared to mastered music. At −16 LUFS, voices are loud enough to hear clearly in noisy environments (commuting, gym) without the compression artifacts that result from pushing speech too loud.

LUFS vs peak normalization for podcasts: Use LUFS, not peak. Peak normalization only sets the loudest sample — a recording with a single loud cough or door slam will be normalized based on that peak, making the speech quietly after the single loud event. LUFS measures the perceived loudness across the entire file, giving a consistent and correct result.

After normalization: Check true peak level. The true peak should not exceed −1 dBFS. If it does, apply a limiter to −1 dBFS before re-normalizing. Most normalization tools handle this automatically.

Normalize to −16 LUFS— Podcast preset at −16 LUFS with true peak limiting

Step 5: Export Settings for Podcast Distribution

Format: MP3. Every podcast app and directory supports MP3. AAC at 128 kbps is a valid alternative but MP3 is the universal choice.

Bitrate: 128 kbps constant bitrate (CBR). This is the podcast industry standard. 96 kbps is acceptable for speech-only shows where file size matters significantly. 192+ kbps wastes bandwidth with no perceptible benefit.

Channels: Mono. Stereo podcasts are twice the file size with zero benefit for listeners — human speech is a mono signal. The only reason to use stereo is if you have music or spatial audio effects that genuinely benefit from stereo separation.

Sample rate: 44100 Hz. Some platforms specify 44100, some accept 48000. 44100 is universally accepted.

ID3 tags: Set before upload: title (episode title), artist (podcast name), album (podcast name), track number (episode number), year, and embed your podcast artwork (1400×1400px to 3000×3000px, JPEG). Most podcast hosts read these tags. Properly tagged episodes display correctly in every directory.

File size as a sanity check: A one-hour episode at MP3 128 kbps mono = approximately 56 MB. If your export is significantly larger, check the format settings. If it is 10× smaller, check that quality settings were not accidentally set to minimum.

Frequently Asked Questions

What loudness level should my podcast be?

−16 LUFS integrated loudness, with a true peak not exceeding −1 dBFS. This is the widely accepted podcast loudness standard and matches Apple Podcasts' normalization target. Spotify normalizes at −14 LUFS, so episodes at −16 LUFS will be slightly boosted by Spotify — this is fine. Going louder than −14 LUFS means platforms will turn your episode down, which you cannot control.

Should I use stereo or mono for my podcast?

Mono, almost always. Human speech is inherently mono — it conveys no spatial information. Stereo doubles your file size and upload time with no improvement to the listener experience. The only exceptions are music-heavy shows with genuine stereo audio production, or comedy/narrative shows with spatial sound design.

How do I edit out my filler words (um, uh)?

Filler words require manual editing — there is no automated solution that reliably removes spoken fillers without also removing real speech. In the Trim & Cut tool, zoom into the waveform to find each filler word, mark the exact start and end, and delete it. For heavy filler removal across a long recording, reduce the problem at the recording stage by pausing before speaking rather than filling silence with sound.

My guest was recorded at a different volume than me. How do I fix this?

Normalize each track to −16 LUFS separately before merging. Upload each file individually to the Volume Normalizer, set the target to −16 LUFS, download each normalized file, then merge the normalized files in the Merge Audio tool. This brings both speakers to the same perceived loudness level.

What is the best file format for podcast submission?

MP3 128 kbps constant bitrate, mono, 44100 Hz sample rate. This is accepted by every podcast host and directory. Set your ID3 tags (title, artist, episode number) and embed your podcast artwork before uploading.

How long should it take to edit a one-hour podcast episode?

With the automated workflow described in this guide (silence removal + LUFS normalization), editing a one-hour episode can take 15–30 minutes. Manual heavy editing (removing every filler word, extensive restructuring) takes 2–4 hours for a one-hour episode, which is why most successful podcasters find a quality level they can achieve efficiently rather than over-editing.

Summary

A polished podcast episode is the result of a repeatable six-step workflow, not hours of manual waveform clicking. Assemble segments, remove dead air, reduce noise, normalize to −16 LUFS, export at 128 kbps mono, and listen-check. The total time for a well-recorded episode should be 20–40 minutes of editing, not hours.

The investment in good recording conditions (a quiet room, decent microphone, proper positioning) pays far more dividends than elaborate post-processing. Better source material means less editing time and better results. But for recordings you already have, the workflow above extracts every bit of quality available from the raw material.