Silence padding sounds like the simplest part of audiobook mastering. Add some silence at the beginning and end of each chapter, done. But the details matter in ways that aren't obvious until your audiobook gets flagged in quality review. Not because the concept is hard, but because ACX requires the right kind of silence.
ACX requires specific silence durations at the head and tail of every chapter file.[1] More importantly, it requires room tone, not digital silence. Get the duration wrong and you fail automated checks. Use digital silence instead of room tone and you fail human quality review. This guide covers everything. For full chapter-level formatting including file naming and credits, see the chapter formatting guide.
What Are ACX's Silence Requirements?
Every chapter file submitted to ACX must have:[1]
- Head silence: 0.5 to 1 second before narration begins
- Tail silence: 1 to 5 seconds after narration ends
Durations are measured from file start to first audible speech (head) and from last audible speech to file end (tail). Breaths before the first word or after the last word count as narration, not silence.
Why These Specific Durations?
Head silence prevents the first word from being clipped on playback devices with buffering delays. When a listener starts a chapter, the system needs a fraction of a second to stabilise. Without head silence, the first syllable can be cut off or pop.
Tail silence provides a natural conclusion and prevents abrupt endings. The longer range (up to 5 seconds) accommodates different ending styles. A dramatic chapter ending can use more tail silence than one ending mid-conversation.
The durations also ensure consistent behaviour across different playback apps. Some apps add their own brief silence between chapters; some don't.
What Is the Difference Between Room Tone and Digital Silence?
This is where most producers make their mistake.
Room Tone
Room tone is the natural ambient sound of your recording environment when nobody is speaking. Every room has a unique acoustic signature: HVAC, electrical hum, distant traffic, building resonance, air movement. Even a very quiet room has measurable room tone.[2]
Room tone sounds like "silence" to the human ear in context. It's what listeners perceive between sentences. Your brain filters it as background.
Digital Silence
Digital silence is absolute zero: every sample value is exactly 0. No room tone, no ambient noise, nothing.
Why Does Digital Silence Cause Problems?
The transition from digital silence to room tone is audible. When narration begins, room tone starts. If the preceding silence was digital zero, listeners hear a subtle "activation", the room tone switching on, like a gate opening or a channel unmuting. At the end, the reverse happens: room tone abruptly cuts to nothing.
This is especially noticeable on headphones and earbuds, which is how most Audible listeners consume audiobooks. ACX's quality reviewers listen for it specifically.
How Do You Record Room Tone?
Before or after your narration session, record 30 to 60 seconds of pure room tone:
- Set up your microphone in exactly the same position as narration
- Keep the same input gain level
- Sit in your normal position but don't speak, move, or breathe loudly
- Record for at least 30 seconds
This room tone recording becomes your silence padding source. Use the same room tone for all chapters recorded in the same session. If you record across multiple sessions, capture fresh room tone for each.
Store room tone files carefully. You may need them months later for resubmission or additional chapters.
How Do You Add Silence Padding?
Method 1: Paste Room Tone (Recommended)
Head silence:
- Open your room tone recording
- Select 0.75 seconds of clean room tone (no artefacts)
- Copy
- Open chapter file, place cursor at position 0:00
- Paste
Tail silence:
- From the same room tone recording, select 3 seconds
- Copy
- Place cursor at end of chapter file
- Paste
Using 0.75 seconds head and 3 seconds tail places you comfortably within ACX's ranges with margin.
Method 2: Extend Existing Room Tone
If your recording already has some room tone at the beginning and end, extend it by copying and pasting a portion of the existing room tone. This preserves the natural room tone of the specific recording.
Method 3: Generate Silence (Last Resort)
If you have no room tone recording, generating digital silence is acceptable only when your noise floor is very low (below -65 dBFS), where the transition is inaudible.
In Audacity: Place cursor → Generate → Silence → 0.75 seconds
How Do Crossfades Improve Transitions?
A crossfade gradually blends the end of one audio segment into the beginning of the next, smoothing the transition.
At the Head (Room Tone → Narration)
Select the last 50–100 ms of room tone padding and the first 50–100 ms of narration. Apply a crossfade in. The result is a smooth 100–200 ms transition.
At the Tail (Narration → Room Tone)
Select the last 50–100 ms of narration and the first 50–100 ms of room tone padding. Apply a crossfade out.
When Are Crossfades Unnecessary?
If padding room tone is from the same session and levels match, the transition is already natural. Crossfades are most valuable when room tone was recorded at a different time, noise reduction changed the character, or you're using generated digital silence.
Where Does Silence Padding Fit in the Mastering Chain?
Add silence padding after all processing is complete but before final verification and MP3 export:[3]
- Format conversion
- High-pass filter
- Noise treatment
- Compression
- Loudness normalisation
- Peak limiting
- Silence padding ← here
- MP3 export
- Verification
Adding silence before processing causes problems. Compression, normalisation, and limiting can alter the silence sections. The complete mastering guide covers the full chain, and the Audacity tutorial includes silence padding as step 9 of a 12-step process.
What About Silence Padding for Credits Files?
Opening and closing credits follow the same requirements:[1]
- Opening credits: 0.5–1s head silence, state book title / author / narrator, 1–5s tail silence
- Closing credits: 0.5–1s head silence, end statement plus additional credits, 1–5s tail silence
Room tone in credits should match chapter room tone if recorded in the same environment. Inconsistent room tone between credits and chapters is a quality review flag.
What Are Common Silence Padding Mistakes?
No Silence at All
Raw recordings often start and end with speech immediately. Without explicit padding, there's no buffer for playback systems.
Too Much Silence
More than 1 second head or 5 seconds tail flags ACX's automated check. Excessive silence also lowers overall RMS measurement because silence contributes to the full-file average.
Inconsistent Duration Across Chapters
If some chapters have 0.5s head and others have 1.0s, the listening experience feels uneven. ACX reviewers may flag this. Pick a duration and use it everywhere: 0.75 seconds head, 3 seconds tail.
Different Room Tone for Different Chapters
Chapters from different sessions may have different room tone. Using room tone from session A to pad a chapter from session B creates a mismatch. Match each chapter's padding to its own recording session.
How Does Silence Differ on Other Platforms?
Other audiobook distribution platforms have similar requirements:[4]
- Findaway Voices: 0.5 to 1 second head, 1 to 5 seconds tail (same as ACX)
- Author's Republic: 1 to 5 seconds at head and tail
- Google Play Books: recommends silence but doesn't specify exact durations
- Kobo: no published silence requirements
Files mastered to ACX's silence specs meet every other platform's requirements. ACX-compliant files pass everywhere.[4]
Does Room Tone Type Affect Your Measurements?
Yes, and this is a subtle trap. Room tone that's too loud raises your overall RMS and noise floor measurements. Room tone that's too quiet (close to digital silence) can still produce an audible transition click.
The target for room tone used as padding is a noise floor below -60 dBFS. This is the same threshold ACX applies to silent sections within your narration. If your room tone is noisier than -60 dBFS, you have two problems: the silence sections fail, and your padding sounds loud against the narration's natural gaps.
Measure your room tone recording before you use it. In Audacity, select the room tone clip, then go to Analyse > Contrast and measure the "Noise RMS" value. It should read below -60 dBFS. If it doesn't, your recording environment is too noisy for ACX compliance regardless of how you handle the silence padding.
Recording in a noisier space means your only option for padding is to use that same noisy room tone. Switching to cleaner room tone from a different session creates a mismatch. This is why treating your recording space before you start matters so much.
What Silence-Related Rejections Are Most Common?
ACX's QA team flags silence padding issues under several different rejection categories. Knowing which specific problem caused your rejection tells you exactly what to fix.[1]
Long Gaps of Silence Inside a Chapter
Internal silence gaps within a chapter file are flagged separately from head/tail padding problems. If you have a pause between sections that runs more than a few seconds of absolute silence, the QA reviewer sees it as an editing error, a missed cut or a missing room tone fill.
Fix: Go through each chapter in your editing software. Select all sections of digital silence and replace them with room tone. The ACX production blog is explicit: replace gaps of silence with room tone, not with nothing.
Digital Silence Instead of Room Tone
This is the most common human-review flag for silence, as opposed to automated. The automated check only measures duration. A file with 0.75 seconds of digital silence at the head passes the automated duration check but fails human review because the QA listener hears the abrupt transition from absolute zero to your recording environment.
Fix: Replace generated silence with pasted room tone. If you no longer have the original room tone recording, record fresh room tone in the same space with the same microphone and gain settings, then re-pad the chapter.
Head Silence Too Long
More than 1 second of silence at the head fails the automated check. This is a precise cutoff, not an approximation. A chapter with 1.1 seconds of head silence will be rejected.
This often happens when narrators export from a DAW without trimming the pre-count or pre-roll that the software adds automatically. Always verify the first audible sound occurs within the 0.5–1 second window.
Fix: Trim the head silence to exactly 0.75 seconds (a comfortable midpoint within the required range). Measure using your DAW's ruler, not by ear.
Tail Silence Too Short or Too Long
Under 1 second or over 5 seconds both fail. Too short is common when exports are tightly trimmed. Too long happens when narrators add several seconds of room tone and then forget there's already some trailing room tone from the recording itself.
Fix: After adding tail padding, measure from the last syllable to the end of the file. If the recording already has trailing room tone before your padding, account for that when calculating how much to add.
Inconsistent Silence Across Chapters
The automated check doesn't flag this, but human reviewers do. If half your chapters have 0.5 seconds of head silence and half have 1.0 seconds, the listening experience feels uneven when moving between chapters. Set a single target for every file and apply it consistently.