How much silence does ACX require on each chapter?

Head silence: 0.5 to 1 second before narration begins. Tail silence: 1 to 5 seconds after narration ends. Using 0.75 seconds head and 3 seconds tail places you comfortably within range with margin on both sides.

When should silence padding be added in the mastering chain?

After all processing (compression, loudness normalisation, peak limiting) but before MP3 export and final verification. Adding silence before processing can cause compression and normalisation to alter the silence sections unpredictably.

Do other audiobook platforms have the same silence requirements as ACX?

Findaway Voices matches ACX exactly. Author's Republic requires 1–5 seconds at both head and tail. Google Play recommends silence without specific durations. Kobo has no published requirements. Files meeting ACX specs satisfy all platforms.

Why does too much silence affect my RMS measurement?

ACX measures RMS across the entire file including silence sections. Long silence periods lower the overall RMS average, potentially pushing your measurement below the -23 dBFS minimum. Keep head silence under 1 second and tail silence under 5 seconds.

Audiobook Silence Padding Explained: Head/Tail Requirements, Room Tone, and Crossfades

Q: What is the difference between room tone and digital silence?

Room tone is the natural ambient sound of your recording space when nobody is speaking. Digital silence is absolute zero signal — every sample at 0. Room tone sounds natural at transitions; digital silence creates audible clicks and channel activation artefacts on headphones.

Silence padding sounds like the simplest part of audiobook mastering. Add some silence at the beginning and end of each chapter, done. But the details matter in ways that aren't obvious until your audiobook gets flagged in quality review. Not because the concept is hard, but because ACX requires the right kind of silence.

ACX requires specific silence durations at the head and tail of every chapter file.[1] More importantly, it requires room tone, not digital silence. Get the duration wrong and you fail automated checks. Use digital silence instead of room tone and you fail human quality review. This guide covers everything. For full chapter-level formatting including file naming and credits, see the chapter formatting guide.

What Are ACX's Silence Requirements?

Every chapter file submitted to ACX must have:[1]

Head silence: 0.5 to 1 second before narration begins
Tail silence: 1 to 5 seconds after narration ends

Durations are measured from file start to first audible speech (head) and from last audible speech to file end (tail). Breaths before the first word or after the last word count as narration, not silence.

Why These Specific Durations?

Head silence prevents the first word from being clipped on playback devices with buffering delays. When a listener starts a chapter, the system needs a fraction of a second to stabilise. Without head silence, the first syllable can be cut off or pop.

Tail silence provides a natural conclusion and prevents abrupt endings. The longer range (up to 5 seconds) accommodates different ending styles. A dramatic chapter ending can use more tail silence than one ending mid-conversation.

The durations also ensure consistent behaviour across different playback apps. Some apps add their own brief silence between chapters; some don't.

What Is the Difference Between Room Tone and Digital Silence?

This is where most producers make their mistake.

Room Tone

Room tone is the natural ambient sound of your recording environment when nobody is speaking. Every room has a unique acoustic signature: HVAC, electrical hum, distant traffic, building resonance, air movement. Even a very quiet room has measurable room tone.[2]

Room tone sounds like "silence" to the human ear in context. It's what listeners perceive between sentences. Your brain filters it as background.

Digital Silence

Digital silence is absolute zero: every sample value is exactly 0. No room tone, no ambient noise, nothing.

Why Does Digital Silence Cause Problems?

The transition from digital silence to room tone is audible. When narration begins, room tone starts. If the preceding silence was digital zero, listeners hear a subtle "activation", the room tone switching on, like a gate opening or a channel unmuting. At the end, the reverse happens: room tone abruptly cuts to nothing.

This is especially noticeable on headphones and earbuds, which is how most Audible listeners consume audiobooks. ACX's quality reviewers listen for it specifically.

How Do You Record Room Tone?

Before or after your narration session, record 30 to 60 seconds of pure room tone:

Set up your microphone in exactly the same position as narration
Keep the same input gain level
Sit in your normal position but don't speak, move, or breathe loudly
Record for at least 30 seconds

This room tone recording becomes your silence padding source. Use the same room tone for all chapters recorded in the same session. If you record across multiple sessions, capture fresh room tone for each.

Store room tone files carefully. You may need them months later for resubmission or additional chapters.

How Do You Add Silence Padding?

Method 1: Paste Room Tone (Recommended)

Head silence:

Open your room tone recording
Select 0.75 seconds of clean room tone (no artefacts)
Copy
Open chapter file, place cursor at position 0:00
Paste

Tail silence:

From the same room tone recording, select 3 seconds
Copy
Place cursor at end of chapter file
Paste

Using 0.75 seconds head and 3 seconds tail places you comfortably within ACX's ranges with margin.

Method 2: Extend Existing Room Tone

If your recording already has some room tone at the beginning and end, extend it by copying and pasting a portion of the existing room tone. This preserves the natural room tone of the specific recording.

Method 3: Generate Silence (Last Resort)

If you have no room tone recording, generating digital silence is acceptable only when your noise floor is very low (below -65 dBFS), where the transition is inaudible.

In Audacity: Place cursor → Generate → Silence → 0.75 seconds

How Do Crossfades Improve Transitions?

A crossfade gradually blends the end of one audio segment into the beginning of the next, smoothing the transition.

At the Head (Room Tone → Narration)

Select the last 50–100 ms of room tone padding and the first 50–100 ms of narration. Apply a crossfade in. The result is a smooth 100–200 ms transition.

At the Tail (Narration → Room Tone)

Select the last 50–100 ms of narration and the first 50–100 ms of room tone padding. Apply a crossfade out.

When Are Crossfades Unnecessary?

If padding room tone is from the same session and levels match, the transition is already natural. Crossfades are most valuable when room tone was recorded at a different time, noise reduction changed the character, or you're using generated digital silence.

Where Does Silence Padding Fit in the Mastering Chain?

Add silence padding after all processing is complete but before final verification and MP3 export:[3]

Format conversion
High-pass filter
Noise treatment
Compression
Loudness normalisation
Peak limiting
Silence padding ← here
MP3 export
Verification

Adding silence before processing causes problems. Compression, normalisation, and limiting can alter the silence sections. The complete mastering guide covers the full chain, and the Audacity tutorial includes silence padding as step 9 of a 12-step process.

What About Silence Padding for Credits Files?

Opening and closing credits follow the same requirements:[1]

Opening credits: 0.5–1s head silence, state book title / author / narrator, 1–5s tail silence
Closing credits: 0.5–1s head silence, end statement plus additional credits, 1–5s tail silence

Room tone in credits should match chapter room tone if recorded in the same environment. Inconsistent room tone between credits and chapters is a quality review flag.

What Are Common Silence Padding Mistakes?

No Silence at All

Raw recordings often start and end with speech immediately. Without explicit padding, there's no buffer for playback systems.

Too Much Silence

More than 1 second head or 5 seconds tail flags ACX's automated check. Excessive silence also lowers overall RMS measurement because silence contributes to the full-file average.

Inconsistent Duration Across Chapters

If some chapters have 0.5s head and others have 1.0s, the listening experience feels uneven. ACX reviewers may flag this. Pick a duration and use it everywhere: 0.75 seconds head, 3 seconds tail.

Different Room Tone for Different Chapters

Chapters from different sessions may have different room tone. Using room tone from session A to pad a chapter from session B creates a mismatch. Match each chapter's padding to its own recording session.

How Does Silence Differ on Other Platforms?

Other audiobook distribution platforms have similar requirements:[4]

Findaway Voices: 0.5 to 1 second head, 1 to 5 seconds tail (same as ACX)
Author's Republic: 1 to 5 seconds at head and tail
Google Play Books: recommends silence but doesn't specify exact durations
Kobo: no published silence requirements

Files mastered to ACX's silence specs meet every other platform's requirements. ACX-compliant files pass everywhere.[4]

Does Room Tone Type Affect Your Measurements?

Yes, and this is a subtle trap. Room tone that's too loud raises your overall RMS and noise floor measurements. Room tone that's too quiet (close to digital silence) can still produce an audible transition click.

The target for room tone used as padding is a noise floor below -60 dBFS. This is the same threshold ACX applies to silent sections within your narration. If your room tone is noisier than -60 dBFS, you have two problems: the silence sections fail, and your padding sounds loud against the narration's natural gaps.

Measure your room tone recording before you use it. In Audacity, select the room tone clip, then go to Analyse > Contrast and measure the "Noise RMS" value. It should read below -60 dBFS. If it doesn't, your recording environment is too noisy for ACX compliance regardless of how you handle the silence padding.

Recording in a noisier space means your only option for padding is to use that same noisy room tone. Switching to cleaner room tone from a different session creates a mismatch. This is why treating your recording space before you start matters so much.

What Silence-Related Rejections Are Most Common?

ACX's QA team flags silence padding issues under several different rejection categories. Knowing which specific problem caused your rejection tells you exactly what to fix.[1]

Long Gaps of Silence Inside a Chapter

Internal silence gaps within a chapter file are flagged separately from head/tail padding problems. If you have a pause between sections that runs more than a few seconds of absolute silence, the QA reviewer sees it as an editing error, a missed cut or a missing room tone fill.

Fix: Go through each chapter in your editing software. Select all sections of digital silence and replace them with room tone. The ACX production blog is explicit: replace gaps of silence with room tone, not with nothing.

Digital Silence Instead of Room Tone

This is the most common human-review flag for silence, as opposed to automated. The automated check only measures duration. A file with 0.75 seconds of digital silence at the head passes the automated duration check but fails human review because the QA listener hears the abrupt transition from absolute zero to your recording environment.

Fix: Replace generated silence with pasted room tone. If you no longer have the original room tone recording, record fresh room tone in the same space with the same microphone and gain settings, then re-pad the chapter.

Head Silence Too Long

More than 1 second of silence at the head fails the automated check. This is a precise cutoff, not an approximation. A chapter with 1.1 seconds of head silence will be rejected.

This often happens when narrators export from a DAW without trimming the pre-count or pre-roll that the software adds automatically. Always verify the first audible sound occurs within the 0.5–1 second window.

Fix: Trim the head silence to exactly 0.75 seconds (a comfortable midpoint within the required range). Measure using your DAW's ruler, not by ear.

Tail Silence Too Short or Too Long

Under 1 second or over 5 seconds both fail. Too short is common when exports are tightly trimmed. Too long happens when narrators add several seconds of room tone and then forget there's already some trailing room tone from the recording itself.

Fix: After adding tail padding, measure from the last syllable to the end of the file. If the recording already has trailing room tone before your padding, account for that when calculating how much to add.

Inconsistent Silence Across Chapters

The automated check doesn't flag this, but human reviewers do. If half your chapters have 0.5 seconds of head silence and half have 1.0 seconds, the listening experience feels uneven when moving between chapters. Set a single target for every file and apply it consistently.

Try ChapterPass