Skip to main content
ChapterPass

Audiobook Production Workflow: From Recording to Retail-Ready Files

ChapterPass Editorial Team

Audiobook production follows a four-stage pipeline: recording, editing, mastering, and submission. Each stage has specific goals, specific tools, and specific failure modes. Recording captures clean narration in a controlled environment. Editing removes mistakes, cleans artefacts, and produces a finished performance. Mastering adjusts the technical specifications (loudness, peaks, noise floor, format) to meet distributor requirements. Submission delivers compliant files to platforms like ACX and Findaway Voices. This guide covers the complete workflow and links to detailed resources for every stage.

The most common mistake in audiobook production is treating these stages as interchangeable. They are not. Recording quality cannot be fixed in mastering. Mastering cannot compensate for poor editing. And submission-ready files require that every upstream stage was done correctly. Understanding the pipeline, and doing each stage in order, is the difference between a smooth submission and weeks of re-work.

Stage 1: Recording Setup and Execution

Good recording is the foundation that everything else depends on. No amount of post-production can rescue audio captured in a noisy, reflective room with inconsistent mic technique.

Room Preparation

Your recording environment matters more than your microphone. A $100 mic in a quiet, treated closet produces better audiobook audio than a $1,000 mic in an untreated living room. Here is what you need:

Eliminate noise sources. Turn off HVAC, fans, refrigerators, and anything that produces a constant hum. Close windows. Record during quiet hours if you live near traffic. Your target is a raw noise floor below -65 dBFS, which gives approximately 5 dB of headroom for the gain adjustments you will make during mastering while staying below the -60 dBFS limit required by ACX.[1]

Control reflections. Hard, flat surfaces (walls, desks, windows) reflect sound and create audible room echo. Soft, irregular surfaces (bookshelves, curtains, rugs, acoustic foam, moving blankets) absorb reflections. A closet full of clothes is acoustically excellent. If you are recording in a larger room, place absorptive material behind you, behind the microphone, and on any hard surface within arm's reach.

Set up consistently. Mark your mic position, your seating position, and your distance from the mic (6–8 inches is typical for spoken word). Consistency between recording sessions prevents the chapter-to-chapter tonal variation that ACX reviewers flag during human quality review.

Recording Technique

Record at a moderate input level. Your raw peaks should sit between -12 and -6 dBFS, loud enough to capture clean signal, quiet enough to avoid clipping. Home narrators often record too quietly (peaks at -30 dBFS or lower), which means more gain is needed later, which amplifies background noise proportionally.

Use a pop filter. Plosive consonants (P, B, T) produce low-frequency bursts that cause distracting thumps in the recording. A mesh or foam pop filter placed 2–3 inches from the mic catches these without affecting vocal tone.

Record room tone. At the start of every session, record 30–60 seconds of silence in your recording environment. This room tone serves two purposes: it provides natural silence for chapter padding (ACX requires room tone, not digital silence), and it gives noise reduction tools a clean noise profile to work with.

One chapter per session if possible. Recording a full chapter in one session produces consistent tonal characteristics throughout. If you split a chapter across sessions, differences in mic placement, room acoustics, or vocal quality can be audible. For long chapters, take breaks but do not change your physical setup.

Stage 2: Editing

Editing transforms a raw recording into a clean, mistake-free performance. This stage handles everything that a listener would notice as a problem but that is not a technical specification.

What Editing Covers

Remove mistakes and retakes. Delete false starts, re-reads, stumbles, and any passages you re-recorded. Ensure the final edit flows naturally with no audible jumps or mismatched cadences at edit points.

Clean artefacts. Remove clicks, pops, and mouth noise. These are the most common quality-review failures on ACX. Human reviewers listen specifically for them. Spectral editing tools (available in Audacity, Adobe Audition, and iZotope RX) let you surgically remove individual artefacts without affecting the surrounding audio. For detailed tool options, see best audiobook mastering tools compared.

Adjust pacing. Remove or shorten excessively long pauses. Add brief pauses at chapter transitions and section breaks. Consistent pacing is a hallmark of professional audiobook narration.

Verify manuscript accuracy. Listen to the full narration against the manuscript. Mispronounced words, skipped sentences, and incorrect readings cause ACX quality rejections. This is tedious but non-negotiable. Reviewers do check.

What Editing Does Not Cover

Editing does not adjust loudness, peak levels, noise floor, format, or any other technical specification. Those are mastering tasks. If you start mastering before editing is complete, any subsequent edit (removing a click, cutting a sentence) changes the loudness and noise characteristics of the file and invalidates your mastering work.

The rule: finish all editing before you start mastering. No exceptions.

Stage 3: Mastering

Mastering takes your clean, edited recording and processes it to meet the technical specifications that distributors require. For audiobooks, these specifications are fixed, measurable numbers, not creative choices. Either the file passes or it does not.

The Eight ACX Specifications

ACX enforces eight technical requirements on every submitted file:[1]

  1. RMS loudness: -23 to -18 dBFS (target -20 dBFS for margin)
  2. True peak: Below -3 dBFS
  3. Noise floor: Below -60 dBFS
  4. Sample rate: 44,100 Hz
  5. Channels: Mono
  6. Format: MP3, 192 kbps CBR
  7. Head silence: 0.5–1 second of room tone
  8. Tail silence: 1–5 seconds of room tone

Other platforms use the same or similar specs. Findaway Voices matches ACX on loudness, peak, and noise floor while being more flexible on format (accepts stereo and FLAC).[2] Master to ACX specs and your files work everywhere. For the complete cross-platform comparison, see the audiobook format requirements guide.

The Mastering Signal Chain

The mastering process follows a fixed order. Each step depends on the output of the previous one:

1. Format conversion. If you recorded at 48 kHz or 96 kHz, resample to 44,100 Hz. Convert stereo to mono if needed. Do this first because every subsequent operation depends on the sample rate and channel configuration being correct.

2. High-pass filter at 80 Hz. This removes low-frequency rumble (HVAC, traffic, building vibration) that sits below the useful range of human speech. Fundamentals of speech start around 85 Hz, so an 80 Hz high-pass removes only noise. For a deeper explanation, see the audiobook noise floor guide.

3. Noise management. If your noise floor exceeds -60 dBFS after accounting for the gain you will add in step 5, address it now. A high-pass filter alone may be sufficient. If not, apply spectral noise reduction at 6–8 dB per pass. Multiple light passes sound more natural than one aggressive pass.

4. Compression. Apply gentle dynamic range compression at 2:1 to 3:1 ratio with a threshold around -20 to -18 dBFS.[3] This reduces the gap between quiet and loud passages, which means less gain is needed in the next step (and therefore less noise amplification). Never exceed 4:1 for speech, as higher ratios destroy natural dynamics. Attack 15–25 ms, release 200–500 ms matches natural speech rhythm.

5. Loudness normalisation. Adjust overall gain to -20 dBFS RMS. This is the centre of ACX's -23 to -18 range, giving you 2 dB of margin on each side. Use RMS mode, not LUFS, because ACX measures RMS. For the full loudness guide, see audiobook loudness: the complete guide.

6. Peak limiting. Apply a limiter to ensure no peak exceeds -3 dBFS. If you have a true-peak-aware limiter (FabFilter Pro-L 2, TDR Limiter No6), set the ceiling to -3.0 or -3.1 dBFS. If you are using Audacity's built-in limiter (which only detects sample peaks), set the ceiling to -3.5 dBFS to leave margin for inter-sample peaks. For a complete explanation, see audiobook true peak explained.

7. Silence padding. Add 0.5–1 second of room tone at the head and 1–5 seconds at the tail. Use the room tone you recorded during your session, not digital silence, which creates audible artefacts at transitions. For detailed guidance, see audiobook silence padding explained.

8. MP3 encoding. Export as MP3 at 192 kbps CBR using LAME (the industry-standard encoder). This must be the last step. MP3 encoding is lossy and can shift peak levels by 0.5–1 dB, so verify the final MP3, not the pre-encoding WAV.

For step-by-step instructions in Audacity, see the Audacity audiobook mastering tutorial. For the recommended Audacity effect settings specifically, see the Audacity ACX settings guide. For an overview of all mastering approaches, see the complete guide to audiobook mastering.

Verification

After mastering, verify every file against every specification. Open the final MP3 (not the project file, not a WAV) and measure:

  • RMS (should read between -23 and -18 dBFS)
  • True peak (should read below -3 dBFS)
  • Noise floor (should read below -60 dBFS)
  • Format metadata: MP3, 192 kbps CBR, 44,100 Hz, mono

Free tools for verification: Audacity with the ACX Check plugin (covers RMS, sample peak, noise floor), Youlean Loudness Meter (true peak), and MediaInfo (format metadata). Note that Audacity's ACX Check measures sample peak, not true peak, so you need a separate tool for true-peak verification. For the complete verification workflow, see how to check if your audiobook meets ACX requirements.

Check consistency across chapters. Individual chapters may each pass, but if RMS varies by more than 3 dB across your book, reviewers will flag it. Batch-process all chapters through identical settings and verify the full set.

Stage 4: Submission

With verified, compliant files in hand, the submission process is straightforward.

ACX Submission

Upload chapter files in sequence, with opening credits, closing credits, and your retail sample chapter. The retail sample should be 1–5 minutes long, start with narration (not credits), and showcase your best recording quality, since this is what potential buyers hear on Audible. For chapter structure details, see the audiobook chapter formatting guide.

ACX runs automated technical checks first (hours), then human quality review (days to weeks). Technical rejections specify which spec failed. Quality rejections identify recording issues. For a complete rejection troubleshooting guide, see how to fix every ACX rejection issue.

Wide Distribution

If you are distributing beyond ACX, upload the same mastered files to your chosen aggregator (Findaway Voices, Author's Republic) or directly to Google Play Books. No re-encoding or format changes needed. Your ACX-compliant files meet or exceed every other platform's requirements.

The Production Timeline

For a first-time narrator producing a 30,000-word non-fiction book (approximately 4 finished hours of audio):

  • Recording: 8–16 hours of raw recording time (expect 2–3x the finished audio length)
  • Editing: 16–32 hours (plan for 4–8 hours per finished hour)
  • Mastering: 2–4 hours with manual tools; significantly less with automated processing
  • Verification: 1–2 hours for thorough checking of all chapter files
  • Submission and review: a few hours for upload; days to weeks for ACX review

Total: roughly 30–60 hours of work for a first audiobook. The time drops significantly with experience. Second and third books typically take half the time as you develop efficient recording habits and a repeatable mastering workflow.

ChapterPass automates the mastering stage. Upload your edited chapter files and ChapterPass runs the complete signal chain: loudness normalisation, true-peak limiting, noise floor management, format conversion, silence padding, and verification. Skip the manual mastering steps and go straight from editing to submission-ready files.

Try ChapterPass
Audiobook Production Workflow: From Recording to Retail-Ready Files | ChapterPass