What is the difference between CBR and VBR for audiobooks?

CBR (constant bit rate) allocates exactly 192,000 bits per second to every frame. VBR (variable bit rate) adjusts dynamically based on audio complexity. ACX rejects VBR files regardless of average bitrate — the bitrate mode is read from the MP3 header metadata.

Why does ACX require 44,100 Hz sample rate?

44,100 Hz is the CD-quality audio standard (IEC 60908). It captures frequencies up to 22,050 Hz via the Nyquist theorem, well above the range of human speech. The convention was established in the 1980s and adopted by all audiobook platforms.

Which MP3 encoder should you use for audiobooks?

LAME is the industry-standard MP3 encoder. It is free, open-source, and used internally by Audacity and FFmpeg. Set it to 192 kbps CBR, 44,100 Hz, mono. MP3 encoding can raise peak levels by 0.5 to 1 dB, so verify the final file.

Audiobook Format Requirements: MP3 Settings, Sample Rate, and Platform Specs Explained

Q: What file format do audiobooks use?

The standard audiobook delivery format is MP3 encoded at 192 kbps with constant bit rate (CBR) at a 44,100 Hz sample rate. This applies to ACX, Findaway Voices, Author's Republic, PublishDrive, and Kobo. Google Play Books also accepts FLAC, WAV, and AAC.

Q: Should audiobooks be mono or stereo?

ACX requires mono. Single-narrator audiobooks contain no spatial information that benefits from stereo, and mono files are half the size. Findaway and Google Play accept stereo as an option, but mono files work on every platform, making it the safest default.

Every major audiobook platform requires MP3 files encoded at 192 kbps CBR (constant bit rate) with a 44,100 Hz sample rate.[1] ACX specifically requires mono channel audio, while platforms like Findaway Voices and Google Play Books accept stereo as well.[2] Getting the format wrong is one of the fastest ways to trigger an automated rejection. Format checks happen at the metadata level before any audio analysis even begins. That is why format rejections come back almost instantly. This guide covers the exact file format specifications for every major audiobook distributor so you can encode correctly the first time.

If you're mastering your own audiobook, format settings are the foundation everything else depends on. Sample rate, channel configuration, bitrate, and encoding mode must be set before you touch loudness, peaks, or noise floor. For the complete technical requirements including loudness and peak specs, see the ACX audio requirements guide. For a broader platform comparison that includes royalties and distribution strategy, see the audiobook distribution platform comparison.

What File Format Do Audiobooks Use?

The standard audiobook delivery format is MP3: specifically, MP3 encoded at 192 kbps with constant bit rate (CBR) at a 44,100 Hz sample rate.[1] This applies to ACX/Audible, Findaway Voices (now Voices by INaudio), Author's Republic, PublishDrive, and Kobo. Google Play Books also accepts FLAC, WAV, and AAC/M4A alongside MP3.[3]

MP3 is a lossy compression format, meaning it reduces file size by permanently removing audio data that psychoacoustic models predict is inaudible. At 192 kbps CBR, the quality is more than sufficient for spoken word audio. Human speech fundamentals sit between 85 and 255 Hz with harmonics extending to roughly 12 kHz, well within what 192 kbps preserves transparently. Higher bitrates (256 or 320 kbps) would increase file size without meaningful quality improvement for narration.

The "constant bit rate" part matters. CBR means every second of audio gets exactly 192,000 bits, regardless of content complexity. Variable bit rate (VBR) dynamically allocates more bits to complex passages and fewer to simple ones. While VBR can be more efficient, ACX rejects VBR files outright. The bitrate mode is read from the MP3 file header, and "Variable" triggers an instant rejection even if the average bitrate is 192 kbps or higher.[1]

What Sample Rate Should You Use for Audiobooks?

The required sample rate for audiobooks is 44,100 Hz (44.1 kHz).[1] This is CD-quality audio as defined by the IEC 60908 standard, and it has been the consumer distribution standard for decades.

A sample rate of 44,100 Hz means the audio signal is measured 44,100 times per second. By the Nyquist theorem, this captures frequencies up to 22,050 Hz, well above the useful range of speech and above the hearing range of most adults. The human voice produces fundamentals between 85 Hz (deep male voice) and 255 Hz (high female voice), with overtones and sibilant energy extending to about 16 kHz. A 44.1 kHz sample rate captures all of this with generous headroom.

If you recorded at 48 kHz or 96 kHz, you need to resample before any other processing. Many DAWs default to 48 kHz (the video/broadcast standard) or higher sample rates. Resampling should be your first processing step because every subsequent operation (filtering, compression, limiting) depends on the sample rate being correct. Most audio editors handle the conversion automatically when you change the project sample rate, but verify the output file header to confirm 44,100 Hz before submission.

Why 44.1 kHz and not 48 kHz? Historical convention. CD audio standardised on 44.1 kHz in the early 1980s, and consumer audio distribution has followed that convention. Audiobook platforms adopted the same standard. There is no audible quality difference between 44.1 and 48 kHz for spoken word, as both capture far more frequency range than speech requires.

Should Audiobooks Be Mono or Stereo?

ACX requires mono audio.[1] This is the simplest answer: if you're producing for Audible/Amazon, your files must be single-channel. Other platforms are more flexible. Findaway Voices accepts both mono and stereo (recommending joint stereo for stereo files), PublishDrive actually recommends stereo, and Google Play Books accepts either.[2]

Mono means a single audio channel. Stereo means two channels (left and right) carrying different audio signals. For a single narrator speaking into one microphone, there is no spatial information to preserve, so mono and stereo contain identical audio content, but the stereo file is twice the size. This matters for streaming delivery across Audible's apps and devices, which is why ACX mandates mono.

The practical rule: master in mono. If your files meet ACX's mono requirement, they work on every platform. The platforms that accept stereo do so as an option, not a requirement. You gain nothing from stereo for single-narrator audiobooks, and you lose compatibility with the largest distributor.

If your recording setup captures stereo, downmix to a single channel during format conversion. Check for phase issues after downmixing. If the left and right channels were out of phase during recording (which can happen with certain mic configurations), summing them to mono causes partial cancellation that makes the audio sound thin or hollow. Listen to the mono version against the stereo original before committing.

Multi-narrator and full-cast productions are the one case where stereo might add value. If different narrators are panned to different positions in the stereo field, switching to mono collapses that spatial separation. Even then, ACX requires mono, so you'll need separate mono mixes for ACX and stereo files for platforms that support them.

What Is CBR and Why Does It Matter for Audiobooks?

CBR stands for constant bit rate. It means every frame of the MP3 file uses the same number of bits: in this case, 192,000 bits per second, or 192 kbps.[1] The alternative is VBR (variable bit rate), where the encoder dynamically adjusts bitrate frame by frame based on audio complexity.

For audiobooks, CBR is mandatory across most platforms. ACX explicitly rejects VBR files regardless of average bitrate. The encoder writes the bitrate mode into the MP3 header as either "Constant" or "Variable," and ACX's automated check reads this metadata field directly.[1]

Why CBR over VBR? CBR provides predictable file sizes (important for streaming buffer management), guaranteed minimum quality across all passages, and simpler seeking in player apps. VBR's efficiency advantage (smaller files at equivalent quality) matters more for music with its wide dynamic range than for spoken word, where the audio complexity is relatively constant.

Common mistake: Many MP3 encoding tools default to VBR mode. If you export from Audacity, Adobe Audition, or any DAW, check your export settings explicitly. Look for a "Bit Rate Mode" or "Encoding" setting and ensure it reads "Constant" or "CBR." Verify the output file with a tool like MediaInfo, which shows the exact bitrate mode from the file header.

Which MP3 Encoder Should You Use?

LAME is the industry-standard MP3 encoder and the safest choice for audiobook production.[4] It's free, open-source, and used internally by Audacity, FFmpeg, and many commercial DAWs. When you export an MP3 from Audacity, you're using LAME under the hood.

To encode for audiobook delivery with LAME, use these settings:

Bitrate: 192 kbps
Mode: CBR (constant bit rate)
Sample rate: 44,100 Hz
Channels: Mono (joint stereo if stereo is required)

If you're using FFmpeg from the command line, the equivalent command is:

ffmpeg -i input.wav -codec:a libmp3lame -b:a 192k -ar 44100 -ac 1 output.mp3

The -b:a 192k flag sets 192 kbps CBR (LAME defaults to CBR when a fixed bitrate is specified). The -ar 44100 flag ensures 44.1 kHz sample rate, and -ac 1 forces mono.

One important consideration with MP3 encoding: the lossy compression process can raise peak levels by 0.5–1 dB compared to the source WAV file.[4] This means a file that measures exactly -3.0 dBFS true peak before encoding might measure -2.3 dBFS after. Always do your peak limiting before encoding, but verify the peaks of the final MP3, not the pre-encoding WAV. If the encoded file exceeds -3 dBFS, lower your limiter ceiling slightly and re-encode. For more on peak management, see audiobook true peak explained.

How Do Format Requirements Compare Across Platforms?

While the core specs are remarkably consistent, each platform has nuances worth knowing. Here's the complete cross-platform comparison:

Spec	ACX/Audible	Findaway/INaudio	Author's Republic	Google Play Books	Kobo	PublishDrive
Format	MP3 only	MP3 or FLAC	MP3	MP3, AAC, FLAC, WAV	MP3 only	MP3
Bitrate	192+ kbps CBR	192+ kbps CBR	192 kbps CBR	128+ mono / 256+ stereo	Not specified	192+ kbps CBR
Sample Rate	44,100 Hz	44,100 Hz	44,100 Hz	44,100 Hz+	Not specified	44,100 Hz
Channels	Mono only	Mono or stereo	Mono or stereo (consistent)	Mono or stereo	Not specified	Stereo recommended
Max File Size	No published limit	No published limit	170 MB	No published limit	200 MB per file	No published limit
Max Duration	120 min	120 min	119 min	No published limit	Not specified	78 min

Key takeaways from this table:

ACX is the most restrictive. Mono only, MP3 only, CBR only. No FLAC, no stereo, no exceptions. If your files meet ACX format requirements, they meet every other platform's format requirements too.[1]

Findaway/INaudio offers the most lossless flexibility. FLAC support means you can deliver full-quality lossless audio, which is ideal if you want the best possible master reaching downstream retailers.[2] That said, most retailers transcode to their own streaming format anyway, so the practical quality difference is minimal for the end listener.

Google Play Books is the most format-flexible. They accept MP3, AAC/M4A, FLAC, and WAV. Their minimum MP3 bitrate is lower (128 kbps for mono), and they don't publish specific loudness or noise floor targets.[3] This flexibility is convenient but doesn't mean you should use lower-quality settings. A file mastered to ACX specs sounds better on Google Play than one that just barely meets Google's minimum.

Kobo compresses aggressively. Regardless of what you upload, Kobo compresses audio to 64 kbps upon ingestion. Your 192 kbps CBR file will be transcoded. This means format quality at upload matters less for Kobo specifically, but starting with a clean 192 kbps master ensures the transcoded version sounds as good as possible.

PublishDrive recommends stereo, which is the opposite of ACX's mono requirement. If you're distributing through both ACX and PublishDrive, you'll need to decide: master everything in mono for universal compatibility, or maintain separate mono and stereo exports. For most single-narrator audiobooks, mono everywhere is the simpler and equally effective approach.

What Happens If You Get the Format Wrong?

Format errors are the quickest rejections you'll receive because they're checked at the file metadata level, before any audio analysis runs. Here are the common format mistakes and what happens:

Wrong bitrate mode (VBR instead of CBR): Instant rejection from ACX. The file header says "Variable" and the automated check stops there. Fix: re-encode with CBR explicitly set.

Wrong sample rate (48 kHz instead of 44.1 kHz): Rejection. The sample rate is in the file header. Fix: resample to 44,100 Hz in your DAW and re-export.

Stereo instead of mono (on ACX): Rejection. The channel count is in the file header. Fix: downmix to mono and re-export.

WAV or FLAC submitted to ACX: Rejection. ACX only accepts MP3. Fix: encode to MP3 192 kbps CBR using LAME.

Low bitrate (128 kbps): Rejected by ACX, Findaway, Author's Republic, and PublishDrive. Only Google Play's mono minimum goes that low. Fix: re-encode at 192 kbps.

The good news: format errors don't require re-processing your audio. Your loudness, peaks, and noise floor are already set. You just need to re-encode or re-export with the correct format settings. Unlike a loudness rejection that requires remastering, a format rejection is usually a five-minute fix.

Do You Need Different File Formats for Each Platform?

No. The safest and simplest strategy is: master everything to ACX format specifications (MP3, 192 kbps CBR, 44,100 Hz, mono) and distribute those files to every platform.[1] This is the lowest common denominator that passes everywhere.

The one exception is if you specifically want to take advantage of Findaway's FLAC support or Google Play's WAV/FLAC support for lossless delivery. In that case, keep your original lossless session files (WAV or FLAC) and export separate deliverables from those originals. Never transcode an existing MP3 to FLAC. Decoding a lossy file and re-encoding it as lossless doesn't recover the lost audio data; it just makes a larger file with the same quality as the MP3. Always export from your original pre-MP3 source.

For a deep dive into the full audio requirements beyond format (loudness, peaks, noise floor, and silence), see the ACX vs Findaway audio requirements comparison. That post covers the complete technical picture with all eight spec categories.

How Should You Verify Format Settings Before Submission?

Before you submit to any platform, verify the actual file, not your project settings, not your export dialog. Open the final MP3 and confirm:

File format: MP3 (not WAV, FLAC, or M4A)
Bitrate: 192 kbps, Constant (not Variable)
Sample rate: 44,100 Hz
Channels: 1 (mono) for ACX; 1 or 2 depending on your target platforms

MediaInfo is a free tool that reads all of this from the file header in seconds. Look for "Bit rate mode: Constant," which is the field that catches VBR files that slipped through.

FFprobe (bundled with FFmpeg) gives you the same information from the command line:

ffprobe -v quiet -print_format json -show_format -show_streams output.mp3

Check codec_name (should be mp3), sample_rate (should be 44100), channels (should be 1 for mono), and bit_rate (should be 192000 or close).

The most common verification mistake is checking the project file or an intermediate WAV instead of the final exported MP3. MP3 encoding changes the file in ways that matter: peak levels shift, format metadata is written fresh, and the bitrate mode is set at encoding time. Your DAW project showing "192 kbps CBR" in the export dialog doesn't guarantee the output file matches. Verify the output.

For a complete pre-submission verification workflow covering all eight ACX specs (not just format), see how to check if your audiobook meets ACX requirements.

ChapterPass verifies format specs automatically. Upload your chapters and ChapterPass checks every format requirement (bitrate, sample rate, channels, encoding mode) alongside loudness, peaks, and noise floor. No manual MediaInfo checks needed.

Try ChapterPass