I pray to the British only that I be beheaded rather than hanged for this. Project Gutenberg has more than 75,000 books in the public domain that you can read on its website. However, the public-domain audiobooks are sometimes lacking. In March, Fish Audio released S2, which provides adequate-sounding text-to-speech. Here is a sample of King Charles’s voice reading the title of South, generated using S2.
Cloning a voice is fairly simple, you just need ~30 seconds of reference speech:
python fish_speech/models/dac/inference.py \
-i reference.wav \
-o work/ref.wav \
--checkpoint-path checkpoints/s2-pro/codec.pth \
-d cuda
This will give you a file, work/ref.npy, representing the voice in tokens. This file can be used
to generate semantic tokens, and then an output .wav.
python fish_speech/models/text2semantic/inference.py \
--text "This is the new sentence I want spoken." \
--prompt-text "Exact transcript of the reference audio." \
--prompt-tokens work/ref.npy \
--checkpoint-path checkpoints/s2-pro \
--device cuda \
--no-compile \
--output-dir work
python fish_speech/models/dac/inference.py \
-i work/codes_0.npy \
-o output.wav \
--checkpoint-path checkpoints/s2-pro/codec.pth \
-d cuda
I clipped a speech King Charles gave and used it to generate the entire audiobook of South by Ernest Shackleton. It sounds good enough that I can listen to the whole book. You can listen to it on my audiobook YouTube channel. What public domain book should I do next? Email me.