Inference
Last updated
Was this helpful?
Last updated
Was this helpful?
In this stage, you should already have the environment set up and have the checkpoint(s) and config file ready.
The Inference stage is where we convert the voice of the input audio to the voice of the target speaker. People from the vocal synth community may also refer to this stage as "rendering".
The project provides two files for inference. You can use either inference.ipynb
or infer.py
.
The .ipynb one will provide a more interactive experience and it's great for testing different parameters, while the .py one requires slightly less effort to set up.
Make sure you have installed Jupyter Notebook, in the diff-svc environment, and under the diff-svc directory.
Run this command to run Jupyter Notebook, a page will open in your browser:
Click on inference.ipynb
to open the inference notebook.
In the first block, edit the value of the following:
project_name
: The project name you used during training.
model_path
: The path to your checkpoint file.
config_path
: The path to the config file.
In the second block, edit the value of the following:
wav_fn
: The path to the input audio.
wav_gen
: The path where the output audio will be saved.
(You can change .wav
to other file extensions like .flac
to save in other formats)
key=0
(Optional): Transpose parameter.
The default value is 0
(NOT 1). The pitch from the input audio will be shifted by {key} semitones, then synthesized.
For example, to change a male voice to a female voice, this value can be set to 8 or 12, etc. (12 is to shift a whole octave up).
pndm_speedup
(Optional): Inference acceleration multiplier.
Default at 20
, which means 1000/20 = 50 diffusion steps will be run during inference. (The default number of diffusion steps is 1000). Increase it for faster inference. This value should be divisible by the number of diffusion steps.
use_crepe
(Optional): Set it to True
to use the CREPE algorithm for pitch extraction during inference. Change it to False
to use the much faster Parselmouth algorithm. You can
use_pe
(Optional): This parameter will be ignored on 44.1kHz models. Refer to the documentation for more details.
You should see the audio previews and the pitch visualization when all the cells are executed successfully. The output file will be saved under the wav_gen
path you set.
Make sure you are in the diff-svc environment and under the diff-svc directory.
Open infer.py
with a text editor and edit the parameters under the if __name__ == __'main'__:
section.
The parameters are mostly the same as those in inference.ipynb
, while
pndm_speedup
is now accelerate
wav_fn
is now file_names
in list format (which means it can have multiple files)
eg. ["xxx.wav", "yyy.wav", "zzz.wav"]
key
is now trans
in list format
wav_gen
is now set to under ./results automatically and format
is where to set the file format
Run this command:
The output files will be under /results
.
Refer to the Inference section of the documentation for more details on other adjustable parameters.
Click Kernel -> Change Kernel and choose the kernel you created when setting up the environment.
Click Cell -> Run All