Inference

In this stage, you should already have the environment set up and have the checkpoint(s) and config file ready.

The Inference stage is where we convert the voice of the input audio to the voice of the target speaker. People from the vocal synth community may also refer to this stage as "rendering".

Process

The project provides two files for inference. You can use either inference.ipynb or infer.py.

The .ipynb one will provide a more interactive experience and it's great for testing different parameters, while the .py one requires slightly less effort to set up.

Inference Using Jupyter Notebook

Make sure you have installed Jupyter Notebook, in the diff-svc environment, and under the diff-svc directory.

Run this command to run Jupyter Notebook, a page will open in your browser:

jupyter notebook

Click on inference.ipynb to open the inference notebook.
Click Kernel -> Change Kernel and choose the kernel you created when setting up the environment.
In the first block, edit the value of the following:
1. project_name: The project name you used during training.
2. model_path: The path to your checkpoint file.
3. config_path: The path to the config file.
In the second block, edit the value of the following:
1. wav_fn: The path to the input audio.
2. wav_gen: The path where the output audio will be saved. (You can change .wav to other file extensions like .flac to save in other formats)
3. key=0 (Optional): Transpose parameter. The default value is 0 (NOT 1). The pitch from the input audio will be shifted by {key} semitones, then synthesized. For example, to change a male voice to a female voice, this value can be set to 8 or 12, etc. (12 is to shift a whole octave up).
4. pndm_speedup (Optional): Inference acceleration multiplier. Default at 20, which means 1000/20 = 50 diffusion steps will be run during inference. (The default number of diffusion steps is 1000). Increase it for faster inference. This value should be divisible by the number of diffusion steps.
5. use_crepe (Optional): Set it to True to use the CREPE algorithm for pitch extraction during inference. Change it to False to use the much faster Parselmouth algorithm. You can
6. use_pe (Optional): This parameter will be ignored on 44.1kHz models. Refer to the documentation for more details.
Click Cell -> Run All

You should see the audio previews and the pitch visualization when all the cells are executed successfully. The output file will be saved under the wav_gen path you set.

Inference Using Python File

Make sure you are in the diff-svc environment and under the diff-svc directory.

Open infer.py with a text editor and edit the parameters under the if __name__ == __'main'__: section. The parameters are mostly the same as those in inference.ipynb, while

pndm_speedup is now accelerate
wav_fn is now file_names in list format (which means it can have multiple files)
- eg. ["xxx.wav", "yyy.wav", "zzz.wav"]
key is now trans in list format
wav_gen is now set to under ./results automatically and format is where to set the file format

Run this command:

python infer.py

The output files will be under /results.

Refer to the Inference section of the documentation for more details on other adjustable parameters.

PreviousTraining NextAppendix

Last updated 2 years ago

Was this helpful?