The Beginner's Guide to Diff-SVC
  • Project Overview
  • Setting Up
    • Requirements
    • Setting up the Environment
  • Start
    • Dataset Preparation
    • Preprocessing
    • Training
    • Inference
  • See also
  • Appendix
  • Resource List
Powered by GitBook
On this page
  • Process
  • Inference Using Jupyter Notebook
  • Inference Using Python File

Was this helpful?

  1. Start

Inference

PreviousTrainingNextAppendix

Last updated 2 years ago

Was this helpful?

In this stage, you should already have the environment set up and have the checkpoint(s) and config file ready.

The Inference stage is where we convert the voice of the input audio to the voice of the target speaker. People from the vocal synth community may also refer to this stage as "rendering".

Process

The project provides two files for inference. You can use either inference.ipynb or infer.py.

The .ipynb one will provide a more interactive experience and it's great for testing different parameters, while the .py one requires slightly less effort to set up.

Inference Using Jupyter Notebook

  1. Run this command to run Jupyter Notebook, a page will open in your browser:

jupyter notebook
  1. Click on inference.ipynb to open the inference notebook.

  2. In the first block, edit the value of the following:

    1. model_path: The path to your checkpoint file.

    2. config_path: The path to the config file.

  3. In the second block, edit the value of the following:

    1. wav_fn: The path to the input audio.

    2. wav_gen: The path where the output audio will be saved. (You can change .wav to other file extensions like .flac to save in other formats)

    3. key=0 (Optional): Transpose parameter. The default value is 0 (NOT 1). The pitch from the input audio will be shifted by {key} semitones, then synthesized. For example, to change a male voice to a female voice, this value can be set to 8 or 12, etc. (12 is to shift a whole octave up).

    4. pndm_speedup (Optional): Inference acceleration multiplier. Default at 20, which means 1000/20 = 50 diffusion steps will be run during inference. (The default number of diffusion steps is 1000). Increase it for faster inference. This value should be divisible by the number of diffusion steps.

    5. use_crepe (Optional): Set it to True to use the CREPE algorithm for pitch extraction during inference. Change it to False to use the much faster Parselmouth algorithm. You can

    6. use_pe (Optional): This parameter will be ignored on 44.1kHz models. Refer to the documentation for more details.

You should see the audio previews and the pitch visualization when all the cells are executed successfully. The output file will be saved under the wav_gen path you set.

Inference Using Python File

Make sure you are in the diff-svc environment and under the diff-svc directory.

  • pndm_speedup is now accelerate

  • wav_fn is now file_names in list format (which means it can have multiple files)

    • eg. ["xxx.wav", "yyy.wav", "zzz.wav"]

  • key is now trans in list format

  • wav_gen is now set to under ./results automatically and format is where to set the file format

Run this command:

python infer.py

The output files will be under /results.

Make sure you have , in the diff-svc environment, and under the diff-svc directory.

Click Kernel -> Change Kernel and choose the when setting up the environment.

project_name: The you used during training.

Click Cell -> Run All

Open infer.py with a text editor and edit the parameters under the if __name__ == __'main'__: section. The parameters are mostly the same as those in , while

Refer to the of the documentation for more details on other adjustable parameters.

installed Jupyter Notebook
project name
inference.ipynb
Inference section
kernel you created