Preprocessing

In this stage, you should already have the environment set up and have the dataset ready.

In this stage, the raw audio data will be processed by the program and converted to binary files. Some data will also be written into the config file.

This stage will take from minutes to hours to complete depending on your dataset size, the pitch-extracting algorithm you choose, and your hardware specifications.

Process

Create folders under the diff-svc folder

  1. In your diff-svc folder, create a folder named data

  2. Inside data, create a folder named raw.

  3. Put your dataset folder under the raw folder. (To align with the next step, this dataset folder should be named by your project name)

Edit the config file in the training folder

  1. In the training folder, make a backup copy of config.yaml if you are using the 24kHz vocoder, or config_nsf.yaml if you are using the 44.1kHz vocoder, then open it with a text editor.

  2. Edit the following entries (By default, the following config entries will be something like .../{speaker_name} (e.g. data/binary/nyaru, just replace {speaker_name} with your current project name)

    binary_data_dir: data/binary/nyaru
    # The path to the pre-processed data.
    
    raw_data_dir: data/raw/nyaru
    # Path to the directory of the raw data before pre-processing. 
    
    speaker_id: nyaru
    # The name of the target speaker. (Currently, this parameter is for reference only and has no functional impact)
    
    work_dir: checkpoints/nyaru
    # Change the last part to the project name. 
  3. (Optional) Change the pitch extraction algorithm. By default, the CREPE algorithm will be used for pitch extraction during preprocessing. Keep this at true for better results, or set it to false to use Parselmouth for faster processing.

    use_crepe: true
    # Use CREPE to extract F0 for pre-processing. Enable it for better results, or disable it for faster processing.

Run commands

Navigate to the Diff-SVC folder in the command line and make sure you are in the diff-svc environment. Then run:

set PYTHONPATH=.
set CUDA_VISIBLE_DEVICES=0
python preprocessing/binarize.py --config training/config_nsf.yaml

Make sure you are using cmd (Anaconda Prompt) instead of Powershell (Anaconda Powershell Prompt).

When the program finished successfully, you should see the preprocessed files under data/binary/{Your_project_name} you set in the previous step.

Notes

The dataset is not needed anymore after this step, but do keep it somewhere since you may need to do preprocessing again sometimes.

You need to do preprocessing again if

  • You add, remove or modify audio files in your dataset.

  • You want to switch from the 24kHz vocoder to the 44.1kHz vocoder or vice versa.

  • You want to switch from one pitch extraction algorithm for preprocessing to the other one.

Refer to the documentation for more details on the adjustable parameters.

Now you can proceed to the next stage: training.

Last updated

Was this helpful?