Preprocessing
Last updated
Was this helpful?
Last updated
Was this helpful?
In this stage, you should already have the environment set up and have the dataset ready.
In this stage, the raw audio data will be processed by the program and converted to binary files. Some data will also be written into the config file.
This stage will take from minutes to hours to complete depending on your dataset size, the pitch-extracting algorithm you choose, and your hardware specifications.
In your diff-svc folder, create a folder named data
Inside data
, create a folder named raw
.
Put your dataset folder under the raw
folder.
(To align with the next step, this dataset folder should be named by your project name)
training
folderIn the training
folder, make a backup copy of
config.yaml
if you are using the 24kHz vocoder,
or
config_nsf.yaml
if you are using the 44.1kHz vocoder,
then open it with a text editor.
Edit the following entries
(By default, the following config entries will be something like .../{speaker_name} (e.g. data/binary/nyaru
, just replace {speaker_name} with your current project name)
(Optional) Change the pitch extraction algorithm.
By default, the CREPE algorithm will be used for pitch extraction during preprocessing. Keep this at true
for better results, or set it to false
to use Parselmouth for faster processing.
Navigate to the Diff-SVC folder in the command line and make sure you are in the diff-svc
environment. Then run:
Make sure you are using cmd (Anaconda Prompt) instead of Powershell (Anaconda Powershell Prompt).
Remember, config.yaml
for 24kHz and config_nsf.yaml
for 44.1kHz. Go back and redo the previous step if you edited the wrong config file.
When the program finished successfully, you should see the preprocessed files under data/binary/{Your_project_name}
you set in the previous step.
The dataset is not needed anymore after this step, but do keep it somewhere since you may need to do preprocessing again sometimes.
You need to do preprocessing again if
You add, remove or modify audio files in your dataset.
You want to switch from the 24kHz vocoder to the 44.1kHz vocoder or vice versa.
You want to switch from one pitch extraction algorithm for preprocessing to the other one.
Make sure you use the same config file you use here for training since some data are written to it.
Refer to the documentation for more details on the adjustable parameters.
Now you can proceed to the next stage: training.