Training
In this stage, you should already have the environment set up and have the preprocessed files ready.
In this stage, the algorithm will learn from the preprocessed files and generate checkpoints.

This stage may take from hours to even days depending on your config and hardware specifications, but how far it should go is totally up to you.
Process
Edit config
Edit the following entry in the config file if you didn't do it during the preprocessing stage.
Change
max_sentences
max_sentences: 88
# The maximum limit of batch size
The default value of 88 corresponds to 80G of VRAM, you should adjust it according to your VRAM.
If you are not sure what value should be used, you can proceed to the next step and run the training commands, then come back to lower this value if you encounter a CUDA OOM error.
The maximum batch size you can have is depending on the audio length in your dataset and your VRAM. Double-check if there are any excessively long audio files in your dataset if you set max_sentences
to a very small value but keep getting the CUDA OOM error.
Optional Entries
The following entries won't affect the quality of your model. Edit them according to your need or ignore them for now.
Change
endless_ds
if needed
endless_ds:False
# Setting this to True will treat 1000 epochs as a single one.
If you have a small dataset, each epoch will pass very fast and lots of time will be wasted on validations. Setting endless_ds
to True to treat every 1000 epochs as a big epoch.
Change
val_check_interval
if needed
val_check_interval: 2000
# Inference on the test set and save checkpoints every 2000 steps.
By default, the program will save a checkpoint every 2000 steps. Change it if you want to save checkpoints more or less frequently.
Change
num_ckpt_keep
if needed
num_ckpt_keep: 10
By default, the 10 latest checkpoints will be kept. Change it to a bigger value if you want to keep more checkpoints for comparison or a smaller value if you want fewer checkpoints to save space.
Advanced Optional Entries
The following entries are advanced settings, do not edit them if you don't know what they are.
lr: 0.0008
# Initial learning rate: this value corresponds to a batch size of 88; if the batch size is smaller, you can lower this value a bit.
decay_steps: 20000
# For every 20,000 steps, the learning rate will decay to half the original. If the batch size is small, please increase this value.
residual_channels: 384
residual_layers: 20
# A group of parameters that control the core network size. The higher the values, the more parameters the network has and the slower it trains, but this does not necessarily lead to better results. For larger datasets, you can change the first parameter to 512. You can experiment with them on your own. However, it is best to leave them as they are if you are not sure what you are doing.
Model configs with different network size parameters are NOT inter-compatible. You will get an error if the network size in the config file you use for training/inference is not the same as the actual network size of the model.
Refer to the documentation for more details on other adjustable parameters.
Run commands
Navigate to the Diff-SVC folder in the command line and make sure you are in the diff-svc
environment. Then run:
Use
config.yaml
for 22kHz andconfig_nsf.yaml
for 44.1kHz.Replace {project_name} with your own project name. (This should be the same as the project name you used in Preprocessing)
set CUDA_VISIBLE_DEVICES=0
python run.py --config training/config_nsf.yaml --exp_name {project_name} --reset
Model checkpoints will be saved to /checkpoints/{your_project_name}
every val_check_interval
steps (every 2000 steps by default). The .ckpt file and the config file are what you will need for the next stage.
Run Tensorboard (Optional)
Replace the path in the following command with the correct path and run:
tensorboard --logdir=checkpoints/{your_project_name}/lightning_logs/lastest
Open http://localhost:6006/ in your browser to see Tensorboard.
Notes
Make sure you use the same config file you used in preprocessing to edit and start training.
Training will run for a long time and you can keep it running or stop it at any time. Listen to the audio in Tensorboard and determine if you should keep training or stop.
You can also stop training, do inference on your checkpoints, listen to the results, and determine if you should keep training or stop.
You can stop training anytime by pressing
Ctrl + C
and resuming training by running the same commands you used to start training.
Now you can proceed to the next stage: inference.
Last updated
Was this helpful?