Training

In this stage, you should already have the environment set up and have the preprocessed files ready.

In this stage, the algorithm will learn from the preprocessed files and generate checkpoints.

This stage may take from hours to even days depending on your config and hardware specifications, but how far it should go is totally up to you.

Process

Edit config

Edit the following entry in the config file if you didn't do it during the preprocessing stage.

  • Change max_sentences

max_sentences: 88
# The maximum limit of batch size

The default value of 88 corresponds to 80G of VRAM, you should adjust it according to your VRAM.

If you are not sure what value should be used, you can proceed to the next step and run the training commands, then come back to lower this value if you encounter a CUDA OOM error.

The maximum batch size you can have is depending on the audio length in your dataset and your VRAM. Double-check if there are any excessively long audio files in your dataset if you set max_sentences to a very small value but keep getting the CUDA OOM error.

Optional Entries

The following entries won't affect the quality of your model. Edit them according to your need or ignore them for now.

  • Change endless_ds if needed

endless_ds:False
# Setting this to True will treat 1000 epochs as a single one.

If you have a small dataset, each epoch will pass very fast and lots of time will be wasted on validations. Setting endless_ds to True to treat every 1000 epochs as a big epoch.

  • Change val_check_interval if needed

val_check_interval: 2000
# Inference on the test set and save checkpoints every 2000 steps.

By default, the program will save a checkpoint every 2000 steps. Change it if you want to save checkpoints more or less frequently.

  • Change num_ckpt_keep if needed

num_ckpt_keep: 10

By default, the 10 latest checkpoints will be kept. Change it to a bigger value if you want to keep more checkpoints for comparison or a smaller value if you want fewer checkpoints to save space.

Advanced Optional Entries

The following entries are advanced settings, do not edit them if you don't know what they are.

lr: 0.0008
# Initial learning rate: this value corresponds to a batch size of 88; if the batch size is smaller, you can lower this value a bit.

decay_steps: 20000
# For every 20,000 steps, the learning rate will decay to half the original. If the batch size is small, please increase this value.
residual_channels: 384
residual_layers: 20
# A group of parameters that control the core network size. The higher the values, the more parameters the network has and the slower it trains, but this does not necessarily lead to better results. For larger datasets, you can change the first parameter to 512. You can experiment with them on your own. However, it is best to leave them as they are if you are not sure what you are doing. 

Refer to the documentation for more details on other adjustable parameters.

Run commands

Navigate to the Diff-SVC folder in the command line and make sure you are in the diff-svc environment. Then run:

  • Use config.yaml for 22kHz and config_nsf.yamlfor 44.1kHz.

  • Replace {project_name} with your own project name. (This should be the same as the project name you used in Preprocessing)

set CUDA_VISIBLE_DEVICES=0
python run.py --config training/config_nsf.yaml --exp_name {project_name} --reset

Make sure you are using cmd (Anaconda Prompt) instead of Powershell (Anaconda Powershell Prompt).

Model checkpoints will be saved to /checkpoints/{your_project_name} every val_check_interval steps (every 2000 steps by default). The .ckpt file and the config file are what you will need for the next stage.

Run Tensorboard (Optional)

Replace the path in the following command with the correct path and run:

tensorboard --logdir=checkpoints/{your_project_name}/lightning_logs/lastest

Open http://localhost:6006/ in your browser to see Tensorboard.

Notes

  • Training will run for a long time and you can keep it running or stop it at any time. Listen to the audio in Tensorboard and determine if you should keep training or stop.

    • You can also stop training, do inference on your checkpoints, listen to the results, and determine if you should keep training or stop.

  • You can stop training anytime by pressing Ctrl + C and resuming training by running the same commands you used to start training.

Now you can proceed to the next stage: inference.

Last updated

Was this helpful?