Training
Last updated
Was this helpful?
Last updated
Was this helpful?
In this stage, you should already have the environment set up and have the preprocessed files ready.
In this stage, the algorithm will learn from the preprocessed files and generate checkpoints.
This stage may take from hours to even days depending on your config and hardware specifications, but how far it should go is totally up to you.
Edit the following entry in the config file if you didn't do it during the preprocessing stage.
Change max_sentences
The default value of 88 corresponds to 80G of VRAM, you should adjust it according to your VRAM.
If you are not sure what value should be used, you can proceed to the next step and run the training commands, then come back to lower this value if you encounter a CUDA OOM error.
The maximum batch size you can have is depending on the audio length in your dataset and your VRAM. Double-check if there are any excessively long audio files in your dataset if you set max_sentences
to a very small value but keep getting the CUDA OOM error.
The following entries won't affect the quality of your model. Edit them according to your need or ignore them for now.
Change endless_ds
if needed
If you have a small dataset, each epoch will pass very fast and lots of time will be wasted on validations. Setting endless_ds
to True to treat every 1000 epochs as a big epoch.
Change val_check_interval
if needed
By default, the program will save a checkpoint every 2000 steps. Change it if you want to save checkpoints more or less frequently.
Change num_ckpt_keep
if needed
By default, the 10 latest checkpoints will be kept. Change it to a bigger value if you want to keep more checkpoints for comparison or a smaller value if you want fewer checkpoints to save space.
The following entries are advanced settings, do not edit them if you don't know what they are.
Model configs with different network size parameters are NOT inter-compatible. You will get an error if the network size in the config file you use for training/inference is not the same as the actual network size of the model.
Refer to the documentation for more details on other adjustable parameters.
Navigate to the Diff-SVC folder in the command line and make sure you are in the diff-svc
environment. Then run:
Use config.yaml
for 22kHz and config_nsf.yaml
for 44.1kHz.
Replace {project_name} with your own project name. (This should be the same as the project name you used in Preprocessing)
Make sure you are using cmd (Anaconda Prompt) instead of Powershell (Anaconda Powershell Prompt).
Model checkpoints will be saved to /checkpoints/{your_project_name}
every val_check_interval
steps (every 2000 steps by default). The .ckpt file and the config file are what you will need for the next stage.
Replace the path in the following command with the correct path and run:
Open http://localhost:6006/ in your browser to see Tensorboard.
Make sure you use the same config file you used in preprocessing to edit and start training.
Training will run for a long time and you can keep it running or stop it at any time. Listen to the audio in Tensorboard and determine if you should keep training or stop.
You can also stop training, do inference on your checkpoints, listen to the results, and determine if you should keep training or stop.
You can stop training anytime by pressing Ctrl + C
and resuming training by running the same commands you used to start training.
Now you can proceed to the next stage: inference.