# Preprocessing

In this stage, you should already have the environment set up and have the dataset ready.

In this stage, the raw audio data will be processed by the program and converted to binary files. Some data will also be written into the config file.

<figure><img src="/files/fbar9SG6j32lCS2hGAFJ" alt=""><figcaption></figcaption></figure>

This stage will take from minutes to hours to complete depending on your dataset size, the pitch-extracting algorithm you choose, and your hardware specifications.

## Process

### **Create folders under the diff-svc folder**

1. In your diff-svc folder, create a folder named `data`
2. Inside `data`, create a folder named `raw`.
3. Put your dataset folder under the `raw` folder.\
   (To align with the next step, this dataset folder should be named by your project name)

### **Edit the config file in the `training` folder**

1. In the `training` folder, make a backup copy of\
   \
   &#x20;         `config.yaml` if you are using the 24kHz vocoder,\
   &#x20;                                               or\
   &#x20;         `config_nsf.yaml` if you are using the 44.1kHz vocoder,\
   \
   then open it with a text editor.
2. Edit the following entries\
   \
   (By default, the following config entries will be something like **.../{speaker\_name}** (e.g. `data/binary/nyaru`, just replace {speaker\_name} with your current project name)

   <pre><code>binary_data_dir: data/binary/nyaru
   <strong># The path to the pre-processed data.
   </strong>
   raw_data_dir: data/raw/nyaru
   # Path to the directory of the raw data before pre-processing. 

   speaker_id: nyaru
   # The name of the target speaker. (Currently, this parameter is for reference only and has no functional impact)

   work_dir: checkpoints/nyaru
   # Change the last part to the project name. 
   </code></pre>
3. (Optional) Change the pitch extraction algorithm.\
   \
   By default, the CREPE algorithm will be used for pitch extraction during preprocessing. Keep this at `true` for better results, or set it to `false` to use Parselmouth for faster processing.

   ```
   use_crepe: true
   # Use CREPE to extract F0 for pre-processing. Enable it for better results, or disable it for faster processing.
   ```

### **Run commands**

[Navigate](/the-beginners-guide-to-diff-svc/appendix.md#change-directory-in-windows-cmd) to the Diff-SVC folder in the command line and make sure you are [in the `diff-svc` environment](/the-beginners-guide-to-diff-svc/setting-up/setting-up-the-environment.md#using-conda). Then run:

{% tabs %}
{% tab title="Windows (cmd)" %}

```
set PYTHONPATH=.
set CUDA_VISIBLE_DEVICES=0
python preprocessing/binarize.py --config training/config_nsf.yaml
```

{% hint style="info" %}
Make sure you are using cmd (Anaconda Prompt) instead of Powershell (Anaconda Powershell Prompt).
{% endhint %}
{% endtab %}

{% tab title="Linux" %}

```
export PYTHONPATH=.
CUDA_VISIBLE_DEVICES=0 python preprocessing/binarize.py --config training/config_nsf.yaml
```

{% endtab %}
{% endtabs %}

{% hint style="warning" %}
Remember, `config.yaml` for 24kHz and `config_nsf.yaml`for 44.1kHz. Go back and redo the [previous step](#open-the-training-folder) if you edited the wrong config file.
{% endhint %}

When the program finished successfully, you should see the preprocessed files under `data/binary/{Your_project_name}` you set in the previous step.

## Notes

> The dataset is not needed anymore after this step, but do keep it somewhere since you may need to do preprocessing again sometimes.

You need to do preprocessing again if

* You add, remove or modify audio files in your dataset.
* You want to switch from the 24kHz vocoder to the 44.1kHz vocoder or vice versa.&#x20;
* You want to switch from one pitch extraction algorithm for preprocessing to the other one.

{% hint style="warning" %}
Make sure you use the same config file you use here for training since some data are written to it.&#x20;
{% endhint %}

Refer to the [documentation](https://github.com/prophesier/diff-svc/blob/main/doc/training_and_inference_EN.markdown#22-editing-hyperparameters) for more details on the adjustable parameters.&#x20;

Now you can proceed to the next stage: training.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://diff-svc.gitbook.io/the-beginners-guide-to-diff-svc/start/preprocessing.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
