Ymarferol 3: Text to Speech

1743811200000

1 Try different models

I tried different language with different vocoder. All vocoders tried worsen the quality of the output, but many yielded the following mistake:

rate must be specified when data is a numpy array or list of audio samples.

I could not find a way to specify the rate in the cli. The most convincing voice I could generate was the Ukrainian model tts_models/uk/mai/glow-tts. Using a GPU did not allow for any speed improvement during the generation, as most of the time spent was on downloading the models instead of running them.

2 Voice Cloning

I managed to go through all the steps, but despite an hour an a half of training, the model would not produce any sound. If I know I would train voice generation for an English model I would have spoken in English in Common Voice, but my dataset was made of breton recordings. Here are my loss results:

StepTraining LossValidation Loss
10000.4006000.719295
20000.3549000.730884
30000.3418000.745848
40000.3291000.776807