I modified the aggressiveness from 3 to 1. Training history can be found here for a run which used the LJSpeech mode I trained, and here for a model trained from scratch. These models are actually starting to consistently read out words. Interestingly enough, the model trained from scratch actually seems to be performing better.
Unfortunately, the model doesn’t seem to be differentiating much between punctuations
Tensorboard data can be found here. This version used even less aggressive silence filtering (set to aggressiveness value to 0 this time) as well as data fro...
Cheryl and I spent some time taking a bit of a break from training models to try and figure out what it is. One big thing could be an issue with long sentenc...
I decided to try yet another dataset, this time the Blizzard dataset. The tensorboard results can be seen here. With this dataset, it is again successful at ...
I suspect the most significant issue is the forced alignment being slightly off on two accounts:
It might sometimes just be plain wrong
It is common for...