In our recent paper, we propose VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech. Several recent end-to-end text-to-speech (TTS) models enabling single ...
The code has been tested running under Python 3.7.4, with the following packages and their dependencies installed: ...
As illustrated in Figure 4c, this module deploys multiple heterogeneous predictors, each performing independent inference at its corresponding temporal scale. Subsequently, a fusion operator is ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results