This page contains following speech samples:
(1) Speech feature conversion + WORLD vocoder
Source (Normal) | Source (Falsetto) | Target | Converted (Normal-MSE) | Converted (Normal-GAN) | Converted (Falsetto-MSE) | Converted (Falsetto-GAN) | |
1 | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
2 | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
3 | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
4 | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
5 | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
(2) Spectral differential filtering
Source (Normal) | Source (Falsetto) | Target | Converted (Normal-MSE) | Converted (Normal-GAN) | Converted (Falsetto-MSE) | Converted (Falsetto-GAN) | |
1 | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
2 | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
3 | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
4 | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
5 | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |