Yuki Saito, Ph.D.

Language: EN/JP


I'm a Lecturer in System #1 Lab. at the University of Tokyo.
My research interests are speech synthesis, voice conversion, machine learning, machine intelligence, and so on.

My CVs are available [here (full)] and [here (short)].

Email: yuuki_saito {at} ipc.i.u-tokyo.ac.jp Twitter: @ysaito_human LinkedIn: yuki-saito-36a32a129

Publications:

Tutorials

  1. Yuki Saito, Shinnosuke Takamichi, and Wataru Nakata, "Emerging topics for speech synthesis: versatility and efficiency," APSIPA ASC 2024, Macau, China, Dec. 2024. (Slide)

Journal Papers

  1. Detai Xin*, Junfeng Jiang*, Shinnosuke Takamichi, Yuki Saito, Akiko Aizawa and Hiroshi Saruwatari, "JVNV: A corpus of Japanese emotional speech with verbal content and nonverbal expressions," IEEE Access, Vol. 12, pp. 19752--19764, Feb. 2024. (IEEE Xplore, *: equal contribution)
  2. Yuki Saito*, Kohei Yatabe*, and Shogun, "Does controller sound contain valuable information for video game scene analysis? Case study by character identification of Super Smash Bros. Ultimate," Acoustical Science and Technology, Vol. 45, No. 2, pp. 113--116, Feb. 2024. (J-STAGE, *: equal contribution)
  3. Xuan Luo, Shinnosuke Takamichi, Yuki Saito, Tomoki Koriyama, and Hiroshi Saruwatari, "Emotion-controllable speech synthesis using emotion soft label, utterance-level prosodic factors, and word-level prominence," APSIPA Transactions on Signal and Information Processing, Vol. 13, No. 1, 30 pages, Feb. 2024. (now publishers)
  4. Satoshi Mizoguchi, Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari, "DNN-based low-musical-noise single-channel speech enhancement based on higher-order-moments matching," IEICE Transactions on Information and Systems, Vol. E104-D, No. 11, pp. 1971--1980, Nov. 2021. (J-STAGE)
  5. Takaaki Saeki, Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari, "Real-time full-band voice conversion with sub-band modeling and data-driven phase estimation of spectral differentials," IEICE Transactions on Information and Systems, Vol. E104-D, No. 7, pp. 1002--1016, Jul. 2021. (2021 IEICE Journal Paper Award), (J-STAGE)
  6. Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari, "Perceptual-similarity-aware deep speaker representation learning for multi-speaker generative modeling," IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 29, pp. 1033--1048, Feb. 2021. (IEEE Xplore, Poster at ICASSP2022)
  7. Yuki Saito, Taiki Nakamura, Yusuke Ijima, Kyosuke Nishida, and Shinnosuke Takamichi, "Non-parallel and many-to-many voice conversion using variational autoencoders integrating speech recognition and speaker verification," Acoustical Science and Technology, Vol. 42, No. 1, pp. 1--11, Jan. 2021. (J-STAGE)
  8. Yuki Saito, Kei Akuzawa, and Kentaro Tachibana, "Joint adversarial training of speech recognition and synthesis models for many-to-one voice conversion using phonetic posteriorgrams," IEICE Transactions on Information and Systems, Vol. 103-D, No. 9, pp. 1978--1987, Sep. 2020. (J-STAGE)
  9. Shinnosuke Takamichi, Ryosuke Sonobe, Kentaro Mitsui, Yuki Saito, Tomoki Koriyama, Naoko Tanji, and Hiroshi Saruwatari, "JSUT and JVS: free Japanese voice corpora for accelerating speech synthesis research," Acoustical Science and Technology, Vol. 41, No. 5, pp. 761--768, 2020. (J-STAGE)
  10. Shinnosuke Takamichi, Yuki Saito, Norihiro Takamune, Daichi Kitamura, and Hiroshi Saruwatari, "Phase reconstruction from amplitude spectrograms based on directional-statistics deep neural networks," Signal Processing, Vol. 169, 12 pages, Apr. 2020. (ScienceDirect)
  11. Hiroki Tamaru, Yuki Saito, Shinnosuke Takamichi, Tomoki Koriyama, and Hiroshi Saruwatari, "Generative moment matching network-based neural double-tracking for synthesized and natural singing voices," IEICE Transactions on Information and Systems, Vol. E103-D, No. 3, pp. 639--647, Mar. 2020. (J-STAGE)
  12. Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari, "Vocoder-free text-to-speech synthesis incorporating generative adversarial networks using low-/multi-frequency STFT amplitude spectra," Computer Speech and Language, Vol. 58, pp. 347--363, Nov. 2019. (ScienceDirect)
  13. Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari, "Statistical parametric speech synthesis incorporating generative adversarial networks," IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 26, No. 1, pp. 84--96, Jan. 2018. (The 34th TELECOM System Technology Award for Students from TAF, IEEE Signal Processing Society Japan Student Journal Paper Award, 2020 IEEE SPS Young Author Best Paper Award), (IEEE Xplore)
  14. Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari, "Voice conversion using input-to-output highway networks," IEICE Transactions on Information and Systems, Vol. E100-D, No. 8, pp. 1925--1928, Aug. 2017. (J-STAGE)

International Conferences (Peer-Reviewed)

  1. Kazuki Yamauchi, Wataru Nakata, Yuki Saito, and Hiroshi Saruwatari, "Decoding strategy with perceptual rating prediction for language model-based text-to-speech synthesis," Proc. NeurIPS Audio Imagination Workshop, pp. xxxx--xxxx, Vancouver, Canada, Dec. 2024. (ACCEPTED)
  2. Wataru Nakata, Takaaki Saeki, Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari, "NecoBERT: Self-supervised learning model trained by masked language modeling on rich acoustic features derived from neural audio codec," Proc. APSIPA ASC, pp. xxxx--xxxx, Macau, China, Dec. 2024. (ACCEPTED)
  3. Yuto Ishikawa, Osamu Take, Tomohiko Nakamura, Norihiro Takamune, Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari, "Real-time noise estimation for Lombard-effect speech synthesis in human--avatar dialogue systems," Proc. APSIPA ASC, pp. xxxx--xxxx, Macau, China, Dec. 2024. (ACCEPTED)
  4. Kaito Baba, Wataru Nakata, Yuki Saito, and Hiroshi Saruwatari, "The T05 system for The VoiceMOS Challenge 2024: Transfer learning from deep image classifier to naturalness MOS prediction of high-quality synthetic speech," Proc. SLT, pp. xxxx--xxxx, Macau, China, Dec. 2024. (ACCEPTED)
  5. Kazuki Yamauchi, Yuki Saito, and Hiroshi Saruwatari, "Cross-dialect text-to-speech in pitch-accent language incorporating multi-dialect phoneme-level BERT," Proc. SLT, pp. xxxx--xxxx, Macau, China, Dec. 2024. (ACCEPTED)
  6. Dong Yang, Tomoki Koriyama, and Yuki Saito, "Frame-wise breath detection with self-training: An exploration of enhancing breath naturalness in text-to-speech," Proc. INTERSPEECH, pp. 4928--4932, Kos, Greece, Sep. 2024. (PDF, Poster) (Shortlisted for the ISCA Best Student Paper Award 2024)
  7. Takuto Igarashi, Yuki Saito, Kentaro Seki, Shinnosuke Takamichi, Ryuichi Yamamoto, Kentaro Tachibana, and Hiroshi Saruwatari, "Noise-robust voice conversion by conditional denoising training using latent variables of recording quality and environment," Proc. INTERSPEECH, pp. 2750--2754, Kos, Greece, Sep. 2024. (PDF, Poster)
  8. Yuki Saito, Takuto Igarashi, Kentaro Seki, Shinnosuke Takamichi, Ryuichi Yamamoto, Kentaro Tachibana, and Hiroshi Saruwatari, "SRC4VC: Smartphone-recorded corpus for voice conversion benchmark," Proc. INTERSPEECH, pp. 1825--1829, Kos, Greece, Sep. 2024. (PDF, Poster)
  9. Kentaro Seki, Shinnosuke Takamichi, Norihiro Takamune, Yuki Saito, Kanami Imamura, and Hiroshi Saruwatari, "Spatial voice conversion: Voice conversion preserving spatial information and non-target signals," Proc. INTERSPEECH, pp. 177--181, Kos, Greece, Sep. 2024. (PDF, Slide)
  10. Kazuki Yamauchi, Yusuke Ijima, and Yuki Saito, "StyleCap: Automatic speaking-style captioning from speech based on speech and language self-supervised learning models," Proc. ICASSP, 5 pages, Seoul, South Korea, Apr. 2024. (PDF, Poster)
  11. Aya Watanabe, Shinnosuke Takamichi, Yuki Saito, Wataru Nakata, Detai Xin, and Hiroshi Saruwatari, "Coco-Nut: Corpus of Japanese utterances and voice characteristics description for prompt-based control," Proc. ASRU, pp. 781--788, Taipei, Taiwan, Dec. 2023. (PDF, Project page, Poster)
  12. Ryunosuke Hirai, Yuki Saito, and Hiroshi Saruwatari, "Federated learning for human-in-the-loop many-to-many voice conversion," Proc. The 12th ISCA SSW, 6 pages, Grenoble, France, Aug. 2023. (OpenReview)
  13. Yuki Saito, Eiji Iimori, Shinnosuke Takamichi, Kentaro Tachibana, and Hiroshi Saruwatari, "CALLS: Japanese empathetic dialogue speech corpus of complaint handling and attentive listening in customer center," Proc. INTERSPEECH, pp. 5561--5565, Dublin, Ireland, Aug. 2023. (Demo, Poster) (Travel Grant Award for INTERSPEECH2023)
  14. Yota Ueda, Shinnosuke Takamichi, Yuki Saito, Norihiro Takamune, and Hiroshi Saruwatari, "HumanDiffusion: diffusion model using perceptual gradients," Proc. INTERSPEECH, pp. 4264--4268, Dublin, Ireland, Aug. 2023. Poster
  15. Yuki Saito, Shinnosuke Takamichi, Eiji Iimori, Kentaro Tachibana, and Hiroshi Saruwatari, "ChatGPT-EDSS: empathetic dialogue speech synthesis trained from ChatGPT-derived context word embeddings," Proc. INTERSPEECH, pp. 3048--3052, Dublin, Ireland, Aug. 2023. (Demo, Slide) (Travel Grant Award for INTERSPEECH2023)
  16. Dong Yang, Tomoki Koriyama, Yuki Saito, Takaaki Saeki, Detai Xin, and Hiroshi Saruwatari, "Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech," Proc. ICASSP, 5 pages, Rhodes, Greece, Jun. 2023. (Demo)
  17. Aya Watanabe, Shinnosuke Takamichi, Yuki Saito, Detai Xin, and Hiroshi Saruwatari, "Mid-attribute speaker generation using optimal-transport-based interpolation of Gaussian mixture models," Proc. ICASSP, 5 pages, Rhodes, Greece, Jun. 2023. (Demo)
  18. Kazuki Fujii, Yuki Saito, and Hiroshi Saruwatari, "Adaptive end-to-end text-to-speech synthesis based on error correction feedback from humans," Proc. APSIPA ASC, pp. 1699--1674, Chiang Mai, Thailand, Nov. 2022. (PDF, Slide)
  19. Yusuke Nakai, Yuki Saito, Kenta Udagawa, and Hiroshi Saruwatari, "Multi-task adversarial training algorithm for multi-speaker neural text-to-speech," Proc. APSIPA ASC, pp. 744--749, Chiang Mai, Thailand, Nov. 2022. (PDF, Slide)
  20. Yuki Saito, Yuto Nishimura, Shinnosuke Takamichi, Kentaro Tachibana, and Hiroshi Saruwatari, "STUDIES: Corpus of Japanese Empathetic Dialogue Speech Towards Friendly Voice Agent," Proc. INTERSPEECH, pp. 5155--5159, Incheon, South Korea, Sep. 2022. (PDF, Speech samples, Poster)
  21. Wataru Nakata, Tomoki Koriyama, Shinnosuke Takamichi, Yuki Saito, Yusuke Ijima, Ryo Masumura, and Hiroshi Saruwatari, "Predicting VQVAE-based Character Acting Style from Quotation-Annotated Text for Audiobook Speech Synthesis," Proc. INTERSPEECH, pp. 4551--4555, Incheon, South Korea, Sep. 2022. (PDF, Speech samples, Poster)
  22. Yuto Nishimura, Yuki Saito, Shinnosuke Takamichi, Kentaro Tachibana, and Hiroshi Saruwatari, "Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History," Proc. INTERSPEECH, pp. 3373--3377, Incheon, South Korea, Sep. 2022. (Google Travel Grants for Students in East Asia) (PDF, Speech samples, Slide)
  23. Kenta Udagawa, Yuki Saito, and Hiroshi Saruwatari, "Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS," Proc. INTERSPEECH, pp. 2968--2972, Incheon, South Korea, Sep. 2022. (PDF, Speech samples, Poster)
  24. Xuan Luo, Shinnosuke Takamichi, Tomoki Koriyama, Yuki Saito, and Hiroshi Saruwatari, "Emotion-controllable speech synthesis using emotion soft labels and fine-grained prosody factors," Proc. APSIPA ASC, pp. 794--799, Tokyo, Japan, Dec. 2021. (PDF, Speech samples)
  25. Detai Xin, Yuki Saito, Shinnosuke Takamichi, Tomoki Koriyama, and Hiroshi Saruwatari, "Cross-lingual speaker adaptation using domain adaptation and speaker consistency loss for text-to-speech synthesis," Proc. INTERSPEECH, pp. 1614--1618, Brno, Czech Republic, Sep. 2021. (PDF)
  26. Yota Ueda, Kazuki Fujii, Yuki Saito, Shinnosuke Takamichi, Yukino Baba, and Hiroshi Saruwatari, "HumanACGAN: conditional generative adversarial network with human-based auxiliary classifier and its evaluation in phoneme perception," Proc. ICASSP, pp. 6468--6472, Toronto, Canada, Jun. 2021. (PDF, arXiv preprint, Poster)
  27. Yuki Yamashita, Tomoki Koriyama, Yuki Saito, Shinnosuke Takamichi, Yusuke Ijima, Ryo Masumura, and Hiroshi Saruwatari, "Investigating effective additional contextual factors in DNN-based spontaneous speech synthesis," Proc. INTERSPEECH, pp. 3201--3205, Shanghai, China, Oct. 2020. (PDF)
  28. Detai Xin, Yuki Saito, Shinnosuke Takamichi, Tomoki Koriyama, and Hiroshi Saruwatari, "Cross-lingual text-to-speech synthesis via domain adaptation and perceptual similarity regression in speaker space," Proc. INTERSPEECH, pp. 2947--2951, Shanghai, China, Oct. 2020. (PDF) (Speech samples)
  29. Shunsuke Goto, Kotaro Ohnishi, Yuki Saito, Kentaro Tachibana, and Koichiro Mori, "Face2Speech: towards multi-speaker text-to-speech synthesis using an embedding vector predicted from a face image," Proc. INTERSPEECH, pp. 1321--1325, Shanghai, China, Oct. 2020. (PDF) (Demo)
  30. Takaaki Saeki, Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari, "Real-time, full-band, online DNN-based voice conversion system using a single CPU," Proc. INTERSPEECH, pp. 1021--1022, Shanghai, China, Oct. 2020. (PDF, Video)
  31. Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari, "SMASH corpus: a spontaneous speech corpus recording third-person audio commentaries on gameplay," Proc. LREC, pp. 6573--6579, Marseille, France, May 2020. (PDF)
  32. Yuki Yamashita, Tomoki Koriyama, Yuki Saito, Shinnosuke Takamichi, Yusuke Ijima, Ryo Masumura, and Hiroshi Saruwatari, "DNN-based speech synthesis using abundant tags of spontaneous speech corpus," Proc. LREC, pp. 6440--6445, Marseille, France, May 2020. (PDF)
  33. Kazuki Fujii, Yuki Saito, Shinnosuke Takamichi, Yukino Baba, and Hiroshi Saruwatari, "HumanGAN: generative adversarial network with human-based discriminator and its evaluation in speech perception modeling," Proc. ICASSP, pp. 6239--6243, Barcelona, Spain, May 2020. (Main contribution paper for FujiSankei Business i Awards, Main contribution paper for National Institute of Technology Student Award) (PDF, arXiv preprint, Video)
  34. Takaaki Saeki, Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari, "Lifter training and sub-band modeling for computationally efficient and high-quality voice conversion using spectral differentials," Proc. ICASSP, pp. 7784--7788, Barcelona, Spain, May 2020. (PDF, arXiv preprint, Video)
  35. Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari, "DNN-based speaker embedding using subjective inter-speaker similarity for multi-speaker modeling in speech synthesis," Proc. The 10th ISCA SSW, pp. 51--56, Vienna, Austria, Sep. 2019. (PDF, arXiv preprint, Poster)
  36. Taiki Nakamura, Yuki Saito, Shinnosuke Takamichi, Yusuke Ijima, and Hiroshi Saruwatari, "V2S attack: building DNN-based voice conversion from automatic speaker verification," Proc. The 10th ISCA SSW, pp. 161--165, Vienna, Austria, Sep. 2019. (PDF, arXiv preprint, Poster)
  37. Hiroki Tamaru, Yuki Saito, Shinnosuke Takamichi, Tomoki Koriyama, and Hiroshi Saruwatari, "Generative moment matching network-based random modulation post-filter for DNN-based singing voice synthesis and neural double-tracking," Proc. ICASSP, pp. 7070--7074, Brighton, United Kingdom, May 2019. (PDF, arXiv preprint, Poster, Demo)
  38. Masakazu Une, Yuki Saito, Shinnosuke Takamichi, Daichi Kitamura, Ryoichi Miyazaki and Hiroshi Saruwatari, "Generative approach using the noise generation models for DNN-based speech synthesis trained from noisy speech," Proc. APSIPA ASC, pp. 99--103, Hawaii, U.S.A., Nov. 2018. (Invited Special Session), (PDF, Slide)
  39. Shinnosuke Takamichi, Yuki Saito, Norihiro Takamune, Daichi Kitamura, and Hiroshi Saruwatari, "Phase reconstruction from amplitude spectrograms based on von-Mises-distribution deep neural network," Proc. IWAENC, pp. 286--290, Tokyo, Japan, Sep. 2018. (PDF, Poster)
  40. Yuki Saito, Yusuke Ijima, Kyosuke Nishida, and Shinnosuke Takamichi, "Non-parallel voice conversion using variational autoencoders conditioned by phonetic posteriorgrams and d-vectors," Proc. ICASSP, pp. 5274--5278, Alberta, Canada, Apr. 2018. (Grants for Researchers Attending International Conferences from NEC C&C, Outstanding Paper Award for Young C&C Researchers) (PDF, Poster)
  41. Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari, "Text-to-speech synthesis using STFT spectra based on low-/multi-resolution generative adversarial networks," Proc. ICASSP, pp. 5299--5303, Alberta, Canada, Apr. 2018. (PDF, Poster)
  42. Hiroyuki Miyoshi, Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari, "Voice conversion using sequence-to-sequence learning of context posterior probabilities," Proc. INTERSPEECH, pp. 1268--1272, Stockholm, Sweden, Aug. 2017. (PDF, Slide, Speech samples)
  43. Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari, "Training algorithm to deceive anti-spoofing verification for DNN-based speech synthesis," Proc. ICASSP, pp. 4900--4904, New Orleans, U.S.A., Mar. 2017. (Spoken Language Processing Student Grant of ICASSP 2017), (PDF, Slide)
  44. Yuki Saito, and Hiroshi Tenmoto, "Construction of highly interpretable classification rule based on linear SVM," Proc. ISTS, Taipei, Taiwan, Nov. 2014.

Technical Reports

  1. Taisei Takano, Yuki Okamoto, Yuki Saito, "Performance analysis on CLAP-Score for text-to-audio evaluation," YANS2024, Sep. 2024. (Poster) (YANS2024 IVRy Award)
  2. Kazuki Yamauchi, Wataru Nakata, Yuki Saito, Hiroshi Saruwatari, "Decoding strategy with subjective speech quality prediction for discrete-token-based text-to-speech," IPSJ SIG Technical Report, 2024-SLP-152, No. 14, pp. 1--6, Jun. 2024. (in Japanese, PDF, Poster) (2024 Otogaku Symposium Best Presentation Award)
  3. Wataru Nakata*, Kazuki Yamauchi*, Dong Yang, Hiroaki Hyodo, and Yuki Saito, "UTDUSS: UTokyo-SaruLab System for Interspeech2024 Speech Processing Using Discrete Speech Unit Challenge," Technical Report for Interspeech2024 Speech Processing Using Discrete Speech Unit Challenge, 5 pages, Mar. 2024. (arXiv, *: equal contribution) (Ranked 1st in TTS (Acoustic+Vocoder) track, Leaderboard)
  4. Kazuki Yamauchi, Yuki Saito, and Hiroshi Saruwatari, "Multi-dialect text-to-speech using VQVAE-derived interpretable accent latent variables," SP2023-80, Vol. 123, No. 403, pp.220--225, Jun. 2024. (in Japanese, Student Poster Award) (PDF, Poster)
  5. Yuki Oda, Kazuki Yamauchi, Yuki Saito, and Hiroshi Saruwatari, "Dialect adaptation of Japanese end-to-end text-to-speech based on crowdsourced dialect accent labels," SPEASIP Workshop Short Oral Presentation, Vol. 123, No. 403, Jun. 2024. (in Japanese)
  6. Yuki Saito, Takuto Igarashi, Kentaro Seki, Shinnosuke Takamichi, Ryuichi Yamamoto, Kentaro Tachibana, and Hiroshi Saruwatari, "SRC4VC: Smartphone-recorded corpus for benchmarking multi-speaker voice conversion models," SPEASIP Workshop Short Oral Presentation, Vol. 123, No. 403, Jun. 2024. (in Japanese)
  7. Takuto Igarashi, Yuki Saito, Kentaro Seki, Shinnosuke Takamichi, Ryuichi Yamamoto, Kentaro Tachibana, and Hiroshi Saruwatari, "Noise-robust voice conversion using denoising training using recording quality and environment as conditional features," SP2023-45, Vol. 123, No. 403, pp. 13--18, Mar. 2024. (in Japanese, PDF)
  8. Miyu Okamoto, Kentaro Seki, Shinnosuke Takamichi, Yuki Saito, and Takayuki Itoh, "ImTTS: Multi-speaker text-to-speech system with visualization of impression estimation," NICOGRAPH 2023, 2 pages, P-9, Dec. 2023. (in Japanese, Peer Reviewed)
  9. Yuki Saito, Shinnosuke Takamichi, Eiji Iimori, Kentaro Tachibana, and Hiroshi Saruwatari, "ChatGPT-EDSS: Acoustic modeling for empathetic dialogue speech synthesis using ChatGPT-derived context word embeddings," IPSJ SIG Technical Report, 2023-SLP-147, No. 6, pp. 1--6, Jun. 2023. (in Japanese, PDF, Poster) (2023 Otogaku Symposium Best Presentation Award)
  10. Junichi Kumada, Yuki Saito, Shinnosuke Takamichi, Aya Watanabe, Naoko Tanji, Mizuki Nagano, Yusuke Ijima, and Hiroshi Saruwatari, "Analysis and evaluation towards sleep-inducing voice synthesis," IPSJ SIG Technical Report, 2023-SLP-147, No. 5, pp. 1--5, Jun. 2023. (in Japanese, PDF, Poster)
  11. Aya Watanabe, Shinnosuke Takamichi, Yuki Saito, and Hiroshi Saruwatari, "In-the-wild sentence data collection method towards voice characteristic control by free-form text script," IEICE Technical Report, NLC2022-29, Vol. 122, No. 449, pp.55--60, Mar. 2023. (in Japanese, PDF)
  12. Yuki Saito, Eiji Iimori, Shinnosuke Takamichi, Kentaro Tachibana, and Hiroshi Saruwatari, "Corpus construction towards multi-domain empathetic dialogue speech synthesis," SPEASIP Workshop Short Oral Presentation, Vol. 122, No. 389, Mar. 2023. (in Japanese, Slide)
  13. Ryunosuke Hirai, Yuki Saito, and Hiroshi Saruwatari, "Fed-StarGANv2-VC: many-to-many voice conversion based on federated learning," IPSJ SIG Technical Report, 2023-SLP-146, No. 11, pp. 1--6, Mar. 2023. (in Japanese, PDF, Slide) (2023 IPSJ SIG-SLP Best Student Paper Award (Fairy Devices Award))
  14. Yuki Saito and Hiroshi Sato, "Report on Participation in Interspeech2022," IPSJ SIG Technical Report, 2022-SLP-144, No. 14, p. 1, Nov. 2022. (in Japanese, Slide)
  15. Yuto Nishimura, Yuki Saito, Shinnosuke Takamichi, Kentaro Tachibana, and Hiroshi Saruwatari, "Empathetic dialogue synthesis considering textual and prosodic information of dialogue history," IPSJ SIG Technical Report, 2022-SLP-140, No. 16, pp. 1--6, Mar. 2022. (in Japanese, PDF, Speech samples, Slide)
  16. Yusuke Nakai, Kenta Udagawa, Yuki Saito, and Hiroshi Saruwatari, "Training algorithm for multi-speaker TTS considering adversarial regularizer," IEICE Technical Report, SP2021-57, Vol. 121, No. 385, pp. 50--55, Mar. 2022. (in Japanese, PDF, Speech samples, Slide)
  17. Wataru Nakata, Tomoki Koriyama, Shinnosuke Takamichi, Yuki Saito, Yusuke Ijima, Ryo Masumura, and Hiroshi Saruwatari, "Multi-speaker audiobook speech synthesis using discrete character acting styles acquired by VQVAE," IEICE Technical Report, SP2021-47, Vol. 121, No. 282, pp. 42--47, Dec. 2021. (in Japanese, PDF, Slide, Speech samples)
  18. Kazuki Fujii, Yuki Saito, and Hiroshi Saruwatari, "Japanese non-augoregressive end-to-end text-to-speech synthesis conditioned by prosodic information," IPSJ SIG Technical Report, 2021-SLP-138, No. 16, pp. 1--6, Oct. 2021. (in Japanese, PDF, Slide)
  19. Kenta Udagawa, Yuki Saito, and Hiroshi Saruwatari, "Speech synthesis adaptation based on human speech perception feedback," IEICE Technical Report, SP2021-33, Vol. 121, No. 202, pp. 46--51, Oct. 2021. (in Japanese, PDF, Slide, Speech samples)
  20. Masaki Kurata, Shinnosuke Takamichi, Takaaki Saeki, Riku Arakawa, Yuki Saito, Keita Higuchi, and Hiroshi Saruwatari, "A method for obtaining speaking characteristics based on real-time DNN-based voice conversion feedback," IPSJ SIG Technical Report, 2021-SLP-136, No. 31, pp. 1--6, Mar. 2021. (in Japanese, PDF, Slide)
  21. Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari, "Active learning for DNN-based speaker embedding considering subjective inter-speaker similarity," IPSJ SIG Technical Report, 2021-SLP-136, No. 30, pp. 1--6, Mar. 2021. (in Japanese, 2021 IPSJ SIG-SLP Best Student Paper Award (Yahoo! JAPAN Award)) (PDF, Slide)
  22. Kazuki Fujii, Yuki Saito, Shinnosuke Takamichi, Yukino Baba, and Hiroshi Saruwatari, "HumanGAN: generative adversarial networks based on human perception evaluation and its application in speech naturalness modeling," IEICE Technical Report, SP2020-06, Vol. 120, No. 57, pp. 15--20, June 2020. (in Japanese, Student Poster Award) (PDF)
  23. Satoshi Naitou, Yuki Saito, Shinnosuke Takamichi, Yasuyuki Saito, and Hiroshi Saruwatari, "Automatic estimation of breath position for singing VOCALOID song," IPSJ SIG Technical Report, 2020-MUS-127, No. 33, pp. 1--6, June 2020. (in Japanese, PDF)
  24. Yuki Yamashita, Tomoki Koriyama, Yuki Saito, Shinnosuke Takamichi, Yusuke Ijima, Ryo Masumura, and Hiroshi Saruwatari, "The effectiveness of additional context in DNN-based spontaneous speech synthesis," IEICE Technical Report, SP2019-61, Vol. 119, No. 441, pp. 65--70, Mar. 2020. (in Japanese, PDF)
  25. Takaaki Saeki, Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari, "Lifter training and sub-band modeling for DNN-based voice conversion using spectral differentials," IPSJ SIG Technical Report, 2020-SLP-131, No. 2, pp. 1--6, Feb. 2020. (in Japanese, PDF, Slide)
  26. Kazuki Fujii, Yuki Saito, Shinnosuke Takamichi, Yukino Baba, and Hiroshi Saruwatari, "HumanGAN: generative adversarial networks trained with human perception evaluation," Information-based Induction Sciences (IBIS) Workshop 2019, 2-037, Nov. 2019. (in Japanese, Poster)
  27. Taiki Nakamura, Yuki Saito, Shinnosuke Takamichi, Yusuke Ijima, and Hiroshi Saruwatari, "Speaker V2S attack: statistical voice conversion built from speaker verification and its evaluation on speaker spoofing attack," Computer Security Symposium (CSS) 2019, 2E1-2, pp. 697--703, Oct. 2019. (in Japanese, PDF, Slide)
  28. Shinnosuke Takamichi, Kentaro Mitsui, Yuki Saito, Tomoki Koriyama, Naoko Tanji, and Hiroshi Saruwatari, "JVS corpus: online available Japanese versatile speech corpus," IPSJ SIG Technical Report, 2019-SLP-129, No. 1, pp. 1--6, Oct. 2019. (in Japanese, PDF, Slide)
  29. Hiroki Tamaru, Yuki Saito, Shinnosuke Takamichi, Tomoki Koriyama, and Hiroshi Saruwatari, "Generative moment matching network-based random modulation post-filter for singing voices synthesized using DNNs and its application to neural double-tracking," IPSJ SIG Technical Report, 2018-SLP-125, No. 1, pp. 1--6, Dec. 2018. (in Japanese, PDF, Slide)
  30. Satoshi Mizoguchi, Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari, "Evaluation of DNN-based low-musical-noise speech enhancement using kurtosis matching," IEICE Technical Report, EA2018-66, Vol. 118, No. 312, pp. 19--24, Nov. 2018. (in Japanese, PDF, Poster)
  31. Shinnosuke Takamichi, Yuki Saito, Norihiro Takamune, Daichi Kitamura, and Hiroshi Saruwatari, "Phase reconstruction from amplitude spectra based on von Mises distribution DNN," IPSJ SIG Technical Report, 2018-SLP-122, No. 1, pp. 1--6, June 2018. (in Japanese, 2018 Otogaku Symposium Best Presentation Award, IPSJ Yamashita SIG Research Award) (PDF, Poster)
  32. Yuki Saito, Yusuke Ijima, Kyosuke Nishida, and Shinnosuke Takamichi, "Non-parallel and many-to-many voice conversion using variational autoencoder conditioned by phonetic posteriorgrams and d-vectors," IEICE Technical Report, SP2017-88, Vol. 117, No. 517, pp. 21--26, Mar. 2018. (in Japanese, 2017 IEICE ISS Young Researcher's Award in Speech Field) (PDF, Slide)
  33. Masakazu Une, Yuki Saito, Shinnosuke Takamichi, Daichi Kitamura, Ryoichi Miyazaki and Hiroshi Saruwatari, "Generative adversarial training of the noise generation model for speech synthesis using speech in noise," IPSJ SIG Technical Report, 2017-SLP-118, No. 1, pp. 1--6, Oct. 2017. (in Japanese, PDF, Slide)
  34. Hiroyuki Miyoshi, Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari, "Voice conversion using sequence-to-sequence learning of context posterior probabilities and evaluation of the dual learning," IEICE Technical Report, SP2017-16, Vol. 117, No. 160, pp. 9--14, Jul. 2017. (in Japanese, PDF, Slide)
  35. Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari, "Training algorithm to deceive anti-spoofing verification for DNN-based text-to-speech synthesis," IPSJ SIG Technical Report, 2017-SLP-115, No. 1, pp. 1--6, Feb. 2017. (in Japanese, PDF, Slide)
  36. Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari, "Evaluation of DNN-based voice conversion deceiving anti-spoofing verification," IEICE Technical Report, SP2016-69, Vol. 116, No. 414, pp. 29--34, Jan. 2017. (in Japanese, Student Poster Award) (PDF, Poster)

Domestic Conferences

  1. Yuki Okamoto, Ryotaro Nagase, Nami Okamoto, Yuki Saito, Takahiro Fukumori, Yoichi Yamashita, "Construction and analysis of impression caption datasets for environmental sounds," Proc. ASJ, Autumn meeting, 3-Q-23, pp. xxx--xxx, Sep. 2024. (in Japanese, PDF, Poster)
  2. Kaito Baba, Wataru Nakata, Yuki Saito, Hiroshi Saruwatari, "UTMOSv2: Integrating spectrogram and SSL features for naturalness MOS prediction," Proc. ASJ, Autumn meeting, 3-6-4, pp. xxx--xxx, Sep. 2024. (in Japanese, PDF, Slide)
  3. Wataru Nakata, Kentaro Seki, Hitomi Yanaka, Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari, "Japanese large scale dialogue corpus for human-AI talks," Proc. ASJ, Autumn meeting, 3-6-3, pp. xxx--xxx, Sep. 2024. (in Japanese, PDF, Slide)
  4. Yuto Ishikawa, Osamu Take, Tomohiko Nakamura, Norihiro Takamune, Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari, "Lombard-effect-mimicking processing for speech synthesis using real-time diffuse noise estimation in human-avatar dialogue systems," Proc. ASJ, Autumn meeting, 1-11-10, pp. xxx--xxx, Sep. 2024. (in Japanese, PDF, Slide)
  5. Ryo Ogawa, Yuki Yonekura, Nobutaka Ito, Norihiro Takamune, Kouei Yamaoka, Yuki Saito, Hiroshi Saruwatari, "Monaural speech enhancement based on semi-supervised deep learning using positive-negative-unlabeled machine learning," Proc. ASJ, Autumn meeting, 1-11-7, pp. xxx--xxx, Sep. 2024. (in Japanese, PDF, Slide)
  6. Aya Watanabe, Shinnosuke Takamichi, Yuki Saito, Wataru Nakata, Detai Xin, and Hiroshi Saruwatari, "Latent representation for pairs of speech and speech characteristics description trained by contrastive learning," Proc. ASJ, Spring meeting, 2-P-11, pp. 973--976, Mar. 2024. (in Japanese, PDF, Poster)
  7. Kazuki Yamauchi, Yusuke Ijima, Yuki Saito "StyleCap: Automatic speaking-style captioning from speech using speech and language self-supervised learning models," Proc. ASJ, Spring meeting, 3-2-14, pp. 843--846, Mar. 2024. (in Japanese, Student Presentation Award) (PDF, Slide)
  8. Wataru Nakata, Takaaki Saeki, Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari, "NecoBERT: Self-supervised learning of speech representation for speech synthesis," Proc. ASJ, Spring meeting, 1-Q-37, pp. 927--930, Mar. 2024. (in Japanese, PDF, Poster)
  9. Xuan Luo, Shinnosuke Takamichi, Yuki Saito, and Hiroshi Saruwatari, "Emotion-controllable speech synthesis using emotion soft label and word-level prominence," Proc. ASJ, Spring meeting, 1-2-8, pp. 777--780, Mar. 2024. (in Japanese, PDF, Slide)
  10. Aya Watanabe, Shinnosuke Takamichi, Yuki Saito, Detai Xin, and Hiroshi Saruwatari, "Coco-Nut: Corpus of connecting Nihongo utterance and text toward prompt-based control of voice characteristics," Proc. ASJ, Autumn meeting, 3-9-3, pp. 1133--1136, Sep. 2023. (in Japanese, Student Presentation Award) (PDF, Poster)
  11. Kazuki Yamauchi, Yuki Saito, and Hiroshi Saruwatari, "Investigation of dialect speech synthesis using a TTS model that can predict and control accent latent variable," Proc. ASJ, Autumn meeting, 2-Q-30, pp. 1255--1256, Sep. 2023. (in Japanese, PDF, Poster)
  12. Kota Iura, Yuki Saito, and Hiroshi Saruwatari, "Analysis and synthesis of commentary audio for competitive game videos," Proc. ASJ, Autumn meeting, 2-Q-30, pp. 1247--1248, Sep. 2023. (in Japanese, PDF, Poster)
  13. Wataru Nakata, Takaaki Saeki, Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari, "An empirical study of self-supervised learning model features for speech waveform reconstruction," Proc. ASJ, Autumn meeting, 2-Q-29, pp. 1243--1246, Sep. 2023. (in Japanese, PDF, Poster)
  14. Aya Watanabe, Shinnosuke Takamichi, Yuki Saito, Detai Xin, and Hiroshi Saruwatari, "Generation of mid-attribute non-existent speakers by Gaussian mixture model interpolation based on optimal-transport," Proc. ASJ, Spring meeting, 2-3Q-8, pp. 899--902 , Mar. 2023. (in Japanese, PDF)
  15. Dong Yang, Tomoki Koriyama, Yuki Saito, Takaaki Saeki, Detai Xin, and Hiroshi Saruwatari, "More differentiated pause insertion for phoneme-based multi-speaker TTS model," Proc. ASJ, Spring meeting, 2-3P-9, pp. 867--870 , Mar. 2023. (PDF)
  16. Sora Harada, Wataru Nakata, Shinnosuke Takamichi, Yuki Saito, Yasuyuki Saito, and Hiroshi Saruwatari, "Analysis of urgency of evacuation announcement speech and its application to text-to-speech," Proc. ASJ, Autumn meeting, 2-Q-41, pp. 1283--1286, Sep. 2022. (in Japanese, PDF, Poster)
  17. Kazuki Fujii, Yuki Saito, and Hiroshi Saruwatari, "Investigation of adaptive end-to-end text-to-speech synthesis based on error correction feedback from humans," Proc. ASJ, Autumn meeting, 2-Q-36, pp. 1265--1268, Sep. 2022. (in Japanese, PDF, Poster)
  18. Wataru Nakata, Tomoki Koriyama, Shinnosuke Takamichi, Yuki Saito, Yusuke Ijima, Ryo Masumura, and Hiroshi Saruwatari, "Audiobook speech synthesis based on character embedding for distinguishable character acting," Proc. ASJ, Spring meeting, 3-3-1, pp. 965--968, Mar. 2022. (in Japanese, PDF)
  19. Yuki Saito, Yuto Nishimura, Shinnosuke Takamichi, Kentaro Tachibana, Hiroshi Saruwatari, " STUDIES: A Japanese empathetic dialogue speech corpus towards expressive speech synthesis," Proc. ASJ, Spring meeting, 2-3P-15, pp. 1133--1136, Mar. 2022. (in Japanese, PDF, Speech samples)
  20. Yuki Saito, Kohei Yatabe, and Shogun, "Classification of SSBU characters from sounds generated by a controller while playing," Proc. ASJ, Spring meeting, 1-1Q-8, pp. 351--352, Mar. 2022. (in Japanese, PDF)
  21. Kenta Udagawa, Yuki Saito, and Hiroshi Saruwatari, "Investigating on search method for speaker adaptation of speech synthesis using human perceptual evaluation as feedback," Proc. ASJ, Spring meeting, 1-3-16, pp. 927--930, Mar. 2022. (in Japanese, PDF)
  22. Xuan Luo, Shinnosuke Takamichi, Tomoki Koriyama, Yuki Saito, and Hiroshi Saruwatari, "Controllable text-to-speech synthesis using prosodic features and emotion soft-label," Proc. ASJ, Autumn meeting, 3-3-21, pp. 985--988, Sep. 2021. (PDF)
  23. Yuki Saito and Hiroshi Saruwatari, "Investigation of effects caused by catastrophic forgetting in continual learning of end-to-end text-to-speech synthesis," Proc. ASJ, Autumn meeting, 1-3Q-8, pp. 1069--1072, Sep. 2021. (in Japanese, PDF)
  24. Detai Xin, Yuki Saito, Shinnosuke Takamichi, Tomoki Koriyama, and Hiroshi Saruwatari, "Cross-lingual speaker adaptation using domain adaptation and speaker consistency loss for text-to-speech synthesis," Proc. ASJ, Autumn meeting, 1-3Q-8, pp. 1049--1052, Sep. 2021. (PDF)
  25. Yota Ueda,Kazuki Fujii,Yuki Saito,Shinnosuke Takamichi,Yukino Baba,and Hiroshi Saruwatari, "HumanACGAN: conditional generative adversarial network using human-based auxiliary classifier and its evaluation in representing conditional distribution of phoneme perception," Proc. ASJ, Spring meeting, 1-2-14, pp. 819--822, Mar. 2021. (in Japanese, PDF)
  26. Takaaki Saeki, Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari, "Implementation and evaluation of real-time full-band DNN-based Voice Conversion based on sub-band filtering," Proc. ASJ, Autumn meeting, 1-2-11, pp. 715--718, Sep. 2020. (in Japanese, Slide, PDF)
  27. Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari, "DNN-based speaker embedding using graph embedding of subjective inter-speaker similarity," Proc. ASJ, Autumn meeting, 1-2-4, pp. 697--698, Sep. 2020. (in Japanese, Awaya Prize Young Researcher Award of ASJ) (PDF, Slide)
  28. Kazuki Fujii, Yuki Saito, Shinnosuke Takamichi, Yukino Baba, and Hiroshi Saruwatari, "HumanGAN: generative adversarial network with human-based discriminator and its evaluation in naturalness modeling of speech," Proc. ASJ, Spring meeting, 3-P-40, pp. 1181--1184, Mar. 2020. (in Japanese, PDF)
  29. Shinnosuke Takamichi, Kai Onuma, Taku Kaneda, Takashi Kaneda, Yuki Saito, Tomoki Koriyama, and Hiroshi Saruwatari, "Crowdsourcing-based parameter optimization for frequency warping-based speaker anonymization," Proc. ASJ, Spring meeting, 3-P-31, pp. 1159--1162, Mar. 2020. (in Japanese, PDF)
  30. Shunsuke Goto, Kotaro Ohnishi, Yuki Saito, Kentaro Tachibana, and Koichiro Mori, "Multi-speaker text-to-speech synthesis using an embedding vector based on a face image," Proc. ASJ, Spring meeting, 2-Q-49, pp. 1141--1144, Mar. 2020. (in Japanese, PDF, poster)
  31. Takaaki Saeki, Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari, "Sub-band lifter-training method for full-band voice conversion using spectral differentials," Proc. ASJ, Spring meeting, 2-2-5, pp. 1085--1088, Mar. 2020. (in Japanese, PDF)
  32. Shinnosuke Takamichi, Yuki Saito, Tomohiko Nakamura, Tomoki Koriyama, and Hiroshi Saruwatari, "manga2voice: speech analysis towards audio synthesis from comic image," Proc. ASJ, Spring meeting, 1-2-15, pp. 1065--1068, Mar. 2020. (in Japanese, PDF)
  33. Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari, "SMASH corpus: spontaneous speech corpus of audio commentaries on gameplay," Proc. ASJ, Spring meeting, 1-2-14, pp. 1061--1064, Mar. 2020. (in Japanese, PDF, Slide)
  34. Satoshi Naitou, Yuki Saito, Shinnosuke Takamichi, Yasuyuki Saito, and Hiroshi Saruwatari, "Estimation of breath position for VOCALOID song sung by user," Proc. ASJ, Spring meeting, 1-2-12, pp. 1057--1058, Mar. 2020. (in Japanese, PDF)
  35. Yuki Saito, Kei Akuzawa, and Kentaro Tachibana, "Joint adversarial training algorithm of speech recognition and synthesis models for many-to-one voice conversion using phonetic posteriorgrams," Proc. ASJ, Autumn meeting, 2-4-2, pp. 963--966, Sep. 2019. (in Japanese, PDF, Slide)
  36. Takaaki Saeki, Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari, "Filter estimation for computational complexity reduction of DNN-based voice conversion using spectral differentials," Proc. ASJ, Autumn meeting, 2-4-1, pp. 961--962, Sep. 2019. (in Japanese, PDF, Slide)
  37. Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari, "Evaluation of DNN-based multi-speaker speech synthesis using DNN-based speaker embedding considering subjective inter-speaker similarity," Proc. ASJ, Autumn meeting, 1-P-18, pp. 999--1002, Sep. 2019. (in Japanese, PDF, Poster)
  38. Hiroki Tamaru, Yuki Saito, Shinnosuke Takamichi, Tomoki Koriyama, and Hiroshi Saruwatari, "Neural double-tracking based on generative moment matching network for users' singing," Proc. ASJ, Autumn meeting, 1-4-2, pp. 935--938, Sep. 2019. (in Japanese, PDF, Slide)
  39. Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari, "DNN-based speaker embedding considering subjective inter-speaker similarity towards DNN-based speech synthesis," Proc. ASJ, Spring meeting, 3-10-7, pp. 1067--1070, Mar. 2019. (in Japanese, PDF, Slide)
  40. Taiki Nakamura, Yuki Saito, Kyosuke Nishida, Yusuke Ijima, and Shinnosuke Takamichi, "Evaluation of VAE-based non-parallel and many-to-many voice conversion conditioned by phonetic posteriorgrams and d-vectors in terms of training data and dimensionality of d-vectors," Proc. ASJ, Spring meeting, 2-P-30, pp. 1149--1150, Mar. 2019. (in Japanese, PDF, Poster)
  41. Hiroki Tamaru, Yuki Saito, Shinnosuke Takamichi, Tomoki Koriyama, and Hiroshi Saruwatari, "Generative moment matching network-based random modulation post-filter for singing voices and its application to double-tracking," Proc. ASJ, Spring meeting, 2-10-5, pp. 1035--1038, Mar. 2019. (in Japanese, IEEE Signal Processing Society Tokyo Joint Chapter Student Award) (PDF)
  42. Satoshi Mizoguchi, Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari, "Low-musical-noise DNN-based speech enhancement applied to noise with various kurtosis," Proc. ASJ, Spring meeting, 1-6-6, pp. 185--188, Mar. 2019. (in Japanese, PDF)
  43. Shinnosuke Takamichi, Yuki Saito, Norihiro Takamune, Daichi Kitamura, and Hiroshi Saruwatari, "Phase reconstruction from amplitude spectrograms based on directional-statistics DNNs," Proc. ASJ, Autumn meeting, 2-4-2, pp. 1127--1130, Sep. 2018. (in Japanese, PDF, Slide)
  44. Satoshi Mizoguchi, Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari, "Low-musical-noise speech enhancement based on DNNs and kurtosis matching," Proc. ASJ, Autumn meeting, 2-1-7, pp. 177--180, Sep. 2018. (in Japanese, Student Presentation Award) (PDF, Slide)
  45. Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari, "Adversarial DNN-based speech synthesis using multi-frequency resolution STFT spectra," Proc. ASJ, Spring meeting, 3-8-14, pp. 259--262, Mar. 2018. (in Japanese, PDF, Slide)
  46. Masakazu Une, Yuki Saito, Shinnosuke Takamichi, Daichi Kitamura, Ryoichi Miyazaki and Hiroshi Saruwatari, "Generative approach using the noise generation models for DNN-based speech synthesis trained from noisy speech," Proc. ASJ, Spring meeting, 3-8-8, pp. 243--244, Mar. 2018. (in Japanese, PDF, Slide)
  47. Shinnosuke Takamichi, Tomoki Koriyama, Yuki Saito, and Hiroshi Saruwatari, "Evaluation of inter-utterance variation in speech synthesis based on moment-matching networks," Proc. ASJ, Autumn meeting, 1-8-9, pp. 195--196, Sep. 2017. (in Japanese, PDF, Slide)
  48. Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari, "Experimental investigation of divergences in adversarial DNN-based speech synthesis," Proc. ASJ, Autumn meeting, 1-8-7, pp. 189--192, Sep. 2017. (in Japanese, PDF, Slide)
  49. Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari, "F0 contour and duration generation for adversarial DNN-based speech synthesis," Proc. ASJ, Spring meeting, 2-6-6, pp. 257--258, Mar. 2017. (in Japanese, PDF, Slide)
  50. Hiroyuki Miyoshi, Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari, "Voice conversion using sequence-to-sequence learning of context posterior probabilities," Proc. ASJ, Spring meeting, 1-6-15, pp. 237--238, Mar. 2017. (in Japanese, PDF, Slide)
  51. Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari, "Adversarial DNN-based voice conversion based on spectral differentials using highway networks," Proc. ASJ, Spring meeting, 1-6-14, pp. 235--236, Mar. 2017. (in Japanese, IEEE Signal Processing Society Tokyo Joint Chapter Student Award) (PDF, Slide)
  52. Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari, "Training algorithm considering anti-spoofing verification for DNN-based speech synthesis," Proc. ASJ, Autumn meeting, 3-5-1, pp. 149--150, Sep. 2016. (in Japanese, Student Presentation Award) (PDF, Slide)

Dissertations

  1. Yuki Saito (adviser: Professor Hiroshi Saruwatari), "Statistical speech synthesis based on human's speech information processing abilities," Ph.D. Thesis, Graduate School of Information Science and Technology, the University of Tokyo, 2021. (Dean's Award) (PDF, Slide)
  2. Yuki Saito (supervisor: Professor Hiroshi Saruwatari), "High-quality statistical parametric speech synthesis using generative adversarial networks," M.S. Thesis, Graduate School of Information Science and Technology, the University of Tokyo, 2018. (PDF, Slide)


Competitive Funds:

  1. Google-initiated Research Grant, 30,000 USD, Nov. 2023--Oct. 2024. (Representative: Yuki Saito)
  2. Japan Science and Technology Agency, ACT-X. 4,500,000 JPY, Oct. 2023--Mar. 2026. (Representative: Yuki Saito)
  3. Travel Grant Award for INTERSPEECH2023, 750 EUR, Aug. 2023.
  4. Research Grant (S) from Tateisi Science and Technology Foundation, 30,000,000 JPY, Apr. 2023--Mar. 2026. (Representative: Hiroshi Saruwatari)
  5. Grant-in-Aid for Young Scientists, Japan Society of the Promotion of Science (JSPS), 3,600,000 JPY, Apr. 2022--Mar. 2025. (Representative: Yuki Saito)
  6. Research Grant (A) from Tateisi Science and Technology Foundation, 2,200,000 JPY, Apr. 2022--Mar. 2023. (Representative: Yuki Saito)
  7. Grant-in-Aid for Research Activity Start-up, Japan Society of the Promotion of Science (JSPS), 2,400,000 JPY, Sep. 2021--Mar. 2023. (Representative: Yuki Saito)
  8. KIOXIA Incentive Research, 1,000,000 JPY, Jun. 2021--Mar. 2022. (Representative: Yuki Saito)
  9. Grant-in-Aid for JSPS Fellows, the Japan Society of the Promotion of Science (JSPS), 2,500,000 JPY, May 2018--Mar. 2021. (Representative: Yuki Saito)
  10. Grants for Researchers Attending International Conferences from NEC C&C, 250,000 JPY, Apr. 2018.

Awards:

  1. Winners of The INTERSPEECH2024 Discrete Speech Challenge (TTS Track), Sep. 2024.
  2. 2024 IPSJ Yamashita SIG Research Award, Jul. 2024.
  3. The 40th Inoue Research Award for Young Scientists, Feb. 2024.
  4. Travel Grant Award for INTERSPEECH2023, Aug. 2023.
  5. 2023 Otogaku Symposium Best Presentation Award, Jun. 2023.
  6. The 22nd Funai Information Technology Award for Young Researchers, May 2023.
  7. 2021 IEICE Journal Paper Award, Jun. 2022.
  8. 2021 IPSJ SIG-SLP Best Student Paper Award (Yahoo! JAPAN Award), Mar. 2022.
  9. 2020 IEEE SPS Young Author Best Paper Award, Jun. 2021.
  10. Dean's Award, Graduate School of Information Science and Technology, The University of Tokyo, Mar. 2021.
  11. The 49th Awaya Prize Young Researcher Award of ASJ, Mar. 2021.
  12. Outstanding Paper Award for Young C&C Researchers, Jan. 2019.
  13. The 12th IEEE Signal Processing Society Japan Student Journal Paper Award, Nov. 2018.
  14. 2017 IEICE ISS Young Researcher's Award in Speech Field, Aug. 2018.
  15. Partial Exemption from Repayment of Scholarship Loan for Students with Outstanding Results, Japan Student Services Organization (JASSO), May 2018.
  16. The 34th TELECOM System Technology Award for Students from TAF, Mar. 2018.
  17. The 1st IEEE Signal Processing Society Tokyo Joint Chapter Student Award, Nov. 2017.
  18. Spoken Language Processing Student Grant of ICASSP, Mar. 2017.
  19. 2017 IEICE ISS Student Poster Award, Jan. 2017.
  20. The 14th Best Student Presentation Award of ASJ, Mar. 2017.
  21. Graduation Research Award, Advanced Course of Electronic and Information Systems Engineering, National Institute of Technology, Kushiro College, Feb. 2016.
  22. Dean's Award, Department of Information Engineering, National Institute of Technology, Kushiro College, Mar. 2014.

Co-author's Awards:

  1. YANS2024 IVRy Award, Sep. 2024. (Awardee: Taisei Takano)
  2. The 28th Best Student Presentation Award of ASJ, Sep. 2024. (Awardee: Kazuki Yamauchi)
  3. Shortlisted for the ISCA Best Student Paper Award 2024, Aug. 2024. (Awardee: Dong Yang)
  4. 2024 Otogaku Symposium Best Presentation Award, Jun. 2024. (Awardee: Kazuki Yamauchi)
  5. 2024 IEICE ISS Student Poster Award, Mar. 2024. (Awardee: Kazuki Yamauchi)
  6. 2023 IPSJ SIG-SLP Best Student Paper Award (Fairy Devices Award), Mar. 2024 (Awardee: Ryunosuke Hirai)
  7. The 27th Best Student Presentation Award of ASJ, Mar. 2014. (Awardee: Aya Watanabe)
  8. Google Travel Grants for Students in East Asia, Jul. 2022. (Awardee: Yuto Nishimura)
  9. National Institute of Technology Student Award, Mar. 2021. (Awardee: Kazuki Fujii)
  10. IPSJ SIG-MUS/SLP Student Poster Award, June 2020. (Awardee: Kazuki Fujii)
  11. FujiSankei Business i Awards, June 2020. (Awardee: Kazuki Fujii)
  12. IPSJ Yamashita SIG Research Award, Mar. 2020. (Awardee: Shinnosuke Takamichi)
  13. The 3rd IEEE Signal Processing Society Tokyo Joint Chapter Student Award, Dec. 2019. (Awardee: Hiroki Tamaru)
  14. The 18th Best Student Presentation Award of ASJ, Mar. 2019. (Awardee: Satoshi Mizoguchi)
  15. 2018 Otogaku Symposium Best Presentation Award, June 2018. (Awardee: Shinnosuke Takamichi)

Reviews:

  1. Paper Reviews for Information Fusion (from 2024)
  2. Paper Reviews for Acoustical Science and Technology (from 2024)
  3. Paper Reviews for Computer Speech and Language (from 2023)
  4. Paper Reviews for Journal of Audio Engineering Society (from 2022)
  5. Paper Reviews for IEICE Transactions on Information and Systems (from 2022)
  6. Paper Reviews for Journal of Information Processing (from 2022)
  7. Paper Reviews for APSIPA Transactions on Signal and Information Processing (from 2021)
  8. Paper Reviews for EURASIP Journal on Audio Speech and Music Processing (from 2021)
  9. Paper Reviews for INTERSPEECH (from 2021)
  10. Paper Reviews for IEEE Access (from 2021)
  11. Paper Reviews for IEEE/ACM Transactions on Audio, Speech, and Language Processing (from 2020)
  12. Paper Reviews for IEEE MLSP (from 2019)
  13. Paper Reviews for IEEE Signal Processing Letter (from 2018)
  14. Paper Reviews for IEEE ICASSP (from 2018)

Research and Work Experiences:

  1. Lecturer of The University of Tokyo, Japan. Apr. 1, 2024--XX. (Lab. page)
  2. Assistant Professor of The University of Tokyo, Japan. Apr. 1, 2023--Mar. 31, 2024. (Lab. page)
  3. Project Research Associate of The University of Tokyo, Japan. Apr. 1, 2021--Mar. 31, 2023. ("Research and Development on Acoustic Information Processing and Voice Conversion," Moonshot Research & Development Program of Japan Science and Technology Agency, Representative: Hiroshi Saruwatari) (Project)
  4. Research assistant of The University of Tokyo, Japan. Apr. 1, 2019--Mar. 31, 2021. ("Stress-free, real-time, and full-band voice conversion based on perceptual models," executed under the Commissioned Research of MIC SCOPE 182103104, Representative: Shinnosuke Takamichi) (Project)
  5. Short-time researcher in DeNA Co., Ltd., Japan, Oct. 1, 2018--Mar. 31, 2019 & June 1, 2019--Mar. 31, 2020. (Mentor: Kentaro Tachibana)
  6. Research fellow (DC1) of Japan Society for the Promotion of Science, Japan, Apr. 1, 2018--Mar. 31, 2021. ("Active speech synthesis based on listener perceptual modeling," JSPS KAKENHI 18J22090, Representative: Yuki Saito) (KAKEN) (Project)
  7. Short-time researcher in NTT Media Intelligence Laboratories, NTT Corporation, Japan, Aug. 30, 2017--Oct. 31, 2017. (Mentor: Yusuke Ijima)
  8. Short-time researcher in NTT Communication Science Laboratories, NTT Corporation, Japan, Aug. 8, 2016--Sep. 9, 2016. (Mentor: Hirokazu Kameoka)

Academic Activities:

  1. Session Chair for INTERSPEECH (from 2024)
  2. Session Vice-Chair for ASJ Meeting (from 2023)
  3. Board member of IEICE Technical Committee on Speech (SP) (from Apr. 2024 to Mar. 2026)
  4. Board member of IPSJ SIG-SLP Committee (from Apr. 2024 to Mar. 2026)
  5. Acoustical Society of Japan (ASJ) Students and Young Researchers Forum, Organizing member (from Mar. 2017) and Vice President (from Apr. 2019 to Mar. 2022)

Speech Corpora:

  1. Yuki Saito, Takuto Igarashi, Kentaro Seki, Shinnosuke Takamichi, Ryuichi Yamamoto, Kentaro Tachibana, and Hiroshi Saruwatari, "SRC4VC: Smartphone-Recorded Corpus for Voice Conversion," Jun. 2024. (URL)
  2. Aya Watanabe, Shinnosuke Takamichi, Yuki Saito, Detai Xin, and Hiroshi Saruwatari, "Coco-Nut: Corpus of Japanese utterance and voice characteristics description for prompt-based control," Nov. 2023. (URL)
  3. Detai Xin, Junfeng Jiang, Shinnosuke Takamichi, Yuki Saito, Akiko Aizawa, and Hiroshi Saruwatari, "JVNV: a Japanese emotional speech corpus with both verbal content and nonverbal vocalizations," Oct. 2023. (URL)
  4. Yuki Saito, Eiji Iimori, Shinnosuke Takamichi, Kentaro Tachibana, and Hiroshi Saruwatari, "STUDIES 2 (CALLS) Corpus: Complaint handling and Attentive Listening Lines Speech," Mar. 2023. (URL)
  5. Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari, "SMASH Corpus: A spontaneous speech corpus recording third-person audio commentaries on gameplay," Jul. 2022. (URL)
  6. Yuki Saito, Yuto Nishimura, Shinnosuke Takamichi, Kentaro Tachibana, and Hiroshi Saruwatari, "STUDIES Corpus: Japanese empathetic dialogue speech corpus," Mar. 2022. (URL, arXiv preprint)
  7. Shinnosuke Takamichi, Kentaro Mitsui, Yuki Saito, Tomoki Koriyama, Naoko Tanji, and Hiroshi Saruwatari, "JVS corpus: free Japanese multi-speaker voice corpus," Aug. 2019. (URL, arXiv preprint)

Invited / Visiting Talks:

  1. Yuki Saito, "Towards human-in-the-loop DNN-based speech synthesis technologies," Seminar by IEEE NZ Signal Processing / Information Theory Joint Chapter and Acoustics Research Center, the University of Auckland, Dec. 2022.
  2. Yuki Saito, "Towards human-in-the-loop speech synthesis technologies," Seminar by IEEE Systems, Man and Cybernetics Singapore Chapter, Chinese and Oriental Languages Information Processing Society Teochew Doctorate Society, Singapore, and Human Language Technology Lab., National University of Singapore, Aug. 2022.

Patents:

  1. Kentaro Tachibana, Yuki Saito, Kei Akuzawa, “SPEECH PROCESSING APPARATUS AND SPEECH PROCESSING PROGRAM," JP2020190605, Filled in May 21.
  2. Shinnosuke Takamichi, Yuki Saito, Takaaki Saeki, and Hiroshi Saruwatari, “VOICE CONVERSION DEVICE, VOICE CONVERSION METHOD, AND VOICE CONVERSION PROGRAM," JP2021032940, Filled in Aug. 19, 2019.
  3. Shinnosuke Takamichi, Yuki Saito, Takaaki Saeki, and Hiroshi Saruwatari, “VOICE CONVERSION DEVICE, VOICE CONVERSION METHOD, AND VOICE CONVERSION PROGRAM," PCT/JP2020/031122, Filled in Aug. 18, 2020.
  4. Shinnosuke Takamichi, Yuki Saito, Takaaki Saeki, and Hiroshi Saruwatari, “VOICE CONVERSION DEVICE, VOICE CONVERSION METHOD, AND VOICE CONVERSION PROGRAM," PCT/JP2021/004367, Filled in Feb. 5, 2021.


Lectures:

Education:

Misc.: