Skip to content

International Conferences & Workshops / 国際会議

Peer-Reviewed

  1. Shinnosuke Takamichi, Tomohiko Nakamura, Hitoshi Suda, Satoru Fukayama, and Jun Ogata, “MangaVox: Dataset of acted voices aligned with manga images towards computer understanding of audio comics,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, May 2026.
    bib
  2. Karl Schrader, Shoichi Koyama, Tomohiko Nakamura, and Mirco Pezzoli, “Phase-retrieval-based physics-informed neural networks for acoustic magnitude field reconstruction,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, May 2026.
    bib arXiv
  3. Kanami Imamura, Tomohiko Nakamura, Kohei Yatabe, and Hiroshi Saruwatari, “Dissecting performance degradation in audio source separation under sampling frequency mismatch,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, May 2026.
    bib arXiv
  4. Go Nishikawa, Wataru Nakata, Yuki Saito, Kanami Imamura, Hiroshi Saruwatari, and Tomohiko Nakamura, “Multi-sampling-frequency naturalness MOS prediction using self-supervised learning model with sampling-frequency-independent layer,” in Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop, Dec. 2025. (First and second authors contributed equally.)
    bib arXiv code
  5. Rinka Nobukawa, Makito Kitamura, Tomohiko Nakamura, Shinnosuke Takamichi, and Hiroshi Saruwatari, “Drum-to-vocal percussion sound conversion and its evaluation methodology,” in Proceedings of Asia Pacific Signal and Information Processing Association Annual Summit and Conference, Oct. 2025, pp. 198–203.
    bib arXiv
  6. Ryan Niu, Shoichi Koyama, and Tomohiko Nakamura, “Head-related transfer function individualization using anthropometric features and spatially independent latent representations,” in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, Oct. 2025.
    bib arXiv code
  7. Hitoshi Suda, Junya Koguchi, Shunsuke Yoshida, Tomohiko Nakamura, Fukayama Satoru, and Jun Ogata, “IdolSongsJp corpus: A multi-singer song corpus in the style of Japanese idol groups,” in Proceedings of International Society for Music Information Retrieval Conference, Sept. 2025, pp. 647–654.
    bib arXiv
  8. Kanami Imamura, Tomohiko Nakamura, Norihiro Takamune, Kohei Yatabe, and Hiroshi Saruwatari, “Local equivariance error-based metrics for evaluating sampling-frequency-independent property of neural network,” in Proceedings of European Signal Processing Conference, Sept. 2025, pp. 276–280.
    bib arXiv
  9. Aogu Wada, Tomohiko Nakamura, and Saruwatari Hiroshi, “Hyperbolic embeddings for order-aware classification of audio effect chains,” in Proceedings of International Conference on Digital Audio Effects, Sept. 2025, pp. 396–402.
    bib arXiv
  10. Tomohiko Nakamura, Kwanghee Choi, Keigo Hojo, Yoshiaki Bando, Satoru Fukayama, and Shinji Watanabe, “Discrete speech unit extraction via independent component analysis,” in Proceedings of SALMA: Speech and Audio Language Models - Architectures, Data Sources, and Training Paradigms, IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops, Apr. 2025.
    bib arXiv poster code
  11. Yuto Ishikawa, Osamu Take, Tomohiko Nakamura, Norihiro Takamune, Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari, “Real-time noise estimation for Lombard-effect speech synthesis in human–avatar dialogue systems,” in Proceedings of Asia Pacific Signal and Information Processing Association Annual Summit and Conference, Dec. 2024.
    bib paper
  12. Hiroaki Hyodo, Shinnosuke Takamichi, Tomohiko Nakamura, Junya Koguchi, and Hiroshi Saruwatari, “DNN-based ensemble singing voice synthesis with interactions between singers,” in Proceedings of IEEE Spoken Language Technology Workshop, Dec. 2024, pp. 660–667.
    bib arXiv code
  13. Hitoshi Suda, Shunsuke Yoshida, Tomohiko Nakamura, Fukayama Satoru, and Jun Ogata, “FruitsMusic: A real-world corpus of Japanese idol-group songs,” in Proceedings of International Society for Music Information Retrieval Conference, Nov. 2024.
    bib arXiv dataset
  14. Kwanghee Choi, Ankita Pasad, Tomohiko Nakamura, Satoru Fukayama, Karen Livescu, and Shinji Watanabe, “Self-supervised speech representations are more phonetic than semantic,” in Proceedings of INTERSPEECH, Sept. 2024, pp. 4578–4582.
    bib arXiv code
  15. Yoshiaki Bando, Tomohiko Nakamura, and Shinji Watanabe, “Neural blind source separation and diarization for distant speech recognition,” in Proceedings of INTERSPEECH, Sept. 2024, pp. 722–726.
    bib arXiv demo code
  16. Yuto Ishikawa, Kohei Konaka, Tomohiko Nakamura, Norihiro Takamune, and Hiroshi Saruwatari, “Real-time speech extraction using spatially regularized independent low-rank matrix analysis and rank-constrained spatial covariance matrix estimation,” in Proceedings of Hands-Free Speech Communication and Microphone Arrays, IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops, Apr. 2024, pp. 730–734.
    bib arXiv
  17. Kanami Imamura, Tomohiko Nakamura, Norihiro Takamune, Kohei Yatabe, and Hiroshi Saruwatari, “Algorithms of sampling-frequency-independent layers for non-integer strides,” in Proceedings of European Signal Processing Conference, Sept. 2023, pp. 326–330.
    bib arXiv
  18. Joonyong Park, Shinnosuke Takamichi, Tomohiko Nakamura, Kentaro Seki, Detai Xin, and Hiroshi Saruwatari, “How generative spoken language model encodes noisy speech: Investigation from phonetics to syntactics,” in Proceedings of INTERSPEECH, Aug. 2023, pp. 1085–1089.
    bib arXiv
  19. Tomohiko Nakamura, Shinnosuke Takamichi, Naoko Tanji, Satoru Fukayama, and Hiroshi Saruwatari, “jaCappella corpus: A Japanese a cappella vocal ensemble corpus,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, June 2023.
    bib arXiv demo code dataset
  20. Kota Arai, Yutaro Hirao, Takuji Narumi, Tomohiko Nakamura, Shinnosuke Takamichi, and Shigeo Yoshida, “TimToShape: Supporting practice of musical instruments by visualizing timbre with 2D shapes based on crossmodal correspondences,” in Proceedings of ACM Conference on Intelligent User Interfaces, Mar. 2023, pp. 850–865.
    bib blog
  21. Futa Nakashima, Tomohiko Nakamura, Norihiro Takamune, Satoru Fukayama, and Hiroshi Saruwatari, “Hyperbolic timbre embedding for musical instrument sound synthesis based on variational autoencoders,” in Proceedings of Asia Pacific Signal and Information Processing Association Annual Summit and Conference, Nov. 2022, pp. 736–743.
    bib arXiv
  22. Yuki Ito, Tomohiko Nakamura, Shoichi Koyama, and Hiroshi Saruwatari, “Head-related transfer function interpolation from spatially sparse measurements using autoencoder with source position conditioning,” in Proceedings of International Workshop on Acoustic Signal Enhancement, Sept. 2022.
    [Finalist of Best Student Paper Award of IWAENC 2022 (Yuki Ito)]
    bib arXiv slides demo code
  23. Kazuhide Shigemi, Shoichi Koyama, Tomohiko Nakamura, and Hiroshi Saruwatari, “Physics-informed convolutional neural network with bicubic spline interpolation for sound field estimation,” in Proceedings of International Workshop on Acoustic Signal Enhancement, Sept. 2022.
    bib arXiv
  24. Takaaki Saeki, Shinnosuke Takamichi, Tomohiko Nakamura, Naoko Tanji, and Hiroshi Saruwatari, “SelfRemaster: Self-supervised speech restoration with analysis-by-synthesis approach using channel modeling,” in Proceedings of INTERSPEECH, Sept. 2022, pp. 4406–4410.
    bib arXiv demo code
  25. Masaya Kawamura, Tomohiko Nakamura, Daichi Kitamura, Hiroshi Saruwatari, Yu Takahashi, and Kazunobu Kondo, “Differentiable digital signal processing mixture model for synthesis parameter extraction from mixture of harmonic sounds,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, May 2022, pp. 941–945.
    [IEEE Signal Processing Society Japan Student Conference Paper Award (Awardee: Masaya Kawamura) / 第16回 IEEE Signal Processing Society Japan Student Conference Paper Award(受賞者:川村 真也)]
    bib arXiv demo
  26. Takuya Hasumi, Tomohiko Nakamura, Norihiro Takamune, Hiroshi Saruwatari, Daichi Kitamura, Yu Takahashi, and Kazunobu Kondo, “Multichannel audio source separation with independent deeply learned matrix analysis using product of source models,” in Proceedings of Asia Pacific Signal and Information Processing Association Annual Summit and Conference, Dec. 2021, pp. 1226–1233.
    bib arXiv
  27. Sota Misawa, Norihiro Takamune, Tomohiko Nakamura, Daichi Kitamura, Hiroshi Saruwatari, Masakazu Une, and Shoji Makino, “Speech enhancement by noise self-supervised rank-constrained spatial covariance matrix estimation via independent deeply learned matrix analysis,” in Proceedings of Asia Pacific Signal and Information Processing Association Annual Summit and Conference, Dec. 2021, pp. 578–584.
    bib arXiv
  28. Yusaku Mizobuchi, Daichi Kitamura, Tomohiko Nakamura, Hiroshi Saruwatari, Yu Takahashi, and Kazunobu Kondo, “Prior distribution design for music bleeding-sound reduction based on nonnegative matrix factorization,” in Proceedings of Asia Pacific Signal and Information Processing Association Annual Summit and Conference, Dec. 2021, pp. 651–658.
    bib arXiv
  29. Koichi Saito, Tomohiko Nakamura, Kohei Yatabe, Yuma Koizumi, and Hiroshi Saruwatari, “Sampling-frequency-independent audio source separation using convolution layer based on impulse invariant method,” in Proceedings of European Signal Processing Conference, Aug. 2021, pp. 321–325.
    bib arXiv
  30. Naoki Narisawa, Rintaro Ikeshita, Norihiro Takamune, Daichi Kitamura, Tomohiko Nakamura, Hiroshi Saruwatari, and Tomohiro Nakatani, “Independent deeply learned tensor analysis for determined audio source separation,” in Proceedings of European Signal Processing Conference, Aug. 2021, pp. 326–330.
    bib arXiv
  31. Takuya Hasumi, Tomohiko Nakamura, Norihiro Takamune, Hiroshi Saruwatari, Daichi Kitamura, Yu Takahashi, and Kazunobu Kondo, “Empirical bayesian independent deeply learned matrix analysis for multichannel audio source separation,” in Proceedings of European Signal Processing Conference, Aug. 2021, pp. 331–335.
    bib arXiv
  32. Shihori Kozuka, Tomohiko Nakamura, and Hiroshi Saruwatari, “Investigation on wavelet basis function of DNN-based time domain audio source separation inspired by multiresolution analysis,” in Proceedings of International Congress and Exposition on Noise Control Engineering, Aug. 2020, pp. 4013–4022.
    bib
  33. Tomohiko Nakamura and Hiroshi Saruwatari, “Time-domain audio source separation based on Wave-U-Net combined with discrete wavelet transform,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, May 2020, pp. 386–390.
    bib arXiv
  34. Tomohiko Nakamura and Hirokazu Kameoka, “Shifted and convolutive source-filter non-negative matrix factorization for monaural audio source separation,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, Mar. 2016, pp. 489–493.
    bib
  35. Tomohiko Nakamura and Hirokazu Kameoka, “Lp-norm non-negative matrix factorization and its application to singing voice enhancement,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, Apr. 2015, pp. 2115–2119.
    bib
  36. Tomohiko Nakamura, Kotaro Shikata, Norihiro Takamune, and Hirokazu Kameoka, “Harmonic-temporal factor decomposition incorporating music prior information for informed monaural source separation,” in Proceedings of International Society for Music Information Retrieval Conference, Oct. 2014, pp. 623–628.
    [Travel Grant by the Tateishi Science and Technology Foundation]
    bib demo
  37. Tomohiko Nakamura and Hirokazu Kameoka, “Fast signal reconstruction from magnitude spectrogram of continuous wavelet transform based on spectrogram consistency,” in Proceedings of International Conference on Digital Audio Effects, Sept. 2014, pp. 129–135.
    [Travel Grant by the Hara Research Foundation]
    bib demo
  38. Takuya Higuchi, Hirofumi Takeda, Tomohiko Nakamura, and Hirokazu Kameoka, “A unified approach for underdetermined blind signal separation and source activity detection by multichannel factorial hidden Markov models,” in Proceedings of INTERSPEECH, Sept. 2014, pp. 850–854.
    bib
  39. Tomohiko Nakamura, Hirokazu Kameoka, Kazuyoshi Yoshii, and Masataka Goto, “Timbre replacement of harmonic and drum components for music audio signals,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, May 2014, pp. 7520–7524.
    bib demo
  40. Takuya Higuchi, Norihiro Takamune, Tomohiko Nakamura, and Hirokazu Kameoka, “Underdetermined blind separation and tracking of moving sources based on DOA-HMM,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, May 2014, pp. 3215–3219.
    bib
  41. Tomohiko Nakamura, Eita Nakamura, and Shigeki Sagayama, “Acoustic score following to musical performance with errors and arbitrary repeats and skips for automatic accompaniment,” in Proceedings of Sound and Music Computing Conference, Aug. 2013, pp. 299–304.
    [Travel Grant by the Telecommunications Advancement Foundation]
    bib demo
  42. Masahiro Nakano, Jonathan Le Roux, Hirokazu Kameoka, Tomohiko Nakamura, Nobutaka Ono, and Shigeki Sagayama, “Bayesian nonparametric spectrogram modeling based on infinite factorial infinite hidden Markov model,” in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, Oct. 2011, pp. 325–328.
    bib
  43. Tomohiko Nakamura, Shinji Hara, and Yutaka Hori, “Local stability analysis for a class of quorum-sensing networks with cyclic gene regulatory networks,” in Proceedings of SICE Annual Conference, Sept. 2011, pp. 2111–2116.
    [SICE Annual Conference 2011 International Award and Finalist of Young Author's Award]
    bib

Presentation / Demos

  1. Kanami Imamura, Tomohiko Nakamura, Kohei Yatabe, and Hiroshi Saruwatari, “Continuous function approximation of convolutional kernels for sampling frequency adaptation of pre-trained source separation networks,” in Joint Meeting of the Acoustical Society of America and the Acoustical Society of Japan, Dec. 2025. (Abstract only)
    bib
  2. Yuto Ishikawa, Tomohiko Nakamura, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari, Yu Takahashi, and Kazunobu Kondo, “Low-latency real-time speech extraction based on rank-constrained spatial covariance matrix estimation using asymmetric window function,” in Joint Meeting of the Acoustical Society of America and the Acoustical Society of Japan, Dec. 2025. (Abstract only)
    bib
  3. Yuta Amezawa, Tomohiko Nakamura, Takahiro Shiina, Satoru Fukayama, Jun Ogata, Hiroki Kuroda, and Takahiko Uchide, “Automatic detection and extraction of later phase in S coda using machine learning for crustal heterogeneity exploration,” in ACES (APEC Cooperation for Earthquake Science) International Workshop, Nov. 2025. (Abstract only)
    bib
  4. Kengo Takemoto, Tomohiko Nakamura, and Hiroshi Saruwatari, “Toward score-informed music audio editing system using differentiable digital signal processing mixture model,” in Late Breaking Session, International Society for Music Information Retrieval Conference, Sept. 2025. (Demo)
    bib
  5. Rinka Nobukawa, Tomohiko Nakamura, Shinnosuke Takamichi, and Hiroshi Saruwatari, “Real-time drum-to-vocal percusssion sound conversion system,” in Late Breaking Session, International Society for Music Information Retrieval Conference, Sept. 2025. (Demo)
    bib
  6. Yuto Ishikawa, Tomohiko Nakamura, Norihiro Takamune, and Hiroshi Saruwatari, “Hearing-aids system using distributed assistive device and blind speech extraction method under diffuse noise,” in International Congress on Acoustics, May 2025. (Abstract only)
    bib
  7. Yuta Amezawa, Tomohiko Nakamura, Satoru Fukayama, Takahiro Shiina, and Takahiko Uchide, “Automatic extraction and peak arrival estimation of later phase in S coda,” in International Joint Workshop on Slow-to-Fast Earthquakes 2024, Sept. 2024. (Abstract only)
    bib
  8. Yuto Ishikawa, Tomohiko Nakamura, Norihiro Takamune, and Hiroshi Saruwatari, “Real-time framework for speech extraction based on independent low-rank matrix analysis with spatial regularization and rank-constrained spatial covariance matrix estimation,” in Workshop on Spoken Dialogue Systems for Cybernetic Avatars (SDS4CA), Sept. 2024. (Presentation only)
    bib
  9. Shigeki Sagayama, Tomohiko Nakamura, Eita Nakamura, Yasuyuki Saito, Hirokazu Kameoka, and Nobutaka Ono, “Automatic music accompaniment allowing errors and arbitrary repeats and jumps,” in Proceedings of Meetings on Acoustics, Acoustic Society of America, May 2014, vol. 21, 35003.
    bib