Skip to content

International Conferences & Workshops / 国際会議

Peer-Reviewed

  1. Tomohiko Nakamura, Wataru Nakata, Kanami Imamura, and Yuki Saito, “Neural audio codec with adjustable token temporal resolution using sampling-frequency-independent convolutional layers,” in Proceedings of International Workshop on Acoustic Signal Enhancement, Sept. 2026.
    bib
  2. Ege Erdem, Shoichi Koyama, Tomohiko Nakamura, Orchisama Das, and Zoran Cvetkovic, “SF-Flow: Sound field magnitude estimation via flow matching guided by sparse measurements,” in Proceedings of International Workshop on Acoustic Signal Enhancement, Sept. 2026.
    bib
  3. Yuto Ishikawa, Norihiro Takamune, Kouei Yamaoka, Tomohiko Nakamura, and Hiroshi Saruwatari, “Joint optimization of demixing filters and asymmetric window function for independent vector analysis,” in Proceedings of International Workshop on Acoustic Signal Enhancement, Sept. 2026.
    bib
  4. Kengo Takemoto, Tomohiko Nakamura, and Hiroshi Saruwatari, “Diffusion-based music audio editing system using differentiable digital signal processing mixture model,” in Proceedings of International Conference on Digital Audio Effects, Demo Session, Sept. 2026.
    bib
  5. Sota Koshino, Shotaro Ueji, Shinnosuke Takamichi, and Tomohiko Nakamura, “Automatic generation of audio comic from manga images,” in Proceedings of INTERSPEECH, Show&tell Session, Sept. 2026.
    bib
  6. Woan-Shiuan Chien, Tomohiko Nakamura, Huan-Yu Chen, Fukayama Satoru, Hitoshi Suda, Jun Ogata, and Chi-Chun Lee, “Two-sided fairness transfer for gender-neutral speech emotion recognition with partially observed attributes,” in Proceedings of INTERSPEECH, Sept. 2026.
    bib
  7. Daigo Takizawa, Tomohiko Nakamura, Samuele Cornell, William Chen, Satoru Fukayama, and Shinji Watanabe, “Dissecting sensitivity to training language in self-supervised speech learning using neural audio codec tokens,” in Proceedings of INTERSPEECH, Sept. 2026.
    bib
  8. Shinnosuke Takamichi, Tomohiko Nakamura, Hitoshi Suda, Satoru Fukayama, and Jun Ogata, “MangaVox: Dataset of acted voices aligned with manga images towards computer understanding of audio comics,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, May 2026, pp. 19467–19471.
    bib
  9. Karl Schrader, Shoichi Koyama, Tomohiko Nakamura, and Mirco Pezzoli, “Phase-retrieval-based physics-informed neural networks for acoustic magnitude field reconstruction,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, May 2026, pp. 15162–15166.
    bib arXiv
  10. Kanami Imamura, Tomohiko Nakamura, Kohei Yatabe, and Hiroshi Saruwatari, “Dissecting performance degradation in audio source separation under sampling frequency mismatch,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, May 2026, pp. 15832–15836.
    bib arXiv
  11. Go Nishikawa, Wataru Nakata, Yuki Saito, Kanami Imamura, Hiroshi Saruwatari, and Tomohiko Nakamura, “Multi-sampling-frequency naturalness MOS prediction using self-supervised learning model with sampling-frequency-independent layer,” in Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop, Dec. 2025. (First and second authors contributed equally.)
    bib arXiv code
  12. Rinka Nobukawa, Makito Kitamura, Tomohiko Nakamura, Shinnosuke Takamichi, and Hiroshi Saruwatari, “Drum-to-vocal percussion sound conversion and its evaluation methodology,” in Proceedings of Asia Pacific Signal and Information Processing Association Annual Summit and Conference, Oct. 2025, pp. 198–203.
    bib arXiv
  13. Ryan Niu, Shoichi Koyama, and Tomohiko Nakamura, “Head-related transfer function individualization using anthropometric features and spatially independent latent representations,” in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, Oct. 2025.
    bib arXiv code
  14. Hitoshi Suda, Junya Koguchi, Shunsuke Yoshida, Tomohiko Nakamura, Fukayama Satoru, and Jun Ogata, “IdolSongsJp corpus: A multi-singer song corpus in the style of Japanese idol groups,” in Proceedings of International Society for Music Information Retrieval Conference, Sept. 2025, pp. 647–654.
    bib arXiv
  15. Kanami Imamura, Tomohiko Nakamura, Norihiro Takamune, Kohei Yatabe, and Hiroshi Saruwatari, “Local equivariance error-based metrics for evaluating sampling-frequency-independent property of neural network,” in Proceedings of European Signal Processing Conference, Sept. 2025, pp. 276–280.
    bib arXiv
  16. Aogu Wada, Tomohiko Nakamura, and Saruwatari Hiroshi, “Hyperbolic embeddings for order-aware classification of audio effect chains,” in Proceedings of International Conference on Digital Audio Effects, Sept. 2025, pp. 396–402.
    bib arXiv
  17. Tomohiko Nakamura, Kwanghee Choi, Keigo Hojo, Yoshiaki Bando, Satoru Fukayama, and Shinji Watanabe, “Discrete speech unit extraction via independent component analysis,” in Proceedings of SALMA: Speech and Audio Language Models - Architectures, Data Sources, and Training Paradigms, IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops, Apr. 2025.
    bib arXiv poster code
  18. Yuto Ishikawa, Osamu Take, Tomohiko Nakamura, Norihiro Takamune, Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari, “Real-time noise estimation for Lombard-effect speech synthesis in human–avatar dialogue systems,” in Proceedings of Asia Pacific Signal and Information Processing Association Annual Summit and Conference, Dec. 2024.
    bib paper
  19. Hiroaki Hyodo, Shinnosuke Takamichi, Tomohiko Nakamura, Junya Koguchi, and Hiroshi Saruwatari, “DNN-based ensemble singing voice synthesis with interactions between singers,” in Proceedings of IEEE Spoken Language Technology Workshop, Dec. 2024, pp. 660–667.
    bib arXiv code
  20. Hitoshi Suda, Shunsuke Yoshida, Tomohiko Nakamura, Fukayama Satoru, and Jun Ogata, “FruitsMusic: A real-world corpus of Japanese idol-group songs,” in Proceedings of International Society for Music Information Retrieval Conference, Nov. 2024.
    bib arXiv dataset
  21. Kwanghee Choi, Ankita Pasad, Tomohiko Nakamura, Satoru Fukayama, Karen Livescu, and Shinji Watanabe, “Self-supervised speech representations are more phonetic than semantic,” in Proceedings of INTERSPEECH, Sept. 2024, pp. 4578–4582.
    bib arXiv code
  22. Yoshiaki Bando, Tomohiko Nakamura, and Shinji Watanabe, “Neural blind source separation and diarization for distant speech recognition,” in Proceedings of INTERSPEECH, Sept. 2024, pp. 722–726.
    bib arXiv demo code
  23. Yuto Ishikawa, Kohei Konaka, Tomohiko Nakamura, Norihiro Takamune, and Hiroshi Saruwatari, “Real-time speech extraction using spatially regularized independent low-rank matrix analysis and rank-constrained spatial covariance matrix estimation,” in Proceedings of Hands-Free Speech Communication and Microphone Arrays, IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops, Apr. 2024, pp. 730–734.
    bib arXiv
  24. Kanami Imamura, Tomohiko Nakamura, Norihiro Takamune, Kohei Yatabe, and Hiroshi Saruwatari, “Algorithms of sampling-frequency-independent layers for non-integer strides,” in Proceedings of European Signal Processing Conference, Sept. 2023, pp. 326–330.
    bib arXiv
  25. Joonyong Park, Shinnosuke Takamichi, Tomohiko Nakamura, Kentaro Seki, Detai Xin, and Hiroshi Saruwatari, “How generative spoken language model encodes noisy speech: Investigation from phonetics to syntactics,” in Proceedings of INTERSPEECH, Aug. 2023, pp. 1085–1089.
    bib arXiv
  26. Tomohiko Nakamura, Shinnosuke Takamichi, Naoko Tanji, Satoru Fukayama, and Hiroshi Saruwatari, “jaCappella corpus: A Japanese a cappella vocal ensemble corpus,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, June 2023.
    bib arXiv demo code dataset
  27. Kota Arai, Yutaro Hirao, Takuji Narumi, Tomohiko Nakamura, Shinnosuke Takamichi, and Shigeo Yoshida, “TimToShape: Supporting practice of musical instruments by visualizing timbre with 2D shapes based on crossmodal correspondences,” in Proceedings of ACM Conference on Intelligent User Interfaces, Mar. 2023, pp. 850–865.
    bib blog
  28. Futa Nakashima, Tomohiko Nakamura, Norihiro Takamune, Satoru Fukayama, and Hiroshi Saruwatari, “Hyperbolic timbre embedding for musical instrument sound synthesis based on variational autoencoders,” in Proceedings of Asia Pacific Signal and Information Processing Association Annual Summit and Conference, Nov. 2022, pp. 736–743.
    bib arXiv
  29. Yuki Ito, Tomohiko Nakamura, Shoichi Koyama, and Hiroshi Saruwatari, “Head-related transfer function interpolation from spatially sparse measurements using autoencoder with source position conditioning,” in Proceedings of International Workshop on Acoustic Signal Enhancement, Sept. 2022.
    [Finalist of Best Student Paper Award of IWAENC 2022 (Yuki Ito)]
    bib arXiv slides demo code
  30. Kazuhide Shigemi, Shoichi Koyama, Tomohiko Nakamura, and Hiroshi Saruwatari, “Physics-informed convolutional neural network with bicubic spline interpolation for sound field estimation,” in Proceedings of International Workshop on Acoustic Signal Enhancement, Sept. 2022.
    bib arXiv
  31. Takaaki Saeki, Shinnosuke Takamichi, Tomohiko Nakamura, Naoko Tanji, and Hiroshi Saruwatari, “SelfRemaster: Self-supervised speech restoration with analysis-by-synthesis approach using channel modeling,” in Proceedings of INTERSPEECH, Sept. 2022, pp. 4406–4410.
    bib arXiv demo code
  32. Masaya Kawamura, Tomohiko Nakamura, Daichi Kitamura, Hiroshi Saruwatari, Yu Takahashi, and Kazunobu Kondo, “Differentiable digital signal processing mixture model for synthesis parameter extraction from mixture of harmonic sounds,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, May 2022, pp. 941–945.
    [IEEE Signal Processing Society Japan Student Conference Paper Award (Awardee: Masaya Kawamura) / 第16回 IEEE Signal Processing Society Japan Student Conference Paper Award(受賞者:川村 真也)]
    bib arXiv demo
  33. Takuya Hasumi, Tomohiko Nakamura, Norihiro Takamune, Hiroshi Saruwatari, Daichi Kitamura, Yu Takahashi, and Kazunobu Kondo, “Multichannel audio source separation with independent deeply learned matrix analysis using product of source models,” in Proceedings of Asia Pacific Signal and Information Processing Association Annual Summit and Conference, Dec. 2021, pp. 1226–1233.
    bib arXiv
  34. Sota Misawa, Norihiro Takamune, Tomohiko Nakamura, Daichi Kitamura, Hiroshi Saruwatari, Masakazu Une, and Shoji Makino, “Speech enhancement by noise self-supervised rank-constrained spatial covariance matrix estimation via independent deeply learned matrix analysis,” in Proceedings of Asia Pacific Signal and Information Processing Association Annual Summit and Conference, Dec. 2021, pp. 578–584.
    bib arXiv
  35. Yusaku Mizobuchi, Daichi Kitamura, Tomohiko Nakamura, Hiroshi Saruwatari, Yu Takahashi, and Kazunobu Kondo, “Prior distribution design for music bleeding-sound reduction based on nonnegative matrix factorization,” in Proceedings of Asia Pacific Signal and Information Processing Association Annual Summit and Conference, Dec. 2021, pp. 651–658.
    bib arXiv
  36. Koichi Saito, Tomohiko Nakamura, Kohei Yatabe, Yuma Koizumi, and Hiroshi Saruwatari, “Sampling-frequency-independent audio source separation using convolution layer based on impulse invariant method,” in Proceedings of European Signal Processing Conference, Aug. 2021, pp. 321–325.
    bib arXiv
  37. Naoki Narisawa, Rintaro Ikeshita, Norihiro Takamune, Daichi Kitamura, Tomohiko Nakamura, Hiroshi Saruwatari, and Tomohiro Nakatani, “Independent deeply learned tensor analysis for determined audio source separation,” in Proceedings of European Signal Processing Conference, Aug. 2021, pp. 326–330.
    bib arXiv
  38. Takuya Hasumi, Tomohiko Nakamura, Norihiro Takamune, Hiroshi Saruwatari, Daichi Kitamura, Yu Takahashi, and Kazunobu Kondo, “Empirical bayesian independent deeply learned matrix analysis for multichannel audio source separation,” in Proceedings of European Signal Processing Conference, Aug. 2021, pp. 331–335.
    bib arXiv
  39. Shihori Kozuka, Tomohiko Nakamura, and Hiroshi Saruwatari, “Investigation on wavelet basis function of DNN-based time domain audio source separation inspired by multiresolution analysis,” in Proceedings of International Congress and Exposition on Noise Control Engineering, Aug. 2020, pp. 4013–4022.
    bib
  40. Tomohiko Nakamura and Hiroshi Saruwatari, “Time-domain audio source separation based on Wave-U-Net combined with discrete wavelet transform,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, May 2020, pp. 386–390.
    bib arXiv
  41. Tomohiko Nakamura and Hirokazu Kameoka, “Shifted and convolutive source-filter non-negative matrix factorization for monaural audio source separation,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, Mar. 2016, pp. 489–493.
    bib
  42. Tomohiko Nakamura and Hirokazu Kameoka, “Lp-norm non-negative matrix factorization and its application to singing voice enhancement,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, Apr. 2015, pp. 2115–2119.
    bib
  43. Tomohiko Nakamura, Kotaro Shikata, Norihiro Takamune, and Hirokazu Kameoka, “Harmonic-temporal factor decomposition incorporating music prior information for informed monaural source separation,” in Proceedings of International Society for Music Information Retrieval Conference, Oct. 2014, pp. 623–628.
    [Travel Grant by the Tateishi Science and Technology Foundation]
    bib demo
  44. Tomohiko Nakamura and Hirokazu Kameoka, “Fast signal reconstruction from magnitude spectrogram of continuous wavelet transform based on spectrogram consistency,” in Proceedings of International Conference on Digital Audio Effects, Sept. 2014, pp. 129–135.
    [Travel Grant by the Hara Research Foundation]
    bib demo
  45. Takuya Higuchi, Hirofumi Takeda, Tomohiko Nakamura, and Hirokazu Kameoka, “A unified approach for underdetermined blind signal separation and source activity detection by multichannel factorial hidden Markov models,” in Proceedings of INTERSPEECH, Sept. 2014, pp. 850–854.
    bib
  46. Tomohiko Nakamura, Hirokazu Kameoka, Kazuyoshi Yoshii, and Masataka Goto, “Timbre replacement of harmonic and drum components for music audio signals,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, May 2014, pp. 7520–7524.
    bib demo
  47. Takuya Higuchi, Norihiro Takamune, Tomohiko Nakamura, and Hirokazu Kameoka, “Underdetermined blind separation and tracking of moving sources based on DOA-HMM,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, May 2014, pp. 3215–3219.
    bib
  48. Tomohiko Nakamura, Eita Nakamura, and Shigeki Sagayama, “Acoustic score following to musical performance with errors and arbitrary repeats and skips for automatic accompaniment,” in Proceedings of Sound and Music Computing Conference, Aug. 2013, pp. 299–304.
    [Travel Grant by the Telecommunications Advancement Foundation]
    bib demo
  49. Masahiro Nakano, Jonathan Le Roux, Hirokazu Kameoka, Tomohiko Nakamura, Nobutaka Ono, and Shigeki Sagayama, “Bayesian nonparametric spectrogram modeling based on infinite factorial infinite hidden Markov model,” in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, Oct. 2011, pp. 325–328.
    bib
  50. Tomohiko Nakamura, Shinji Hara, and Yutaka Hori, “Local stability analysis for a class of quorum-sensing networks with cyclic gene regulatory networks,” in Proceedings of SICE Annual Conference, Sept. 2011, pp. 2111–2116.
    [SICE Annual Conference 2011 International Award and Finalist of Young Author's Award]
    bib

Presentation / Demos

  1. Kanami Imamura, Tomohiko Nakamura, Kohei Yatabe, and Hiroshi Saruwatari, “Continuous function approximation of convolutional kernels for sampling frequency adaptation of pre-trained source separation networks,” in Joint Meeting of the Acoustical Society of America and the Acoustical Society of Japan, Dec. 2025. (Abstract only)
    bib
  2. Yuto Ishikawa, Tomohiko Nakamura, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari, Yu Takahashi, and Kazunobu Kondo, “Low-latency real-time speech extraction based on rank-constrained spatial covariance matrix estimation using asymmetric window function,” in Joint Meeting of the Acoustical Society of America and the Acoustical Society of Japan, Dec. 2025. (Abstract only)
    bib
  3. Yuta Amezawa, Tomohiko Nakamura, Takahiro Shiina, Satoru Fukayama, Jun Ogata, Hiroki Kuroda, and Takahiko Uchide, “Automatic detection and extraction of later phase in S coda using machine learning for crustal heterogeneity exploration,” in ACES (APEC Cooperation for Earthquake Science) International Workshop, Nov. 2025. (Abstract only)
    bib
  4. Kengo Takemoto, Tomohiko Nakamura, and Hiroshi Saruwatari, “Toward score-informed music audio editing system using differentiable digital signal processing mixture model,” in Late Breaking Session, International Society for Music Information Retrieval Conference, Sept. 2025. (Demo)
    bib
  5. Rinka Nobukawa, Tomohiko Nakamura, Shinnosuke Takamichi, and Hiroshi Saruwatari, “Real-time drum-to-vocal percusssion sound conversion system,” in Late Breaking Session, International Society for Music Information Retrieval Conference, Sept. 2025. (Demo)
    bib
  6. Yuto Ishikawa, Tomohiko Nakamura, Norihiro Takamune, and Hiroshi Saruwatari, “Hearing-aids system using distributed assistive device and blind speech extraction method under diffuse noise,” in International Congress on Acoustics, May 2025. (Abstract only)
    bib
  7. Yuta Amezawa, Tomohiko Nakamura, Satoru Fukayama, Takahiro Shiina, and Takahiko Uchide, “Automatic extraction and peak arrival estimation of later phase in S coda,” in International Joint Workshop on Slow-to-Fast Earthquakes 2024, Sept. 2024. (Abstract only)
    bib
  8. Yuto Ishikawa, Tomohiko Nakamura, Norihiro Takamune, and Hiroshi Saruwatari, “Real-time framework for speech extraction based on independent low-rank matrix analysis with spatial regularization and rank-constrained spatial covariance matrix estimation,” in Workshop on Spoken Dialogue Systems for Cybernetic Avatars (SDS4CA), Sept. 2024. (Presentation only)
    bib
  9. Shigeki Sagayama, Tomohiko Nakamura, Eita Nakamura, Yasuyuki Saito, Hirokazu Kameoka, and Nobutaka Ono, “Automatic music accompaniment allowing errors and arbitrary repeats and jumps,” in Proceedings of Meetings on Acoustics, Acoustic Society of America, May 2014, vol. 21, 35003.
    bib