Skip to content

Sampling-Frequency-Independent Deep Learning

We propose a DNN-based audio source separation method that can consistently work for audio signals of arbitrary (unseen) sampling frequencies, even if it is trained with a single sampling frequency. Focusing on the fact that a convolutional layer is interpreted as a digital FIR filter, we build a sampling-frequency-independent convolutional layer, of which weights (the impulse responses of the digital filters) are generated from latent analog filters using a classical DSP technique, digital filter design.

たとえ単一のサンプリング周波数で学習されていたとしても,任意の(未学習の)任意のサンプリング周波数にも適用可能なDNNベース音源分離手法を提案しました. 畳み込み層がデジタルFIRフィルタとみなせることに着眼し,古典的な信号処理技法であるデジタルフィルタ設計技法を用いてアナログフィルタからの畳み込み層の重みの生成過程を導入することで,サンプリング周波数に非依存な畳み込み層を構築しました.

Empirical Bayesian Independent Deeply Learned Matrix Analysis

We propose an extension of independent deeply learned matrix analysis (IDLMA), empirical Bayesian IDLMA, that can deal with uncertainty of source power spectrogram estimates at each time-frequency bin using DNN-based hyperparameter estimation of prior distributions of sources.


Independent Deeply Learned Tensor Analysis

We propose a multichannel audio source separation method based on independent deeply learned matrix analysis, independent deeply learned Tensor Analysis, that can deal with inter-frequency correlations of each source explicitly.


Multiresolution Deep Layered Analysis: End-to-end Music Source Separation Inspired by Multiresolution Analysis

Focusing on the architectural resemblance between an DNN for end-to-end audio source separation and multiresolution analysis, we propose down-sampling (pooling) layers "reasonable" from the signal processing viewpoint, which has the perfect reconstruction property and the anti-aliasing mechanism. Using the proposed down-sampling layers, we further propose a multiresolution-analysis-inspired end-to-end audio source separation method, multiresolution deep layered analysis.

End-to-end音源分離用DNNと多重解像度解析の構造の類似性に着眼し,完全再構成性,アンチエイリアシングフィルタを備えたダウンサンプリング層を提案しました. さらに,それらの層を用いたend-to-end音源分離手法(多重解像度深層分析)を提案しました.

Harmonic-Temporal Factor Decomposition for Unsupervised Monaural Source Separation of Harmonic Sounds

We present an unsupervised monaural source separation method of harmonic sounds, harmonic-temporal factor decomposition, that encompasses the ideas of computational auditory scene analysis, non-negative matrix factorization, and a source-filter model.


Unsupervised Drum Timbre Replacement between Two Music Audio Recordings

We propose a system that allows users to replace the frequency characteristics of harmonic sounds and the timbres of drum sounds of a music audio signal with those of another music audio signal without their musical scores.


Fast Signal Reconstruction from Magnitude Spectrogram of Continuous Wavelet Transform

We propose a 100× faster signal reconstruction algorithm from a magnitude continuous wavelet transform spectrogram (a.k.a. constant-Q transform spectrogram).


  • Tomohiko Nakamura and Hirokazu Kameoka, “Fast signal reconstruction from magnitude spectrogram of continuous wavelet transform based on spectrogram consistency,” in Proceedings of International Conference on Digital Audio Effects, Sep. 2014, pp. 129–135.
    paper , demo , [Travel Grant by the Hara Research Foundation]

Score Following and Automatic Accompaniment for Musical Performance During Practice

We propose real-time (O(n) for # of notes) score following methods that can deal with musical performances including typical errors during practice (note insertion, deletion, and substitution errors, and arbitrary repeats/skips).