Skip to content

Sampling-Frequency-Independent Deep Learning

We propose a DNN-based audio source separation method that can consistently work for audio signals of arbitrary (unseen) sampling frequencies, even if it is trained with a single sampling frequency. Focusing on the fact that a convolutional layer is interpreted as a digital FIR filter, we build a sampling-frequency-independent convolutional layer, of which weights (the impulse responses of the digital filters) are generated from latent analog filters using a classical DSP technique, digital filter design.

たとえ単一のサンプリング周波数で学習されていたとしても,任意の(未学習の)任意のサンプリング周波数にも適用可能なDNNベース音源分離手法を提案しました. 畳み込み層がデジタルFIRフィルタとみなせることに着眼し,古典的な信号処理技法であるデジタルフィルタ設計技法を用いてアナログフィルタからの畳み込み層の重みの生成過程を導入することで,サンプリング周波数に非依存な畳み込み層を構築しました.


Empirical Bayesian Independent Deeply Learned Matrix Analysis

We propose an extension of independent deeply learned matrix analysis (IDLMA), empirical Bayesian IDLMA, that can deal with uncertainty of source power spectrogram estimates at each time-frequency bin using DNN-based hyperparameter estimation of prior distributions of sources.

高性能な多チャネル音源分離手法の1つである独立深層学習行列分析を各時間周波数ビンでの音源パワー推定値の不確実性を扱えるように拡張した,経験ベイズ独立深層学習行列分析を提案しました.


Independent Deeply Learned Tensor Analysis

We propose a multichannel audio source separation method based on independent deeply learned matrix analysis, independent deeply learned Tensor Analysis, that can deal with inter-frequency correlations of each source explicitly.

各音源の周波数間相関を陽に扱いつつ,独立深層学習行列分析のアイディアを継承した多チャネル音源分離手法(独立深層学習テンソル分析)を提案しました.


Multiresolution Deep Layered Analysis: End-to-end Music Source Separation Inspired by Multiresolution Analysis

Focusing on the architectural resemblance between an DNN for end-to-end audio source separation and multiresolution analysis, we propose down-sampling (pooling) layers "reasonable" from the signal processing viewpoint, which has the perfect reconstruction property and the anti-aliasing mechanism. Using the proposed down-sampling layers, we further propose a multiresolution-analysis-inspired end-to-end audio source separation method, multiresolution deep layered analysis.

End-to-end音源分離用DNNと多重解像度解析の構造の類似性に着眼し,完全再構成性,アンチエイリアシングフィルタを備えたダウンサンプリング層を提案しました. さらに,それらの層を用いたend-to-end音源分離手法(多重解像度深層分析)を提案しました.


Harmonic-Temporal Factor Decomposition for Unsupervised Monaural Source Separation of Harmonic Sounds

We present an unsupervised monaural source separation method of harmonic sounds, harmonic-temporal factor decomposition, that encompasses the ideas of computational auditory scene analysis, non-negative matrix factorization, and a source-filter model.

計算論的聴覚情景分析,非負値行列因子分解,ソースフィルタモデルを融合した,教師なし調波音分離手法(調波時間因子分解)を提案しました.


Unsupervised Drum Timbre Replacement between Two Music Audio Recordings

We propose a system that allows users to replace the frequency characteristics of harmonic sounds and the timbres of drum sounds of a music audio signal with those of another music audio signal without their musical scores.

楽譜情報なしでも2楽曲間で調波音の周波数特性とドラム音色を置換できるシステムを提案しました.


Fast Signal Reconstruction from Magnitude Spectrogram of Continuous Wavelet Transform

We propose a 100× faster signal reconstruction algorithm from a magnitude continuous wavelet transform spectrogram (a.k.a. constant-Q transform spectrogram).

従来法に比べて約100倍高速な振幅連続ウェーブレット変換からの信号再構成(位相推定)アルゴリズムを提案しました.

  • Tomohiko Nakamura and Hirokazu Kameoka, “Fast signal reconstruction from magnitude spectrogram of continuous wavelet transform based on spectrogram consistency,” in Proceedings of International Conference on Digital Audio Effects, Sep. 2014, pp. 129–135.
    paper , demo , [Travel Grant by the Hara Research Foundation]

Score Following and Automatic Accompaniment for Musical Performance During Practice

We propose real-time (O(n) for # of notes) score following methods that can deal with musical performances including typical errors during practice (note insertion, deletion, and substitution errors, and arbitrary repeats/skips).

練習時によく起こる誤り(音符の挿入,脱落,置換誤りや任意の弾き直し,弾き飛ばし)を含む演奏に対して,実時間で動作する(音符に関して線形オーダ)楽譜追跡手法を提案しました.