Demonstration on Automatic Musical Key Transposition Using Harmonic-Temporal Factor Decomposition

Tomohiko Nakamura (The University of Tokyo) and Hirokazu Kameoka (The University of Tokyo/Nippon Telegraph and Telephone Corporation)

We show the effectiveness of the proposed methods [1] for an automatic musical key transposition task.
  • Proposed methods: Harmonic-Temporal Factor Decomposition (HTFD) and its source-filter extension (SF-HTFD)
  • Conventional method: Harmonic Non-Negative Matrix Factorization (HNMF)
Our procedure of the automatic musical key transposition is as follows:
  1. Given a music audio signal and its key, we separated the magnitude spectrogram of the music signal into those of individual semitones using the separation method (HTFD, SF-HTFD, or HNMF).
  2. We next transposed the separated components of specific semitones by shifting the corresponding separated spectrograms along the log-frequency axis.
  3. We added all the magnitude spectrograms together to obtain the pitch-transposed magnitude spectrograms.
  4. We finally converted the pitch-transposed magnitude spectrogram into a time-domain signal by a spectrogram inversion algorithm for fast approximate continuous wavelet transform presented in [2].
Several samples obtained with the methods are shown below. The keys of the original musical pieces are from the corresponding datasets.

HNMF HTFD (Proposed) SF-HTFD (Proposed)
01-AchGottundHerr from Bach10 dataset [3]
Key: C Major → C Natural Minor
Transposed
Pitches to be transposed
Unchanged pitches
02-AchLiebenChristen from Bach10 dataset [3]
Key: A Minor → A Major
Transposed
Pitches to be transposed
Unchanged pitches
03-ChristederdubistTagundLicht from Bach10 dataset [3]
Key: Bb Major → Bb Natural Minor
Transposed
Pitches to be transposed
Unchanged pitches
04-ChristeDuBeistand from Bach10 dataset [3]
Key: C Major → C Natural Minor
Transposed
Pitches to be transposed
Unchanged pitches
09-Jesus from Bach10 dataset [3]
Key: D Minor → D Major
Transposed
Pitches to be transposed
Unchanged pitches
10-NunBitten from Bach10 dataset [3]
Key: A Major → A Natural Minor
Transposed
Pitches to be transposed
Unchanged pitches
AuMix_05_Entertainer_tpt_tpt from URMP dataset [4]
Key: Eb Major → Eb Natural Minor
Transposed
Pitches to be transposed
Unchanged pitches
AuMix_34_Fugue_tpt_tpt_hn_tbn from URMP dataset [4]
Key: F Major → F Natural Minor
Transposed
Pitches to be transposed
Unchanged pitches

References

[1] Tomohiko Nakamura and Hirokazu Kameoka, “Harmonic-temporal factor decomposition for unsupervised monaural separation of harmonic sounds,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 68–82, Nov. 2020.
bib slides poster demo code dataset

[2] Tomohiko Nakamura and Hirokazu Kameoka, “Fast signal reconstruction from magnitude spectrogram of continuous wavelet transform based on spectrogram consistency,” in International Conference on Digital Audio Effects, Sep. 2014, pp. 129–135.
[Travel Grant by the Hara Research Foundation]
bib paper demo

[3] Zhiyao Duan and Bryan Pardo, "Soundprism: An Online System for Score-Informed Source Separation of Music Audio," IEEE Journal of Selected Topics in Signal Processing, vol. 5, no. 6, pp. 1205–1215, 2011.
[4] Bochen Li, Xinzhao Liu, Karthik Dinesh, Zhiyao Duan, and Gaurav Sharma, "Creating a Multitrack Classical Music Performance Dataset for Multimodal Music Analysis: Challenges, Insights, and Applications," IEEE Transactions on Multimedia, vol. 21, no. 2, pp. 522–535, 2019.