This site presents audio samples related to the paper entitled On Ambisonic Source Separation with Spatially Informed Non-negative Tensor Factorization [1].

Sound source separation

We target the challenging task of Ambisonic-to-Ambisonic reverberant sound source separation. The aim is to estimate individual reverberant source signals, as if no other sound sources were present during the recording. The problem overview is depicted in Figure 1.

Figure 1. Ambisonic sound source separation scheme. Retrieval of individual source images.

Binaural audio samples

We present the audio samples of input mixtures and source signals estimated with reference methods and one of the five proposed algorithms, namely the EU-WLP. The reference methods include PWD, MWF, FastMNMF [2] and EU [3]. For more information about the reference methods, source signal retrieval procedure and the proposed solutions, kindly refer to our paper [1].

The presented audio samples include one audio file per one of the four datasets considered in our paper, namely the MUSIC I, MUSIC II, SPEECH and DSD100 [4]. In cases of the MUSIC I, MUSIC II and SPEECH datasets, we focus on the under-determined scenario, in which we retrieve source signals of 6 sound sources using first-order Ambisonics, that is derived from 4 directional signals. Due to the limitations of the DSD100 dataset, in its case, we present samples for the determined scenario, where 4 source signals are retrieved. More information on the aforementioned datasets can be found in our paper [1].

Table 1 presents the audio samples of input mixtures for the exemplary files, while Tables 2 - 5 present source signals separated with the reference methods and the proposed EU-WLP, in case of the MUSIC I, MUSIC II, SPEECH and DSD100 datasets, respectively.

All audio samples were decoded from the Ambisonic to the binaural format with the magLS method [5], using the spaudiopy library - please use headphones for best experience.

Note, that the estimated source signals were assigned to their target counterparts by considering the best overall Signal-to-Distortion-Ratio. Combined with the reoccurring inability of the reference algorithms to properly separate source signals, this can produce somewhat unintuitive estimate-to-target mapping.


Table 1. Input mixtures for the exemplary audio files.

MUSIC I MUSIC II SPEECH DSD100

Table 2. Separated source signals for the exemplary file from MUSIC I dataset.

Target PWD MWF FastMNMF [2] EU [3] Proposed EU-WLP

Table 3. Separated source signals for the exemplary file from MUSIC II dataset.

Target PWD MWF FastMNMF [2] EU [3] Proposed EU-WLP

Table 4. Separated source signals for the exemplary file from SPEECH dataset.

Target PWD MWF FastMNMF [2] EU [3] Proposed EU-WLP

Table 5. Separated source signals for the exemplary file from DSD100 dataset.

Target PWD MWF FastMNMF [2] EU [3] Proposed EU-WLP

References

[1] M. Guzik and K. Kowalczyk, “On Ambisonic Source Separation with Spatially Informed Non-negative Tensor Factorization”, in IEEE/ACM Transactions on Audio, Speech, and Language Processing, doi: 10.1109/TASLP.2024.3399618.

[2] K. Sekiguchi and A. A. Nugraha and Y. Bando and K. Yoshii, “Fast Multichannel Source Separation Based on Jointly Diagonalizable Spatial Covariance Matrices”, European Signal Processing Conference (EUSIPCO), 2019, doi.org/10.23919/EUSIPCO.2019.8902557.

[3] J. Nikunen and A. Politis, “Multichannel NMF for source separation with ambisonic signals”, International Workshop on Acoustic Signal Enhancement (IWAENC), 2018, doi.org/10.1109/IWAENC.2018.8521344.

[4] A. Liutkus and F. R. Stöter and Z. Rafii and D. Kitamura and B. Rivet and N. Ito and N. Ono and J. Fontecave, “The 2016 Signal Separation Evaluation Campaign”, The 2016 Signal Separation Evaluation Campaign (SiSEC16), 2017, doi.org/10.1007/978-3-319-53547-0_31.

[5] C. Schörkhuber and M. Zaunschirm and R. Höldrich, "Binaural rendering of ambisonic signals via magnitude least squares", Fortschritter der Akustik (DAGA), 2018.