Project Page of Time-Varying Neural FCA
TV Neural FCA

Joint Separation and Localization of Moving Sound Sources Based on Neural Full-Rank Spatial Covariance Analysis

Hokuto Munakata, Yoshiaki Bando, Ryu Takeda, Kazunori Komatani, Masaki Onishi

Fig. 1: Overview of our time-varying neural FCA.

Abstract: This paper presents an unsupervised multichannel method that can separate moving sound sources based on an amortized variational inference (AVI) of joint separation and localization. A recently proposed blind source separation (BSS) method called neural full-rank spatial covariance analysis (FCA) trains a neural separation model based on a nonlinear generative model of multichannel mixtures and can precisely separate unseen mixture signals. This method, however, assumes that the sound sources hardly move, and thus its performance is easily degraded by the source movements. In this paper, we solve this problem by introducing time-varying spatial covariance matrices and directions of arrival of sources into the nonlinear generative model of the neural FCA. This generative model is used for training a neural network to jointly separate and localize moving sources by using only multichannel mixture signals and array geometries. The training objective is derived as a lower bound on the log-marginal posterior probability in the framework of AVI. Experimental results obtained with mixture signals of moving sources show that our method outperformed an existing joint separation and localization method and standard BSS methods.

Separation results for real recordings

We separated 6-channel mixture signals recorded in our experimental room with the neural network evaluated in the paper. The mixture signals were dereverberated by the weighted prediction error (WPE) method in advance.

Result 1: Separation of two static sources
Input
Dereverberation result
Src 1 (static): This is a demonstration of time-varying neural FCA.
Src. 1: cACGMM
Src. 1: FCA
Src. 1: FastMNMF2
Src. 1: DoA-HMM-based clustering
Src. 1: Neural FCA
Src. 1: TV Neural FCA (proposed)
Src 2 (static): これは時変深層フルランク空間相関分析のデモ動画です. (Kore wa jihen sinsou furu-ranku kukan-soukan bunseki no demo douga desu)
Src. 2: cACGMM
Src. 2: FCA
Src. 2: FastMNMF2
Src. 2: DoA-HMM-based clustering
Src. 2: Neural FCA
Src. 2: TV Neural FCA (proposed)
Result 2: Separation of one static source and one moving source
Input
Dereverberation result
Src 1 (moving): This is a demonstration of time-varying neural FCA.
Src. 1: cACGMM
Src. 1: FCA
Src. 1: FastMNMF2
Src. 1: DoA-HMM-based clustering
Src. 1: Neural FCA
Src. 1: TV Neural FCA (proposed)
Src 2 (static): これは時変深層フルランク空間相関分析のデモ動画です. (Kore wa jihen sinsou furu-ranku kukan-soukan bunseki no demo douga desu)
Src. 2: cACGMM
Src. 2: FCA
Src. 2: FastMNMF2
Src. 2: DoA-HMM-based clustering
Src. 2: Neural FCA
Src. 2: TV Neural FCA (proposed)
Result 3: Separation of two moving sources
Input
Dereverberation result
Src 1 (moving): This is a demonstration of time-varying neural FCA.
Src. 1: cACGMM
Src. 1: FCA
Src. 1: FastMNMF2
Src. 1: DoA-HMM-based clustering
Src. 1: Neural FCA
Src. 1: TV Neural FCA (proposed)
Src 2 (moving) : これは時変深層フルランク空間相関分析のデモ動画です. (Kore wa jihen sinsou furu-ranku kukan-soukan bunseki no demo douga desu)
Src. 2: cACGMM
Src. 2: FCA
Src. 2: FastMNMF2
Src. 2: DoA-HMM-based clustering
Src. 2: Neural FCA
Src. 2: TV Neural FCA (proposed)

Separation results for simulated mixtures

Input
Src. 1: cACGMM
Src. 1: FCA
Src. 1: FastMNMF2
Src. 1: DoA-HMM-based clustering
Src. 1: Neural FCA
Src. 1: TV Neural FCA (proposed)
Src. 2: cACGMM
Src. 2: FCA
Src. 2: FastMNMF2
Src. 2: DoA-HMM-based clustering
Src. 2: Neural FCA
Src. 2: TV Neural FCA (proposed)
Input
Src. 1: cACGMM
Src. 1: FCA
Src. 1: FastMNMF2
Src. 1: DoA-HMM-based clustering
Src. 1: Neural FCA
Src. 1: TV Neural FCA (proposed)
Src. 2: cACGMM
Src. 2: FCA
Src. 2: FastMNMF2
Src. 2: DoA-HMM-based clustering
Src. 2: Neural FCA
Src. 2: TV Neural FCA (proposed)
Input
Src. 1: cACGMM
Src. 1: FCA
Src. 1: FastMNMF2
Src. 1: DoA-HMM-based clustering
Src. 1: Neural FCA
Src. 1: TV Neural FCA (proposed)
Src. 2: cACGMM
Src. 2: FCA
Src. 2: FastMNMF2
Src. 2: DoA-HMM-based clustering
Src. 2: Neural FCA
Src. 2: TV Neural FCA (proposed)