Research Projects

Input-Adaptive Spectral Feature Compression ↗ ↖

We propose an efficient encoder-decoder architecture for neural source separation, called spectral feature compression (SFC), which compresses the input using a single sequence modeling module, making it both input-adaptive and parameter-efficient.

Formula-SED: A Formula-Driven SED Framework ↗ ↖

A novel framework for pre-training environmental sound analysis models by utilizing parametrically synthesized acoustic signals using formula-driven methods.

Onset-and-Offset-Aware Sound Event Detection via Differentiable Frame-to-Event Mapping

2 mins

SED-HSMM couples a neural acoustic model with a hidden semi-Markov model to deliver event-wise sound detection with precise boundaries.

Neural FCASA for Conversation Analysis

2 mins

Neural blind source separation and diarization for distant speech recognition that works with weak supervision.

Time-Domain Audio Source Separation Based on Gaussian Processes with Deep Kernel Learning ↗ ↖

This paper revisits single-channel audio source separation based on a probabilistic generative model of a mixture signal defined in the continuous time domain.

Time-Varying Neural FCA

3 mins

This paper presents an unsupervised multichannel method that can separate moving sound sources based on an amortized variational inference (AVI) of joint separation and localization.

Neural Full-Rank Spatial Covariance Analysis

2 mins

This paper describes a neural blind source separation (BSS) method based on amortized variational inference (AVI) of a non-linear generative model of mixture signals.

Speech Enhancement Based on Robust NTF

2 mins

This paper presents a blind multichannel speech enhancement method that can deal with the time-varying layout of microphones and sound sources.