Unsupervised speech enhancement with deep dynamical generative speech and noise models

This work builds on a previous work on unsupervised speech enhancement using a dynamical variational autoencoder (DVAE) as the clean speech model and non-negative matrix factorization (NMF) as the noise model. We propose to replace the NMF noise model with a deep dynamical generative model (DDGM) depending either on the DVAE latent variables, or on the noisy observations, or on both. This DDGM can be trained in three configurations: noise-agnostic, noise-dependent and noise adaptation after noise-dependent training. Experimental results show that the proposed method achieves competitive performance compared to state-of-the-art unsupervised speech enhancement methods, while the noise-dependent training configuration yields a much more time-efficient inference process.

Domaines

Traitement du signal et de l'image [eess.SP] Intelligence artificielle [cs.AI] Apprentissage [cs.LG]

Xavier Alameda-Pineda : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-04132312

Soumis le : lundi 19 juin 2023-08:03:14

Dernière modification le : jeudi 4 avril 2024-21:41:38

Dates et versions

hal-04132312 , version 1 (19-06-2023)

Licence

Paternité

Identifiants

HAL Id : hal-04132312 , version 1
ARXIV : 2306.07820

Citer

Xiaoyu Lin, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda. Unsupervised speech enhancement with deep dynamical generative speech and noise models. Interspeech 2023 - 24th Annual Conference of the International Speech Communication Association, ISCA, Aug 2023, Dublin, Ireland. pp.1-5. ⟨hal-04132312⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 UGA CNRS INRIA INSA-RENNES GIPSA IETR SUP_IETR LJK GIPSA-CRISSP CENTRALESUPELEC IETR-FAST INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES INSA-GROUPE GIPSA-PPC MIAI ANR UR1-MATH-NUM HUB-IA NANTES-UNIVERSITE IETR-AIMAC

91 Consultations

0 Téléchargements