Supervised group NMF¶
In [1], we adapted a supervised matrix factorization model known as Task-driven Dictionary Learning (TDL) [3]} to suit the ASC task. We introduced a variant of the TDL model in its nonnegative formulation [4], including a modification of the original algorithm, where a nonnegative dictionary is jointly learned with a multi-class classifier. In [2] we proposed a new formulation of the Group nonnegative matrix factorisation (GNMF) method. Using the Euclidean distance as the divergence for the GNMF problem, the dictionary learning based on GNMF is integrated in a supervised framework inspired by TDL [3].
GNMF with speaker and session similarity¶
Details and source code for the GNMF [3] .
Task-driven NMF based dictionary learning¶
TDL [4] has recently been applied with nonnegativity constraints to perform speech enhancement[5]_ or to acoustic scene classification, where temporally integrated projections are classified with multinomial logistic regression [1]. In [2] we extended the latter approach to the GNMF case.
Task-driven NMF¶
The general idea of nonnegative TDL or task-driven NMF (TNMF) is to unite the dictionary learning with NMF and the training of the classifier in a joint optimization problem [1], [5]. Influenced by the classifier, the basis vectors are encouraged to explain the discriminative information in the data while keeping a low reconstruction cost. The TNMF model first considers the optimal projections \(\textbf{h}^{\star}(\textbf{v},\textbf{W})\) of the data points \(\textbf{v}\) on the dictionary textbf{W}, which are defined as solutions of the nonnegative elastic-net problem [6], expressed as:
where \(\lambda_{1}\) and \(\lambda_{2}\) are nonnegative regularization parameters. Given each data segment \(\textbf{V}^{(l)}\) of length \(M\) frames, associated with a label \(y\) in a fixed set of labels \(\mathcal{Y}\), we want to classify the mean of the projections of the data points \(\textbf{v}^{(l)}\) belonging to the segment \(l\), such that \(\textbf{V}^{(l)}=[\textbf{v}_{0}^{(l)},...,\textbf{v}_{M-1}^{(l)}]\). We define \(\hat{\textbf{h}}^{(l)}\) as the averaged projection of \(\textbf{V}^{(l)}\) on the dictionary, where \(\hat{\textbf{h}}^{(l)}=\frac{1}{M}\sum_{m=0}^{M-1}\textbf{h}^{\star}(\textbf{v}_{m}^{(l)},\textbf{W})\). The corresponding classification loss (here using multinomial logistic regression) is defined as \(l_{s}(y,\textbf{A},\hat{\textbf{h}}^{(l)})\), where \(\textbf{A}\in \mathcal{A}\) are the parameters of the classifier. The TNMF problem is then expressed as a joint minimization of the expected classification loss over \(\textbf{W}\) and \(\textbf{A}\):
with
Here, \(\mathcal{W}\) is defined as the set of nonnegative dictionaries containing unit \(l_{2}\)-norm basis vectors and \(\nu\) is a regularization parameter on the classifier parameters, meant to prevent over-fitting. The problem in equation (2) is optimized with mini-batch stochastic gradient descent as described in the paper of Bisot et al. [1].
Task-driven GNMF¶
In task-driven GNMF (TGNMF) we propose to perform jointly the dictionary learning based on GNMF [2] and the training of a multinomial logistic regression. The dictionary \(\textbf{W}\) is then the concatenation of all the sub-dictionaries \(\textbf{W}^{(cs)}\) and the optimal projections \(\textbf{h}^{\star}(\textbf{v},\textbf{W})\) are the solutions of (1).
Including the similarity constraints , the TGNMF is thus expressed as the minimization of the following problem:
with \(f(\textbf{W},\textbf{A})\) as defined above. The problem is again optimized with mini-batch stochastic gradient descent. However, as opposed to the previous algorithm, for each data point \(\textbf{v}\) belonging to a particular \(\textbf{V}^{(cs)}\), only the corresponding sub-dictionaries (\(\textbf{W}^{(cs)}\)) are updated, whereas the other dictionaries are left unchanged in order to match the GNMF adaptation scheme [2].
Download¶
Source code available at https://github.com/rserizel/TGNMF
Getting Started¶
A short example is available as at https://github.com/rserizel/TGNMF/blob/master/TGNMF_howto.ipynb
Citation¶
If you are using this source code please consider citing one of the following papers:
References
[1] |
|
[2] |
|
@article{bisot2017TASLP,
title={Feature Learning with Matrix Factorization Applied to Acoustic Scene Classification},
author={Serizel, Romain and Bisot, Victor and Essid, Slim and Richard, Ga{\"e}l},
journal={IEEE Transactions on Audio, Speech and Language Processing},
pages={14},
year={2017},
organization={IEEE}
}
@inproceedings{serizel2017ICASSP,
title={Supervised group nonnegative matrix factorisation with similarity constraints and applications to speaker identification},
author={Serizel, Romain and Bisot, Victor and Essid, Slim and Richard, Ga{\"e}l},
booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={5},
year={2017},
organization={IEEE}
}Download
References¶
[3] |
|
[4] |
|
[5] |
|
[6] |
|