NMF with :math:`\beta` -divergence and multiplicative updates
-------------------------------------------------------------

Consider the (nonnegative) matrix :math:`\textbf{V}\in\mathbb{R}_+^{F\times N}`. The goal of NMF [1]_ is to find a factorisation for :math:`\textbf{V}` of the form:

.. math::
	\textbf{V} \approx \textbf{W}\textbf{H}\label{nmf}

where :math:`\textbf{W}\in\mathbb{R}_+^{F\times K}` , :math:`\textbf{H}\in\mathbb{R}_+^{K\times N}` and :math:`K` is the number of components in the decomposition. The NMF model estimation is usually considered as solving the following optimisation problem:

.. math::
	\min_{\textbf{W}\textbf{H}}D(\textbf{V} | \textbf{W}\textbf{H})\quad\mathrm{s.t.}\quad\textbf{W}\geq 0,\ \textbf{H}\geq 0\mathrm{.}

Where :math:`D` is a separable divergence such as:

.. math::
	D(\textbf{V}|\textbf{W}\textbf{H}) = \sum\limits_{f=1}^{F}\sum\limits_{n=1}^{N}d([\textbf{V}]_{fn}|[\textbf{WH}]_{fn})\textrm{,}

with :math:`[.]_{fn}` is the element on column :math:`n` and line :math:`f` of a matrix and :math:`d` is a scalar cost function. A common choice for the cost function is the :math:`\beta` -divergence [2]_  [3]_ [4]_ defined as follows:

.. math::
 	d_{\beta}(x|y)\triangleq 
  	\begin{cases}
 	\frac{1}{\beta(\beta -1)} (x^{\beta} + (\beta -1)y^{\beta} - \beta xy^{(\beta-1)})&\beta\in\mathbb{R}\backslash\{0,1\}\nonumber\\
 	x\log\frac{x}{y} - x + y&\beta=1\nonumber\\
 	\frac{x}{y} -\log\frac{x}{y} - 1&\beta=0\nonumber\textrm{.}
 	\end{cases}

Popular cost functions such as the Euclidean distance, the generalised KL divergence [5]_ and the IS divergence [6]_ and all particular cases of the $\beta$-divergence (obtained for :math:`\beta=2` , :math:`1` and :math:`0`, respectively). The use of the :math:`\beta` -divergence for NMF has been studied extensively in Févotte et al. [7]_.  In most cases NMF problem is solved using a two-block coordinate descent approach. Each of the factors :math:`\textbf{W}` and :math:`\textbf{H}` is optimised alternatively. The sub-problem in one factor can then be considered as a nonnegative least square problem (NNLS) [8]_. The approach implemented here to solve these NNLS problems relies on the multiplicative update rules introduced in Lee et al. [1]_ for the Euclidean distance and later generalised to the the :math:`\beta` -divergence [7]_ :

.. math::
	\begin{align}
	\textbf{H}&\gets \textbf{H}\odot\frac{\textbf{W}^T\left[(\textbf{W}\textbf{H})^{\beta-2}\odot\textbf{V}\right]}
	{\textbf{W}^T(\textbf{W}\textbf{H})^{\beta-1}}\label{upH}\\
	\textbf{W}&\gets \textbf{W}\odot\frac{\left[(\textbf{W}\textbf{H})^{\beta-2}\odot\textbf{V}\right]\textbf{H}^T }
	{(\textbf{W}\textbf{H})^{\beta-1}\textbf{H}^T}\mathrm{;}\label{upW}
	\end{align}

where :math:`\odot` is the element-wise product (Hadamard product) and division and power are element-wise. The matrices :math:`\textbf{H}` and :math:`\textbf{W}` are then updated according to an alternating update scheme.

Download
++++++++
Source code available at https://github.com/rserizel/beta_nmf

Getting Started
+++++++++++++++
A short example is available as at https://github.com/rserizel/beta_nmf/blob/master/BetaNMF_howto.ipynb

Citation
++++++++
If you are using this source code please consider citing the following paper: 

.. topic:: Reference

	R. Serizel, S. Essid, and G. Richard. “Mini-batch stochastic approaches for accelerated multiplicative updates in nonnegative matrix factorisation with beta-divergence”. Accepted for publication In *Proc. of MLSP*, p. 5, 2016.


.. topic:: Bibtex

	@inproceedings{serizel2016batch,

  	title={Mini-batch stochastic approaches for accelerated multiplicative updates in nonnegative matrix factorisation with beta-divergence},

  	author={Serizel, Romain and Essid, Slim and Richard, Ga{\"e}l},

  	booktitle={IEEE International Workshop on Machine Learning for Signal Processing (MLSP)},

  	pages={5},

  	year={2016},

  	organization={IEEE}
	}


References
++++++++++

.. [1] D. D. Lee and H. S. Seung, “Learning the parts of objects by non-negative matrix factorization.,” *Nature*, vol. 401, no. 6755, pp. 788–791, 1999.

.. [2] A. Basu, I. R. Harris, N. L. Hjort, and M. C. Jones, “Robust and efficient estimation by minimising a density power divergence,” *Biometrika*, vol. 85, no. 3, pp. 549–559, 1998.

.. [3] A. Cichocki and S. Amari, “Families of alpha-beta-and gamma-divergences: Flexible and robust measures of similarities,” *Entropy*, vol. 12, no. 6, pp. 1532–1568, 2010.

.. [4] S. Eguchi and Y. Kano, “Robustifing maximum likelihood estimation,” Research Memo 802, Institute of Statistical Mathematics, June 2001.

.. [5] S. Kullback and R. A. Leibler, “On information and sufficiency,” *The annals of mathematical statistics*, pp. 79–86, 1951.

.. [6] F. Itakura, “Minimum prediction residual principle applied to speech recognition,” *IEEE Transactions on Acoustics, Speech and Signal Processing*, vol. 23, no. 1, pp. 67–72, 1975.

.. [7] C. Févotte and J. Idier, “Algorithms for nonnegative matrix factorization with the beta-divergence,” *Neural Computation*, vol. 23, no. 9, pp. 2421–2456, 2011.

.. [8] N. Gillis, “The why and how of nonnegative matrix factorization,” in *Regularization, Optimization, Kernels, and Support Vector Machines*, M. Signoretto J.A.K. Suykens and A. Argyriou, Eds., Machine Learning and Pattern Recognition Series, pp. 257 – 291. Chapman & Hall/CRC, 2014.