Documentation

beta_nmf_minibatch.py

Contents

The beta_nmf_minibatch module includes the betaNMF class, fit function and theano functions to compute updates and cost.

class beta_nmf_minibatch.BetaNMF(data_shape, n_components=50, beta=2, n_iter=50, fixed_factors=None, cache1_size=0, batch_size=100, verbose=0, init_mode='random', W=None, H=None, solver='mu_batch', nb_batch_w=1, sag_memory=0)[source]

BetaNMF class

Performs nonnegative matrix factorization with mini-batch multiplicative updates. GPGPU implementation based on Theano.

Parameters:

data_shape : tuple composed of integers

The shape of the data to approximate

n_components : positive integer

The number of latent components for the NMF model

beta: arbitrary float (default 2).

The beta-divergence to consider. Particular cases of interest are
  • beta=2 : Euclidean distance
  • beta=1 : Kullback Leibler
  • beta=0 : Itakura-Saito

n_iter: positive integer

number of iterations

fixed_factors: array of intergers

Indexes of the factors that are kept fixed during the updates * [0] : corresponds to fixed H * [1] : corresponds to fixed W

cache1_size: integer

Size (in frames) of the first data cache. The size is reduced to the closest multiple of the batch_size. If set to zero the algorithm tries to fit all the data in cache

batch_size: integer

Size (in frames) of the batch for batch processing. The batch size has an impact on the parrelization and the memory needed to store partial gradients (see Schmidt et al.)

verbose: integer

The numer of iterations to wait between two computation and printing of the cost

init_mode : string (default ‘random’)

  • random : initalise the factors randomly
  • custom : intialise the factors with custom value

W : array (optionnal)

Initial wvalue for factor W when custom initialisation is used

H : array (optionnal)

Initial wvalue for factor H when custom initialisation is used

solver : string (default ‘mu_batch’)

  • mu_batch : mini-batch version of the MU updates.

(fully equivalent to standard NMF with MU).

  • asg_mu : Asymetric stochatistic gradient for MU [1]
  • gsg_mu : Greedy stochatistic gradient for MU [1]
  • asag_mu : Asymetric stochatistic average gradient [2] for MU [1]
  • gsag_mu : Greedy stochatistic average gradient [2] for MU [1]

nb_batch_w : interger (default 1)

number of batches on which W updates is computed * 1 : greedy approaches [1]

sag_memory : integer (default 0)

number of batches used to compute the average gradient * 0 : SG approaches * nb_batches : SAG approaches

References

[1]R. Serizel, S. Essid, and G. Richard. “Mini-batch stochastic approaches for accelerated multiplicative updates in nonnegative matrix factorisation with beta-divergence”. Accepted for publication In Proc. of MLSP, p. 5, 2016.
[2]Schmidt, M., Roux, N. L., & Bach, F. (2013). Minimizing finite sums with the stochastic average gradient https://hal.inria.fr/hal-00860051/PDF/sag_journal.pdf

Attributes

nb_cache1 (integer) number of caches needed to fill the full data
forget_factor (float) forgetting factor for SAG
scores (array) reconstruction cost and iteration time for each iteration
factors_ (list of arrays) The estimated factors
w (theano tensor) factor W
h_cache1 (theano tensor) part of the factor H in cache1
x_cache1 (theano tensor) data cache

Methods

check_shape() Check that all the matrix have consistent shapes
fit(data[, cyclic, warm_start]) Learns NMF model
get_div_function() compile the theano-based divergence function
get_gradient_mu_batch() compile the theano based gradient functions for mu
get_gradient_mu_sag() compile the theano based gradient functions for mu_sag algorithms
get_gradient_mu_sg() compile the theano based gradient functions for mu_sg algorithms
get_updates() compile the theano based update functions
init() Initialise theano variable to store the gradients
prepare_batch([randomize]) Arrange data for batches
prepare_cache1([randomize]) Arrange data for to fill cache1
set_factors(data[, W, H, fixed_factors]) Re-set theano based parameters according to the object attributes.
transform(data[, warm_start]) Project data X on the basis W
update_mu_batch_h(batch_ind, update_func, ...) Update h for current batch with standard MU
update_mu_batch_w(udpate_func) Update W with standard MU
update_mu_sag(batch_ind, update_func, grad_func) Update current batch with SAG based algorithms
check_shape()[source]

Check that all the matrix have consistent shapes

fit(data, cyclic=False, warm_start=False)[source]

Learns NMF model

Parameters:

data : ndarray with nonnegative entries

The input array

cyclic : Boolean (default False)

pick the sample cyclically

warm_start : Boolean (default False)

start from previous values

get_div_function()[source]

compile the theano-based divergence function

get_gradient_mu_batch()[source]

compile the theano based gradient functions for mu

get_gradient_mu_sag()[source]

compile the theano based gradient functions for mu_sag algorithms

get_gradient_mu_sg()[source]

compile the theano based gradient functions for mu_sg algorithms

get_updates()[source]

compile the theano based update functions

init()[source]

Initialise theano variable to store the gradients

prepare_batch(randomize=True)[source]

Arrange data for batches

Parameters:

randomize : boolean (default True)

Randomise the data (time-wise) before preparing batch indexes

prepare_cache1(randomize=True)[source]

Arrange data for to fill cache1

Parameters:

randomize : boolean (default True)

Randomise the data (time-wise) before preparing cahce indexes

set_factors(data, W=None, H=None, fixed_factors=None)[source]

Re-set theano based parameters according to the object attributes.

Parameters:

W : array (optionnal)

Value for factor W when custom initialisation is used

H : array (optionnal)

Value for factor H when custom initialisation is used

fixed_factors : array (default Null)

list of factors that are not updated

e.g. fixed_factors = [0] -> H is not updated

fixed_factors = [1] -> W is not updated

transform(data, warm_start=False)[source]

Project data X on the basis W

Parameters:

X : array

The input data

warm_start : Boolean (default False)

start from previous values

Returns:

H : array

Activations

update_mu_batch_h(batch_ind, update_func, grad_func)[source]

Update h for current batch with standard MU

Parameters:

batch_ind : array with 2 elements

batch_ind[0]:batch start
batch_ind[1]:batch end

update_func : Theano compiled function

Update function

grad_func : Theano compiled function

Gradient function

update_mu_batch_w(udpate_func)[source]

Update W with standard MU

Parameters:

update_func : Theano compiled function

Update function

update_mu_sag(batch_ind, update_func, grad_func)[source]

Update current batch with SAG based algorithms

Parameters:

batch_ind : array with 2 elements

batch_ind[0]:batch start
batch_ind[1]:batch end

update_func : Theano compiled function

Update function

grad_func : Theano compiled function

Gradient function

base.py

Contents

The base module includes the basic functions such as to load data, annotations, to normalize matrices and generate nonnegative random matrices

base.get_norm_col(w)[source]

returns the norm of a column vector

Parameters:

w: 1-dimensionnal array

vector to be normalised

Returns:

norm: scalar

norm-2 of w

base.load_all_data(f_name, scale=True, rnd=False)[source]

Get data from from all sets stored H5FS file.

Parameters:

f_name : String

file name

scale : Boolean (default True)

scale data to unit variance (scikit-learn function)

rnd : Boolean (default True)

randomize the data along time axis

Returns:

data_dic : Dictionnary

dictionary containing the data

x_train:

numpy array

train data matrix

x_test:

numpy array

test data matrix

x_dev:

numpy array

dev data matrix

base.load_all_data_labels(f_name, scale=True, rnd=False)[source]

Get data with labels, for all sets.

Parameters:

f_name : String

file name

scale : Boolean (default True)

scale data to unit variance (scikit-learn function)

rnd : Boolean (default True)

randomize the data along time axis

Returns:

data_dic : Dictionnary

dictionary containing the data

x_train:

numpy array

train data matrix

x_test:

numpy array

test data matrix

x_dev:

numpy array

dev data matrix

y_train:

numpy array

train labels vector

y_test:

numpy array

test labels vector

y_dev:

numpy array

dev labels vector

base.load_all_data_labels_fids(f_name, scale=True, rnd=False)[source]

Get data with labels and file ids for all sets.

Parameters:

f_name : String

file name

scale : Boolean (default True)

scale data to unit variance (scikit-learn function)

rnd : Boolean (default True)

randomize the data along time axis

Returns:

data_dic : Dictionnary

dictionary containing the data

x_train:

numpy array

train data matrix

x_test:

numpy array

test data matrix

x_dev:

numpy array

dev data matrix

y_train:

numpy array

train labels vector

y_test:

numpy array

test labels vector

y_dev:

numpy array

dev labels vector

f_train:

numpy array

train file ids vector

f_test:

numpy array

test file ids vector

f_dev:

numpy array

dev file ids vector

base.load_all_fids(f_name)[source]

Get file ids for all sets.

Parameters:

f_name : String

file name

Returns:

fids_dic : Dictionnary

dictionary containing the data

f_train:

numpy array

train file ids vector

f_test:

numpy array

test file ids vector

f_dev:

numpy array

dev file ids vector

base.load_all_labels(f_name)[source]

Get labels for all sets.

Parameters:

f_name : String

file name

Returns:

lbl_dic : Dictionnary

dictionary containing the data

y_train:

numpy array

train labels vector

y_test:

numpy array

test labels vector

y_dev:

numpy array

dev labels vector

base.load_data(f_name, dataset, scale=True, rnd=False)[source]

Get data from from a specific set stored H5FS file.

Parameters:

f_name : String

file name

dataset : String

name of the set to load (e.g., train, dev, test)

scale : Boolean (default True)

scale data to unit variance (scikit-learn function)

rnd : Boolean (default True)

randomize the data along time axis

Returns:

data_dic : Dictionnary

dictionary containing the data

data:

numpy array

data matrix

base.load_data_labels(f_name, dataset, scale=True, rnd=False)[source]

Get data with labels, for a particular set.

Parameters:

f_name : String

file name

dataset : String

name of the set to load (e.g., train, dev, test)

scale : Boolean (default True)

scale data to unit variance (scikit-learn function)

rnd : Boolean (default True)

randomize the data along time axis

Returns:

data_dic : Dictionnary

dictionary containing the data

x:

numpy array

data matrix

y:

numpy array

labels vector

base.load_data_labels_fids(f_name, dataset, scale=True, rnd=False)[source]

Get data with labels and file ids for a specific set.

Parameters:

f_name : String

file name

dataset : String

name of the set to load (e.g., train, dev, test)

scale : Boolean (default True)

scale data to unit variance (scikit-learn function)

rnd : Boolean (default True)

randomize the data along time axis

Returns:

data_dic : Dictionnary

dictionary containing the data

x:

numpy array

data matrix

y:

numpy array

labels vector

f:

numpy array

file ids vector

base.load_fids(f_name, dataset)[source]

Get file ids for a specific set.

Parameters:

f_name : String

file name

dataset : String

name of the set to load (e.g., train, dev, test)

Returns:

fids_dic : Dictionnary

dictionary containing the files ids

file_ids:

numpy array

file ids vector

base.load_labels(f_name, dataset)[source]

Get labels for a specific set.

Parameters:

f_name : String

file name

dataset : String

name of the set to load (e.g., train, dev, test)

Returns:

lbl_dic : Dictionnary

dictionary containing the labels

labels:

numpy array

labels vector

base.nnrandn(shape)[source]

generates randomly a nonnegative ndarray of given shape

Parameters:

shape : tuple

The shape

Returns:

out : array of given shape

The non-negative random numbers

base.norm_col(w, h)[source]

normalize the column vector w (Theano function). Apply the invert normalization on h such that w.h does not change

Parameters:

w: Theano vector

vector to be normalised

h: Ttheano vector

vector to be normalised by the invert normalistation

Returns:

w : Theano vector with the same shape as w

normalised vector (w/norm)

h : Theano vector with the same shape as h

h*norm

cost.py

Contents

The cost module regroups the cost functions used for the group NMF

costs.beta_div(X, W, H, beta)[source]

Compute beta divergence D(X|WH)

Parameters:

X : Theano tensor

data

W : Theano tensor

Bases

H : Theano tensor

activation matrix

beta : Theano scalar

Returns:

div : Theano scalar

beta divergence D(X|WH)

updates.py

Contents

The update module regroups the update functions used for the mini-batch NMF

updates.gradient_h(X, W, H, beta)[source]

Compute the gradient of the beta-divergence relatively to the factor H

Parameters:

X: theano tensor

Data matrix to be decomposed

W: theano tensor

Factor matrix containing the bases of the decomposition

H: theano tensor

Factor matrix containing the actiovations of the decomposition

beta: theano scalar

Coefficient beta for the beta-divergence Special cases: * beta = 1: Itakura-Saito * beta = 1: Kullback-Leibler * beta = 2: Euclidean distance

Returns:

grad_h: theano matrix

Gradient of the local beta-divergence with respect to H

updates.gradient_h_mu(X, W, H, beta)[source]

Compute the gradient of the beta-divergence relatively to the factor H Return positive and negative contribution e.g. for multiplicative updates

Parameters:

X: theano tensor

Data matrix to be decomposed

W: theano tensor

Factor matrix containing the bases of the decomposition

H: theano tensor

Factor matrix containing the actiovations of the decomposition

beta: theano scalar

Coefficient beta for the beta-divergence Special cases: * beta = 1: Itakura-Saito * beta = 1: Kullback-Leibler * beta = 2: Euclidean distance

Returns:

grad_h: theano matrix (T.stack(grad_h_pos, grad_h_neg))

grad_h_pos:Positive term of the gradient of the local beta-divergence with respect to H
grad_h_neg:Positive term of the gradient of the local beta-divergence with respect to H
updates.gradient_w(X, W, H, beta)[source]

Compute the gradient of the beta-divergence relatively to the factor W

Parameters:

X: theano tensor

Data matrix to be decomposed

W: theano tensor

Factor matrix containing the bases of the decomposition

H: theano tensor

Factor matrix containing the actiovations of the decomposition

beta: theano scalar

Coefficient beta for the beta-divergence Special cases: * beta = 1: Itakura-Saito * beta = 1: Kullback-Leibler * beta = 2: Euclidean distance

Returns:

grad_w: theano matrix

Gradient of the local beta-divergence with respect to W

updates.gradient_w_mu(X, W, H, beta)[source]

Compute the gradient of the beta-divergence relatively to the factor W Return positive and negative contribution e.g. for multiplicative updates

Parameters:

X: theano tensor

Data matrix to be decomposed

W: theano tensor

Factor matrix containing the bases of the decomposition

H: theano tensor

Factor matrix containing the actiovations of the decomposition

beta: theano scalar

Coefficient beta for the beta-divergence Special cases: * beta = 1: Itakura-Saito * beta = 1: Kullback-Leibler * beta = 2: Euclidean distance

Returns:

grad_w: theano matrix (T.stack(grad_w_pos, grad_w_neg))

grad_w_pos:Positive term of the gradient of the local beta-divergence with respect to W
grad_w_neg:Positive term of the gradient of the local beta-divergence with respect to W
updates.mu_update(factor, gradient_pos, gradient_neg)[source]

Update the factor based on multiplicative rules

Parameters:

factor: theano tensor

The factor to be updated

gradient_pos: theano tensor

Positive part of gradient relatively to factor

gradient_neg: theano tensor

Negative part of gradient relatively to factor

Returns:

factor: theano matrix

New value of factor update with multiplicative updates

updates.mu_update_h(X, W, H, beta)[source]
Compute the gradient of the beta-divergence relatively to the factor H
and update H with multiplicative rules
Parameters:

X: theano tensor

Data matrix to be decompsed

W: theano tensor

Factor matrix containing the bases of the decomposition

H: theano tensor

Factor matrix containing the activations of the decomposition

beta: theano scalar

Coefficient beta for the beta-divergence Special cases: * beta = 1: Itakura-Saito * beta = 1: Kullback-Leibler * beta = 2: Euclidean distance

Returns:

H: theano matrix

New value of H updated with multiplicative updates

updates.update_grad_w(grad, grad_old, grad_new)[source]

Update the global gradient for W

Parameters:

grad: theano tensor

The global gradient

grad_old: theano tensor

The previous value of the local gradient

grad_new: theano tensor

The new version of the local gradient

Returns:

grad: theano tensor

New value of the global gradient