Transform¶

class implicitmf.transform.Transformer(user_item_dict, full_matrix=True)¶

Transform fetched results into sparse matrix.

Parameters:

item_sub_dict (dict) – a dictionary of lists of tuples containing distinct pairs of ids, distinct user ids, and distinct item ids
full_matrix (boolean) – Default is True. Determines whether matrix will be an “out matrix” or “in matrix”.

Variables:

user_item_score (list) – list of tuples of length three, where first item in tuple is user_id second is item_id, third is score
user_mapper (dict) – keys are user_ids and values are indices along user axis in user item matrix
item_mapper (dict) – keys are item_ids and values are indices along item axis in user item matrix
user_inv_mapper (dict) – keys are indices along user axis in user item matrix and values are user_ids
item_inv_mapper (dict) – keys are indices along item axis in user item matrix and values are item_ids

Examples

>>> from implicitmf.transform import Transformer
>>> user_item_dict, _ = gen_fetched_data()
>>> t = Transformer(user_item_dict)
>>> X = t.to_sparse_array(arr_type='csr_matrix')
... X.shape
(u, i) where u is the number of distinct users and i is the number of distinct items

to_sparse_array(arr_type='csr_matrix')¶

Transforms provided data into scipy sparse array

Parameters:	type (str) – a string indicating type of sparse array returned (only supports csr_matrix)
Returns:	utility matrix of shape (u,i) where u represents number of distinct users and i represents number of distinct items
Return type:	scipy.sparse.csr_matrix

Pre-process¶

implicitmf.preprocess.normalize_X(X, norm_type)¶

Normalizes the X matrix using either tfidf or bm25. Wrapper for tfidf_weight and bm25_weight functions from the implicit.nearest_neighbours module.

Parameters:	X (scipy.sparse.csr_matrix) – sparse matrix of shape (n_users, n_collections) norm_type (str) – can be either “bm25” or tfidf
Returns:	Normalized sparse csr matrix
Return type:	scipy.sparse.csr_matrix

References

[1]	bm25 and tfidf explanation: https://www.benfrederickson.com/distance-metrics/

[2]	https://github.com/benfred/implicit/blob/master/implicit/evaluation.pyx

Validation¶

In order to validate the performance of a recommender system, we must first split the dataset, X, into X_train and X_validate. The traditional approach to train_test_split is to split dataset X either by row or column, thus resulting in a training set and validation set of different dimensions. However, in recommendation systems, we perform train_test_split by “masking” a proportion of user-collection interactions during the training phase then calculating precision@k by comparing predicted recommendations on X_train against the original X matrix.

implicitmf.validation.cross_val_folds() and implicitmf.validation.gridsearchCV() both use the “masked-out” approach to split data.

implicitmf.validation.cross_val_folds(X, n_folds, seed=None)¶

Generates cross validation folds using provided utility matrix

Parameters:	X (scipy.sparse.csr_matrix) – utility matrix of shape (u, i) where u is number of users and i is number of items n_folds (int) – number of folds to create seed (int) – random seed for use by np.random.choice
Returns:	dictionary of length n_folds
Return type:	dict

Example

>>> output = cross_val_folds(X, n_folds=3, seed=42)
... print(output)
{0: {'train': X_train, 'test': X_test},
1: {'train': X_train, 'test': X_test},
2: {'train': X_train, 'test': X_test}}

implicitmf.validation.gridsearchCV(base_model, X, n_folds, hyperparams)¶

Performs exhaustive gridsearch cross-validation to identify the optimal hyperparemters of a model.

Parameters:	base_model (model object) – X (scipy.sparse.csr_matrix) – n_folds (int) – number of folds for cross-validation hyperparams (dict) – hyperparameter values of interest
Returns:	dataframe with mean_score, max_score, min_score for each combination of hyperparmeter values
Return type:	pandas.DataFrame

References

[1]	scikit-learn’s GridSearchCV: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/model_selection/_search.py

Post-process¶

implicitmf.postprocess.remove_subscribed_items(rec_dict, user_sub_dict, unwanted_items=None)¶

Filters out already-subscribed collections from recommendations list for each user.

Parameters:	rec_dict (dict) – dictionary with user id as the key and list of recommended items as the value user_sub_dict (dict) – dictionary with user id as the key and list of item subscriptions as the value unwanted_items (list) – list of additional items to remove from the recommendation list
Returns:	dictionary with recommended items that users have not subscribed to
Return type:	dict