Next: Methods for Predicting Movie
Up: Using Co-clustering for Predicting
Previous: Using Co-clustering for Predicting
Prediction of missing values has recently found many practical applications in sales forecasting and recommendation systems. It is a main task in information filtering where the goal is to identify the relevance of an item such as a movie or a book to a given user based on the user profile and/or known preferences of the other users. We can also view these input data as a matrix where each row represents for the ratings of one user and one column corresponds to a movie. The task now is to impute missing entries of the matrix.
The recent study on Bregman co-clustering [1][3] has proposed a method for simultaneously partitioning rows and columns of such a data matrix and then using clustering results to predict those missing entries. By incorporating both row and column clustering information, a kind of statistical regularization technique, co-clustering can yield better quality clusters even only single-sided clustering results are needed. In addition, co-clustering is more scalable than traditional single-sided clustering [1].
In this project, we explore the combination of co-clustering and other methods for missing value prediction on a particular subset of the original Netflix dataset. The original is a huge dataset which contains over 100 million of ratings from 480 thousands Netflix users over 17 thousand movies. Each rating is on a scale from 1 to 5. Since this dataset is too large, a subset of it is sampled for the pilot study. In this project based on the subset dataset we would like to initially answer two following questions. The first one is how well the Bregman co-clustering performs on the Netflix dataset? The second one is whether we can use the co-clustering as an intermediate step for partitioning the data into blocks, then applying some expensive but more accurate methods such as using SVD on co-clusters of data to achieve higher performance? The intuition of this idea is that the co-clustering method can capture the favorite relations between groups of users and groups of movies. Therefore, this approach does not only make the dataset scalable to SVD, but it may also help to improve the predicting performance.
Next: Methods for Predicting Movie
Up: Using Co-clustering for Predicting
Previous: Using Co-clustering for Predicting
Tuyen Huynh
2007-05-09