next up previous
Next: Data Analysis Up: Methods for Predicting Movie Previous: Missing Value Prediction Using

Missing value prediction based on Singular Value Decomposition (SVD)

SVD is a popular matrix factorization technique which has been used a lot in data mining, especially in dimensionality reduction. Given a $ m \times n$ matrix $ R$, SVD decomposes that matrix into:

$\displaystyle R = U \cdot S \cdot V^{\prime}
$

where U and V are two orthogonal matrices of size $ m \times m$ and $ n \times n$ respectively, and $ S$ is a diagonal matrix of size $ m \times n$. All diagonal entries of $ S$ are non-negative and called the singular values of matrix $ R$. The diagonal entries of $ S$ are often sorted in a decreasing order. The most important property of SVD is that we can take the first $ k$ largest singular values and construct an approximated matrix of rank-$ k$ of $ R$ as following:

$\displaystyle R_k = U_k \cdot S_k \cdot V_k^{\prime}
$

where $ U_k$, $ S_k$, and $ V_k$ are reduced matrices of $ U$, $ S$, $ V$ respectively. It has been shown that $ R_k$ is the best rank-k approximation of the matrix $ R$ in case of squared error loss. In other words, $ R_k$ is the one that minimizes the Frobenius norm $ \vert\vert R - R_k\vert\vert$ = $ \sum_i\sum_j{[R(i,j) - R_k(i,j)]^2}$. So considering the matrix $ R$ as our rating matrix, we can use the values of $ R_k$ to approximate the values of $ R$. The assumption of this approach is that there are latent relationships between users and movies which affect the rating of a user for a given movie. Specifically, we assume there are a set of $ k$ factors which decide how a user rates a movie, and these factors can be captured by the rank-$ k$ SVD. In [6],[7], the authors has shown that the performance of this approach is comparable to other collaborative filtering methods. However, one problem with this approach is that it is computationally expensive and is not scalable to large dataset. So instead of performing this approach on the original rating matrix, we first run the co-clustering algorithm to partition the original matrix into blocks and apply this approach on each block. Another problem with this approach is that the original rating matrix $ R$ has a lot of missing entries which make it difficult to compute the rank-$ k$ SVD of $ R$. One way to solve this problem is to fill in those missing values with some reasonable values. In [6],[7], the authors use the average ratings of movies to fill in those missing values. We have tried that approach but the performance is very bad. So we propose to use the predicting values based on co-clustering as the guessing values for those missing values in the original rating matrix $ R$. Then we compute the rank-$ k$ approximation $ R_k$ of $ R$ based on SVD, and use it for predicting the final values of those missing entries in $ R$. In summary, the following steps are performed on each cocluster of the original rating matrix:
next up previous
Next: Data Analysis Up: Methods for Predicting Movie Previous: Missing Value Prediction Using
Tuyen Huynh 2007-05-09