![]() |
![]() |
To compare the performance of our methods, we also compute the RMSE and MAE of the baseline approach which predicts all missing values by the average rating of all known ratings in the training set. The average rating of the training set is 3.5997. Then RMSE and MAE of the baseline are 1.0834 and 0.90881 respectively. From all the graphs we find that the relative performance of all methods does not depend much the evaluation metrics, i.e we can use either of them for comparing different methods. Theoretically, RMSE is harder than MAE since it penalizes a error rating more than MAE does. For the following discussion, therefore, we use RMSE, the evaluation metric in the Netflix contest, as the one to compare the different schemes and methods.
The first observation is that both square Euclidean distance and I-divergence give us the same performance. We can see that there is no significant change between Figure 7 and Figure 9.
The second comment is on the scheme 6 where the missing value is predicted based on the combination of the average rating per row in a column cluster, the average rating per column in a row cluster, and the cocluster average. Its performance is best when the number of row and column clusters are small. The reason behind this phenomenon is that when we increase the numbers of row and columns clusters, many coclusters become so sparse which lead to empty rows and columns in many coclusters. In that case, we use the global average rating as the average rating for those rows and columns in that cocluster. That's why the performance of the scheme 6 on the 25x25 setting is very bad ,its RMSE = 1.0450, which is very close to the baseline performance. The performance of the scheme 6 on the setting 2x2 is quite interesting since it is relative comparative to the best performance of the scheme 3 and 5 which require large number of row and column clusters.
Among all other schemes, the scheme 3 is best at the small size of coclusters which means the numbers of row and column clusters are high: 15x15, 20x20, and 25x25. On the other hand, the scheme 5 is best at the larger size of coclusters: 2x2, 5x5, and 10x10. Another observation is that all methods work better if we increase the number of row and column clusters. However, this improvement is stopped when the cocluster setting exceeds 20x20.
Figure 11 and 12 also indicate that the use of the initial clustering results from Graclus for co-clustering does not make any significant improvement. We also use the Graclus results for predicting missing value, but the performance is worse than the one with co-clustering algorithm.
Here we list three result clusters with the scheme 3 on the setting 20x20 (Table 3,4,5). The clusters show the ability of the co-clustering method which can detect the relevant groups of movies with the same genres. The cluster 13 contains all parts of 'Lord of the Rings' while the cluster 3 captures many gangsters movies. However, this result is not perfect since not all Star Wars parts are included in the cluster 13.