Having the presentation out of the way feels like a weight off my shoulders, but I'm still not quite satisfied with my project.
I spent a lot of time throughout the semester trying to research the "right" way to create a recommender, and not enough time creating a recommender. I learned very quickly during the last half of the work period of the semester that simply diving in is the best way to find ways to optimize the system.
I now have a functional infrastructure: the user interface allows user actions to invoke processes in the database and with the recommendation service, and the recommendation service returns meaningful data about movies the user should try out. That was really my base goal. But now I'd really like to optimize the recommender.
When user's rating predictions are calculated, there tend to about 2000-4000 predictions made, and there are a lot of ties for high ratings. That's a lot of potential choices, so there's no way its as accurate as a user would like.
The first way I'm going to optimize is through dimension reduction. This method uses a threshold to determine whether or not an item belongs in the pool of recommendable items. Essentially, if an item does not have about half of the total amount of possible ratings, it is removed from the item-rating matrix before similarities are calculated. This reduces time cost and also provides movies that are more relevant.
The second way I'm going to optimize is by adding a layer of comparison into the nearest neighbors portion of the prediction process. I can take into account the genre when providing K top predictions by ranking the genres for a given user and providing that ranking as criteria for the KNN choosing.
I think that implementing both of these ideas will result in a clear improvement upon the current iteration. I'd also like to use some type of accuracy function by comparing predicted ratings with those the user has already rated - but I'm not sure I'll have that done before my final defense.
I spent a lot of time throughout the semester trying to research the "right" way to create a recommender, and not enough time creating a recommender. I learned very quickly during the last half of the work period of the semester that simply diving in is the best way to find ways to optimize the system.
I now have a functional infrastructure: the user interface allows user actions to invoke processes in the database and with the recommendation service, and the recommendation service returns meaningful data about movies the user should try out. That was really my base goal. But now I'd really like to optimize the recommender.
When user's rating predictions are calculated, there tend to about 2000-4000 predictions made, and there are a lot of ties for high ratings. That's a lot of potential choices, so there's no way its as accurate as a user would like.
The first way I'm going to optimize is through dimension reduction. This method uses a threshold to determine whether or not an item belongs in the pool of recommendable items. Essentially, if an item does not have about half of the total amount of possible ratings, it is removed from the item-rating matrix before similarities are calculated. This reduces time cost and also provides movies that are more relevant.
The second way I'm going to optimize is by adding a layer of comparison into the nearest neighbors portion of the prediction process. I can take into account the genre when providing K top predictions by ranking the genres for a given user and providing that ranking as criteria for the KNN choosing.
I think that implementing both of these ideas will result in a clear improvement upon the current iteration. I'd also like to use some type of accuracy function by comparing predicted ratings with those the user has already rated - but I'm not sure I'll have that done before my final defense.