Since I learned about matrix factorization techniques I have been excited by the power of recommender systems. It’s amazing that ratings data alone can illuminate which movies are similar, yet options for exploring similar movies based on a given movie are limited. Basically recommender systems are black boxes. Moreover, what happens when you want to watch a movie with a friend or partner? Netflix doesn’t have a way to explore recommendations based on a starting movie you both agree on. With these concerns in mind, I built a simple recommender system using Dash. Please note that it’s slow to load because I’m using a free version of Heroku.
To arrive at similarities for each movie, I used the surprise python package to build a similarity matrix capturing how similar each movie is to every other movie. As this matrix is very large (with n^2 entries) I employed principal component analysis (PCA) with 10 components to create a more manageable dataset. I found that 10 components was a good balance of complexity and predictive power. When a user selects a movie from the dropdown, movies with the most similar vector representations (from the PCA of the similarity matrix) are displayed. As much of the impetus for this project was to be able to identify recommendations based on multiple starting movies, I thought a lot about how to go two or more movies, to a list of similar movies, but ultimately settled on simply averaging the vectors of the selected movies.