Introduction
Today, movies are basically the most popular type of art and entertainment for us. People go to the cinema to watch a movie or buy it on Amazon, and give reviews and ratings on it. But for movie makers, movies are a type of business. Some blockbusters like “Avatar” have earned billions of dollars, while a lot of other movies are not as lucky as the highest-grossing ones. So we may wonder how to make a hit movie. We will try to use data analysis to answer this question. From the matadata, reviews, and ratings of movies, we would like to find factors and features that make a movie to become best seller on Amazon or take high revenues from the box office.
Dataset
For the project, the datasets we analyse are as follows:
-
Amazon reviews dataset contains totally 142.8 million product reviews (ratings, text, votes) and related metadata from Amazon. Among the huge data, we mainly use the ‘Movies and TV’ part for the implementation of our ideas.
-
TMDB movie dataset is a dataset we find on Kaggle. It contains metadata on thousands of movies, including the plot, cast, crew, budget, vote, revenues, etc.
Research questions
We’d like to answer these questions about movies and Amazon reviews:
- What is the trend and evolution of movie ratings according to Amazon?
- What is the correlation between TMDB vote and Amazon reviews?
- Do good reviews and ratings lead to high revenues?
- What are the factors and features that affect the ratings and sales of movies?