Based on the ratings given by the users for the movies, compute the similarities between all users [closed]
I have two datasets. rating and movie.
Rating:-
UserID MovieID Rating Timestamp
0 1 122 5.0 838985046
1 10 185 3.0 838983525
2 2 231 1.0 838983392
3 8 292 5.0 838983421
4 1 316 4.0 838983392
5 5 329 3.0 838983392
6 3 355 2.0 838984474
7 7 356 1.0 838983653
8 6 362 5.0 838984885
9 4 364 2.5 838983707
Movie:-
MovieID Title Genres
0 1 Toy Story (1995) Adventure|Animation|Children|Comedy|Fantasy
1 2 Jumanji (1995) Adventure|Children|Fantasy
2 3 Grumpier Old Men (1995) Comedy|Romance
3 4 Waiting to Exhale (1995) Comedy|Drama|Romance
4 5 Father of the Bride Part II (1995) Comedy
5 6 Heat (1995) Action|Crime|Thriller
6 7 Sabrina (1995) Comedy|Romance
7 8 Tom and Huck (1995) Adventure|Children
8 9 Sudden Death (1995) Action
9 10 GoldenEye (1995) Action|Adventure|Thriller
now I need to find out similaries between all the users based on the rating given by them.
Below is what I have done so far:-
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
rating= "ratings.dat"
names_r=['UserID','MovieID','Rating','Timestamp']
ratings = pd.read_csv(rating, names=names_r, sep = '::')
movie = "movies.dat"
names_m=['MovieID','Title','Genres']
movies = pd.read_csv(movie, names=names_m, sep = '::')
merged_df=ratings.merge(movies, on='MovieID')
merged_df.drop('Timestamp', axis=1, inplace=True)
after that I am confused how to calculate similaries between all the users.
from Recent Questions - Stack Overflow https://ift.tt/3l7ukce
https://ift.tt/eA8V8J
Comments
Post a Comment