2021-11-29

Based on the ratings given by the users for the movies, compute the similarities between all users [closed]

I have two datasets. rating and movie.

Rating:-

UserID  MovieID Rating  Timestamp
0   1   122     5.0    838985046
1   10  185     3.0    838983525
2   2   231     1.0    838983392
3   8   292     5.0    838983421
4   1   316     4.0    838983392
5   5   329     3.0    838983392
6   3   355     2.0    838984474
7   7   356     1.0    838983653
8   6   362     5.0    838984885
9   4   364     2.5    838983707

Movie:-

MovieID Title   Genres
0   1   Toy Story (1995)    Adventure|Animation|Children|Comedy|Fantasy
1   2   Jumanji (1995)  Adventure|Children|Fantasy
2   3   Grumpier Old Men (1995) Comedy|Romance
3   4   Waiting to Exhale (1995)    Comedy|Drama|Romance
4   5   Father of the Bride Part II (1995)  Comedy
5   6   Heat (1995) Action|Crime|Thriller
6   7   Sabrina (1995)  Comedy|Romance
7   8   Tom and Huck (1995) Adventure|Children
8   9   Sudden Death (1995) Action
9   10  GoldenEye (1995)    Action|Adventure|Thriller

now I need to find out similaries between all the users based on the rating given by them.

Below is what I have done so far:-

import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

rating= "ratings.dat"
names_r=['UserID','MovieID','Rating','Timestamp']
ratings = pd.read_csv(rating, names=names_r, sep = '::')

movie = "movies.dat"
names_m=['MovieID','Title','Genres']
movies = pd.read_csv(movie, names=names_m, sep = '::')

merged_df=ratings.merge(movies, on='MovieID')

merged_df.drop('Timestamp', axis=1, inplace=True)

after that I am confused how to calculate similaries between all the users.



from Recent Questions - Stack Overflow https://ift.tt/3l7ukce
https://ift.tt/eA8V8J

No comments:

Post a Comment