2021-01-28

Why do np.corrcoef(x) and df.corr() give different results?

Why the numpy correlation coefficient matrix and the pandas correlation coefficient matrix different when using np.corrcoef(x) and df.corr()?

x = np.array([[0, 2, 7], [1, 1, 9], [2, 0, 13]]).T
x_df = pd.DataFrame(x)
print("matrix:")
print(x)
print()
print("df:")
print(x_df)
print()

print("np correlation matrix: ")
print(np.corrcoef(x))
print()
print("pd correlation matrix: ")

print(x_df.corr())
print()

Gives me the output

matrix:
[[ 0  1  2]
 [ 2  1  0]
 [ 7  9 13]]

df:
   0  1   2
0  0  1   2
1  2  1   0
2  7  9  13

np correlation matrix: 
[[ 1.         -1.          0.98198051]
 [-1.          1.         -0.98198051]
 [ 0.98198051 -0.98198051  1.        ]]

pd correlation matrix: 
          0         1         2
0  1.000000  0.960769  0.911293
1  0.960769  1.000000  0.989743
2  0.911293  0.989743  1.000000

I'm guessing they are different types of correlation coefficients?



from Recent Questions - Stack Overflow https://ift.tt/2KYnk2O
https://ift.tt/eA8V8J

No comments:

Post a Comment