2020-10-29

How to run linear regression of a masked array

I am trying to run a linear regression on two masked arrays. Unfortunately, linear regression ignores the masks and regresses all variables. My data has some -9999 values where values where our instrument did not measure any data. These -9999 values produce a line that does not fit the data at all.

My code is this:

from sklearn.linear_model import LinearRegression
import numpy as np
import matplotlib.pyplot as plt

x = np.array( [ 2.019, 1.908, 1.902, 1.924, 1.891, 1.882, 1.873, 1.875, 1.904,
            1.886, 1.891, 2.0, 1.902, 1.947,2.0280, 1.95, 2.342, 2.029,
            2.086, 2.132, 2.365, 2.169, 2.121, 2.192,2.23, -9999, -9999, -9999, -9999,
            1.888, 1.882, 2.367 ] ).reshape((-1,1))
 
y = np.array( [ 0.221, 0.377, 0.367, 0.375, 0.258, 0.16 , 0.2  , 0.811,
          0.330, 0.407, 0.421, -9999, 0.605, 0.509, 1.126, 0.821,
          0.759, 0.812, 0.686, 0.666, 1.035, 0.436, 0.753, 0.611,
          0.657, 0.335, 0.231, 0.185, 0.219, 0.268, 0.332, 0.729 ] )

    
model = LinearRegression().fit(x, y )

r_sq = model.score( x, y )

print( 'coefficient of determination:', r_sq)
print( 'intercept:', model.intercept_)
print( 'slope:', model.coef_)

x_line = np.linspace (x.min(), x.max(), 11000)
y_line = (model.coef_* x_line) + model.intercept_
fig, ax1 = plt.subplots( figsize = ( 10, 10) )
plt.scatter( x, y )
plt.plot( x_line, y_line )
plt.show()

Which gives us this scatter plot with the regression plotted. Note: most of the values are in the upper right hand corner...they're too close together to differentiate.

Is there a way to run the regression while ignoring the masked -9999 values?



from Recent Questions - Stack Overflow https://ift.tt/31TeQPI
https://ift.tt/eA8V8J

No comments:

Post a Comment