2022-05-24

Machine Learning Question on missing values in training and test data

I'm training a text classifier for binary classification. In my training data, there are null values in the .csv file in the text portion, and there are also null values in my test file. I have converted both files to a dataframe (Pandas). This is a small percentage of the overall data (less than 0.01).

Knowing this - is it better to replace the null text fields with an empty string or leave it as as empty? And if the answer is replace with empty string, is it "acceptable" to do the same for the test csv file before running it against the model?



No comments:

Post a Comment