2021-12-27

Xgboost multiclass classification map the probabilities to the labels

I am using the xgboost multiclass classifier as outlined in the example below. For each row in the X_test dataframe the model outputs a list with the list elements being the probability corresponding to each category 'a','b','c' or 'd' e.g. [0.44767836 0.2043365 0.15775423 0.19023092].

How can I tell which element in the list corresponds to which class / cateogry (a,b,c or d)? My goal is to create 4 extra columns on the dataframe a,b,c,d with the matching probability as the row value in each column.

import numpy as np
import pandas as pd
import xgboost as xgb
import random
from sklearn import preprocessing
from sklearn.model_selection import train_test_split

#Create Example Data
np.random.seed(312)
data = np.random.random((10000, 3))
y = [random.choice('abcd') for _ in range(data.shape[0])]

features = ["x1", "x2", "x3"]
df = pd.DataFrame(data=data, columns=features)
df['y']=y

#Encode target variable
labelencoder = preprocessing.LabelEncoder()
df['y_target'] = labelencoder.fit_transform(df['y'])
    
#Train Test Split    
X_train, X_test, y_train, y_test = train_test_split(df[features], df['y_target'], test_size=0.2, random_state=42, stratify=y)

#Train Model
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)

param = {        'objective':'multi:softprob',
                 'random_state': 20,
                 'tree_method': 'gpu_hist',
                 'num_class':4
                }

xgb_model = xgb.train(param, dtrain, 100)

predictions=xgb_model.predict(dtest)

print(predictions)


from Recent Questions - Stack Overflow https://ift.tt/3H9mtDD
https://ift.tt/eA8V8J

No comments:

Post a Comment