Xgboost multiclass classification map the probabilities to the labels

By Ritesh Sahu - December 27, 2021

I am using the xgboost multiclass classifier as outlined in the example below. For each row in the X_test dataframe the model outputs a list with the list elements being the probability corresponding to each category 'a','b','c' or 'd' e.g. [0.44767836 0.2043365 0.15775423 0.19023092].

How can I tell which element in the list corresponds to which class / cateogry (a,b,c or d)? My goal is to create 4 extra columns on the dataframe a,b,c,d with the matching probability as the row value in each column.

import numpy as np
import pandas as pd
import xgboost as xgb
import random
from sklearn import preprocessing
from sklearn.model_selection import train_test_split

#Create Example Data
np.random.seed(312)
data = np.random.random((10000, 3))
y = [random.choice('abcd') for _ in range(data.shape[0])]

features = ["x1", "x2", "x3"]
df = pd.DataFrame(data=data, columns=features)
df['y']=y

#Encode target variable
labelencoder = preprocessing.LabelEncoder()
df['y_target'] = labelencoder.fit_transform(df['y'])
    
#Train Test Split    
X_train, X_test, y_train, y_test = train_test_split(df[features], df['y_target'], test_size=0.2, random_state=42, stratify=y)

#Train Model
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)

param = {        'objective':'multi:softprob',
                 'random_state': 20,
                 'tree_method': 'gpu_hist',
                 'num_class':4
                }

xgb_model = xgb.train(param, dtrain, 100)

predictions=xgb_model.predict(dtest)

print(predictions)

from Recent Questions - Stack Overflow https://ift.tt/3H9mtDD
https://ift.tt/eA8V8J

Search This Blog

Theprogrammersfirst | A technical portal.

Xgboost multiclass classification map the probabilities to the labels

Comments

Post a Comment

Popular posts from this blog

Spring Elasticsearch Operations

Hibernate Search - Elasticsearch with JSON manipulation

Today Walkin 14th-Sept