2023-05-29

Training a BARTForSequenceClassification returns data with ununiform dimentsions

I am trying to fine-tune a BART-base model on a dataset that I have. The dataset looks like this: It has columns "id", "text", "label" and "dataset_id". The "text" column is what I want to use as inputs to the model, and it is plain text. "label" is a value of either 0 or 1.

I've already written the code for Training, using transfomers==4.28.0.

This is the code for the dataset class:

class TextDataset(Dataset):
    def __init__(self, encodings):
        self.encodings = encodings

    def __getitem__(self, idx):
        return {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}

    def __len__(self):
        return len(self.encodings['input_ids'])

This is the code for loading and encoding of the data:

def load_data(directory):
    files = os.listdir(directory)
    dfs = []
    for file in files:
        if file.endswith('train.csv'):
            df = pd.read_csv(os.path.join(directory, file))
            dfs.append(df)
    return pd.concat(dfs, ignore_index=True)
print(len(load_data("splitted_data/gender-bias")))

def encode_data(tokenizer, text, labels):
    inputs = tokenizer(text, padding="max_length", truncation=True, max_length=128, return_tensors="pt")
    inputs['labels'] = torch.tensor(labels)
    return inputs

This is the code for the metrics for evaluation. I use the f1_score function from scikit.

def compute_metrics(eval_pred):
    logits = eval_pred.predictions
    labels = eval_pred.label_ids
    predictions = np.argmax(logits, axis=-1)
    return {"f1": f1_score(labels, predictions)}

This is the training function:

def train_model(train_dataset, eval_dataset):
    # Define the training arguments
    training_args = TrainingArguments(
        output_dir='./baseline/results',           # output directory
        num_train_epochs=5,               # total number of training epochs
        per_device_train_batch_size=32,   # batch size per device during training
        per_device_eval_batch_size=64,    # batch size for evaluation
        warmup_steps=500,                 # number of warmup steps for learning rate scheduler
        weight_decay=0.01,                # strength of weight decay
        evaluation_strategy="steps",      # evaluation is done at each training step
        eval_steps=50,                    # number of training steps between evaluations
        load_best_model_at_end=True,      # load the best model when finished training (defaults to `False`)
        save_strategy='steps',            # save the model after each training step
        save_steps=500,                   # number of training steps between saves
        metric_for_best_model='f1',       # metric to use to compare models
        greater_is_better=True            # whether a larger metric value is better
    )

    # Define the trainer
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset,
        eval_dataset=eval_dataset,
        compute_metrics=compute_metrics
    )

    # Train the model
    trainer.train()

    return trainer

This is how I defined the model and etc.

model = BartForSequenceClassification.from_pretrained('facebook/bart-base', num_labels=2)
tokenizer = BartTokenizer.from_pretrained('facebook/bart-base')

train_df = load_data("splitted_data/gender-bias")
train_encodings = encode_data(tokenizer, train_df['text'].tolist(), train_df['label'].tolist())

# For simplicity, let's split our training data to create a pseudo-evaluation set
train_size = int(0.9 * len(train_encodings['input_ids']))  # 90% for training
train_dataset = {k: v[:train_size] for k, v in train_encodings.items()}
print(train_dataset)
print(len(train_dataset))

eval_dataset = {k: v[train_size:] for k, v in train_encodings.items()}  # 10% for evaluation

# Convert the dictionary data to PyTorch Dataset
train_dataset = TextDataset(train_dataset)
eval_dataset = TextDataset(eval_dataset)

trainer = train_model(train_dataset, eval_dataset)

The training looks just fine. However, when it comes to evaluation during training, an error is raised from my compute_metrics function, which takes a parameter as the output of the model. The model should be a binary classification model, returning the probabilistic of each label in its output I believe.

np.argmax(np.array(logits), axis=-1) 21 
ValueError: could not broadcast input array from shape (3208,2) into shape (3208,)

I've tried to output the type of the logits, and it turns out that type(logits) return Tuple. Considering that this might be caused the fact that evaluation dataset might be split into batches, and the returned Tuple is a number of separate numpy arrays, I've also tried to concatenate the tuple.

def compute_metrics(eval_pred):
    logits = eval_pred.predictions
    labels = eval_pred.label_ids
    logits = np.concatenate(logits, axis=0)
    predictions = np.argmax(logits, axis=-1)
    return {"f1": f1_score(labels, predictions)}

But this raised a new error:

packages/numpy/core/overrides.py in concatenate(*args, **kwargs) 

ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 3 dimension(s)

How can I solve this issue?



No comments:

Post a Comment