2023-11-11

CUDA issue with NER (Named Entity Recognition) for ML predictions

I'm attempting to use NamedEntityRecognition (NER)(https://github.com/dotnet/machinelearning/issues/630) to predict categories for words/phrases within a large body of text.

Currently using 3 Nuget packages to try get this working:

Microsoft.ML (3.0.0-preview.23511.1)

Microsoft.ML.TorchSharp (0.21.0-preview.23511.1)

Torchsharp-cpu (0.101.1)

At the point of training the model [estimator.Fit(dataView)], I get the following error:

Field not found: 'TorchSharp.torch.CUDA'.

I may have misunderstood something here, but I should be processing with CPU from the Torchsharp-cpu package and I'm not sure where the CUDA reference is coming from. This also appears to be a package reference rather than a field?

using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.ML.TorchSharp;
using System;
using System.Collections.Generic;
using System.Windows.Forms;

namespace NerTester
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }

    private class TestSingleSentenceData
    {
        public string Sentence;
        public string[] Label;
    }

    private class Label
    {
        public string Key { get; set; }
    }

    private void startButton_Click(object sender, EventArgs e)
        {
        try
        {
                var context = new MLContext();
                context.FallbackToCpu = true;
                context.GpuDeviceId = null;

                var labels = context.Data.LoadFromEnumerable(
                new[] {
                new Label { Key = "PERSON" },
                new Label { Key = "CITY" },
                new Label { Key = "COUNTRY"  }
                });

                var dataView = context.Data.LoadFromEnumerable(
                    new List<TestSingleSentenceData>(new TestSingleSentenceData[] {
                    new TestSingleSentenceData()
                    {   // Testing longer than 512 words.
                        Sentence = "Alice and Bob live in the USA",
                        Label = new string[]{"PERSON", "0", "PERSON", "0", "0", "0", "COUNTRY"}
                    },
                     new TestSingleSentenceData()
                     {
                        Sentence = "Alice and Bob live in the USA",
                        Label = new string[]{"PERSON", "0", "PERSON", "0", "0", "0", "COUNTRY"}
                     },
                    }));
                var chain = new EstimatorChain<ITransformer>();
                var estimator = chain.Append(context.Transforms.Conversion.MapValueToKey("Label", keyData: labels))
                   .Append(context.MulticlassClassification.Trainers.NameEntityRecognition(outputColumnName: "outputColumn"))
                   .Append(context.Transforms.Conversion.MapKeyToValue("outputColumn"));

                var transformer = estimator.Fit(dataView);
                transformer.Dispose();
                
                MessageBox.Show("Success!");
            }
        catch (Exception ex)
            {
        MessageBox.Show($"Error: {ex.Message}");
            }
    }
    }
}

Application is running on x64 and the documentation for NER appears to be limited.

Any help would be greatly appreciated.

Tried changing the Nuget packages I'm referencing, including the use if libtorch packages.

Attempted running the application in x86 and x64 configuration.

Added code to try force CPU usage rather than GPU (CUDA).



No comments:

Post a Comment