2023-03-23

Metrics to compare two sets of 1D points

I'm training an AI model to predict where to place train stations along a train track.

I want to feed my model some information about the train track, which generates a series of points A, B, C, ... that correspond to stations that I should place on the train track.

A prediction P could look like:

=====A========================B===============================C==========

I also have a ground truth T for my training examples. For instance, that could look like:

=========A====================B=========C=============================D==

Now, what I want is a metric to measure my model.

I've been thinking about possible solutions, but none of the usual candidates seem to fit this problem. Some ideas I've considered are:

Creating a custom F1 score

To do so, get the distance of each point in P with respect to the closest point in T (precision). Likewise, get the distance of each point in T with respect to the closest one in P (recall). Duplicates are allowed (for instance, T(B) and T(C) would both be matched against P(B)).

This suffers from a small problem, however. Consider an alternative P:

=====A====================B=C=D=E=F===========================G==========

In this case, precision and recall would remain more or less the same. However, I would like the score to reflect the fact that many more points than expected have been placed along the track.

Metrics for Time Series

During my research of metrics I've searched on the time series domain just in case there was something similar to this problem but I've found nothing that closely resembles this scenario.

Things that the score should reflect

  1. T has more points than G.
  2. G has more points than T.
  3. The distance between predicted points and ground truth ones and vice versa.

I don't want to reinvent the wheel, and this seems like a (possibly) common scenario.

Is there anything I might be missing?



No comments:

Post a Comment