Gremlin: Shortest logistical route between A and B while respecting schedules + other constraints
Preface
- I'm new to Gremlin and working through Kelvin Lawrence's awesome eBook on the topic in order to solve a specific use-case.
- Due to the sheer amount to learn, I'm asking this question to get recommendations on how I might approach the challenge so that, as I read the eBook, I'll better know the sections to which to pay extra attention.
- I intend to use AWS Neptune in the pursuit of solving this, so I tagged that topic as well.
Question
Respecting departure/arrival times of legs + other constraints, can the shortest path (the real-world, logistical meaning of "path") between origin and destination be "queried" (i.e., can I use the Gremlin console with a single statement)? Or is the use-case of such complexity that I will effectively need to write a program to accomplish it?
Use-Case / Detail
I hope to answer the question:
Starting at
ORIGIN
onDAY
, can I get toDESTINATION
while respecting[CONDITIONS]
?
The good news is that I only need a true/false response (so limit(1)
?) and a lack of a result (e.g., []
) suffices for "no".
What are the conditions?
- Flight schedules need to be respected. Instead of simple flight routes (i.e., a connection exists between BOSton and DALlas), I have actual flight schedules (i.e., on Wednesday, 9 Nov 2022 at 08:40, flight XYZ will depart BOSton and then arrive DALlas at 13:15) ... consequently, if/when there are connections, I need to respect arrival and departure times + some sort of buffer (i.e., a path for which a Traveler would arrive at 13:05 and depart on another leg at 13:06 isn't actually a valid path);
- Aggregate travel time / cost limits. The answer to the question needs to be "No" if a path's aggregate travel time or aggregate cost exceeds specified limits. (Here, I believe I'll need to use
sack()
to track the cost - financial and time - of each leg and bail out of therepeat() until
loop when either is hit?)
I apologize b/c I know this isn't a good StackOverflow question, since it's not technically specific -- my hope is that, at least, some specific technical recommendations might result.
The use-case seems like the varsity / pro version of the flight routes example presented in the eBook, which is perfect for someone brand-new to Gremlin ... 😅
Updates / Follow-up Questions
UPDATE 1
Thank you very much, @Kelvin Lawrence, for responding A few questions as I think the data model through:
- Do you happen to know what is meant by
ORD_FLIGHT
in the edges betweenAirportDay
andFlight
vertices in the second blog post? - Since the aggregate travel time (i.e., flight duration(s) + layover(s)) is an important, limiting condition, would it be more efficient to add
Duration
as a property of aFlight
(i.e., calculate it during data ingestion)? Or is it better to use theDeparts
andArrives
properties, at query-time, to calculateDuration
? - Along the lines of Q2, what are your thoughts on adding the best-case (i.e., fastest) duration to the "first" graph (i.e., Flight Routes)? Thinking: could apply a "naive travel time limit" to the first traversal, disqualifying candidate
DESTINATION
s that are clearly unreachable time-wise -- winnowing down the set of paths for the second traversal.
UPDATE 2
- Here's my first attempt at the data model (notes continue after the break):
- First, I've generalized Airports ==> Ports and Flights ==> Voyages, because my use-case could grow to encompass more than flights (i.e., rail and sea transport could be a thing).
- Second (and more importantly), reflecting on maxdemarzi's blog post, I believe his model creates a challenge that is never addressed. Specifically, his proposed model effectively changes the motivating question from:
I want to depart
Airport
A onDay
D and arriveAirport
B -- what are the best routing options?
to:
I want to depart on
AirportDay
A -- what are the best routing options toAirportDay
B?
The issue with this approach is that it's invalid to assume the Day
component of the AirportDay
vertex identifier will be the same -- doing so will constrain the search to same-day paths.
To illustrate: How would that model discover a path that includes the gain (or loss) of a day due to crossing the International Date Line -or- "crossing midnight" (e.g., depart Boston 5 Jan @ 2030, arrive Barcelona 6 Jan @ 0900)?
- I'm not sure my solution is implementable (so please validate, as this is very important), but I propose a
dayChange
property on theVoyage
node. This will usually be 0, occasionally -1 or +1, and (for multi-day duration Voyages) could be up to +N. My thinking is that this could be used during traversal to dynamically determine "target" vertices:
Example: I want to depart BOS on 5-Jan and get to BCN -- what are the best routing options?
- Start traversal at
AirportDay
node id: BOS-5JAN - Get outbound edges
- One
Voyage
node hasdayChange
== +1 - Determine if the
Voyage
is connected to BCN-6JAN (BCN == known destination, 6JAN is calculated by adding the +1 to the known start date) - If BCN-6JAN has an inbound edge, then one path has been found
The ability to dynamically build the target vertex (BCN
+ 6JAN
) is the crux of the proposed solution.
Comments
Post a Comment