Gremlin: Shortest logistical route between A and B while respecting schedules + other constraints
Preface
- I'm new to Gremlin and working through Kelvin Lawrence's awesome eBook on the topic in order to solve a specific use-case.
- Due to the sheer amount to learn, I'm asking this question to get recommendations on how I might approach the challenge so that, as I read the eBook, I'll better know the sections to which to pay extra attention.
- I intend to use AWS Neptune in the pursuit of solving this, so I tagged that topic as well.
Question
Respecting departure/arrival times of legs + other constraints, can the shortest path (the real-world, logistical meaning of "path") between origin and destination be "queried" (i.e., can I use the Gremlin console with a single statement)? Or is the use-case of such complexity that I will effectively need to write a program to accomplish it?
Use-Case / Detail
I hope to answer the question:
Starting at
ORIGINonDAY, can I get toDESTINATIONwhile respecting[CONDITIONS]?
The good news is that I only need a true/false response (so limit(1)?) and a lack of a result (e.g., []) suffices for "no".
What are the conditions?
- Flight schedules need to be respected. Instead of simple flight routes (i.e., a connection exists between BOSton and DALlas), I have actual flight schedules (i.e., on Wednesday, 9 Nov 2022 at 08:40, flight XYZ will depart BOSton and then arrive DALlas at 13:15) ... consequently, if/when there are connections, I need to respect arrival and departure times + some sort of buffer (i.e., a path for which a Traveler would arrive at 13:05 and depart on another leg at 13:06 isn't actually a valid path);
- Aggregate travel time / cost limits. The answer to the question needs to be "No" if a path's aggregate travel time or aggregate cost exceeds specified limits. (Here, I believe I'll need to use
sack()to track the cost - financial and time - of each leg and bail out of therepeat() untilloop when either is hit?)
I apologize b/c I know this isn't a good StackOverflow question, since it's not technically specific -- my hope is that, at least, some specific technical recommendations might result.
The use-case seems like the varsity / pro version of the flight routes example presented in the eBook, which is perfect for someone brand-new to Gremlin ... 😅
Updates / Follow-up Questions
UPDATE 1
Thank you very much, @Kelvin Lawrence, for responding A few questions as I think the data model through:
- Do you happen to know what is meant by
ORD_FLIGHTin the edges betweenAirportDayandFlightvertices in the second blog post? - Since the aggregate travel time (i.e., flight duration(s) + layover(s)) is an important, limiting condition, would it be more efficient to add
Durationas a property of aFlight(i.e., calculate it during data ingestion)? Or is it better to use theDepartsandArrivesproperties, at query-time, to calculateDuration? - Along the lines of Q2, what are your thoughts on adding the best-case (i.e., fastest) duration to the "first" graph (i.e., Flight Routes)? Thinking: could apply a "naive travel time limit" to the first traversal, disqualifying candidate
DESTINATIONs that are clearly unreachable time-wise -- winnowing down the set of paths for the second traversal.
UPDATE 2
- Here's my first attempt at the data model (notes continue after the break):
- First, I've generalized Airports ==> Ports and Flights ==> Voyages, because my use-case could grow to encompass more than flights (i.e., rail and sea transport could be a thing).
- Second (and more importantly), reflecting on maxdemarzi's blog post, I believe his model creates a challenge that is never addressed. Specifically, his proposed model effectively changes the motivating question from:
I want to depart
AirportA onDayD and arriveAirportB -- what are the best routing options?
to:
I want to depart on
AirportDayA -- what are the best routing options toAirportDayB?
The issue with this approach is that it's invalid to assume the Day component of the AirportDay vertex identifier will be the same -- doing so will constrain the search to same-day paths.
To illustrate: How would that model discover a path that includes the gain (or loss) of a day due to crossing the International Date Line -or- "crossing midnight" (e.g., depart Boston 5 Jan @ 2030, arrive Barcelona 6 Jan @ 0900)?
- I'm not sure my solution is implementable (so please validate, as this is very important), but I propose a
dayChangeproperty on theVoyagenode. This will usually be 0, occasionally -1 or +1, and (for multi-day duration Voyages) could be up to +N. My thinking is that this could be used during traversal to dynamically determine "target" vertices:
Example: I want to depart BOS on 5-Jan and get to BCN -- what are the best routing options?
- Start traversal at
AirportDaynode id: BOS-5JAN - Get outbound edges
- One
Voyagenode hasdayChange== +1 - Determine if the
Voyageis connected to BCN-6JAN (BCN == known destination, 6JAN is calculated by adding the +1 to the known start date) - If BCN-6JAN has an inbound edge, then one path has been found
The ability to dynamically build the target vertex (BCN + 6JAN) is the crux of the proposed solution.

Comments
Post a Comment