How to get last non null values and aggregated non null values by timestamp from event sourcing JSONB column in PostgreSQL?
I'm working with event sourced data where all of the important fields are combined to a JSONB column and many of the keys are missing from most of the database rows.
I would want to get:
- Aggregated combined values of included arrays in the JSONB field (see ingredients in the example)
- Latest non-null value according to timestamp
I tried this on my own and I was able to produce a example which generates exactly what I would want to achieve here but it looks rather ugly and I'm wondering how to make following query better.
Schema (PostgreSQL v15)
CREATE TABLE events (
id SERIAL PRIMARY KEY,
identifier VARCHAR(255),
timestamp TIMESTAMP WITH TIME ZONE,
event_data JSONB
);
INSERT INTO events (identifier, timestamp, event_data)
VALUES
('12345', '2019-01-01T00:00:00.000Z', '{"target": "99999999"}'),
('12345', '2019-01-01T12:00:00.000Z', '{"ingredients": ["Banana", "Strawberry"]}'),
('12345', '2019-01-03T00:00:00.000Z', '{"target": "12345678", "user": "peterpan"}'),
('12345', '2019-01-04T00:00:00.000Z', '{"ingredients": ["Melon"], "user": "robinhood"}'),
('67890', '2019-01-03T00:00:00.000Z', '{"target": "0000", "user": "mickeymouse"}'),
('67890', '2019-01-04T00:00:00.000Z', '{"ingredients": ["Potato"]}');
Query #1
WITH events_flattened AS (
SELECT
identifier,
timestamp,
event_data->>'target' AS target,
event_data->>'user' AS user,
elem.part
FROM events
LEFT JOIN LATERAL jsonb_array_elements(event_data->'ingredients') elem(part) ON true
ORDER BY timestamp DESC
)
SELECT
identifier,
(ARRAY_REMOVE(ARRAY_AGG(e.target),NULL))[1] as target,
(ARRAY_REMOVE(ARRAY_AGG(e.user),NULL))[1] as user,
ARRAY_REMOVE(ARRAY_AGG(part),NULL) as ingredients,
MAX(timestamp) as latest_update
FROM events_flattened e
GROUP BY identifier;
For the answer to be helpful it should produce exactly same result as seen on this table:
identifier | target | user | ingredients | latest_update |
---|---|---|---|---|
12345 | 12345678 | robinhood | Melon,Strawberry,Banana | 2019-01-04T00:00:00.000Z |
67890 | 0000 | mickeymouse | Potato | 2019-01-04T00:00:00.000Z |
I'm trying to determine which kind of query and indexes would be beneficial to get exactly this kind of data out from this table?
Comments
Post a Comment