2022-03-26

Interacting regressors in a BQ ML Linear Regression Model

I'm trying to work out how get two regressors to interact when using BigQuery ML.

In this example below (apologies for the rough fake data!), I'm trying to predict total_hire_duration using trip_count as well as the month of the year. BQ tends to treat the month part as a constant to add on to the linear regression equation but I actually want it to grow with trip_count. For my real dataset I can't just supply the timestamp as BQML seems to over parametise.

I should add, if I supply month as a numeric value I just get a single coefficient that doesn't really work for my dataset (patterns form around parts of the academic year rather than calendar).

If the month part is a constant, then as trip_count gets very very large, the constant in the equation y = ax+b becomes inconsequential. It's almost as if I want something like y = ax + bx + c where a is the trip_count and b is a coefficient weighted on what the value of month is.

This is quite easy to do in R, I'd just run glm(bike$totalHireDuration ~ bike$tripCount:bike$month)

Here's some fake data to reproduce:

CREATE OR REPLACE MODEL
  my_model_name OPTIONS (model_type='linear_reg',
    input_label_cols =['total_hire_duration']) AS (
  SELECT
    CAST(EXTRACT(MONTH FROM DATE(start_date)) AS STRING) month,
    COUNT(*) trip_count,
    SUM(duration_sec) total_hire_duration
  FROM
    bigquery-public-data.san_francisco_bikeshare.bikeshare_trips
  GROUP BY
    date)

Any help would be greatly appreciated!



No comments:

Post a Comment