Interacting regressors in a BQ ML Linear Regression Model
I'm trying to work out how get two regressors to interact when using BigQuery ML.
In this example below (apologies for the rough fake data!), I'm trying to predict total_hire_duration
using trip_count
as well as the month of the year. BQ tends to treat the month
part as a constant to add on to the linear regression equation but I actually want it to grow with trip_count
. For my real dataset I can't just supply the timestamp as BQML seems to over parametise.
I should add, if I supply month
as a numeric value I just get a single coefficient that doesn't really work for my dataset (patterns form around parts of the academic year rather than calendar).
If the month part is a constant, then as trip_count
gets very very large, the constant in the equation y = ax+b
becomes inconsequential. It's almost as if I want something like y = ax + bx + c
where a
is the trip_count
and b
is a coefficient weighted on what the value of month
is.
This is quite easy to do in R, I'd just run glm(bike$totalHireDuration ~ bike$tripCount:bike$month)
Here's some fake data to reproduce:
CREATE OR REPLACE MODEL
my_model_name OPTIONS (model_type='linear_reg',
input_label_cols =['total_hire_duration']) AS (
SELECT
CAST(EXTRACT(MONTH FROM DATE(start_date)) AS STRING) month,
COUNT(*) trip_count,
SUM(duration_sec) total_hire_duration
FROM
bigquery-public-data.san_francisco_bikeshare.bikeshare_trips
GROUP BY
date)
Any help would be greatly appreciated!
Comments
Post a Comment