Iterate through column names to get different type of functions summarized by week in r dataframe using dplyr
I am trying to iterate through global health epidemic data on a database which consists of daily cases, cumulative cases, daily deaths, and cumulative deaths (as well as some other covariables which aren't really relevant here). The table is structured as follows: For each country (with country name listed, region, ID) and each date (though not all dates are displayed for all countries*) the daily/cumulative cases/deaths/etc. are listed.
The data looks something like this:
# A tibble: 40 x 7
iso_code continent location date total_cases new_cases week
<chr> <chr> <chr> <date> <dbl> <dbl> <chr>
1 AFG Asia Afghanistan 2020-02-24 5 5 2020-08
2 AFG Asia Afghanistan 2020-02-25 5 0 2020-08
3 AFG Asia Afghanistan 2020-02-26 5 0 2020-08
4 AFG Asia Afghanistan 2020-02-27 5 0 2020-08
5 AFG Asia Afghanistan 2020-02-28 5 0 2020-08
6 AFG Asia Afghanistan 2020-02-29 5 0 2020-08
7 AFG Asia Afghanistan 2020-03-01 5 0 2020-09
8 AFG Asia Afghanistan 2020-03-02 5 0 2020-09
9 AFG Asia Afghanistan 2020-03-03 5 0 2020-09
10 AFG Asia Afghanistan 2020-03-04 5 0 2020-09
# ... with 30 more rows
I need to summarize the daily data into weekly data. Of course, this is no problem for one column: using methods described here I should be able to aggregate the data for each week, for each country as follows~
library(dplyr)
sumByColumn <- function(df, colName) {
# the method for daily (cases/deaths)/(cases/deaths) smoothed
df %>%
group_by(location, week) %>%
summarize(colName = sum(!! sym(colName)))
}
idByColumn <- function(df, colName) {
# the method for cumulative (cases/deaths)
df %>%
group_by(location, week) %>%
summarize(colName = identity(!! sym(colName)))
}
(It should be noted that, obviously, daily case/death data will be summarized, whereas cumulative case/death data will be simply the identity function as given. These columns, in the list of column names of df
, are denoted as id_cols
.)
However, when I try to run the sumByColumn()
/idByColumn()
loop along the entire dataframe df
, I run into this error:
for (col in 1:ncol(df)) {
colName = colnames(df)[col]
if (col%in%id_cols) {
df_weekly = idByColumn(df_weekly,colName)
} else {
df_weekly = sumByColumn(df_weekly,colName)
}
}
I get:
Error in !sym(colName) : invalid argument type
Note: I have computed the frequency by which the number of times each country appears in the dataframe, which corresponds to the number of days the disease was tracked. Is there a way to account for this, e.g. when I go through the weeks, if there is no data for that week, or an uneven number of countries per week give data, to ignore it and not return NA
?
916
916
910
892
884
899
971
938
899
946
Comments
Post a Comment