How to code multiple time intervals by a set of dates?

Searched SO for duplicate questions and a couple came close, but not quite to my problem. I tried implementing their solutions and the data are still not quite right.

  1. Summary: I have a very long data set with multiple entries per subject ID, and they take place in sets of intervals over the course of sets of three days. See data below for exact detail.
  • A subject might have 26 "intervals" over the course of 3 dates, and then has an outcome variable I need counted then scaled according to a "perfect" interval = 30. I.e. whatever their count of X over the interval needs to be scaled as if it were done in 30 intervals over the course of the three consecutive dates.

  • Importantly, I need to tally the number of "S" and "U" to create their own separate counts as primary outcomes - this is the number that is getting "scaled" to a full 30 intervals per 3 days.

  1. I tried (and failed) with sapply, for loops and some brute Excel stuff, but there are so many thousands of rows that doing it in Excel would take days.

  2. Some dummy data to work with:

> head(diary_df1, 30)
        id site      date count    type      interval
1  1280003   51 11/6/2013     0    None 06:00 - 08:00
2  1280003   51 11/6/2013     0    None 00:00 - 06:00
3  1280003   51 11/6/2013     1       S 08:00 - 10:00
4  1280003   51 11/6/2013     1       S 10:00 - 12:00
5  1280003   51 11/6/2013     0    None 12:00 - 14:00
6  1280003   51 11/6/2013     2     S S 14:00 - 16:00
7  1280003   51 11/6/2013     2     S S 16:00 - 18:00
8  1280003   51 11/6/2013     1       S 18:00 - 20:00
9  1280003   51 11/7/2013     0    None 00:00 - 06:00
10 1280003   51 11/7/2013     0    None 06:00 - 08:00
11 1280003   51 11/7/2013     1       S 08:00 - 10:00
12 1280003   51 11/7/2013     0    None 10:00 - 12:00
13 1280003   51 11/7/2013     2     S S 12:00 - 14:00
14 1280003   51 11/7/2013     1       S 14:00 - 16:00
15 1280003   51 11/7/2013     0    None 16:00 - 18:00
16 1280003   51 11/7/2013     2     S S 18:00 - 20:00
17 1280003   51 11/8/2013     0    None 00:00 - 06:00
18 1280003   51 11/8/2013     2     S S 06:00 - 08:00
19 1280003   51 11/8/2013     0    None 08:00 - 10:00
20 1280003   51 11/8/2013     0    None 10:00 - 12:00
21 1280003   51 11/8/2013     1       S 12:00 - 14:00
22 1280003   51 11/8/2013     0    None 14:00 - 16:00
23 1280003   51 11/8/2013     1       S 20:00 - 22:00
24 1280003   51 11/8/2013     1       S 22:00 - 00:00
25 1280003   51 11/8/2013     2     S S 16:00 - 18:00
26 1280003   51 11/8/2013     0    None 18:00 - 20:00
27 1280003   51  4/1/2014     0    None 00:00 - 06:00
28 1280003   51  4/1/2014     0    None 06:00 - 08:00
29 1280003   51  4/1/2014     4 S S S S 08:00 - 10:00
30 1280003   51  4/1/2014     3   S S S 10:00 - 12:00

so this is 30 intervals, but as you can see in the 3 day span, the ppt only has 26 valid intervals before the date changes

So, I would like to have additional columns that

  1. spit out the count and total intervals over the time span, (int_count), (int_tot)
  2. report that as a % of 30, then
  3. count the number of S and U (separately) in the type variable. (x_3day_total)

DESIRED OUTCOME:

> head(df_stack, 30)
        id site      date count    type      interval int_count int_tot x_3day_total
1  1280003   51 11/6/2013     0    None 06:00 - 08:00         1      26           20
2  1280003   51 11/6/2013     0    None 00:00 - 06:00         2      26           20
3  1280003   51 11/6/2013     1       S 08:00 - 10:00         3      26           20
4  1280003   51 11/6/2013     1       S 10:00 - 12:00         4      26           20
5  1280003   51 11/6/2013     0    None 12:00 - 14:00         5      26           20
6  1280003   51 11/6/2013     2     S S 14:00 - 16:00         6      26           20
7  1280003   51 11/6/2013     2     S S 16:00 - 18:00         7      26           20
8  1280003   51 11/6/2013     1       S 18:00 - 20:00         8      26           20
9  1280003   51 11/7/2013     0    None 00:00 - 06:00         9      26           20
10 1280003   51 11/7/2013     0    None 06:00 - 08:00        10      26           20
11 1280003   51 11/7/2013     1       S 08:00 - 10:00        11      26           20
12 1280003   51 11/7/2013     0    None 10:00 - 12:00        12      26           20
13 1280003   51 11/7/2013     2     S S 12:00 - 14:00        13      26           20
14 1280003   51 11/7/2013     1       S 14:00 - 16:00        14      26           20
15 1280003   51 11/7/2013     0    None 16:00 - 18:00        15      26           20
16 1280003   51 11/7/2013     2     S S 18:00 - 20:00        16      26           20
17 1280003   51 11/8/2013     0    None 00:00 - 06:00        17      26           20
18 1280003   51 11/8/2013     2     S S 06:00 - 08:00        18      26           20
19 1280003   51 11/8/2013     0    None 08:00 - 10:00        19      26           20
20 1280003   51 11/8/2013     0    None 10:00 - 12:00        20      26           20
21 1280003   51 11/8/2013     1       S 12:00 - 14:00        21      26           20
22 1280003   51 11/8/2013     0    None 14:00 - 16:00        22      26           20
23 1280003   51 11/8/2013     1       S 20:00 - 22:00        23      26           20
24 1280003   51 11/8/2013     1       S 22:00 - 00:00        24      26           20
25 1280003   51 11/8/2013     2     S S 16:00 - 18:00        25      26           20
26 1280003   51 11/8/2013     0    None 18:00 - 20:00        26      26           20

Note that this is only for a slice (26 rows) so it will need to apply per subject per 3 day set of dates (which can be longer or shorter than 26)

Data (dput)

structure(list(id = c(1280003, 1280003, 1280003, 1280003, 1280003, 
1280003, 1280003, 1280003, 1280003, 1280003, 1280003, 1280003, 
1280003, 1280003, 1280003, 1280003, 1280003, 1280003, 1280003, 
1280003, 1280003, 1280003, 1280003, 1280003, 1280003, 1280003, 
1280003, 1280003, 1280003, 1280003), site = c(51, 51, 51, 51, 
51, 51, 51, 51, 51, 51, 51, 51, 51, 51, 51, 51, 51, 51, 51, 51, 
51, 51, 51, 51, 51, 51, 51, 51, 51, 51), date = c("11/6/2013", 
"11/6/2013", "11/6/2013", "11/6/2013", "11/6/2013", "11/6/2013", 
"11/6/2013", "11/6/2013", "11/7/2013", "11/7/2013", "11/7/2013", 
"11/7/2013", "11/7/2013", "11/7/2013", "11/7/2013", "11/7/2013", 
"11/8/2013", "11/8/2013", "11/8/2013", "11/8/2013", "11/8/2013", 
"11/8/2013", "11/8/2013", "11/8/2013", "11/8/2013", "11/8/2013", 
"4/1/2014", "4/1/2014", "4/1/2014", "4/1/2014"), count = c(0, 
0, 1, 1, 0, 2, 2, 1, 0, 0, 1, 0, 2, 1, 0, 2, 0, 2, 0, 0, 1, 0, 
1, 1, 2, 0, 0, 0, 4, 3), type = c("None", "None", "S", "S", "None", 
"S S", "S S", "S", "None", "None", "S", "None", "S S", "S", "None", 
"S S", "None", "S S", "None", "None", "S", "None", "S", "S", 
"S S", "None", "None", "None", "S S S S", "S S S"), interval = c("06:00 - 08:00", 
"00:00 - 06:00", "08:00 - 10:00", "10:00 - 12:00", "12:00 - 14:00", 
"14:00 - 16:00", "16:00 - 18:00", "18:00 - 20:00", "00:00 - 06:00", 
"06:00 - 08:00", "08:00 - 10:00", "10:00 - 12:00", "12:00 - 14:00", 
"14:00 - 16:00", "16:00 - 18:00", "18:00 - 20:00", "00:00 - 06:00", 
"06:00 - 08:00", "08:00 - 10:00", "10:00 - 12:00", "12:00 - 14:00", 
"14:00 - 16:00", "20:00 - 22:00", "22:00 - 00:00", "16:00 - 18:00", 
"18:00 - 20:00", "00:00 - 06:00", "06:00 - 08:00", "08:00 - 10:00", 
"10:00 - 12:00")), row.names = c(NA, -30L), class = "data.frame")


Comments

Popular posts from this blog

Today Walkin 14th-Sept

Spring Elasticsearch Operations

Hibernate Search - Elasticsearch with JSON manipulation