synchronization of three timeframes

Question

There are three data frames of slightly different length because observations were made from a different time,

how can they be synchronized in time to leave only those observations that are in all three frames and throw out those that come across only in separate frames

here are the date frames themselves

> head(sec1) date time open high low close vol 1 2016.09.06 08:45 3081.5 3082.5 3080.5 3080.5 6 2 2016.09.06 08:50 3081.5 3081.5 3079.5 3080.5 6 3 2016.09.06 08:55 3081.5 3082.5 3081.5 3082.5 19 4 2016.09.06 09:00 3083.5 3083.5 3081.5 3082.5 19 5 2016.09.06 09:05 3083.5 3085.5 3082.5 3085.5 8 6 2016.09.06 09:10 3086.5 3086.5 3084.5 3086.5 15 > head(sec2) date time open high low close vol 1 2016.09.13 13:00 95.34 95.40 95.33 95.39 36 2 2016.09.13 13:05 95.40 95.43 95.39 95.41 40 3 2016.09.13 13:10 95.42 95.44 95.40 95.42 37 4 2016.09.13 13:15 95.41 95.42 95.39 95.39 25 5 2016.09.13 13:20 95.40 95.41 95.38 95.38 21 6 2016.09.13 13:25 95.39 95.42 95.38 95.42 32 > head(sec3) date time open high low close vol 1 2016.09.14 18:10 1.12433 1.12456 1.12431 1.12450 137 2 2016.09.14 18:15 1.12444 1.12459 1.12424 1.12455 139 3 2016.09.14 18:20 1.12454 1.12477 1.12446 1.12469 148 4 2016.09.14 18:25 1.12468 1.12474 1.12442 1.12453 120 5 2016.09.14 18:30 1.12452 1.12483 1.12442 1.12482 156 6 2016.09.14 18:35 1.12481 1.12499 1.12472 1.12474 126

Those at the exit should have three data frames of the same length (nrow) and all lines of the data frames should have the same date and time

Accepted Answer · 2016-09-25T11:06:43

If I correctly understood the task, then I need to determine the overlapping intervals of dates and times and filter the observations that fall within these intervals. I note that the data given as an example do not overlap in dates.

Define boundaries for dates:

 min_date <- list(df1, df2, df3) %>% sapply(. %>% .subset2("date") %>% as.Date(format = "%Y.%m.%d") %>% min()) %>% max() max_date <- list(df1, df2, df3) %>% sapply(. %>% .subset2("date") %>% as.Date(format = "%Y.%m.%d") %>% max()) %>% min()

Now the same for the time:

 min_time <- list(df1, df2, df3) %>% sapply(. %>% .subset2("time") %>% as.POSIXct(format = "%H:%M") %>% min()) %>% max() max_time <- list(df1, df2, df3) %>% sapply(. %>% .subset2("time") %>% as.POSIXct(format = "%H:%M") %>% min()) %>% min()

Now you can filter the observations:

 df1 <- df1 %>% mutate(date = as.Date(date, format = "%Y.%m.%d")) %>% filter(date >= min_date & date <= max_date) %>% mutate(time = as.POSIXct(time, format = "%H:%M")) %>% filter(time >= min_time & time <= max_time)

For the code to work, you need to download the dplyr package.

But something is not clear what we got at the output ... The code runs without errors, but when I try to open df1 after running the code, I get "" "" "df1 [1] date time open low low close vol <0 rows> ( or 0-length row.names) "" "" "" "fig.
The tables from your example do not overlap in time and dates, so at the output you get empty tables.
No, they intersect, they just double-checked again, and in my illustration above just the

Answer 2 · 2016-11-18T13:18:20

As far as I understand, the task is to leave in each dataset only those observations for which there are observations with similar values of date and time in two other datasets.

I see the simplest solution to this:

merge all three datasets together
group observations by date and time
count the number of observations in groups
leave only those combinations of date and time , which occur 3 times
filter out source datasets by dataset of intersecting observations

Code (didn’t check - too lazy to generate source datasets; write if I missed something somewhere, and it doesn’t work)

 library(tidyverse) df_cross <- bind_rows(df1, df2, df3) %>% group_by(date,time) %>% summarise(occurance = n()) %>% ungroup() %>% filter(occurance == 3) %>% select(-occurance) df1_refined <- left_join(df_cross, df1, by = c('date', 'time'))

UPD everything was even easier

 df_cross <- intersect(df1 %>% select(date,time), df2 %>% select(date,time), df3 %>% select(date,time)) df1_refined <- left_join(df_cross, df1, by = c('date', 'time'))

synchronization of three timeframes

2 answers 2

UPD everything was even easier

More articles: