There are three data frames of slightly different length because observations were made from a different time,

how can they be synchronized in time to leave only those observations that are in all three frames and throw out those that come across only in separate frames

here are the date frames themselves

> head(sec1) date time open high low close vol 1 2016.09.06 08:45 3081.5 3082.5 3080.5 3080.5 6 2 2016.09.06 08:50 3081.5 3081.5 3079.5 3080.5 6 3 2016.09.06 08:55 3081.5 3082.5 3081.5 3082.5 19 4 2016.09.06 09:00 3083.5 3083.5 3081.5 3082.5 19 5 2016.09.06 09:05 3083.5 3085.5 3082.5 3085.5 8 6 2016.09.06 09:10 3086.5 3086.5 3084.5 3086.5 15 > head(sec2) date time open high low close vol 1 2016.09.13 13:00 95.34 95.40 95.33 95.39 36 2 2016.09.13 13:05 95.40 95.43 95.39 95.41 40 3 2016.09.13 13:10 95.42 95.44 95.40 95.42 37 4 2016.09.13 13:15 95.41 95.42 95.39 95.39 25 5 2016.09.13 13:20 95.40 95.41 95.38 95.38 21 6 2016.09.13 13:25 95.39 95.42 95.38 95.42 32 > head(sec3) date time open high low close vol 1 2016.09.14 18:10 1.12433 1.12456 1.12431 1.12450 137 2 2016.09.14 18:15 1.12444 1.12459 1.12424 1.12455 139 3 2016.09.14 18:20 1.12454 1.12477 1.12446 1.12469 148 4 2016.09.14 18:25 1.12468 1.12474 1.12442 1.12453 120 5 2016.09.14 18:30 1.12452 1.12483 1.12442 1.12482 156 6 2016.09.14 18:35 1.12481 1.12499 1.12472 1.12474 126 

Those at the exit should have three data frames of the same length (nrow) and all lines of the data frames should have the same date and time

    2 answers 2

    If I correctly understood the task, then I need to determine the overlapping intervals of dates and times and filter the observations that fall within these intervals. I note that the data given as an example do not overlap in dates.

    Define boundaries for dates:

     min_date <- list(df1, df2, df3) %>% sapply(. %>% .subset2("date") %>% as.Date(format = "%Y.%m.%d") %>% min()) %>% max() max_date <- list(df1, df2, df3) %>% sapply(. %>% .subset2("date") %>% as.Date(format = "%Y.%m.%d") %>% max()) %>% min() 

    Now the same for the time:

     min_time <- list(df1, df2, df3) %>% sapply(. %>% .subset2("time") %>% as.POSIXct(format = "%H:%M") %>% min()) %>% max() max_time <- list(df1, df2, df3) %>% sapply(. %>% .subset2("time") %>% as.POSIXct(format = "%H:%M") %>% min()) %>% min() 

    Now you can filter the observations:

     df1 <- df1 %>% mutate(date = as.Date(date, format = "%Y.%m.%d")) %>% filter(date >= min_date & date <= max_date) %>% mutate(time = as.POSIXct(time, format = "%H:%M")) %>% filter(time >= min_time & time <= max_time) 

    For the code to work, you need to download the dplyr package.

    • Thank! But something is not clear what we got at the output ... The code runs without errors, but when I try to open df1 after running the code, I get "" "" "df1 [1] date time open low low close vol <0 rows> ( or 0-length row.names) "" "" "" "fig. prntscr.com/cm43sn - mr.T
    • The tables from your example do not overlap in time and dates, so at the output you get empty tables. - Artem Klevtsov
    • No, they intersect, they just double-checked again, and in my illustration above just the headings are just the headings (), the data itself is more than 2000 lines in each df - mr.T
    • && need & replace. Corrected the answer. - Artem Klevtsov

    As far as I understand, the task is to leave in each dataset only those observations for which there are observations with similar values ​​of date and time in two other datasets.


    I see the simplest solution to this:

    • merge all three datasets together
    • group observations by date and time
    • count the number of observations in groups
    • leave only those combinations of date and time , which occur 3 times
    • filter out source datasets by dataset of intersecting observations

    Code (didn’t check - too lazy to generate source datasets; write if I missed something somewhere, and it doesn’t work)

     library(tidyverse) df_cross <- bind_rows(df1, df2, df3) %>% group_by(date,time) %>% summarise(occurance = n()) %>% ungroup() %>% filter(occurance == 3) %>% select(-occurance) df1_refined <- left_join(df_cross, df1, by = c('date', 'time')) 

    UPD everything was even easier

     df_cross <- intersect(df1 %>% select(date,time), df2 %>% select(date,time), df3 %>% select(date,time)) df1_refined <- left_join(df_cross, df1, by = c('date', 'time'))