How to bring the size of the data to one type in R

Question

Help please with the decision.

There are two tables with data.

The size of the rows of one table is 30 thousand values, the size of the rows of the second table is 32 thousand values

The data represent two time series. I would like to identify extra lines and remove them from the total sample.

That is, in fact, it is necessary to compare each line of one table with another line of the same table and if it is found that the line is not in the other table, delete it

Artem Klevtsov Artem Klevtsov 1,374 one eight 14 · Answer 1 · 2016-03-17T04:16:08

Since you did not provide sample data, the solution is blind:

dataset <- rbind(dataset1, dataset2) dataset <- dataset[duplicated(dataset), ]

That is, we leave only duplicate lines. You can specify for which columns to find duplicates:

 duplicated(dataset[, cols])

ikashnitsky ikashnitsky 444 2 eleven · Answer 2 · 2016-07-26T08:25:28

This sounds like a task for merge() or dplyr::inner_join() .

Example

 require(dplyr) one <- data.frame(id = sample(letters,size = 20, replace = F), value = rnorm(20)) two <- data.frame(id = sample(letters,size = 20, replace = F), value = rnorm(20)) joined <- inner_join(one,two,by='id')

If there is no task to combine two data sets, but only to filter, you can combine one dataset with an id variable from the other. For example, we need to filter only common lines in dataset two :

 two_filtered <- inner_join(two, one %>% select(id), by='id')

ikashnitsky ikashnitsky 444 2 eleven · Answer 3 · 2016-11-18T20:27:39

And there is an even better solution - dplyr::intersect()

 library(dplyr) joined <- intersect(one, two)

How to bring the size of the data to one type in R

3 answers 3

Example

More articles: