There is a data set, the names of some columns of the number (year). When you try to automate the creation of a pivot table, you get an error.

df <- read.table(text = "value class 2000 2001 2002 123 class1 subclass1 subclass3 subclass1 564 class1 subclass1 subclass3 subclass2 564 class1 subclass1 subclass3 subclass3 213 class2 subclass1 subclass4 subclass4 856 class2 subclass1 subclass5 subclass4 22 class3 subclass6 subclass6 subclass4 5 class4 subclass1 subclass3 subclass4", header = TRUE) names(df) <- c("value", "class", "2000", "2001", "2002") 

It works without problems:

 data.frame(with(df, tapply(value, INDEX = list(class, `2000`), FUN = sum))) data.frame(with(df, tapply(value, INDEX = list(class, `2001`), FUN = sum))) data.frame(with(df, tapply(value, INDEX = list(class, `2002`), FUN = sum))) 

If you try to do this cycle produces an error.

 years <- c("2000", "2001", "2002") for (i in 1:3) { data.frame(with(df, tapply(value, INDEX = list(class, years[i]), FUN = sum))) } Error in tapply(value, INDEX = list(class, years[i]), FUN = sum) : arguments must have same length 

Can someone tell me what my mistake is and how to fix it?

    1 answer 1

    The cause of the error in your code is that you pass a string, but you need the name of a variable (column). In your case, you can simply refuse the with construction. This construction will work without errors:

     tapply(df[["value"]], INDEX = list(df[["class"]], df[[years[1]]]), FUN = sum) 

    The aggregate() function allows you to pass a formula as an argument. We can form the formula itself as a text string.

     f <- as.formula(sprintf("value ~ class + `%s`", years[1])) aggregate(formula = f, data = df, FUN = sum) 

    For aggregation of more than one index, it is convenient to use the data grouping implemented in the dplyr package. Before the grouping itself, you need to convert the data format from a wide representation to a long one.

     library(tidyr) library(dplyr) df %>% gather(year, subclass, -value, -class) %>% group_by(class, subclass, year) %>% summarise(sum = sum(value)) 

    Result:

     Source: local data frame [15 x 4] Groups: class, subclass [?] class subclass year sum (fctr) (chr) (chr) (int) 1 class1 subclass1 2000 1251 2 class1 subclass1 2002 123 3 class1 subclass2 2002 564 4 class1 subclass3 2001 1251 5 class1 subclass3 2002 564 6 class2 subclass1 2000 1069 7 class2 subclass4 2001 213 8 class2 subclass4 2002 1069 9 class2 subclass5 2001 856 10 class3 subclass4 2002 22 11 class3 subclass6 2000 22 12 class3 subclass6 2001 22 13 class4 subclass1 2000 5 14 class4 subclass3 2001 5 15 class4 subclass4 2002 5 
    • Thank you Exactly what is needed! - makbuk pm