I am doing a cluster analysis using k-means. I have generated 720 datasets grouped into one sheet and there is a separate dataset containing the values of the number of centers ( k ) for each of the 720 datasets. I try to make so that the cluster analysis was performed at once for all datasets in a sheet in one action with the help of lapply and the number of centers corresponding to each dataset was chosen. The problem is that I do not know how to make lapply besides the alternation of datasets, alternating the corresponding numbers of centers.
Example:
# генерируем 5 датасетов set.seed(199) df1<-data.frame(replicate(4,sample(1:100,40,rep=TRUE))) df2<-data.frame(replicate(3,sample(1:100,30,rep=TRUE))) df3<-data.frame(replicate(5,sample(1:100,20,rep=TRUE))) df4<-data.frame(replicate(6,sample(1:100,40,rep=TRUE))) df5<-data.frame(replicate(3,sample(1:100,50,rep=TRUE))) # засовываем их в лист list_df = list(df1, df2, df3, df4, df5) list_names = c("df1", "df2", "df3", "df4", "df5") list_df<-setNames(list_df, list_names) # создаем датасет с центрами для каждого датасета df_centers <- data.frame(centers=c(3,4,2,6,8)) # попытка применить lapply (код неверный) km.clust <- lapply(list_df, kmeans, centers = df_centers$centers) I try to do this in a similar way, however I don’t know how to make kmeans use the corresponding df_centers for each list_df in the list_df list. How can this be implemented?