There is a problem of binary classification. And there are many predictors, including qualitative ones. I apply the decision tree algorithms, the logit model and the SVM with a linear core. The question is, do we need to create indicator variables for qualitative ones for each of these algorithms, that is, exclude one level from consideration? I know that in the task it is necessary, otherwise the matrix will not be reversed . But what to do here?
For example, estimating a logit model for all variables in sklearn I get coefficients for each level of quality variables, and evaluating it in R , it removes for each of the quality variables in the equation by level.

  • one
    Looking for code examples. So far, it seems that using sklearn is doing something wrong. In R, when constructing a model, either conversion into indicator “on the fly” will be performed, or original categorical variables will be used (when possible). For the same trees, you can use indicators and do not remove one level. - Ogurtsov
  • one
    In R, there is the concept of a “base level” with which the values ​​of the dependent variable are compared for all other levels. This baseline in summary is not reflected. - Artem Klevtsov

0