I recently read the book of Sebastian Rashki Python Machine Learning, there was a section on regression analysis, after reading this section it seemed to me that regression analysis is an alternative to classification algorithms in machine learning, I don’t quite understand why we need a regression analysis in classification problems if there are classification algorithms, What is their advantage? I apologize in advance for my poor knowledge, I just began to study machine learning.

  • maybe you mean "logistic regression"? - MaxU
  • no, not quite, for example, in the same book I’m talking about, there is a mention of the LinearRegression class in the sklearn package for the python language, there are also RANSACRegressor and DecisionTreeRegressor which build more complex regression models and classify them based on them before classification algorithms? - Alexey Navalny

1 answer 1

Theoretically, the classification problem is considered as a special case of the regression construction problem. But in real-world problems, each of the methods has a clearly defined range of problems in which they should be applied.

In practice, regression models are applied then, all data are measured in numerical scales. Accordingly, you can specify arbitrary values ​​of independent variables and get the value of the dependent variable, and the range of its possible values ​​is from minutes to plus infinity.

Classification models are used when at least the dependent variable, and possibly some (or even all) independent variables are measured on weak scales, i.e. in dichotomous, nominal and / or rank scales.

Accordingly, if in the tasks of regression, the task of forecasting is put like this: "what will be the value of the dependent variable if independent values ​​are known" (for example, "here is a set of laboratory indicators, indicate the estimated duration of the patient's recovery"), and in the case of classification tasks, the task will be put like this: "which of the predetermined and final set of possible values ​​will be the value of the dependent variable if the values ​​of the independent are known" (for example, "here is a set of laboratory indicators, specify di disease prognosis ")