Maybe I will ask nonsense, do not judge strictly, I am a novice. Suppose we have data and need to perform a prediction of some target variable. It does not matter which variable is binary or digital, and what matters is the following: an object whose property is given by the target variable can participate in the training data set several times.
We put such a training set, where the athlete field is dependent:
+-------+-----------------------+-----------------------+-----------+ | ФИО | Утренняя пробежка, км | Вечерняя пробежка, км | Спортсмен | +-------+-----------------------+-----------------------+-----------+ |Иванов | 10 | 15 | Да | |Иванов | 5 | 13 | Да | |Петров | 3 | 7 | Нет | |Петров | 4 | 2 | Нет | +-------+-----------------------+-----------------------+-----------+ And the prediction set:
+-------+-----------------------+-----------------------+-----------+ | ФИО | Утренняя пробежка, км | Вечерняя пробежка, км | Спортсмен | +-------+-----------------------+-----------------------+-----------+ |Сидоров| 12 | 14 | ? | |Сидоров| 11 | 6 | ? | +-------+-----------------------+-----------------------+-----------+ In both sets there are records related to the same object, but separated by different lines (let us say because of the date). Suppose we have trained a model that predicts data somehow. How in this model to take into account the fact that the predictions must coincide for the same objects (Sidorov can not be an athlete and not an athlete at different times)? Perhaps you need to cast one line for each object in the training set, or maybe there is some parameter that creates a hard dependency on the object column?