Good afternoon. I have the following problem, there is a classification task. Trayne 50,000 lines, Y 60 labels. But the data is unbalanced (in one class, 35,000 values, in the other 59 classes, 15,000 values, of which in some 30 values). If the example, that is, X (column_1, column_2, column_3) and Y:
colum_1 colum_2 colum_2 Y 0.5 1 2 1 0.5 1.1 2 1 0.55 0.95 3 1 0.1 1 2 2 2 0.9 3 3 And you need to add "noisy" data so that there is no imbalance, conditionally, so that all values ​​become the same:
colum_1 colum_2 colum_2 Y 0.5 1 2 1 0.5 1.1 2 1 0.55 0.95 3 1 0.1 1 2 2 0.15 0.99 2 2 0.05 1.01 2 2 2 0.9 3 3 1.95 0.95 3 3 2.05 0.85 3 3 Only this is a toy example, and I have many meanings. Thank.
"data augmentation". Regarding the "noise" data - what are the limits of noise? whycolum_2does not change?SVMis the name of the algorithm, not the module / library ... Give the corresponding part of the code ( How to create a minimal, self-sufficient and reproducible example ) - MaxU