There is an array of numbers, say [2,1,2,2,1,2,..2,1,1] , how to continue this array? What methods of machine learning are better suited?

  • In the array can be only one and two? Those. Is it a classification or regression task? Still would not prevent additional information - what is this row, what is it like? Is this a “time series”? - MaxU
  • @MaxU, the numbers can be either 1 and 2, or from 1 to 8 (inclusive), and these are not time series - dmitry klemenkov
  • values ​​from 1 to 8 - discrete (quantized) or any real number from this range? Does your number series depend on anything? Pay attention to the last paragraph of the answer from @passant and clarify your question. With such a wording of the question is difficult to advise anything ... - MaxU
  • the sequence does not depend on anything, we can assume that it is a random number; all integers can be either only 1 and 2, or from 1 to 8 - dmitry klemenkov
  • then you should get: np.random.choice(np.arange(1, 9), N) , where N is the number of elements ...;) - MaxU

1 answer 1

If the series consists of "1" and "2" and there is no additional information about the series, then the only thing that can be done is to look for the probability of each of the values ​​appearing, and then using the obtained shares to generate random values.

You can go a little further, and try to identify the frequency of occurrence for couples, triads, quartets, etc.

If there is (or you can extract) some additional information, (well, for example, that the probabilities of the occurrence of "1" and "2" change over time, i.e. there is a trend or there is seasonality), then you can try to detect them (trend or seasonality) using appropriate methods.

If the appearance of "1" and "2" is due to some other factors, then you can try to build a classifier where the factors are independent variables, and your "1" and "2" are just a class label.

In any case, the main rule of Data Science is that data does not exist by itself. Meaningful data analysis can be carried out only having an idea of ​​the semantic context of the available data. Good luck.

  • one
    The last paragraph is straight to the point! +1 - MaxU
  • My problem is that I do not know the data in advance. And you need to predict them blindly. - Dmitry Klemenkov
  • one
    "In the blind" can only guess. And if you predict, then somehow you need to study the data. By the way - "in advance" know and not necessary. It is possible to study them as they become available, including the methods mentioned above. - passant