There is a task. It is necessary to save 6 byte numbers, breaking them into 100 files of 2.5 billion numbers each. In the future, the number will be given, you need to determine which file it belongs to. If you save in the forehead, then 6 bytes * 2.5 billion = 14 GB per file go out. Total 1.4 TB. I came to a solution: 3 bytes - an index, 3 - data. The volume of each file was reduced by almost 2 times. The search is very fast. Found a way to compress data blocks - got 6 GB per file. 100 files - the minimum necessary, and this is 600 GB.

And what if to apply neural networks? A bad sign with them. I understand that networks work on patterns, learn from some data, work with others. My data is random and I will have to learn everything. Serve at the input 48 bits, the output to get the number 1 - 100.

1) Will this (search) work at all?

2) If so, will it reduce storage?

3) How long will the training take? I think the bottleneck.

  • The task is to determine which of the 100 files the number belongs to, or to save the number in one of the files? Neural network does not help to reduce the storage of numbers ... - MaxU
  • @MaxU, it is to determine what number (out of 100) it refers to - Dmitry
  • Can you describe the task in more detail? By what criteria are the numbers broken? Have you considered using hash partitioning? - MaxU
  • @MaxU, Given the numbers 1 - 100. Each of them corresponds to a set of 6-byte numbers from 0x000000000000 to 0xffffffffffff. The task, having an arbitrary 6-byte, to determine the corresponding number from 1 to 100 - Dmitry
  • one
    banal idea: use the remainder of dividing the number by 100 as the file number and for writing and reading - MaxU

0