Greetings. I write with random scales and I need to solve the following problem. At the entrance I have, for example, such an array A = [1,3,7,10] and the number n ∈ [min(A); max(A)) n ∈ [min(A); max(A)) . We assume that [1; 3) is the first half-interval, [3, 7) is the second, and so on. Is it possible to determine in which interval (first, second, ...) is a given number n in constant time? It is clear that this is easily implemented by the cycle.

  • Binary search for O (log k) (k - the length of the array). Faster, I think it is impossible. PS: what is “random with weights”? - VladD
  • If you use only your array A, then nothing. What are the restrictions on numbers? Is it possible to create in advance an additional data structure? - Zergatul
  • @VladD it will still need to be sorted then ... Although I also thought about binary search. Random with scales - so that you can set the probability of loss of certain elements. I build an array of segments in terms of probabilities (the larger segment corresponds to the highest probability), and then I determine which segment the number fell into. Is this generally like a normal circuit? - Nik
  • @Zergatul in this abstract problem has no limitations. In my application, the sum of the elements of the array = 1; each element is> 0. - Nik
  • one
    If the accuracy of the elements is limited (for example, 6 decimal places), you can create an array of 1 million length ( B ) and fill it with segment numbers. Now to perform the search, you will write: int x = B[n * 1000000] . This will be O(1) . - Zergatul

1 answer 1

Because I did not receive accurate information from the author about the limitations in the problem, and as a result I solved the problem with my limitations. I decided for a sorted sequence of numbers (because I assumed that the author builds something like a probability distribution table, cumulative probability and wants to find it quickly), also for whole numbers (for real ones it is easy to generalize), my solution will work best for more or less evenly distributed numbers (otherwise it will be slower to work). The essence of my solution is that I used a precomputed table to speed up finding the right interval, for this I broke the entire interval [min, max] into c_table_size buckets ("bucket", or intervals) of equal size (rounded up), the table indexed by the bucket number and the values ​​of the table are indices of the first numbers of numbers that fall into this bucket, it turns out that for a given bucket, the interval of indices in the original sequence is [table[bucket], table[bucket + 1]) , i.e. thanks to the table with a uniform distribution of numbers, we reduce the search interval by c_table_size (table size) times, and the table is very compact - c_table_size 32-bit numbers, you can make it at any suitable size, how many is not a pity. I do a search through the interval predicted by the table through a linear cycle. Of course, a binary search can and should be done, but I’m already leaving it to the author for revision. Time measurements showed that a table, for example, with a size of 1024 elements, speeds up about 1000 times the search.

Here is the full C ++ source code, you can run online :

 #include <random> #include <cstdint> #include <algorithm> #include <functional> #include <vector> #include <iostream> #include <chrono> #include <string> using namespace std; typedef uint32_t u32; typedef uint32_t s32; typedef uint64_t u64; enum { c_nums_min = 1 << 10, c_nums_max = 1 << 20, c_nums_cnt = 1 << 16, c_table_size = 1 << 10, c_num_tests = 1 << 16, }; class TimeMeasure { public: TimeMeasure(string const & name) { name_ = name; start_time_ = std::chrono::high_resolution_clock::now(); end_time_ = start_time_; } u64 TimeElapsedNS() { end_time_ = std::chrono::high_resolution_clock::now(); u64 diff = chrono::duration_cast<chrono::nanoseconds>(end_time_ - start_time_).count(); return diff; } ~TimeMeasure() { cout << "Time elapsed for [" << name_ << "] = " << dec << TimeElapsedNS() / 1000000 << " ms." << endl; } private: string name_; chrono::time_point<chrono::high_resolution_clock> start_time_, end_time_; }; int main() { // Initialize random number generator. std::default_random_engine generator; std::uniform_int_distribution<u32> distribution(c_nums_min, c_nums_max); auto rng = std::bind(distribution, generator); // Generate random numbers. vector<u32> nums(c_nums_cnt); for (u32 i = 0; i < nums.size(); ++i) { nums[i] = rng(); } nums.push_back(c_nums_min); nums.push_back(c_nums_max + 1); // Sort numbers. sort(nums.begin(), nums.end()); // Create lookup table. vector<u32> table(c_table_size + 1); u32 const c_bucket_size = (c_nums_max - c_nums_min + c_table_size) / c_table_size; for (u32 i = 0, i_nums = 0; i < table.size() - 1; ++i) { u32 const bucket_end = c_nums_min + (i + 1) * c_bucket_size; table[i] = i_nums; while (i_nums < nums.size() - 1 && nums[i_nums] < bucket_end) ++i_nums; } table.back() = nums.size() - 1; // Generate random nums for queries. vector<u32> query_nums(c_num_tests); for (u32 i = 0; i < query_nums.size(); ++i) { query_nums[i] = rng(); } // Do lookups. { TimeMeasure time_measure("AllTests"); for (u32 i = 0; i < query_nums.size(); ++i) { u32 n = query_nums[i]; if (n < c_nums_min || n > c_nums_max) { cout << "Wrong n: " << n << endl; return -1; } u32 table_index = (n - c_nums_min) / c_bucket_size; u32 found_index = max(u32(1), table[table_index]) - 1; for (u32 j = table[table_index]; j < table[table_index + 1]; ++j) { if (n < nums[j + 1]) { if (nums[j] <= n) { found_index = j; } break; } } if (!(nums[found_index] <= n && n < nums[found_index + 1])) { cout << "Incorrect algorithm!" << endl << n << " [" << nums[found_index] << ", " << nums[found_index + 1] << ")" << endl; return -1; } } } return 0; }