std :: bitset can not be used, this is the condition!

A continuous array of bytes of sufficient volume is set, two functionalities need to be defined: read the Nth bit, set the Nth bit

Conditions:

  • it is necessary to provide the order of bits (big / litle endian), in some architectures it is switchable, therefore, you know, the question is not simple
  • you need to provide the number of bits in a byte other than eight
  • if you decide to format with defaults, rather than inline functions, provide normal "security" for a fellow programmer (such as ... passing as an argument - an argument with a unary operation, the ability to protect against renaming existing program entities)

In order to unify the measurements, I propose to measure the execution time in a uniform C + + code:

#include <chrono> // ... auto start = std::chrono::high_resolution_clock::now(); // код для замеров < ---------------------------- auto stop = std::chrono::high_resolution_clock::now(); std::cout << "Прошло: " << std::chrono::duration_cast<std::chrono::milliseconds>(stop - start).count() << " ms" << std::endl; // ... 
  • Comments are not intended for extended discussion; conversation moved to chat . - Nick Volynkin

3 answers 3

Since the question is about the fastest implementation, then of course it makes sense to try the implementation in the simplest way: selecting int for each bit. Let me remind you that the issue was not economical about the implementation.

I used the @avp code to compare the speed. MSVC 2015, Release mode, x64.

Code:

 #include "stdafx.h" #include <stdio.h> #include <stdlib.h> #include <stdint.h> #include <string.h> #include <limits.h> #include <time.h> #include <iostream> #include <bitset> using namespace std; int main(int ac, char *av[]) { const int n = 1000000; bitset<1000000> bcp; printf("bcp %d\n", (int)sizeof(bcp)); const unsigned long ebits = (sizeof(unsigned long) * CHAR_BIT); const unsigned long sz = (n + sizeof(unsigned long) * CHAR_BIT - 1) / ebits; unsigned long bc[sz], bysz = sizeof(bc); printf("bc sz = %d ebits = %d bysz = %d\n", (int)(sz), (int)ebits, (int)bysz); int* crude_array = new int[n]; printf("crude array size %d\n", n); clock_t s, e; int nn = 0; s = clock(); for (int r = 0; r < 100; r++) { srand(9); bcp.reset(); for (int i = 0; i < n; i++) bcp.set(rand() % n); for (int i = 0; i < n; i++) if (bcp.test(i)) nn++; } e = clock(); cout << "nn = " << nn << " t = " << e - s << '\n'; nn = 0; s = clock(); for (int r = 0; r < 100; r++) { srand(9); memset(bc, 0, bysz); for (int i = 0; i < n; i++) { int bi = rand() % n; bc[bi / ebits] |= (1UL << (bi % ebits)); } for (int i = 0; i < n; i++) { if (bc[i / ebits] & (1UL << (i % ebits))) nn++; } } e = clock(); cout << "nn = " << nn << " t = " << e - s << '\n'; nn = 0; s = clock(); for (int r = 0; r < 100; r++) { srand(9); memset(crude_array, 0, sizeof(crude_array)); for (int i = 0; i < n; i++) { int bi = rand() % n; crude_array[bi] = 1; } for (int i = 0; i < n; i++) { if (crude_array[i]) nn++; } } e = clock(); cout << "nn = " << nn << " t = " << e - s << '\n'; delete[] crude_array; } 

Result:

 bcp 125000 bc sz = 31250 ebits = 32 bysz = 125000 crude array size 1000000 nn = 3276800 t = 2158 nn = 3276800 t = 2067 nn = 3276800 t = 1979 

Repeated test runs show consistent results.


I note that the fact that the “naive” implementation turned out to be the fastest is not so self-evident. Indeed, she suffers a cache locality, so she could well be slower than others. Therefore, the advice is in any case to profile, profile and profile your application code again on real data. "Effort mind" to guess that it would be rather unreal (except for very very trivial cases).

    Just an example of measurements unsigned long vs bitset

     #include <stdio.h> #include <stdlib.h> #include <stdint.h> #include <string.h> #include <limits.h> #include <time.h> #include <iostream> #include <bitset> using namespace std; int main (int ac, char *av[]) { int n = 1000000; bitset<1000000> bcp; printf("bcp %d\n", (int)sizeof(bcp)); unsigned long ebits, sz = (n + sizeof(unsigned long) * CHAR_BIT - 1) / (ebits = (sizeof(unsigned long) * CHAR_BIT)), bc[sz], bysz = sizeof(bc); printf("bc sz = %d ebits = %d bysz = %d\n", (int)(sz), (int)ebits, (int)bysz); clock_t s, e; int nn = 0; s = clock(); for (int r = 0; r < 100; r++) { srandom(9); bcp.reset(); for (int i = 0; i < n; i++) bcp.set(random() % n); for (int i = 0; i < n; i++) if (bcp.test(i)) nn++; } e = clock(); cout << "nn = " << nn << " t = " << e - s << '\n'; nn = 0; s = clock(); for (int r = 0; r < 100; r++) { srandom(9); memset(bc, 0, bysz); for (int i = 0; i < n; i++) { int bi = random() % n; bc[bi / ebits] |= (1UL << (bi % ebits)); } for (int i = 0; i < n; i++) { if (bc[i / ebits] & (1UL << (i % ebits))) nn++; } } e = clock(); cout << "nn = " << nn << " t = " << e - s << '\n'; } 

    Operation test and set pull out in # #define themselves. Similarly, they need to make another clear (conditional names).

    Just in case, here are the macros

     #define B_SET(a, i) ({__typeof__((a)[0]) *_a = (a); __typeof__(i) _i = (i); \ _a[_i / (sizeof(_a[0]) * CHAR_BIT)] |= (1UL << (_i % (sizeof(_a[0]) * CHAR_BIT)));}) #define B_CLR(a, i) ({__typeof__((a)[0]) *_a = (a); __typeof__(i) _i = (i); \ _a[_i / (sizeof(_a[0]) * CHAR_BIT)] &= ~(1UL << (_i % (sizeof(_a[0]) * CHAR_BIT)));}) #define B_TEST(a, i) ({__typeof__((a)[0]) *_a = (a); __typeof__(i) _i = (i); \ _a[_i / (sizeof(_a[0]) * CHAR_BIT)] & (1UL << (_i % (sizeof(_a[0]) * CHAR_BIT)));}) 

    checked in g++.real (Ubuntu 5.4.0-6ubuntu1~16.04.2) 5.4.0

    • continue the theme =) my output is nn = 63223897 t = 2760 nn = 63223897 t = 2804 I ran on Windows, replaced srandom () with srand and replaced random with int myrand(){ return (rand()<<16) + rand(); } int myrand(){ return (rand()<<16) + rand(); } In the debug, the beatset loses 30 percent. - pavel
    • ideone.com/uO4VPN here is a running example. - pavel
    • @pavel, looked at the output from the ideon, well, it means that bitset is a bit faster in Windows (distribution is different? Or processor? I tried unsigned int (instead of long), i.e. 32 bits. Virtually no difference. By the way, vector even lags far behind my bitset (i.e., it works longer). Also tried char cb[n]; byte per bit. Suddenly it turned out not faster. In 4 launches once a little faster and 3 slightly slower unsigned int bit version (apparently due to 8 times more memory, although I have 6Mb cache) - avp
    • @avp: From char th on bit, strangely enough, not so fast. But with me, I quite get faster. (Still int native size? Or another compiler?) - VladD
    • @VladD, I do not know. Probably depends on the combination of CPU-compiler. In general, I was surprised that the bitset with a constant in the template is not always the fastest. Apparently they are checking overseas. - avp

    @PashaPash, ok ... let's continue. I myself offer options!

    1) First

     #define GETBIT(Array,Num) ((*(Array+(Num/CHAR_BIT))&(1<<(Num%CHAR_BIT)))!=0) 

    2) Second

     bool Getbit(void *Array, size_t Num) { const int char_bits = std::numeric_limits<unsigned char>::digits; return (reinterpret_cast<unsigned char*>(Array)[Num / char_bits] & 1 << Num % char_bits) != 0; } 

    3) Third

     bool Getbit_other(void *Array, size_t Num) { const int char_bits = std::numeric_limits<unsigned char>::digits; div_t tmp = div(Num, char_bits); return (reinterpret_cast<unsigned char*>(Array)[tmp.quot] & 1<<tmp.rem) != 0; } 

    Criticize. Surely somewhere shoals with the order of bits will be. Offer your options / "patches" ...

    • Added macros to the answer - avp
    • one
      Your benchmark code was added to the question, the answer is deleted. All is well, do not worry, get comfortable) - Nick Volynkin