[Bugged] Dynamic Bitsets in GCC

#	User	Rating
1	tourist	3880
2	jiangly	3669
3	ecnerwala	3654
4	Benq	3627
5	orzdevinwang	3612
6	Geothermal	3569
6	cnnfls_csy	3569
8	jqdai0815	3532
9	Radewoosh	3522
10	gyh20	3447

#	User	Contrib.
1	awoo	161
1	maomao90	161
3	adamant	156
4	maroonrk	153
5	-is-this-fft-	148
5	atcoder_official	148
5	SecondThread	148
8	Petr	147
9	nor	144
10	TheScrasse	142

UPD1: The bug has been fixed. I'll update the blog again when the fix is on Codeforces.

UPD: Unfortunately, left / right shift is bugged and doesn't clear bits properly. It should be fine for other operations, but use it at your own risk.

Original Blog

Hello sirs, you might know that Boost has a dynamically sized bitset class, but did you know that GCC also has one? Turns out it's included in the tr2 folder. You can simply include the <tr2/dynamic_bitset> header, and use it as std::tr2::dynamic_bitset. Yes, it works on Codeforces.

Here's a code example to count triangles in a graph (abc258_g):

Code

#include <bits/stdc++.h> I wonder if it's possible to get GCC to fix it...
using namespace std;

#include <tr2/dynamic_bitset>
using namespace tr2;
signed main() {
    cin.tie(0)->sync_with_stdio(0); cin.exceptions(cin.failbit);
    int n; cin >> n;
    vector<dynamic_bitset<>> adj(n);
    for(int i = 0; i < n; i++){
        adj[i].resize(n);
        for(int j = 0; j < n; j++){
            char c; cin >> c;
            adj[i][j] = c-'0';
        }
    }
    int64_t ans = 0;
    for(int i = 0; i < n; i++) for(int j = i+1; j < n; j++) if(adj[i][j]){
        ans += (adj[i]&adj[j]).count();
    }
    cout << ans/3 << "\n";
}

In some problems, we might not be able to use a constant sized bitset. For example, on 1856E2 - PermuTree (hard version). Here's a submission where we replaced neal's custom_bitset template with using custom_bitset = tr2::dynamic_bitset<>;, and got accepted with little performance difference (260853192, original 217308610).

The implementation of the bitset seems identical to a version of Boost's dynamic bitset, so you can read the docs here. You can view the source here.

Comparing to std::bitset, here are some notable things I saw:

They include more set operations, such as set difference, and checking if a bitset is a subset of another bitset.
They include lexicographical comparisons between bitsets.
You can append single bits, as well as whole blocks (i.e. integers)
You can also resize it like a vector. (For some reason there's no pop_back though...)
Find first and find next are also supported, without weird function names like in std::bitset. Unfortunately, there's still no find previous :(
You can specify the word type and allocator as template arguments.

Of course, it also has all the normal std::bitset functionality. However, I'm not sure how fast this is compared to std::bitset; you can let me know in the comments. Interestingly, it seems to be using 64 bit integers on Codeforces.

If you enjoyed this blog, please make sure to like and subscribe for more bitset content.

Thanks qmk for helping with the blog.

#include <bits/stdc++.h> #include <tr2/dynamic_bitset> using namespace std; using namespace tr2; int32_t main() { dynamic_bitset<> test(91); test[50] = true; cout << test << "\n"; cout << (test << 78) << "\n"; cout << "\n"; bitset<91> test2; test2[50] = true; cout << test2 << "\n"; cout << (test2 << 78) << "\n"; return 0; }

0000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

int32_t main() { bset test(91 + 64); test[50 + 64] = true; cout << test << "\n"; cout << (test << (78 + 64)) << "\n"; cout << "\n"; bitset<91 + 64> test2; test2[50 + 64] = true; cout << test2 << "\n"; cout << (test2 << (78 + 64)) << "\n"; return 0; }

Comments (21)

Write comment?

bitset

2 months ago, # |

+45

Note: they added more features to dynamic_bitset in a newer Boost version, but they haven't been added to GCC yet. So that's why I linked an older version of Boost's docs.

→ Reply

qmk

← Rev. 2 →

+81

Fun fact, you can also write dynamic_bitset<__uint128_t> which is technically $$$O(\frac{n}{128})$$$! 260859954

_Kee

2 months ago, # ^ |

+69

I doubt it will be faster, because __uint128_t is not a native type (the CPU can only handle integers up to 64 bits). Basically the compiler thinks __uint128_t is two uint64_ts, and operations involving __uint128_ts are translated to multiple instructions handling uint64_ts.

Experiment

This code:

#include <cstdint>
std::uint64_t add64(std::uint64_t a, std::uint64_t b) { return a + b; }
__uint128_t add128(__uint128_t a, __uint128_t b) { return a + b; }

is compiled to:

0000000000000000 <_Z5add64mm>:
   0:	48 8d 04 37          	lea    rax,[rdi+rsi*1]
   4:	c3                   	ret
   5:	66 66 2e 0f 1f 84 00 	data16 cs nop WORD PTR [rax+rax*1+0x0]
   c:	00 00 00 00 

0000000000000010 <_Z6add128oo>:
  10:	48 89 f0             	mov    rax,rsi
  13:	49 89 f8             	mov    r8,rdi
  16:	48 89 c7             	mov    rdi,rax
  19:	48 89 d0             	mov    rax,rdx
  1c:	48 89 ca             	mov    rdx,rcx
  1f:	4c 01 c0             	add    rax,r8
  22:	48 11 fa             	adc    rdx,rdi
  25:	c3                   	ret

You can see two adding instructions (add and adc) in the 128-bit version, while the 64-bit version just has one (lea; the compiler prefers it over add).

Xellos

+17

128-bit (and higher) integers are indeed native types, though not in basic x86_64 and not equally general — they're supported in particular operations. There are two ways CPUs handle higher-precision operations to achieve faster performance:

one or more operands (can also include the output) are represented by 2 registers like you say here, leading to 2x precision — example is 32-bit ARM's SMULL which multiplies 2 32-bit integers to get a 64-bit result in 2 registers using only one operation
SIMD instructions that use their own registers — this is the case relevant here, as a __uint128_t variable can use a register %xmm0 with SSE instruction set; bitwise operations are typically provided in earliest versions of SIMD instruction sets as they're very simple, and performance overhead of loading to SIMD registers isn't a big deal as memory still needs to be accessed in new cache lines

The performance difference depends on hardware and software, which is part of why $$$O(n/128)$$$ isn't valid $$$O$$$-notation.

← Rev. 3 →

I don't think there are SIMD instructions that can be usable to directly compute 128-bit values, at least on x86-64. For example, there's no instruction that compute "add with carry" which is necessary to compute 128-bit additions.

As I've shown in my experiment, GCC actually generates two adding instructions for a 128-bit addition (on x86-64). I tried various compile options but the compiler did not use SIMD at all. Similarly, a 128-bit multiplication expands to 3 multiplying instructions and 2 adding instructions, and a 128-bit division even becomes a library function call. It might be possible to use SIMD on bitwise operations, but GCC doesn't use SIMD for them, either. See this assembly output in Compiler Explorer.

True, I was thinking specifically for bitwise operations, where the fact that SIMD is packed multiple data doesn't matter. There's no bitset addition in principle.

Note that even some seemingly "elementary" bitset operations must be split into multiple assembly operations. Typical example: shifts, a massive pain in the ass to implement correctly.

larush

I wonder if using this can help pass some problems in $$$O(n^{2})$$$. Is this the case?

Abito

Yes. Bitsets in general help optimize $$$O(n^2)$$$ solutions

avighnakc

+40

Username checks out

+25

More bitset blogs coming soon.

VitalyKo

7 weeks ago, # |

+37

Can someone explain what shifting logic does it use? It's different from default bitset:

Output:

Milan

7 weeks ago, # ^ |

+14

It's interesting. I found this piece of code on gcc.gnu.org :

do_left_shift

template<typename _WordT, typename _Alloc>
    void
    __dynamic_bitset_base<_WordT, _Alloc>::_M_do_left_shift(size_t __shift)
    {
      if (__builtin_expect(__shift != 0, 1))
	{
	  const size_t __wshift = __shift / _S_bits_per_block;
	  const size_t __offset = __shift % _S_bits_per_block;

	  if (__offset == 0)
	    for (size_t __n = this->_M_w.size() - 1; __n >= __wshift; --__n)
	      this->_M_w[__n] = this->_M_w[__n - __wshift];
	  else
	    {
	      const size_t sub_offset = _S_bits_per_block - __offset;
	      for (size_t n = _M_w.size() - 1; __n > __wshift; --__n)
		this->_M_w[__n] = ((this->_M_w[__n - __wshift] << __offset)
			     | (this->_M_w[__n - __wshift - 1] >> __sub_offset));
	      this->_M_w[__wshift] = this->_M_w[0] << __offset;
	    }

	  //// std::fill(this->_M_w.begin(), this->_M_w.begin() + __wshift,
	  ////          static_cast<_WordT>(0));
	}
    }

It looks like they commented out the part of the code that was responsible for clearing the rightmost blocks... Indeed, you can duplicate bits. Try changing test[50] to test[0] in your code and you will see that the bitset is shifted, but the first 64 bit block is not cleared.

The takeaway is not to use dynamic_bitsets, I guess.

That's funny

I wonder if we can submit a bug fix to GCC...

adamant reported it ! (thanks)

6 weeks ago, # ^ |

+10

This bug has been fixed on GCC trunk: gcc-mirror/gcc@bd3a312

You can view the correct output here! https://godbolt.org/z/KMYMj8qMh

devvikram22

LEFT SHIFT works in ur testcase

#define static_assert( ... )
#include <tr2/dynamic_bitset>
using bset = tr2::dynamic_bitset<__uint128_t>;

Doesn't.

Auto comment: topic has been updated by bitset (previous revision, new revision, compare).

bgopc

I think it's time u write ur own dynamic bitset

TrendBattles

find_previous can be implemented the same as find_next I think, however, it's best to customize your own bitsets.

bitset's blog