In this comment, it's mentioned that the complexity of `__builtin__popcount`

for any integer *j* with *j* = *O*(2^{N}) is *O*(*N*) (i.e ) instead of *O*(1). So to count the number of one in a large binary string of length *n* with *n* > > 64, if I split *n* into substrings (with *N* = 64 / 32 / 16) and apply builtin popcount to each of the substrings and add them up, then the total time complexity should be instead of .

But in page 101 of Competitive programmers handbook on the topic Counting Subgrids, based on the time taken to compute the results, the time should be same no matter if *N* = 64 for *N* = 32. But it turns out that they're different as "the bit optimized version only took 3.1 seconds with N = 32 (int numbers) and 1.7 seconds with N = 64 (long long numbers)".

Why *N* = 64 takes less time ?