memset or just a loop?? - Codeforces

→ Pay attention

Before contest
Helvetic Coding Contest 2024 online mirror (teams allowed, unrated)
26:55:20
Register now »

→ Top rated

#	User	Rating
1	tourist	3690
2	jiangly	3647
3	Benq	3581
4	orzdevinwang	3570
5	Geothermal	3569
5	cnnfls_csy	3569
7	Radewoosh	3509
8	ecnerwala	3486
9	jqdai0815	3474
10	gyh20	3447

Countries | Cities | Organizations

→ Top contributors

#	User	Contrib.
1	maomao90	174
2	awoo	165
3	adamant	161
4	TheScrasse	160
5	nor	158
6	maroonrk	156
7	-is-this-fft-	152
8	orz	146
9	SecondThread	145
9	pajenegod	145

View all →

→ Find user

→ Recent actions

Detailed →

aza-zil's blog

memset or just a loop??

By aza-zil, history, 5 years ago, In English

In English

Hey! In the last contest I was getting TLE on a problem, I thought my solution was not optimal, now I tried to solve it but using a loop to clear all elements of the array instead of "memset" -which I was using- and I got AC!! So which one is better ??

UPD: found that the TLE then using "memset" was because there was a mistake with the size of the array (which I don't know why I got TLE not RTE or something like that!!). But anyway, the time for the solution when using a "for" loop is like " 156 ms ", while it is " 1294 ms " when using "memset !!!

0

aza-zil
5 years ago
9

Comments

Comments (6)

Show archived | Write comment?

»

5 years ago, # |

Vote: I like it

+9

Vote: I do not like it

your loop resets values in range [0, n], but memset one resets the whole array which is very large.

→ Reply

»

5 years ago, # |

Vote: I like it

+3

Vote: I do not like it

Auto comment: topic has been updated by aza-zil (previous revision, new revision, compare).

→ Reply

»

5 years ago, # |

Vote: I like it

+21

Vote: I do not like it

Change

memset ( ctrN , 0 , sizeof ctrN ) ;
memset ( ctrM , 0 , sizeof ctrM ) ;

To:

memset ( ctrN , 0 , n * sizeof(ll) ) ;
memset ( ctrM , 0 , m * sizeof(ll) ) ;

→ Reply

»

»

5 years ago, # ^ |

Vote: I like it

0

Vote: I do not like it

Is there any significant difference between i) memset ( ctrN , 0 , sizeof ctrN ) ; and ii) memset ( ctrN , 0 , n * sizeof(ll) ) ; // assuming n is size of the array ctrN

→ Reply

»

»

»

5 years ago, # ^ |

← Rev. 4 →

Vote: I like it

0

Vote: I do not like it

In ii) you need to know that variable n is a size of array and ll is a type of element. In i) you can know nothing. No difference in runtime because third parameter just number, equal to num of bytes we need to set.

I'm always using std::fill(all(arr), 0); in code with #define all(x) std::begin(x), std::end(x). For this example, with Ofast runtime is same, but std::fill more generic and can be applied for std::vector with same runtime and for std::array with same runtime.

For array on stack there is a bug on codeforces, because stack limited by 256 MB even for problems with memory limit 512 MB, 768 MB, 1024 MB, but vector allocates memory on heap and works well.

Examples of bug with stack limit

This code in codeforces custom invocation gets RE with stack overflow:

#include <bits/stdc++.h>
int main() {
    char arr[511 * 1024 * 1024];
    std::cout << arr[5];
    return 0;
}

This code works well:

#include <bits/stdc++.h>
int main() {
    std::vector<char> arr(511 * 1024 * 1024);
    std::cout << arr[5];
    return 0;
}

→ Reply

»

5 years ago, # |

← Rev. 3 →

Vote: I like it

+3

Vote: I do not like it

FYI:

The time consumed by memset on $$$n$$$ bytes is $$$a_1 n + c_1$$$, where $$$c_1$$$ is the overhead of calling the function and maybe performing some additional checks and calling internal functions, which is kinda high, and $$$a_1$$$ is kinda low, since it's optimised as far as possible. One big optimisation is that on 64-bit architectures, you can set 64 bits (8 bytes) at once, or 4 bytes for 32-bit architectures.
The time consumed by a for loop is $$$a_2 n + c_2$$$, where $$$c_2$$$ is just the overhead of loop initialisation and $$$a_2$$$ is a normal for loop constant. This is already pretty fast, but if you're setting chars (just one byte at once), it will be several times slower. There's no significant difference if you're setting longs.
The third alternative, only usable when you're setting a fixed number of bytes, is a for loop inside an inline templated function, where $$$n$$$ and possibly the value you're setting (but this isn't very important) are known at compile time, so it can be massively optimised, unnecessary variables like loop index and movement between registers is eliminated, loop unrolling is done and there's no function call overhead. For $$$n$$$ divisible by $$$8$$$ (detected during compilation) on 64-bit machines, it should take something like $$$2+3n/8$$$ or $$$2+n/2$$$ processor cycles if the only function argument is the starting pointer. It's just copy starting pointer from register r1 to register r2, add n to r2, check if r1 == r2, if not: store the 64-bit value (e.g. 0) we're setting to position [r1] in memory, add 8 to r1, jump to "check".

Basically, use memset if you're calling it on big arrays a few times and you're setting the same value for each byte. Use a for loop if you're not setting the same value for each byte (don't use memset to set an int to 0xFF00FF00) or if you're using it very often on smaller arrays. Use a templated for loop if you can afford a fixed nice value of $$$n$$$.

→ Reply