generic_placeholder_name's blog

By generic_placeholder_name, history, 4 months ago,

In this contest, while I was browsing through FST submissions, I encountered many TLEs on test 8 in B. Here are some examples: 142848311, 142836654. The solutions appeared to be completely correct. What happened here?

The problem was that these contestants used memset before every test. If you didn't know, memset(a, 0, sizeof(a)) is linear in the size of the array a.

This is why these contestant TLEd. Every testcase, they did $1000000$ unnecessary operations.

The lesson here? Stop using memset to reset stuff when there are many testcases. Resetting by hand works just fine. Or know how memset actually works, so that you don't TLE.

• +197

 » 4 months ago, # |   +106 Just avoid global variables like the plague in problems with multiple test cases.
•  » » 4 months ago, # ^ |   +104 Just avoid global variables like the plague in problems with multiple test cases.
•  » » » 4 months ago, # ^ |   +302 Just avoid global variables like the plague in problems with multiple test cases.
•  » » » » 4 months ago, # ^ | ← Rev. 8 →   -45 .
•  » » » » 4 months ago, # ^ |   +17 Problems with multiple test cases are actually very nice, they make pretests stronger and queues shorter. Also, if you write a local brute tester, you can run your code against a brute force faster.
•  » » » 4 months ago, # ^ |   +68 I am not going to write vectors of vectors of vectors of vectors to solve 4-dimensional DPs...
•  » » » » 4 months ago, # ^ |   0 Can you explain this 4 dp? I did not understand.
•  » » » » 4 months ago, # ^ |   +11 Btw, did you know that you can write vector dp(n, vector(m, 0))? Still $O(d)$ words "vector", but much better than $O(d^2)$. However, it is still slower than plain arrays because of multiple pointer jumps.
•  » » » » 4 months ago, # ^ |   +79 I am not going to write vectors of vectors of vectors of vectors to solve 4-dimensional DPs...
•  » » » » 4 months ago, # ^ |   0 Use something like this tensor class by ecnerwala. Now you can completely avoid global variables!
•  » » 4 months ago, # ^ |   -6 Covid deniers would definitely not follow your advice
•  » » » 4 months ago, # ^ |   0 we weren't even TALKING about covid bro
 » 4 months ago, # |   0 Thank you for the advice.Can you please let me know what you mean by resetting by hand?I mean like if i reset the values using a for loop then i am also going to take linear time.Is there a better way?
•  » » 4 months ago, # ^ |   +16 Do something like this: for(int i = 0; i < n; i++) a[i] = 0; //reset a[i] This is linear in n and not in the array size.
•  » » » 4 months ago, # ^ |   +6 You can also memset for definite size Like memset(a,0,n * sizeof(int));
•  » » » » 4 months ago, # ^ |   +8 Or memset(a,0,n * sizeof(a[0])) for generic types of a[].
•  » » » » » 4 months ago, # ^ |   0 memset(a,0,(n + 100) * sizeof(a[0])) And allocate the array of size MAXN + 500.
•  » » » 4 months ago, # ^ |   0 Thank you.I also had the same thing in mind but thought it would be better to ask.
 » 4 months ago, # |   0 Also, if you use functions like void done() to clear your static arrays, don't forget to call it every time after printing the answer. I made 3 submissions with this mistake in problem E. Don't do that.
 » 4 months ago, # |   +29 Another simple mistake I often see is not having #define int long long in every submission
•  » » 4 months ago, # ^ |   +33 This sometimes give tle.
•  » » 4 months ago, # ^ |   +56 Another simple mistake I often see is not having #define int long long in every submission
 » 4 months ago, # |   +14 On a more serious note, memset is not the problem here, you can just use memset(a, 0, n * sizeof(int)). It's just having ANY global variables. >99% of problems with multiple test cases don't require having data that is shared between testcases and is supposed to be in some part cleaned. I always have a struct that encompasses everything I need for a particular test case and that is a hard barrier preventing me from getting any kind of WAs resulting from not cleaning the data between test cases. The downside is that I need to resize every container with appropriate size at the beginning of every testcase, but that is not bigger effort than cleaning it. One can of course forget about resizing it appropriately what is a mistake comparable to forgetting about cleaning it, but if you don't resize then you get runtime error locally on sample test, which is fixed in 10 seconds (cause with proper local flags you are explicitly told that the runtime was caused by this particular container being empty), while forgetting about cleaning it leads to WAs on pretests/systests (and it may not be obvious not cleaning the data is its reason)
 » 4 months ago, # |   +56 Embrace lambdas so you never have to use a global variable ever again!Example of a submission to a multitest graph problem with no global scope used.
•  » » 4 months ago, # ^ |   +15 You can also get rid of that annoying auto &self parameter, by just typing the type explicitly rather than using auto: Spoiler function dfs = [&](int u, int c) { if (color[u] != -1) return color[u] == c; color[u] = c; for (auto [v, w] : adj[u]) if (!dfs(v, c ^ w)) return false; return true; }; instead of:  auto dfs = [&] (auto &self, int u, int c) -> bool { if (color[u] != -1) return color[u] == c; color[u] = c; for (auto [v, w] : adj[u]) if (!self(self, v, c ^ w)) return false; return true; }; 
•  » » » 4 months ago, # ^ |   +46 So what you're using here is actually something different, std::function, and there's surprisingly a performance difference between the two. std::function tends to have significantly more overhead. Try running this simple example in custom invocation for example: Code#include using namespace std; int main() { int n; cin >> n; vector> adj(n); for (int i=0; i int { int ret = 1; for (int v : adj[u]) if (v != p) ret += self(self, v, u); return ret; }; function dfs2 = [&] (int u, int p) -> int { int ret = 1; for (int v : adj[u]) if (v != p) ret += dfs2(v, u); return ret; }; cout << fixed << setprecision(10); auto start = clock(); int ret1 = dfs1(dfs1, 0, -1); cout << "Passing self: " << (double) (clock() - start) / CLOCKS_PER_SEC << endl; start = clock(); int ret2 = dfs2(0, -1); cout << "std::function: " << (double) (clock() - start) / CLOCKS_PER_SEC << endl; assert(ret1 == ret2); return 0; }  Outputn = 1e5 Passing self: 0.0000000000 std::function: 0.0180000000 n = 1e6 Passing self: 0.0310000000 std::function: 0.0780000000 n = 2e6 Passing self: 0.0620000000 std::function: 0.1750000000 n = 3e6 Passing self: 0.1090000000 Runtime error: exit code is 2147483647 These results were generated using C++20 compiler, though I got similar results using other compilers.As an extra note, I've benchmarked all the different styles of recursion before in the past (normal recursion, std::function, passing self, using Y combinators). Normal recursion is of course still the fastest, but passing self has surprisingly competitive performance in exchange for minimal additional code.
•  » » » » 4 months ago, # ^ | ← Rev. 6 →   0 A bit off topic,But why both the ways gets a runtime error for n=3e6 in custom invocation when I used C++ 17 (64 bit).Is there some optimization in C++ 20 for recursion ??UPD1 : Actually the self type recursion gives runtime error only for C++17 (64 bit) and works fine on other compilers.UPD2 : I think the reason may be that C++ 17 (64 bit) default recursion depth is set very low which doesn't allow this much depth in recursion. Then my question is how to increase recursion depth limit.UPD3 : I guess I will have to stop using C++17 (64 bit) because there is no way we can increase stack space for OJ, Sorry for pinging you.
•  » » » » » 4 months ago, # ^ | ← Rev. 2 →   0 Yeah sorry I don't know either. All I know is that passing self has less overhead, but I'm not very knowledgeable about the intricacies between different compilers.About increasing recursion depth, all C++ compilers on Codeforces, such as C++20 and C++17 compile with 256 mb of stack size, so there shouldn't be a significant difference between max recursion depth in the two.
•  » » » » » » 4 months ago, # ^ |   0 Yeah that's what I didn't understandThat even if the stack size for all compilers are same why C++ 17 (64 bit) give Runtime Error.
•  » » » » » » 4 months ago, # ^ |   +40 The reason behind less overhead with lambdas is the following: std::function is a type-erased implementation of function objects (and hence it is useless for competitive programming unless you want to do something like make vectors of "lambdas", which is impossible, so you need to use something like std::function instead), and it requires heap allocations and extraneous pointer indirections, which makes it slower. Lambdas are implemented as anonymous structs with operator(), i.e., they are callable (they're called functors). Using auto in the parameter list makes them a generic lambda, which is analogous to structs with templated operator(). When you call stuff like dfs(dfs, root, color), type deduction can be done at compile-time, and due to the lack of indirections, the compiler is able to optimize it much better.
•  » » » » 4 months ago, # ^ |   +22 Ok, didn't know about this, thanks! I have two questions: What is behind auto then, if it's not std::function? Does this higher overhead of std::function really matter in practice? (i.e. can it actually cause my solution to get TLE in a contest problem?)
•  » » » » » 4 months ago, # ^ |   0 To my knowledge, it's a unique unnamed class type so you can only declare it with auto. Probably not, but I don't like writing function parameters in both function and lambda.
•  » » » » » 4 months ago, # ^ |   +10 I guess it might matter in problems like this where they ask you to run DFS up to $10^6$ recursion depth. Though it shouldn't be a huge difference as long as problem authors are reasonable with time limits.
•  » » » » » » 4 months ago, # ^ |   +13 It worked 143040776 XD
•  » » » » » 4 months ago, # ^ |   +21 As far as overhead is concerned, using std::function got me TLE once, so I stay away from it as much as possible. Similar things happen when you use std::function in segment tree implementations and so on.
•  » » » » » » 4 months ago, # ^ |   +12 So is it safe to use it for simple stuff that only makes $O(n)$ recursive calls?
•  » » » » » » » 4 months ago, # ^ |   +10 Personally speaking, I don't use std::function for competitive programming purposes at all, but your mileage may vary. If $n$ is not too large, $O(n)$ recursive calls might be fine.
•  » » » » » » » » 4 months ago, # ^ |   -10 I used to use std::function, but because of some reasons (like not being able to set default arguments), I was told to use y_combinator (Although we need to have some extra code to use it). What are your views on it? Should we use it?
•  » » » » » » » » » 4 months ago, # ^ |   +1 y_combinator is just a wrapper for another way of getting something similar to the "recursive" lambda trick, so it should be fine. In most cases it will be as fast as the recursive lambdas (the only issue I know of is that sometimes those functions don't get inlined, but I have never heard of anyone getting TLE using that).
 » 4 months ago, # | ← Rev. 2 →   0 I think memset is faster than the iterative assign: memset(a, 0, n * sizeof *a);
•  » » 4 months ago, # ^ |   +55 Yes but if that's the difference between AC and TLE and your whole solution is O(N) then you can just curse the problem setter and move on :)
•  » » 4 months ago, # ^ |   +37 Using std::fill is much better than using memset btw.
•  » » 4 months ago, # ^ |   +23 The compiler will usually optimize the iterative assign to a memset anyway, unless there is some reason it isn't correct to do so. So it really doesn't matter.
 » 4 months ago, # |   0 I made a similar mistake here..Here