doubtforces's blog

By doubtforces, history, 4 years ago, In English

Hi!

I was coding the Dynamic Segment Tree in C++ by storing all the data of a node in a struct. While coding, I found that the following small piece of code generated a very large binary file on compilation (32 MB). Even, the compilation time was longer than usual.

Code generating 32 MB binary file

However, while doing a slight change in the above code, the size of the binary file reduced to 20 KB (that's around 1600 times smaller).

Code generating 20 KB binary file

I use the following command to compile: g++ -o out file.cpp, and the size of the out file (Binary File) is mentioned above.

If I initialize those variables using a constructor or even if I add a default constructor (which does nothing), then the size of the Binary File reduces to 20 KB.

Code with default constructor: 20 KB binary file
Code with default constructor which does nothing: 20 KB binary file

The size of all the binary files mentioned is based on my machine. It may vary on other machines.

But why should someone worry about this?

Well, one should worry because the Online Judges have a size limit for the binary file generated. I just submitted the same code on a random problem on codeforces and got COMPILATION_ERROR. Submission: 86340669

Checker comment:

Can't compile file: Compiled file is too large [34411089 bytes], but maximal allowed size is 33554432 bytes [CompileRequest {id='program.cpp', description='', file='program.cpp', resources='', type='cpp.g++17'}]

The Binary File may even become as large as 1 GB

If I change N to 2e6, add four variables in the struct, initializing all of them to 1, and then compile the file — the binary file can become as large as 1 GB (it will definitely take a long time to compile, may freeze your machine too).

Sample Code
How to initialize variables then?

In my opinion, the best way would be to initialize all the variables using a default constructor, rather than initializing their values in the declaration itself. It will work either way, but using a default constructor is the most appropriate way of initializing variables in a struct, and it is safe too.

About why this happens, a probable guess is, it depends on the way how compiler allocates memory. If 'all' the variables are initialized simultaneously in a struct, then the compiler allocates memory differently, as compared to when 'some' of the variables are initialized.

I'm not sure, but this is just a guess.

So, anyone who can explain why this happens, in a better way?
  • Adding 5 variables, but leaving one of them uninitialized — 20 KB.
  • Initializing all the variables to 0 — 20 KB.
  • Using a default constructor to initialize values — 20 KB.
  • Using a default constructor that does nothing — 20 KB.
  • Initializing all the variables, such that at least one of them has a non-zero value — 32 MB.
  • And why is the size of the file TOO LARGE? (1600 times larger)
  • Vote: I like it
  • +42
  • Vote: I do not like it

| Write comment?
»
4 years ago, # |
  Vote: I like it -14 Vote: I do not like it

Just don't use unnecessary memory.

segtree[4*N] is enough.

  • »
    »
    4 years ago, # ^ |
    Rev. 2   Vote: I like it +3 Vote: I do not like it

    Well, I said "Dynamic" Segment Tree, which I create without using pointers and therefore have to allocate the memory before. In the case of a normal Segment Tree, 4 * N memory would be OK, but for a Dynamic Segment Tree, we may need to allocate at max Q * log(10^9) space. If for every query, we need to update two positions of the array, then twice the memory i.e. 2 * Q * log(10^9) shall be allocated — which is approximately 40 * Q or 40 * N as I mentioned above.

    You'll ask why I don't use pointers. It is because:

    • I hate pointers (they are very hard to debug, I always commit errors when I use pointers).
    • Allocating all memory at once is slightly faster than creating node every time using pointers.

    Anyway, the question here was "why", or how to overcome this even if I use that much amount of memory.

»
4 years ago, # |
  Vote: I like it -6 Vote: I do not like it

Initializing the member variables(non-const and non-static) of a class is allowed in C++ but logically wrong because you are trying to initialize a member variable which has not given memory yet. It's in general considered a bad practice. This might be causing the problem.

  • »
    »
    4 years ago, # ^ |
    Rev. 2   Vote: I like it +11 Vote: I do not like it

    Writing

    struct X {
      int a = 5;
    };
    

    does not mean that I initialize the instance variable named a, but means that it will be initialized to 5 when an object of type X is created. This feature is a syntax sugar and nothing more.

»
4 years ago, # |
Rev. 3   Vote: I like it +9 Vote: I do not like it

I have also encountered this problem. Fortunately, it wasn't during a competition. My IDE even refused to compile because of the enormous size of binary file. I think if you initialize anything at the compile time, it will be stored in the binary, so the best way to prevent this is to avoid initializing a lot at the compile time. In case of structs, instead use the default constructor for struct which will assign values at the runtime, not at the compile time.

Here you can see the comparison made in godbolt. It seems like initializing to 0 doesn't cause any problems.

»
4 years ago, # |
  Vote: I like it +60 Vote: I do not like it

TL;DR Writing initializes of fields is same as writing constructor, but it will be constexpr by default. This enforces compiler to initialize full structure in compile time and put it to binary, because of rules how program is initialized.

The question requires more deep analysis of how C++ programs are initialized. Details can be found here.

Long story short initialization is done in several steps.

  1. Global variables for which their initial value can be computed in compile time are initialized by this value
  2. All other global variables are initialized by filling theirs' memory layout with zeros
  3. Constructors are called for all global variables in some order (which is not fully specified, but for codeforces case you can think that this order is same that order of definitions in file).

The key fact is "for which their initial value can be computed in compile time". To do this, compiler must put this computed value in binary, because it can't compute anything on this step, only copy precomputed value (well, it not will be copied, but just would be mapped to correct address by dynamic linker, but whatever). So problem is initial memory layout of "8 million ints equal to one" of your array must be put in binary, because compiler can compute it in compile time and must do it.

What changes if you add your constructor? Well, almost nothing. You just forgot to say, that your constructor is constexpr, which means can be computed in compile time. It's hard to understand it automatically, so compiler assumes it can't. And when this constructor is auto-generated compiler can deduce it itself.

What changes if you use zeros? In this case compiler just use same mechanism as for zero-initializing all other variables. It's still placed in binary but it put something like "map 8 megabytes of zeros here", instead of putting it in binary directly. Only zeros have such an option.

So how to fix it? There are several options. First you can disable constexpr constructor, by writing your own (or even by writing Node() = default;), This can probably break some optimizations, but probably they are not really important. Also some other hacks exists, but I don't want to stop on them to much.

Other way is to use std::vector, instead of raw array. Overhead for accessing elements is near-zero for one-dimentional vector (as opposite to vector<vector> which is slower than two-dimensional array), and one allocation at beginning will not cost too much. Also, this will make your code faster on local testing, and may help to catch bugs of accessing to elements which should not exists by default. It's well-known trick to make arrays a bit bigger then needed to avoid runtime error if you have some +-1 errors, but in fact, this trick leads to hiding bugs you have increasing debug time dramatically in a lot of cases. Also, this can help with reducing scope of visibility of variables, which can also reduce number of typos. I would definitely recommend this solution, because it is a solution, not a workaround.

  • »
    »
    4 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Thanks! That was what I wanted :)

    You seem to be having a deep knowledge of core C++, did you get it from https://cppreference.com, or from reading the source code of C++? Or something else (books, websites, etc.)?

    • »
      »
      »
      4 years ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      Well, writing C++ for work for several years sometimes requires going deeper. I didn't done something special for that, just googling/reading articles/watching videos on cases I met for quite a long time.

  • »
    »
    4 years ago, # ^ |
    Rev. 4   Vote: I like it 0 Vote: I do not like it

    I think that if GCC assumes constexpr if it can guarantee that the expression is a compile-time constant. For example in this (86342114), none of the compilers ICC, Clang tries to evaluate the expression at compile time.

    Interesting thing in the above code is that I am trying to declare a stack-based local without marking the constructor constexpr and also the stack local is not marked as a constexpr variable. Marking constructor constexpr causes compile-time MLE.

    • »
      »
      »
      4 years ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      Well, this code https://godbolt.org/z/v45nqr fails for latest version of gcc, clang, icc. Only msvc succeeds, but it's well known to be not very standard-conforming :)

      I not sure if compiler is allowed to add constexpr to function if it can prove something, but anyway it doesn't have to do it, and as we see don't do in that case.

»
4 years ago, # |
  Vote: I like it -9 Vote: I do not like it

I do not have time to write detailed comment, sorry, just look at https://en.wikipedia.org/wiki/.bss

»
4 years ago, # |
  Vote: I like it 0 Vote: I do not like it

Long time ago, in VS far far older: https://shuffle-c.livejournal.com/1102.html