Fast and furious C++ I/O

#	User	Rating
1	tourist	3880
2	jiangly	3669
3	ecnerwala	3654
4	Benq	3627
5	orzdevinwang	3612
6	Geothermal	3569
6	cnnfls_csy	3569
8	jqdai0815	3532
9	Radewoosh	3522
10	gyh20	3447

#	User	Contrib.
1	awoo	161
1	maomao90	161
3	adamant	156
4	maroonrk	153
5	-is-this-fft-	148
5	atcoder_official	148
5	SecondThread	148
8	Petr	147
9	nor	144
10	TheScrasse	142

For a long time I've been upset with C++ standard input/output. First of all, I heard that fread/fwrite are much faster than everything else, and it's impossible to get good times on problems with huge input or output without using those. Secondly, it's really annoying to write formatting string and ampersands in scanf, especially with many variables to read. Thirdly, the only way to expand I/O to custom types is by overloading << and >> operators on streams, but they are the slowest. I tried to tackle all these issues in my implementation. Remember, that it's targeted for the common use case in programming contests, so it's not as flexible as one might wish.

The code is here Doesn't compile with MSVS.

I apologize in advance to everyone, who will be scrolling through this 500 lines trying to read my solutions. Also it's not advised for people without broad experience with C++ to try to understand the entirety of it (dangerous for your mental health). There are 3 major components.

1. File management

This is done by classes InputFile and OutputFile. Indeed, I use fread/fwrite with a buffer of size 2¹², but there is one catch. fread doesn't work for interactive problems, because input isn't fully buffered there, it's line buffered. I use fgets in this case, but make sure to call correct constructor, it should look like this (input and output are smart pointers):

#ifdef ONLINE_JUDGE
  input.reset(new InputFile(stdin, false));    // usual problem
  input.reset(new InputFile());                // interactive problem
  output.reset(new OutputFile());
#else
  input.reset(new InputFile());                // local testing using a console
  input.reset(new InputFile("input.txt"));     // local testing using a file
  output.reset(new OutputFile());              // output to console
  output.reset(new OutputFile("output.txt"));  // output to a file
#endif

Also it's possible to read or write to a string using InputString and OutputString classes. They are used similarly:

string s;
input.reset(new InputString(s));
output.reset(new OutputString(s));

2. String parsing

Next step is to convert standard types to/from a buffer of characters. To my deepest disappointment, even this problem isn't solved efficiently in standard C++ library. Just look at this benchmarks (situation with atoi and atod is similar):

https://github.com/miloyip/itoa-benchmark
https://github.com/miloyip/dtoa-benchmark

I didn't want to go too deep, so I used the simplest code to read/write integer types similar to this one 16792284. Then, I cheated with double treating it as two integers separated by a decimal point. This approach won't work for huge numbers or precision of more than 18 digits after the decimal point, but I've never seen such cases in the programming contests. All related methods are located inside InputDevice and OutputDevice classes.

3. User interface

Everything above was about being efficient. But the main goal during a contest is to code fast, so I wrapped all the efficiency into functions read/write, similar to scanf/printf. Of course, they offer much more. You don't need to write format strings. Range input and output are supported. You can read a string until the character satisfies some termination criterion. Write will insert delimiters for you automatically and can be configured with setPrecision, setFill and other modifiers similar to the ones in iomanip header. Full list of supported parameters is given in the comment after the code. Here's an example of reading and writing n, m, then n by m character grid (without whitespaces), then array a of length n, then array b of length m:

const int N = 1001;
int n, m;
char s[N][N];
int a[N], b[N];

read(n, m, s, n, a, n, b, m);
writeln(n, m, '\n', setDelimiter("\n"), s, n);
writeln(setDelimiter(", "), a, n, '\n', b, m);

Another feature is extensibility, for example you can create your own Point class, implement read and write methods for it, and use it later combined with all other features, for example read an array and then 2 more points.

template <class T>
struct Point {
  T x, y;

  bool read(InputDevice& input) { return input.read(x, y); }
  int write(OutputDevice& output) const { return output.write(x, ',', y); }
};

// later in the code
const int N = 1001;
int n;
Point<int> p[N], s, t;

read(n, p, n, s, t);
writeln(n, setDelimiter("\n"), p, n, s, setDelimiter(), t);

Speed

I spent a lot of time optimizing the code. To prove it, I included my methods in this benchmark. Below are the results with clang compiler (my methods are named fast).

benchmark results

int, printf         1.34   1.43   1.33
int, cout           7.22   6.78   6.72
int, custom/out     3.41   3.46   3.53
int, fast/out       0.28   0.27   0.28
int, scanf          1.20   1.21   1.14
int, cin           24.93  25.29  26.27
int, custom/in      3.08   3.04   3.02
int, fast/in        0.14   0.14   0.15
double, printf      3.17   3.20   3.16
double, cout       12.60  12.67  12.76
double, fast/out    0.60   0.61   0.59
double, scanf       2.06   2.17   2.00
double, cin        46.38  43.31  45.82
double, fast/in     0.40   0.39   0.36
char, putchar       4.28   4.16   4.24
char, cout          6.37   6.57   6.68
char, fast/out      0.36   0.37   0.37
char, getchar       4.08   4.02   3.83
char, cin           4.95   5.03   5.21
char, fast/in       0.29   0.29   0.28
char *, printf      0.82   0.80   0.80
char *, puts        0.81   0.81   0.81
char *, cout       11.76  11.97  10.93
char *, fast/out    0.22   0.22   0.23
char *, scanf       0.89   0.89   0.93
char *, fgets       0.53   0.53   0.53
char *, cin        23.91  23.44  22.71
char *, fast/in     0.25   0.24   0.25

Update 01/29/2017

Custom types must implement member functions read and write to be supported.
read now returns bool success flag instead of number of arguments read.
Floating point write bug was fixed. It caused errors on numbers like 1 - ε because of rounding.
is_iterator trait was implemented and SFINAE was improved. This fixed ambiguous overload resolution that happened in some cases.
write and read methods were moved inside classes to improve flexibility. Out of class methods now simply forward the parameters.
Oversight was fixed which made calls like write(vector.begin(), vector.size()) impossible, because only int size was supported.
complex and string are now fully supported.
Some erroneous calls could compile with unexpected results instead of giving CE. This was fixed.

Comments (25)

Show archived | Write comment?

Grzegorz

8 years ago, # |

← Rev. 4 →

I think (if it's not implemented yet), output buffering (not static size (like yours 1<<12), but all buffer) could be useful for the kind of offline problems (output answer in one time), at least it works for me — reduces the number of fwrite calls. Like:

// many writes
while (/* ... */)
    write(/* ... */); // = append data to the buffer

// output
flush(); // = fwrite(buffer), output to stdout/file

→ Reply

Al.Cash

8 years ago, # ^ |

fwrite calls at the current buffer size are already heavily dominated by conversion procedures. Reducing the number of calls should improve speed a bit, but at the cost of more memory consumption. In my opinion, the gain is too small to be worth it.

adamant

+10

What do you think about things like __getchar_nolock or getchar_unlocked?

I tested getc_unlocked and putc_unlocked, they are 10% faster than my methods to read/write files character by character. This is to be expected, because I have some overhead.

However, they become slower when we need more than one character to parse a value, like integer or any other type. Since this is the dominant case, in my opinion, it's not worth using unlocked functions.

Also they look like a hack. Did you know that gcc also has fread_unlocked and fwrite_unlocked? I would consider using them, but clang doesn't have them. This convinces me even more that they are a hack.

BekzhanKassenov

I tried to call fread_unlocked and fwrite_unlocked but did not manage to do that.

Also I found somewhere that in MinGW they're called _fread_nolock and _fwrite_nolock but then linking (!) failes with the following message:

Message

C:\Users\User\AppData\Local\Temp\ccBQi4aM.o:fastio.cpp:(.text$_ZN10OutputFile13writeToDeviceEj[_ZN10OutputFile13writeToDeviceEj]+0x1d): undefined reference to `__imp__fwrite_nolock'
C:\Users\User\AppData\Local\Temp\ccBQi4aM.o:fastio.cpp:(.text$_ZN9InputFile9fillInputEv[_ZN9InputFile9fillInputEv]+0x2f): undefined reference to `__imp__fread_nolock'
C:\Users\User\AppData\Local\Temp\ccBQi4aM.o:fastio.cpp:(.text$_ZN10OutputFileD1Ev[_ZN10OutputFileD1Ev]+0x32): undefined reference to `__imp__fwrite_nolock'
C:\Users\User\AppData\Local\Temp\ccBQi4aM.o:fastio.cpp:(.text$_ZN10OutputFileD0Ev[_ZN10OutputFileD0Ev]+0x32): undefined reference to `__imp__fwrite_nolock'
collect2.exe: error: ld returned 1 exit status

andreyv

These functions are well-defined and standardized: http://pubs.opengroup.org/onlinepubs/9699919799/functions/getc_unlocked.html.

getchar() is required to be thread-safe. If you call this function many times in a row, the added cost of locking/unlocking on each call accumulates. To remove the unnecessary overhead, you can lock the stream once with flockfile() and use the fast getchar_unlocked() instead.

al13n

+34

Nice results!

But IMHO there's not much practical need for this nowadays.

For the last few years I've been using only cin/cout for contests, and it was always fast enough. Not even once I had to resort to printf/scanf/gets/whatever. (For floating-point numbers too.) The following well-known tweaks at the top of main() are sufficient:

ios_base::sync_with_stdio(false);cin.tie(0);cout.precision(20);

_index

For this problem, cin/cout doesn't pass even with these optimisations.

Alex.F

-8

It does, tested it now. 22392545

← Rev. 3 →

+18

There wasn't c++ 14 5 months ago at codeforces.

ho-jo-bo-ro-lo

Recently saw a blog about c++14 being slow for scanf,printf. Didn't expect it to be faster in case of cin,cout!!
And about c++11: putting cin.tie(0), cout.tie(0) passes some more cases, still TL at test 29.

Swistakk

+26

You are a happy man. I definitely prefer iostream rather than cstdio, but once in few months it happens for me that I need to rewrite my IO. Locally there doesn't seem to be any difference, but on codeforces iostream is slower and that saddens me. Not by a very large factor, but when handling with really big inputs it becomes significant (problem linked by _index was one of those when I needed to rewrite it). (ofc I use all optimizations you mentioned)

yeputons

+41

Few comments about code style and correctness. And, yes, I think all of them make sense in competitive programming as well. I'm sorry if I missed something or my point is invalid — that may happen.

Why not use isdigit/isalpha from cctype and write your own code instead? Is it because you do not want problems with non-ASCII characters?
static inline bool isDigit(char c) { return static_cast<unsigned char>(c - '0') < 10; } — I believe you have undefined behavior right there because you are subtracting two chars and it's potential overflow. Which may be signed on some platforms (like, all that I know :), and it's undefined behavior. Convert c or '0' to unsigned char beforehand. UPD: riadwaw pointed out that two expressions in subtraction are subject to promotion to int first, so no signed overflow happens as long as int and char have different sizes.
I assume that you use C++11 as you use explicit constructors. No need in writing constructors for POD objects (e.g. everything in Detail namespace) — you can use bracket-list initialization if there are no constructors specified. E.g. return { 1 } or return Width({ 1 })
readFloatingPoint and writeFloatingPoint do not work with big numbers (which does not fit in 64 bits). That can easily happen.
Is __builtin_expect really necessary? Does it really help? Have you measured it? It clutters code and makes it harder to read and debug.
*--last = i["fnI"]; — are you kidding me? Is there any rational reason for writing this instead of "fnI"[i]?
return read(arg.first) & read(arg.second); — I believe that order of evaluation is not guaranteed for & or overloaded &&, is it? That can read second element of pair first which would we very fun to debug.
Please do not make your readChar and readString functions return int when they actually return boolean values of 0 or 1. It's misleading and can make people think that they return number of characters read, like in libc.
Is there a reason you use static_cast everywhere instead of C-style cast? You do not cast anything else, only numbers, no pointers or objects, and you can't do anything else than static_cast with numbers.
Do you really need input and output to be pointers instead of static variables which can be statically initialized?

That you for the valuable input. Now the answers.

I compared them on clang compiler and my isSpace was 10 times faster, other methods were 3-5 times faster.
char is an integer type and integer arithmetic is well defined by the standart.
Good point.
Yes, I wrote about that. However, during the contests I've seen huge real numbers only as a result of a bug. I'll try to find a way to write generic versions, but for output it will probably require too much code (see this).
I measured it only in a couple of most critical places and it did improve performance by more than 10%. Another reason to leave it is to educate people who didn't know such thing exists.
Just a little code obfuscation I couldn't resist :)
Yes, this is a huge oversight, and not only for & operator, but for + as well. Fixed.
If they returned bool, I would need to convert it to int later. This methods aren't expected to be called directly anyway.
Just general reasons to prefer static_cast.
This way I can switch input sources, although it's usually not needed for the contests. Another reason is that all the construction can be done inside main, not somewhere in the middle of template code.

pranet

+23

2) Isn't this true only for unsigned integer type?

← Rev. 2 →

+21

2) It's well-defined for unsigned overflows only. Say, (1 << 31) + (1 << 31) is undefined behavior if we have 32-bit ints. See here.

8) Conversion from bool to int is implicit, isn't it? Is it correct that you don't want it because you don't like implicit conversions and will have to throw in some static_cast<int>(bool) later in code (in read functions only, as I believe?). Well, right now only two functions return actual length — it's readUnsignedIntGeneral and readUnsignedInt, which is very misleading and looks inconsistent at the first glance. Later I understood that it's because these are only functions used in readFloatingPoint, but that's still very weird-looking for me.

10) Got it. Just in case you'll use C++14 in the future, = make_unique(...) is preferable over .reset(new T(...)) because it offers exception safety.

riadwaw

+15

2) it's subject to promotion first

I missed that. But even in this case I think they get promoted to int, not unsigned int..?

If we forget about case when sizeif int = size of char, it won't overthrow int

Oh, yes. Absolutely, thanks. Because we had two chars in the beginning, promoted them to int and now we have two very small ints.

Today I got a message from timus online judge that my solution has been rejudged, and it got TLE. I opened that problem and changed scanf/printf to FastIo. And my solution got accepted in 100 ms, while scanf/printf worked > 2500ms.
UPD Submitting solution without changes in msvc gets accepted in 1500 ms.

xsc

7 years ago, # ^ |

deleted

I_love_natalia

7 years ago, # |

  OutputFile(FILE* file = stdout) : file(file), owner(false) {}
  OutputFile(FILE* &&file) : file(file), owner(true) {}

Never checked this but I think this works

TheEnglishMajor

6 years ago, # |

It's been a very long time but I want to ask, how do you read/write 1D char array ? 2D char array works, but 1D char array gives:

In instantiation of 'bool read(Ts&& ...) [with Ts = {char (&)[4098], int&}]':| |488|error: call of overloaded 'read(char [4098], int&)' is ambiguous|

What i'm trying to do is reading bytes from a binary file then write bytes to a binary file. My code is: char buff[N]; ... read(buff,N); write(buff,N);

Tieway59

5 years ago, # |

It's been a very long time but I want to ask, how do you handle input like __int128 ? I tried this code in a school online judge, everything worked out fine, but __int128 problems results in INTEGER_DIVIDE_BY_ZERO.