Codeforces celebrates 10 years! We are pleased to announce the crowdfunding-campaign. Congratulate us by the link https://codeforces.com/10years. ×

### z4120's blog

By z4120, history, 3 weeks ago, , Note: This technique had been used before at https://codeforces.com/blog/entry/60851 (editorial code of problem F) and https://codeforces.com/problemset/submission/1017/41357847 .

While this solution is faster than using int64_t (because Codeforces machines are 32-bit), the time limit should be loose enough for solution that does not use this trick to pass. However, this trick may be useful if you want to be very fast or implement an unintended/suboptimal brute force solution. (the same thing can be said about fast input/output methods. On Codeforces, cin/cout is usually fast enough)

This is definitely faster than the usual return (long long)a * b % mod, but it might be slower than Montgomery multiplication.

Given a positive integer md, and two positive integers a and b in range [0, md-1], the following function will compute (int) ((long long) a * b % md): (the quotient of the division is stored in variable d)

int mul(int a, int b) {
unsigned long long x = (long long) a * b;
unsigned xh = (unsigned) (x >> 32), xl = (unsigned) x, d, m;
asm(
"divl %4; \n\t"
: "=a" (d), "=d" (m)
: "d" (xh), "a" (xl), "r" (md)
);
return m;
}


x86 assembly has an instruction to divide a 64-bit integer by a 32-bit integer, provided that both the quotient and the remainder fits in 32 bits. (More details)

However, it's not a compiler bug that GCC does not optimize code like this

uint32_t f(uint64_t a, uint32_t b) { return a % b; }


to use that assembly instruction, because that would not work well (as required by the C++ standard) when the quotient exceeds 2^32. See also 64308 – Missed optimization: 64-bit divide used when 32-bit divide would work.

Unfortunately, I think there isn't any intrinsic function of GCC that provides the functionality, and it's necessary to use asm. (source)

Benchmark code: (you can copy-paste that to https://codeforces.com/problemset/customtest )

Code

Because Codeforces caches the result, you may need to change the input or some whitespaces in the code to re-run the code. When I run it the result is ~= 0.45 vs 0.65, which is a 30% performance gain. By z4120, 4 weeks ago, , Note: Depends on your coding convention, you may not encounter this bug at all.

I have had a bug in a code similar to this:

void process(long long x) { /* ... */ }
for (auto i = a.size(); i-->0; ) {
for (auto j = b.size(); j-->0; ) {
process(i - j);
}
}


Can you see the bug?

Both i and j are unsigned, therefore the result will be computed modulo 2^32 or 2^64, depends on the computer architecture. However, because process function takes a long long as input, the input is converted to long long. On 64-bit machines, the result ends up being correct, but on 32-bit machine, if i - j == -1 then the received x value is 2^32-1.

In this particular case, there's no undefined behavior, so turning on -ftrapv -fsanitize=undefined -D_GLIBCXX_DEBUG etc. does not help. But compiling the code as 32-bit, enabling -Wsign-compare, or never using unsigned integer types would work.

How to compile for 32-bit machine with 64-bit compiler?

Add the -m32 compiler flag. (Note that if your compiler fails with some message about link failed, you may need to install 32-bit libraries. The instruction is dependent on your operating system — see 1 2)

Another use of 32-bit compiling: Benchmark with Codeforces speed.

Because Codeforces runs your program in 32-bit, benchmarking in 64-bit mode may not reflect the performance on Codeforces server.

In particular, 64-bit integer multiplication and division on 32-bit machines are much slower on 32-bit machines than on 64-bit machines, so replacing a 64-bit integer division with a 32-bit one may make the program significantly faster on Codeforces, but the performance is almost the same on a 64-bit machine.

By z4120, history, 6 weeks ago, , Note: All of the information in this blog relies on implementation-defined behavior. Do not use this in production code. In Codeforces contests, it's usually preferred to simply use hand-implemented bitset, because you've infinite time before the contest; or bitset (however bitset is printed differently from vector<bool> in GDB, and it doesn't work if you need dynamic size).

It's mentioned in a previous comment that bitset internal can be accessed by casting it to int32_t*. What about vector<bool>?

Method 1: cast its content to uint32_t* (or uint64_t*), but it'll only work when _GLIBCXX_DEBUG is not defined.

	vector<bool> a {1,0,0,1,0,1};
cout << * (uint32_t*&&) a << '\n';
cout << * (uint32_t*&&) a.begin() << '\n';


Both prints 41.

Method 2: Access _M_p and _M_offset members of the iterators.

In normal mode it works fine, but in debug mode it will throw this error

error: ‘std::__cxx1998::_Bit_type* std::__cxx1998::_Bit_iterator_base::_M_p’ is inaccessible within this context


In this case, just cast it to the base. If you want something that works with both lvalue and rvalue, you can use

#ifdef _GLIBCXX_DEBUG
__cxx1998::_Bit_iterator_base& PP(auto& x){ return (__cxx1998::_Bit_iterator_base&) x; }
__cxx1998::_Bit_iterator_base&& PP(auto&& x){ return (__cxx1998::_Bit_iterator_base&&) x; }
#else
template<class T>
auto&& /* decltype(auto) ? */ PP(T&& x){ return std::forward<T>(x); }
#endif


(any way to simplify the template?)

#define private public may work, but I don't recommend using it.

Sample code: computing bitwise AND of two vectors:

	vector<bool> a {1,1,0,1,0,1};
vector<bool> b {0,1,1,1,1,0};
transform(
PP(a.begin())._M_p,
PP(a.end())._M_p + (PP(a.end())._M_offset != 0),
PP(b.begin())._M_p,
PP(a.begin())._M_p,
[](auto x, auto y){ return x&y; });
for(int x : a) cout << x;


Don't forget + (T(a.end())._M_offset != 0) when necessary.

Find the first set bit:

	vector<bool> a(100);
a = 1;

auto iter = begin(a);
PP(iter)._M_p = find_if(PP(iter)._M_p, PP(end(a))._M_p, [](auto x){ return x; });
iter = find(iter, end(a), 1);
cout << iter - begin(a) << '\n';


Note: While it's possible to change unused bits like this

	vector<bool> a {1,1,0,1,0,1};
*PP(a.begin())._M_p = 0b111111111;
a.resize(7);
for(int x : a) cout << x;


and it would still work fine, I'm not sure if it's always the case. It would simplify the implementation of some algorithm.

By z4120, history, 7 weeks ago, , Try clicking this link (the domain is codeforces.com)

How I discovered this

UPD: The bug is fixed now, however there's another (see the comment below) By z4120, history, 7 weeks ago, , You can verify the bug yourself by visiting https://codeforces.com/problemset/status.

Screenshot: (not an animated gif)  By z4120, history, 2 months ago, , Somebody complained that there's no way to know the problem scores (they're not visible on the standings table during virtual contest). This is a user script to solve the issue.

Known problems:

• The score table is displayed in both real contest mode, practice mode and virtual contest mode. (didn't test in real contest mode, but the scoring table will probably be duplicated...)
• The time is always 0:00. Edit: It may not be always possible to compute the penalty correctly, because the rule changes over time (for contests longer than 120 minutes)
• Only support English.

Screenshot:  By z4120, history, 2 months ago, , An extension of the blog Efficient and easy segment trees.

## Single element modification + find nearest previous element smaller than value

For recursive implementation, see this comment: https://codeforces.com/blog/entry/55083#comment-389949.

Non-recursive implementation: First it's necessary to find any parent of the target node. There are two methods.

• Method 1 (similar to the query range function): divide the range [0..x+1[ into $\log n$ ranges, find the rightmost range with minimum value smaller than val.
• Method 2: just traverse the tree to the left starting from node x. For simplicity, assume the last element of the array is $-\infty$. (only required in method 2)

(t[x] is the minimum value in node x.)

#define METHOD_1 /* 1 or 0 */
int previous_less_than(int x, int val) {
x += n;

#if METHOD_1
int found_node = -1;
for (int l = n, r = n + x + 1; l < r; l >>= 1, r >>= 1) {
if (l & 1) {
if (t[l] < val) found_node = l;
++l;
}
if (r & 1) {
--r;
if (t[r] < val) {
found_node = r;
break;
}
}
}
if (found_node < 0) return -1;
x = found_node;
#else
while (t[x] >= val) { // repeatedly go to the left node until reach a parent of a valid node
if (x & 1)
x >>= 1;
else
--x;
}
#endif

while (x < n) { // find the rightmost leaf with value less than val
x = x * 2;
if (t[x + 1] < val) ++x;
}

#if not METHOD_1
if (x == n * 2 - 1) return -1;
#endif
return x - n;
}


## Lazy propagation + find nearest previous element smaller than value

• Method 1:
• Remember to push before find the ancestor of the target node.
• The first step is the same.
• In the second step (find the rightmost leaf with value less than val), it's necessary to push before going to a child node.
• Method 2: It's no longer possible to just go to the nearest left node, because in this image Assumes the starting node is 26, and all lazy values of ancestor nodes of 26 are applied. The nearest left node is 25, however the lazy value of its parent, node 12, is not applied.

Instead, it's necessary to go to the left child (12) of the nearest ancestor (6) for which the current node is in the right branch; or node 1 in case there's no such node (when the current node is in the leftmost path — 1, 2, 4, 8, 16). The complete code is:

(assumes that push is only for the current node and push_all is for the current node and all its ancestors)

int previous_less_than(int x, int val) {
x += n;

#if METHOD_1
int l = n, r = n + x + 1;
push_all(l); push_all(r - 1);

int found_node = -1;
for (; l < r; l >>= 1, r >>= 1) {
if (l & 1) {
if (t[l] < val) found_node = l;
++l;
}
if (r & 1) {
--r;
if (t[r] < val) {
found_node = r;
break;
}
}
}
if (found_node < 0) return -1;
x = found_node;
#else
push_all(x);

while (t[x] >= val) { // repeatedly go to the left node until reach a parent of a valid node
if (x & 1)
x >>= 1;
else {
x >>= __builtin_ctz(x);
if (x > 1) --x;
}
}
#endif

while (x < n) { // find the rightmost leaf with value less than val
push(x);
x = x * 2;
if (t[x + 1] < val) ++x;
}

if (x == n * 2 - 1) return -1;
return x - n;
}


You can prove that, at any time t[x] is accessed, then all ancestors of x (excluding x) have lazy value pushed.

## 2D segment tree

Assumes the array size is n*m.

1. If n*m fits in memory, then it's possible to make the segment tree array (2*n)*(2*m) and turn the for loop into two of them (briefly mentioned in this comment
2. Otherwise, if offline processing is available, then just prepare the list of touched element for each 2*n node, build a segment tree for each of them, then process normally.
3. Otherwise, it's necessary to use pointers (dynamic node creation) anyway and non-recursive implementation is not possible.

## Segment tree with correct node order

Used when it's necessary for node 1 to be the combined value of all nodes and the combiner function is non-commutative. (usually it's not worse asymptotically to use query(0, n))

• Method 1. Just extend the array to the nearest power of 2.
int floor_pow_2(int x) {
return 1 << (31 ^ __builtin_clz(x));
};
int ceil_pow_2(int x) {
return floor_pow_2(2 * x - 1);
}

• Method 2. Store the elements in position $x, x+1, ..., 2n-1, n, n+1, ..., x-1$ where $x$ is the largest power of 2 $\leq 2n$. In that case in order to get the node index from the array index it's not just a simple += n but it's necessary to do this:
vector<int> t(2 * n);
int const offset = floor_pow_2(2 * n); // or ceil_pow_2(n)

int index_to_node(int x) { // note: index_to_node(0) == index_to_node(n) unless n is a power of 2
x += offset;
if (x >= 2 * n) x -= n;
return x;
}


Also, because the node l and r may be on different "layer" it's necessary to modify the range update/query function: (the single index update/query function can stay the same, with p += n replaced with p = index_to_node(p))

void add_range(int l, int r, int val) {
if (l == r) return; // required otherwise add_range(0, 0) will be the same as add_range(0, n) for n not a power of 2
l = index_to_node(l);
r = index_to_node(r);

auto const processl = [&] {
if (l & 1) t[l++] += val;
l >>= 1;
};
auto const processr = [&] {
if (r & 1) t[--r] += val;
r >>= 1;
};
if (l >= offset and r <= offset) processl();

while (l < r) {
processl();
processr();
}
} By z4120, history, 2 months ago, , Consider a harder variation of Codeforces Global Round 6 — Decreasing Debts (1266D) where the first way to consolidate is changed to: (_first suggested in this comment_)

Let $d(a,b) > 0$ and $d(b,d) > 0$ such that $a \neq b$ or $b \neq d$. We can decrease the $d(a,b)$ and $d(b,d)$ by $z$ and increase $d(a,d)$ by $z$, where $0 < z \leq \min(d(a,b),d(b,d))$.

(replace $c$ with $b$)

It can be proven that any valid submission to this problem is also a valid submission to the easier problem, however the reversion is not true. It appears that some people wrongly assumes that the problem is this harder variation and try to solve it with a "repeated shrinking" algorithm:

• Loop through the vertices in some order.
• For each vertex, rewrite all the edges adjacent to that vertex such that it's in-degree or out-degree is 0.

If the order is fixed, then it's possible to hack (sansen's hack looks like this, which works with the order 1, 2, ..., n: )

There are some other possible orders, see hacked submissions for details. (if you submitted some code which uses the shrinking algorithm described here, consider leaving a comment so people can try hacking it)

However, if the order is random shuffled (like in this submission), is it possible to hack the submission? If it isn't possible, then can you prove that the expected number of touched edges is $O(n)$? (Or $O(n log n)$)

(There's only one day hour left for uphacking! Please be quick.) By z4120, history, 2 months ago, , tl;dr:

If you want to try to figure it out, read below and don't open this

Recently I was writing some code that looks roughly like this:

double value= /* something */;
for(double step=1024;step>1e-8;step/=2)if(value-step>0){
value-=step;
// long long code to check condition satisfied
if(not ok)
value+=step;
}


While it gives correct answer locally on all sample test case, the result is wrong on Codeforces machine.

I had no idea what's going on, then decide to add a debug statement to the begin of the // long long code block. However, when I add anything there (even cout<<"";) the code gives the correct answer.

Initially I thought there's some undefined behavior, but I compiled with sanitization turned on, and there's no error reported.

Switching to another compiler works well on that particular test case.

Can you guess what could be the cause? (It's a bug in the code, not a compiler bug.)

(Of course if You're given the complete code and debug it for a while, you'll be able to find out the bug as well. The point here is that, can you guess what's the problem?)

Спойлер

Even if you knew about the behavior already (like me while debugging the code), I hope this blog can be useful to you when your code happen to have such a hard-to-explain behavior.

By z4120, history, 2 months ago, , UPD: New version: Add problem title to the dashboard. Screenshot. Now you can tell easily what the problem is, without clicking through the link or add tags to describe the problem (although if you have existing tags, it's okay as well)

A port of the chrome extension to Greasemonkey. I don't use Chrome, and so I decide to spend some time convert the chrome extension to greasemonkey userscript. I know that Chrome is the most popular browser, but I hope this can be useful to someone else. (there was somebody who asked for a Firefox extension)

For now, the user script only works on Codeforces, and it uses localStorage. Also, because it "merges" a bunch of files into one, the resulting user script is not really maintainable. Anyway, as far as I've tested the works identically to the chrome extension (except that the "badge text" does not work)

Because it's not possible for user scripts to add button on the taskbar the button is placed like this instead. Original post:

If you visit many different programming sites and come across problems you'd like to solve later, and forget about them later, consider giving my chrome extension : Upsolve tracker a try.

See snapshots here and here.

Upsolve tracker lets you

mark problems for solving later across various sites

also lets you add tags to the saved problems

reminds you the number of pending problems via a badge on the extension icon.

More features such as activity graph and alarm reminder are in the pipeline.

Suggestions/ Pull requests at github repo are welcome! By z4120, history, 3 months ago, , After reading Yet Another Blog About Subtasks, I feel that I should write a blog to try to reason whether subtasks are inherently evil. Even the author of Are subtasks evil? — Codeforces says that "I don't think that all problems with subtasks are bad".

Because the former blog is just a rant and doesn't provide any arguments, I'll focus on the reasons provided in the other blog. Before reading, note that when you solve the harder subtask, you'll automatically get the point for the easier subtask.

1. The difficulties are almost always ruined.

Codeforces Round #602(Div 1). Both problems A and B1 in this round have difficulty 500. However, I believe that B1 is easier than A.

Then just decrease the value of B1. (and increase the value of B2 if necessary)

Codeforces Round #601(Div 1) Problem B1 is worth 500 points, B2 is worth 750 points, while the only step that B2 makes with respect to B1 is that you can iterate only through prime divisors, not through all divisors!

Given that B1+B2 has less value than C, you can consider that they're two separate problems B1 = 500 and B = 1250. B has less points than C and is easier than C. No problem.

Codeforces Round #584(Div 1), where both E2 and G1 were worth 1500 points, which is nonsense.

E2 is actually 2500 and have rating 2400. G1 is 1500 and have rating 2000 (easier).

2. The strategy of solving the easy subtask before trying hard gives more points than just solving hard from scratch too often.

Again, look at Codeforces Round #584(Div 1) with G1 worth 1500 and G2 worth 2250. Here implementing G1 before even trying G2 is the best strategy.

The same is true about B1 and B2 in Codeforces Round #602(Div 1) (for not too high rated users), about G1 and G2 in Codeforces Round #568(Div 2), and so on.

Note the "for not too high rated users" part. In other words, that would only possibly a problem if that problem is (approximately) the last one the user solve, because solving each one separately costs time and it would affect the submission time of other problems too.

Now consider those users. The problem here is that the point value of the easier subtasks must not be too small (it must be larger than the point values of easier problems), however if it's too large then the point value of the harder subtask must be very large (otherwise the penalty of solving easier subtask would be significant with respect to the little penalty reduction on the harder subtask), which makes the point value of harder problems larger, etc.

In this case, I propose a change. (described below)

3. The subtasks should have some weight as problems even on their own, but often they are dead problems.
4. Often, the solution for subtask doesn't have anything in common with the solution of the original problem

The same thing happens with too-easy separate problems. When this problem only appears as a subtask, better users can "skip" that problem entirely, which is not too bad.

Possible change: transfer some of the penalty value from the easier subtask to the harder one.

Denote $P_A$, $p_A$ and $t_A$ the point value, the penalty value and the (relative) expected time required to solve a problem respectively. (currently $p_A = P_A / 250$) Then:

• Total value when solve task 2 immediately = $P_{A1} + P_{A2} - t_{A2} \times (p_{A1} + p_{A2})$
• Total value when solve task 1 then task 2 = $P_{A1} + P_{A2} - t_{A1} \times p_{A1} - (t_{A1} + t_{A2} - t_{common}) \times p_{A2}$

If we want to make the latter value smaller and keep the value of the former value, we can decrease $p_{A1}$ by $\delta$ and increase $p_{A2}$ by $\delta$ for some positive $\delta$ value. Then:

• Users who use approach 1 will have total value unchanged.
• Users who use approach 2 will have total value decreased by $(t_{A2} - t_{common}) \times \delta$.
• Users who can solve only subtask 1 and subtask 2 will have total value slightly increased (by $\sum_{X\text{ is solved before or = }A1} p_X \times \delta$) but that should be still smaller than users who solve both subtasks. Perhaps this would make it beneficial to solve easy subtasks than separate problems, but that can be "fixed" (todo: more analysis) by increasing the $p$ value of other problems too.

TODO: try to apply this change on old contests, see how the standings table would change. By z4120, history, 3 months ago, , Userscript: https://greasyfork.org/en/scripts/393292-cf-view-hacked-code (tested on Chrome+Tampermonkey). Click at the hack ID to view the list of possible submission.

Known limitations:

• Have to make a HTTP request for each defender.
• Currently the script does not filter to a single submission, even when it's possible. When the user is logged in, the submission page contains information about the hack ID, so it's possible to make additional HTTP requests to get the submission pages.
• May take a long time (and some CPU) to replaces all the hack IDs with links. Fixed in version 0.0.3.

Currently, the hacks page has a lot of columns, but none of them links to the actual submission. The easiest way (I can find) to view the code is to go to the defender's submissions page then find the submissions of that problem. By z4120, history, 3 months ago, , It's not uncommon for people here to try to improve their typing speed, or to complain that programming requires typing some hard-to-type characters frequently. There are a few solutions to that:

• Buy an ergonomic keyboard model. May work, but will not be discussed here.
• Just practice (on typeracer, etc.) This may help with the speed, but typing special symbol is still hard.
• Use alternate keyboard layout: may not always be available in on-site programming contests. In particular, Programmer Dvorak layout is not available on Windows by default.
• Key chording: discussed below.

While there are just so many keys on the keyboard, you can "make" more by chording the key (taking the idea from stenography). Then you can bind those to commonly typed keys/sequences.

1. How to bind keys?

Vim supports key binds natively. Just use noremap! (or just map!) I don't know if you don't use Vim.

2. What key bindings do you use?

ws = <bs>, uj = (, ik = ), UJ = [, IK = ], df = 0, un = unsigned, jp = .push_back(, jk = <<, as; = sa; = <esc>, fj = jf = <cr>, and some other ones.

3. Isn't that a lot? Would it be very slow to type them during programming competitions?

map! jp .push_back( is shorter than #define pb push_back. For brackets/special characters, it makes typing faster, so it's worth it.

4. Would typing two/three keys be slower than one key?

Typing shift+some key is not faster than typing two keys. Also, for key mappings that uses the same finger (such as uj or ik) you only need to press once. (I use some key cap to make it possible to press multiple keys with a single finger)

5. Does this really matter? My typing speed is fast already.

I think this is better ergonomically.

6. What if I get the order of the key wrong?

For chords typed with one finger that's not a problem. For others, a workaround is to use some plug-in (vim-arpeggio, for example) (not available in on-site contests), or to bind all permutations (not really feasible for combinations longer than 2 characters)

7. Can they cause conflict with something else?

It depends. Sometimes I get conflicts, but you can analyze your existing code to see if a combination is okay, or rename the variables (instead of rows you can use rowz for example)

8. What about #define?

Then you'll need to type parentheses anyway. By z4120, history, 4 months ago, ,  The bug can be seen at page https://codeforces.com/contest/1190/standings/participant/26510188/page/2 (accessible by visiting the user submission page, turn on "show unofficial", click any # sign, then click on a different page)

The bug is more apparent in another contest where the user solved the last problem. bug,
By z4120, history, 4 months ago, , (For codeforces global round 5.)

There are some users who submit the same code for C1 and C2, but get main test failed for C1 and passed for C2.

This JavaScript code shows that there are 12 users who passed C1 and failed C2: (run this in the developer console on a Codeforces page)

Code

Output: (list of submission ID for problem C2. View them at https://codeforces.com/contest/1237/submission/<id>)

> await main()
[62717661, 62729922, 62715435, 62717690, 62726047, 62727939, 62714189, 62716789, 62721112, 62730345, 62719766, 62719548]


I didn't write code to check how many of those submit the same code, but you can check manually that there's at least one by user Simon (rank 242) (side note: that user receives a T-shirt.) By z4120, history, 5 months ago, , Upd3: Move to greasyfork (originally some dollar signs are converted to Latex), and add some known bugs.

Usage:

• Just install the userscript. It will only one when you're taking part in a virtual contest.
• Currently only works on the "My submissions" page (/contest/ID/my)
• For testing, you can append ?always_show=1 to the URL.
• Appending ?mock_pretest_count=1 will pretend that all problems have between 10 and 20 pretests.
• When you register (virtual) for a contest, the script will prompt you to download the pretest.

How it works:

• Fetch every submissions
• Check if there's any skipped/hacked solution such that there's no rejected test cases (like this) (then the number of pretest is the number of passed test of that submission)
• Otherwise estimate the number by getting number of passed tests of solutions that fail on pretests/main tests of contestants.
• After the number of pretest is calculated, it's easy to compute the pretest verdict. The number of pretest is stored for future usage, so it's necessary to download the data only once.

Known bugs:

• Sometimes logging out fails for unknown reason. You can log out manually and then force-get the data.
• It's not possible if there's not enough submission for that problem. (For instance, running the code says that there are at least 1 and at most 138 pretests for problem 566E)
• Fetching the pretest may infall invalidate the csrf token of the current page. In that case a reload would help.
• When the user gets TLE, the "time taken" displayed will reveal that.
• Even if the user definitely passes, the "???" may appear when the number of pretests is not definite on "running on test x".
• If the user is taking part in a virtual contest, it's not possible to view the test cases of a submission. In that case it's necessary to log out the user.
• Fetching every submissions just to get the number of pretests of the problems is slow. (This can be avoided by fetching only solutions in contest, given that there are currently ~20000 submissions but only 3585 in-contest ones for #566) 