### ifsmirnov's blog

By ifsmirnov, history, 3 years ago, translation, ,

Two years ago I came up with an idea (Russian only) of making a generic library for creating testcases. Months have passed, thoughts circulated in my mind over and over, and finally I sculpted something in code. Far many things are yet not implemented, but I already frequently use ones which are.

Jngen works with arrays, graphs and trees. It also can generate some strings and geometry and provides command-line options parser and cool SVG drawer. Here are some working examples.

cout << Array::id(10).shuffled().add1() << endl; // permutation of elements from 1 to 10

cout << Tree::random(100000, 20) << endl; // "long" tree with elongation 20

pair<string, string> test = rnds.antiHash({{mod1, base1}, {mod2, base2}}, "a-z", 10000); // should be self-describing :)

cout << rndg.convexPolygon(1000, 1e9) << endl;

cout << Graph::random(100000, 200000).connected() << endl;

cout << rndm.randomPrime(1e18, 1e18 - 10000000) << endl;


Almost all code is documented, there are some real-world examples.

Getting started is very easy. If you work with testlib.h and use in your generators only registerGen and rnd.next, replace #include "testlib.h" with #include "jngen.h" and you'll see no difference. Compilation will last a bit longer, but there is a workaround which makes it compile even faster than testlib.

I'll be happy if you try it and share your feelings and feedback. Everyone finds his code and interfaces perfect until shows them to the community.

Currently the library has much more "basic" things and primitives than "real content" – I mean there are more bricks than practical testcases. We're working on it: soon several "test suites" will be added, and you'll be able to create reasonable tests for your, say, LCA-like-queries problem, in several lines of code.

I'd like to thank zemen, Endagorion and Errichto for useful discussions and advice, Endagorion, GlebsHP and CherryTree for their libraries from which I could borrow code learn upon and MikeMirzayanov for testlib which was a massive source of inspiration on early stages.

• +389

By ifsmirnov, 4 years ago, ,

Several months ago I've seen a comment on Codeforces which explained a method of finding a tangent from a point to a polygon. This problem is rather hard and if you try the first implementation which comes to your mind, it probably will not work. That comment explained a tricky binary search where you should store not two endpoints of a segment, but three, and which was easy to implement.

However, now I cannot neither recall the idea nor find the comment itself. Could anyone help me with either?

• +46

By ifsmirnov, history, 4 years ago, ,

It is known (well, as long as Tarjan's proof is correct, and it is believed to be) that DSU with path compression and rank heuristics works in O(α(n)), where α(n) is an inverse Ackermann function. However, there is a rumor in the community that if you do not use the rank heuristic than the DSU runs as fast, if not faster. For years it was a controversial topic. I remember some holywars on CF with ones arguing for a random heuristic and others claiming they have a counter-test where DSU runs in independently of rand() calls.

Recently I was looking through the proceedings of SODA (a conference on Computer Science) and found a paper by Tarjan et al. The abstract states:

Recent experiments suggest that in practice, a naïve linking method works just as well if not better than linking by rank, in spite of being theoretically inferior. How can this be? We prove that randomized linking is asymptotically as efficient as linking by rank. This result provides theory that matches the experiments, which implicitly do randomized linking as a result of the way the input instances are generated.

This paper is relatively old (2014), though I haven't yet heard of it and decided to share with you.

A whole paper can be found here.

• +139

By ifsmirnov, history, 4 years ago, ,

Topcoder Google calendar available from here is awesome. It includes all SRMs, marathon matches and everything you'll possibly ever need.

But I don't need these long marathon matches which occupy my Google calendar and mark the whole week as busy. I'd like to have a calendar where only SRMs and TCO events are included. Maybe, the beginning of new marathon match would be helpful too.

Do you know any of such calendars? Maybe it is possible to tweak the standard one? Or should I make this thing myself?

• +59

By ifsmirnov, history, 4 years ago, ,

I'm now participating in my first TC Marathon Match. I decided to run my solution in an infinite loop and stop when TL is almost reached. But it turned out that %%clock()%% function (C++) doesn't return correct time in the system. I printed total time elapsed after each iteration, On some testcases the last entry printed was 210 ms, however, the solution got a TLE verdict (with 10 seconds time limit!). On others it managed to finish on time, but the time printed by tester was significantly larger than what my program printed to the stderr.

Is it possible to do time-based pruning on TC marathons? Which functions should I use for it? I've heard about some clocks from std::chrono, but a limit of one test submission per hour isn't really convenient to test it all.

Oh. And as you're already here, I've got one more question. Where can I find information about how often I can make a judged submission? I googled a bit but found contradictory information (72 hours, 4 hours, some else...) In the ongoing match (TCO Marathon R3) I could make several submissions in about a half an hour, and the next day I had to wait for an hour until the next submission.

I use a web form for submitting, if it matters.

Thank you for helping me climbing the learning curve :)

• +29

By ifsmirnov, history, 4 years ago, ,

Take a look at this snippet.

#include <bits/stdc++.h>
using namespace std;

void f(int x, int y) {
if (x > y) swap(x, y);
printf("%d %d\n", x, y);
}

void caller1(int i) {
f(i-1, i);
}

void caller2(int i) {
f(i+1, i);
}

int main() {
caller1(1);
caller2(1);
}


Have you noticed something strange? Or maybe redundant? Neither did I. But g++ did:

\$ g++-4.8 -Wall -O2 a.cpp
a.cpp: In function ‘void caller1(int)’:
a.cpp:5:5: warning: assuming signed overflow does not occur when assuming that (X - c) > X is always false [-Wstrict-overflow]
if (x > y) swap(x, y);
^


Wait, what the hell? This if is used not only in caller1 but also in caller2, and in both cases the flow continues to the different branch. It seems that the optimizer examines only caller1 and doesn't even think that there could be some other users of that "redundant" line of code.

What is more strange, this error does not reproduce if only caller2 is present (in that case if condition always evaluates to true).

Hopefully, optimizer doesn't completely cut out the body of the conditional and the code works as expected albeit the warning is shown.

The bug reproduces with all g++ versions I have up to 5.0 and doesn't reproduce with clang of any version.

• +58

By ifsmirnov, history, 4 years ago, translation, ,

667A - Pouring Rain

To know how much water you consume per second you should divide consumed volume, v, by the area of the cup, . Then you should compare thisit with e. If your speed of drinking is greater, then you'll drink all the water in seconds. Otherwise you would never do it.

667B - Coat of Anticubism

It is possible to make a convex polygon with given side lengths if and only if a generalized triangle inequality holds: the length of the longest side is less than the sum of lengths of other sides. It is impossible to make a convex polygon from a given set, so there is a side which is longest than (or equals to) than sum of others. Assume it is greater by k; then it is sufficient to add a rod of length k + 1. More, it is clear that adding any shorter length wouldn't satisfy the inequality. Thus the answer for the problem is .

This problem is solved with dynamic programming. We can select an arbitrary root of any length (at least five). Let's reverse the string. A boolean value dp 2, 3[n] denotes if we could split a prefix of length n to a strings of length 2 and 3 so that the last string has a corresponding length. Transitions: . Similarly, . If any of dp k[n] is true we add the corresponding string to the set of answers.

You are given the oriented graph, find four distinct vertices a, b, c, d such that each vertex if reachable from previous and the sum of shortest paths between consequitive vertices is as large as possible. First let's run a BFS from each vertex and find three most distant vertices over given graph and its reverse. Afterwards loop through each possible b and c. Having them fixed, loop through a among three most distant vertices from b in the reversed graph and through d among three most distant vertices from c in tie initial graph. This is sufficient, because if we've fixed b and c and d is not one of three furthest from c then we could replace it with one of them and improve the answer.

666C - Codeword

The first thing to notice: string itself does not matter, only its length does. In each sequence of length n containing a fixed subsequence s we can select s's lexicographically minimal occurance, let it be p 1, ..., p |s|. No character s k + 1 may occur between p k and p k + 1 - 1, because otherwise the occurence is not lex-min. On the other hand, if there is an occurence which satsfies this criterion than it is lex-min.

Given this definition we can count number of strings containing given string as a subsequence. At first select positions of the lex-min occurance; there are ways to do it. Next, you can use any of Σ - 1 characters at first s intervals between p i, and any of Σ at the end of the string. (Here, Σ denotes alphabet size).

Looping through p |s| — the position of last character in s in the lex-min occurence, we can count that there are exactly strings containing s as a subsequence. So, having |s| fixed, answer for each n could be computed in linear time.

A final detail: input strings can have at most different lengths. Thus simply applying the overmentioned formula we get a solution, which was the expected one.

You are given four points on the plain. You should move each of them up, down, left, or right, such that the new configuration is a square with positive integer sides parallel to coordinate axes.

Choose some d, length of the square side, and (x, y), the position of the bottom-left point. A set from where to choose will be constructed later. Then fix one of 24 permutations of initial points: first goes to the bottom-left point of the square, second goes to the bottom-right, etc. Having it fixed it is easy to check if this configuration is valid and relax the answer if needed. The only question is where to look for d, x and y.

First we deal with d. You can see that there are always two points which move among parallel but not the same lines. In this case d is the distance between these lines, i.e. one of |x i - x j| or |y i - y j|. This is the set from where d will be chosen.

Now fix d from the overmentioned set and look at two cases.

1. There are two points moving by perpendicular lines. Then there is a square vertex in the point of these lines intersection. In each coordinate this point either equals to bottom-left point of the square or differs by exactly d. Thus if bottom-left point has coordinates (x 0, y 0), than x 0 must be some of x i, x i + d, x i - d, similarly for y 0. Add all this values to the set.
2. All points are moving among parallel lines; WLOG horisontal. Let's fix a permutation of points (yes, once again) and shift each point in such a way that after moving it would equal the bottom-left vertex of the square. Second point should be shifted by ( - d, 0), third by (0,  - d), fourth by ( - d,  - d). All four points must be on the same line now; otherwise this case is not possible with this permutation. Now the problem is: given four points on a line, move it to the same point minimizing the maximal path. Clearly, one should move points to the position (maxX - minX) / 2; rounding is irrelevant because of the constraint of integer answer. So, having looked through each permutations, we have another set for bottom-left vertex possible coordinates.

By the way, the 10th test (and the first test after pretests) was case 2; it was intentionally not included in pretests.

Why does it work fast? Let D and X be the sizes of corresponding lookup sets. . Now for each d there are 4·3 = 12 coordinates in the first case and no more than 4! = 24 in the second. Thus X ≤ 12·(12 + 24) = 432. The main part works in ; however, it is impossible to build a test where all permutations in the second case give a valid position, so you can reduce this number. Actually, the model solution solves 50 testcases in 10ms without any kind of pruning.

666E - Forensic Examination

You are given string s and m strings ti. Queries of type find the string ti with number from [l;r] which has the largest amount of occurrences of substring s[a,  b]'' approaching. Let's build segment tree over the texts t_i. In each vertex of segment tree let's build suffix automaton for the concatenation of texts from corresponding segment with delimeters like a#b. Also for each state in this automaton we should found the number of text in which this state state occurs most over all texts from the segment. Also for each state v in this automaton we should find such states in children of current segment tree vertex that include the set of string from v. If you maintain only such states that do not contain strings like a#b, it is obvious that either wanted state exists or there is no occurrences of strings from v in the texts from child's segment at all. Thus, to answer the query, firstly we find in root vertex the state containing string s[a;b], and after this we go down the segment tree keeping the correct state via links calculated from the previous state.

Please refer to the code if something is unclear.

• +68

By ifsmirnov, history, 4 years ago, ,

Consider a well-known problem: given a static array of size n, answer m queries of kind "how many numbers on [l, r] have value less than x". The standard solution is to build a segment tree where in every node we store a sorted vector. To answer a query we do a binary search in every corresponding node, achieving per query time complexity.

There is a method to reduce time complexity to per query, called fractional cascading. Instead of doing binary search in each node we do it only in the root, and then "push" its result to children in O(1).

For years I thought that the second approach is blazingly faster than the first one. And today I've made a test. I implemented both approaches in a pretty straightforward way and tested them on a random data. The results were quite surprising.

Top-down implementation: 670 ms

Bottom-up implementation: 520 ms

The first one is , others are ! Time is averaged over several consecutive runs. Test data is generated randomly with n = 100000, m = 200000.

Why doesn't fractional cascading give any improvements? Am I implementing it in an improper way? Anyway, this might be worth taking a look.

• +182

By ifsmirnov, history, 5 years ago, translation, ,

Many of us have successfully pursued a degree in computer science and defended a thesis. Probably, some theses were related to sport programming and could be interesting to the community. There are two well-known papers: dynamic 2-connectivity by burunduk1 and recent eertree (palindromic tree) by MikhailRubinchik. There are definitely some others. Let's share!

• +125

By ifsmirnov, history, 5 years ago, ,

TL;DR: is it possible to construct an Euler tour tree which supports LCA, subtree sum and rerooting?

Euler tour tree (ETT) is a method for representing a rooted undirected tree as a number sequence. There are several common ways to build this representation. Usually only the first is called the Euler tour; however, I don't know any specific names for others and will call them Euler tours too. All of them have some pros and cons. I want to discuss if we can build an algorithm which contains benefits from each one. Maybe this post also could be a tutorial for beginners.

• +97

By ifsmirnov, 6 years ago, translation, ,

One more opencup stage has just finished, let's discuss problems here.

What is the solution for D?

• +42

By ifsmirnov, 7 years ago, translation, ,

Next SRM will be held on Saturday, 20:00 MSK (12:00 EDT).

• +65

By ifsmirnov, 8 years ago, translation, ,

Hello, dear Codeforces community!

I'm glad to present you the next Codeforces Round #121. The contest is brought to you by Ivan Smirnov (ifsmirnov) and Aleksandr Timin (AlTimin), the students of Moscow Institute of Physics and Technology. In our first round we will offer you a good time in Berland: you will hold a demonstration, prevent the demonstration, deal with the most important problems of Berland (with both of them!) and much more.

We thank Gerald a lot for the invaluable help he gave as during the contest preparation. He is also the author of one task in the contest. We also thank delinur for the English version of statements, Aksenov239 for proofreading the statements and MikeMirzayanov for an opportunity to hold a contest on this wonderful site.

As usual, the contest will be held in both divisions and the problemsets will overlap. The scoring will be published later.

We wish you good luck and hope that you enjoy the contest!

UPD1: The scoring is classic in both divisions — 500-1000-1500-2000-2500.

UPD2: The contest authors apologize to the contesters for a possibility of misunderstanding of problem B div 1 (D div 2). The statement was fixed soon enough and the incorrect understanding of the problem didn't pass pretests, so the round is rated.

• +107

By ifsmirnov, 8 years ago, translation, ,

Inspired by 163B - Lemmings.

Let us have an unknown rational number presented as an irreducible fraction 0 ≤ p / q ≤ 1, where denominator does not exceed Q (in this task — 109). We have a function which determines whether given rational number is less than, greater than or equal to that unknown number ( intf (p, q) → {1, 0,  - 1}). The problem is to quess an unknown number using only f(). It is impossible to call f(x, y) where either x or y exceeds Q. Floating point arithmetic is also prohibited.