Chmel_Tolstiy's blog

By Chmel_Tolstiy, history, 3 years ago, translation, In English

Yandex.Algorithm ML track started! Hope you will participate it)

The task has been prepared by the Yandex.Alice team and personally Slava Alipov eik0u.

Building an open-domain conversational assistant that is clever and fun is one the most exciting applications and research frontiers of artificial intelligence nowadays. The core and very challenging task of such an assistant is providing coherent and entertaining replies to the users at all steps of their conversation with the assistant.

Participating in our competition you can to feel closer to the creation of the conversational assistant able to talk with millions of users like Alice does every day.

Submissions should be sent until 10:00 23rd April (MSK). Top 128 participants will receive cool t-shirts. The best three will receive cash prizes.

One can share approaches and discuss the contest with other participants in #proj_algo_alice Open Data Science community channel (registration link

Read more »

  • Vote: I like it
  • +18
  • Vote: I do not like it

By Chmel_Tolstiy, 4 years ago, translation, In English

Problemset was prepared by Yandex employees from Minsk.

Problem А. Task Management

Consider a solution with the complexity . Use a variable to track the last closed task and go through the list of tasks from the beginning to the end. Increase the counter when passing through the element corresponding to the next task. The constrains are quite large, so this solution doesn't fit the timelimit.

What is the case when we can not find any suitable tasks up to the end of the list? There is only one case: the task number (x + 1) is closer to the beggining of the list then the closed task number x. Therefore, to solve the problem we can determine position of each task in the task list and count the number of distinct numbers x such that the position of task number (x + 1) is less than the position of task number x.

Don't forget to consider the first pass through the task list. The final complexity of the solution is .

Problem B. Cross-City Communication

Problem ''Cross-City Communication`` does not contain any special algorithmic ideas, so we have to implement behavior described in the problem statement.

For each hour from 0 to 23 we need to check the free conference rooms in the reqired cities.

Note that for larger constrains determining of free conference room for each pair (city, hour) can help to answer queries faster. This information allows to answer a query in time proportional to the number of cities instead of the total number of conference rooms.

Problem C. Test Invocation

There are several different approaches to solve the problem, consider two of them.

The first solution is based on counting the number of times the test number x will be run. In the original system each test runs once, and in the new system it runs 2d(x) times, where d(x) is the distance from the root test to test number x. This can be easily proved by the induction (this fact can be guessed from the description of the samples). d(x) can be calculated by any traversal of the tree.

The second approach is based on dynamic programming. Let T(x) be equal to the total running time of the test number x, i.e. sum of the running timeS of the tests it depends and the checking time ax. So we have:

To compute the answer T(1) we run a tree traversal and calculate T(x) values using the recurrent formula above.

The complexity of solution is .

Problem D. Artihmetic Mean Encoding

Сonsider next problem. If n = b1 + b2 + ... + bc and all bi are powers of two with non-negative integer exponents, find all possible values of c. We will show that c can be equal to any integer from popcount(n) to n (where popcount(n) is the number of set bits in binary representation of n).

It is obvious that we cannot use less than popcount(n) numbers and more than n. If we have a sequence bi of length x (x < n) it always contains an element 2t with a positive value t. We can replace it with the two elements 2t - 1, obtaining a sequence of length (x + 1). The sequence bi of length popcount(n) can be build from the binary representation of the number n.

To solve the problem we can start from k equals to 1 and check existence of the sequence ai. As shown above, we need to find the minimum k that popcount(kn) ≤ k and construct the answer.

For problem constraints k does not exceed 60 (60·(250 - 1) < 260 - 1 = 20 + 21 + ... + 259).

Problem E. Cluster Connection

Graph should not contain brigdes, so the degree of each vertice is not less than two. Since the sum of degree of all vertices is equal to n + 2 we have two possible cases:

  • there is one vertex of degree 4;
  • there are two vertices of degree 3.

Consider the first case. Graph is an union of two cycles of length (m + 1) and (n - m) with a common vertex. In this case, we need to choose which vertices will be in the first cycle (other will go to the second), and the order in which vertices will be on these cycles. The number of ways to construct such graph is

Consider the second case. In the graph there are two vertices of degree three connected by chains of edges. All the other (n - 2) vertices are located on this chains. Note that there can be only one connection with zero additional vertices (an edge). Let the chains have a, b and c vertices are (a ≤ b ≤ c). Consider a few cases:

  • 0 ≤ a < b < c, then the number of graphs is equal to
  • 0 < a = b < c, then the number of graphs is equal to
  • 0 ≤ a < b = c, then the number of graphs is equal to
  • 0 < a = b = c, then the number of graphs is equal to

Combine all formulas together and obtain the final result.

Problem F. Repetitions

To solve this problem, refer to the string terminology. A border of a string s is some string y that x starts with and ends with y.

For each prefix p of the text we need to find the maximum length of the border, which has at least one additional occurrence in this prefix. To do this, we can find all borders of p, go through this list in descending order of lengths to find the first with additional occurence.

To search for all borders we can use the prefix-function. Additionally, remembering that π(i) is the maximal border of the prefix with length i, and length of all the borders form sequence π(i), π(π(i)), π(π(π(i))), ... etc.

Interesting fact: the border of length π(π(i)) always has at least three occurrences in the prefix with length i (the first starts in the first position, the second ends in the position i and the third ends in position π(i)).

To find the answer of the problem we need the way to check whether the additional entry of the maximum length border (of length π(i)) exists. If such occurrence exists then text has a position j (j < i), where π(j) ≥ π(i). For a quick check of this inequality, it is sufficient to compute the prefix maximum values of the prefix-function.

The complexity of algorithm is .

Read more »

  • Vote: I like it
  • +61
  • Vote: I do not like it

By Chmel_Tolstiy, 5 years ago, translation, In English

Thanks to all the participants for successful submissions!

This set has been prepared by Romka, Chmel_Tolstiy and Ixanezis. Special thanks to Gassa for his help in the preparation of statements and analysis.

Problem А. Orthography

The task has rather small constraints, so that it can be solved easily in O(N2L) complexity, where N is the number of words in the set and L is the length of a word.

In order to achieve such complexity, it is enough to check what happens if we choose each word as ''correct'' (a word with no errors). So, for each word in the set, we iterate over the set of words again and calculate the number of positions in which our chosen word differs from each other word. If all the numbers do not exceed 1, the chosen word perfectly suits us as being ''correct''.

There also exists a better solution in O(NL) complexity, but it was not necessary to apply it here.

Problem B. Voice Alerts

For each recorded phrase, we can calculate the corresponding distance Di using the formula Di = Xi + V·(ti + Tphrase). Then we must find the minimal Di and its index (also minimal with such distance in case of multiple answers).

Problem C. Table Tennis Tournament

To solve this problem, the following observation is necessary. The problem statement implies that the participants scored N - 2, N - 3, ..., N - 1 points. And it follows that the winner won all matches, the runner-up won all matches except the match against the winner, and so on. The participant who took the last place lost all matches.

If we fix the order of participants in the tournament results, we can find the probability of such final outcome. For this, just multiply the necessary probabilities from the probability matrix pij. To generate all permutations, you can use the method std::next_permutation (C++) or, for example, implement a recursive search.

The original problem can be solved with the described algorithm in time O(n2·n!). Using the ideas of dynamic programming, this approach can be sped up, and one can obtain an algorithm running in O(n·2n).

Problem D. Numeric Compression

Let us solve the problem using dynamic programming. For this problem, there are several different approaches, we will describe only one of them.

Note that the number of bit sequences of length L with K alternating groups of bits starting with a group with a zero bit are exactly C(L - 1, K - 1). Indeed, each of the groups is not empty, and among L - 1 of the boundaries between adjacent elements of a bit sequence, we need to choose K - 1 (which will separate the adjacent groups with different bits).

Consider the function F(L, K): the sum of bit compression results of all numerical sequences of length L with K alternating groups of bits (as we already know, there are C(L - 1, K - 1) such sequences).

To compute the value F(L, K) one can use the following approach. Iterate the size of the right group t (the lowest binary digit of the compression result). It is easy to obtain the following relationship:
F(L, K) = sum(t = 1..L - 1) 2·F(L - t, K - 1) + is_even(KC(L - t - 1, K - 2) for L > 1; for the base case, F(1, K) = 0.

For the solution of the original problem, find the sum of the numerical compression results for all numbers X smaller than N + 1. All these numbers can be divided into several groups. Let the group Gi consist of all the numbers X such that the highest bit where it differs from N + 1 is at position i. Note that this automatically implies that i-th bit of X is equal to 0, and i-th bit of N + 1 is equal to 1. To calculate the result of summing the numbers from one group, we will use the results of numerical compression of the prefix of N + 1 up to position i and the function F(L, K).

The function calculates the result of compression for the prefix of the number N + 1, but if the last group contains bit 0, it is excluded from the sum. We consciously included this group in F(L, K).

The solution described above has complexity , but you can speed it up to . Maybe even faster?

Problem E. Bicycle Construction

Let us notice that condition of 4-gears set satisfiability can be rewritten as follows: we can complete a bike using given 4 gears if and only if we can split them into pairs such that products of diameters in each pair are equal.

For each gear diameter D, we can calculate the number of gears with this diameter CD, and for each possible P, we can calculate number CntP of pairs of gears that give us product of diameters P. It can be done in O(N2) time using HashMap / unordered_map or similar container, or in time with help of Quicksort or other sorting algorithm that is fast enough. Now let us consider all pairs of gears. For each pair with diameters D1 and D2 with product P = DD2, we can increase answer by CntP - 1, thinking of it as of ``the number of other pairs with such product of diameters''. Unfortunately, it is possible that we added some pairs which included one of the gears we are currently considering. In order to prevent this, we need to decrease the answer by CD1 + CD2 - 2 if current gears have different diameters, and by (CD1 - 2)·2 if they are equal.

Now we had taken into account all needed 4-gears sets, but each of them was considered several times. It's easy to notice that every set of the form D D D D (4 gears with equal diameters) is considered 6 times, every set of form D E D E is considered 4 times and every set of form D E F G or D E E F is considered 2 times. Thus, to obtain the final answer, we need to decrease our answer by for each gear diameter D, then decrease it by CD·(CD - 1)·CE·(CE - 1)·2 for each pair of different diameters D E, and finally divide the answer by 2.

Problem F. Radars

First, let us consider the case when we have a single control point. This is trivial: the easiest way is to place the radar into the same point as the single control point. We get efficiency of exactly 1 in such case.

Now let us consider a case when we have two or more control points. Suppose we have found an answer: radar placement with maximum total efficiency. After that, we can move the radar around such point until at least two control points are at integer distances from the radar. Further movement in any direction, which leads some of these points to cross the integer border of that radius around the radar, does not increase the total efficiency (otherwise, our initial assumption about the initial optimal placement is incorrect).

So we will try to find precisely such radar placement that at least two control points lie at an integer distance from it. In order to achieve it, let us iterate over all pairs of control points and over distances r1 and r2, which are the distances from these points to the radar. Eventually, we will find points on the plane for the radar itself: these are the points of circle intersections with radii r1 and r2 and with centers in chosen control points.

We can have numerous such points:
- 0, if circles do not intersect at all,
- 1, if circles touch each other (including the inner touching case),
- 2, if circles intersect.

For each of the circles' intersection points, we calculate the total radar efficiency by definition, and choose the global maximum of all such placements.

The final algorithm complexity is O(N3·R2) where N is the number of control points and R is the maximum considered radius. Considering that points are located in a square with coordinates from 0 to 50, the highest total distance that makes sense to examine is .

P.S. If you find a typo, let me know with a private message.

Read more »

  • Vote: I like it
  • +22
  • Vote: I do not like it

By Chmel_Tolstiy, 5 years ago, translation, In English

Yesterday ACM ICPC World Finals finished. We congratulate all winners, medalists and special prize winners!

We want to remind you that tomorrow, May 21, at 00:00 (UTC+3) the qualification round of Yandex.Algorithm will start. For participation in qualifications you should register and start a virtual contest at any time prior to May 23 (UTC+3).

As usual 6 problems of varying difficulty will be provided at Yandex.Algorithm rounds. In order to qualify for the elimination stage rounds and compete for a trip to the finals in Minsk and other prizes you should solve at least one problem.

Also Yandex.Algorithm offers students to participate in the internship track. For more information see the rules of the championship.

Good luck! See you at Yandex.Algorithm!

P.S. The qualification round has started. Problems prepared by Romka, Ixanezis and Chmel_Tolstiy.

EDIT. Analysis has been posted.

Read more »

  • Vote: I like it
  • +138
  • Vote: I do not like it

By Chmel_Tolstiy, 5 years ago, translation, In English

Thanks to all the participants for successful submissions!

This set has been prepared by snarknews, Chmel_Tolstiy, Gassa and Zlobober. The analysis was prepared by snarknews and Gassa.

Problem А. Alphabetical E-mail

Note that sets of vowels and consonants does not intersect, which means that, when comparing two subsequences lexicographically, first goes the one with the lesser first letter.

So read the characters one by one and remember first consonant and first vowel. When we meet both of them, immediately compare and print the answer. One of the subsequences can be empty.

Problem B. Bytaler

Read the first exchange rate and set and to its value. Then read all other exchange rates; if the next rate is greater than , then set to it, and if it is less than , set to it.

Answer will be .

Problem C. Chesskers

Because black checker can make no more than 7 turns before it will reach the 1st rank (or be removed by a white knight), and the white knight can also make no more than 7 turns (even if white move first, after seventh turn, the black checker reaches rank 1 and wins). Note that, at each turn, the checker has no more than 2 possibilities to move, and knight has not more than 8 possibilities, so we will have not more than 27·87 = 228 operations, which is small enough to just solve the problem recursively.

Let us implement two functions:

  • Checker's turn function which checks winning/losing condition, and if they are not reached, call the functions of knight's turn for all possible checker's moves.

  • Knight's turn function which checks winning/losing condition, and if they are not reached, call the functions of checker's turn for all (no more than eight) possible knight's moves.

So, the main procedure reads the input data, and then calls the appropriate function depending on who starts the game.

Problem D. Desert City

Consider the diagonal AC of the convex pentagon. Two vertices (let us call them D and E) are lying at one side of it, and one (let us call it B) on another, so we have two diagonals BD and BE intersecting AC in inner points P and Q. Diagonals AD and CE are diagonals of a convex quadrilateral ACDE, so they intersect inside this quadrilateral. Diagonals BE and AD intersect inside the ACDE too (otherwise, we have segment EQ which connects vertice E of a convex quadrilateral ACDE with point Q on AC and does not intersect diagonal AD, which is impossible). The same is true for BD and CE. So, the three other intersection points are lying at one side of the line AC, connecting points P and Q. This means that intersections of diagonals of the convex pentagon form another convex pentagon PQRST with parts of diagonals as its edges.

In this case, vertices of the pentagon ABCDE can be reconstructed as the intersections of the lines which contain the pairs of sides which don't share a vertex (let us say these are PQ and RS). First, PQ and RS cannot be parallel. Also, because ABCDE is convex and edges of PQRST are parts of its diagonals, intersection must lie at another side of line QR than the other vertices of PQRST. So, if this is true for all five pairs of lines, then the convex pentagon ABCDE can be reconstructed correctly.

Taking all the above into account, the following solution can be considered We check if all five given points are vertices of their convex hull, then reorder them so that the pentagon A0A1A2A3A4 is convex, then check for each i if AiAi + 1 and Ai + 2Ai + 3 intersect at some point Qi which lies at the other side of Ai + 1Ai + 2 than Ai (here, the sum in indices is calculated modulo 5). If any of the checks failed, the answer is No. Otherwise, the answer is Yes.

Problem E. Encoding

If it is impossible to select such a pair of p and q, then f(x) is permutation polynomial modulo 2m, which means that it transforms the sequence of consecutive integers from 0 to 2m - 1 into their permutation. Below, we show certain properties of permutation polynomials.

It is easy to see that for m = 1 and for any permutation polynomial, the sum a1 + ... + an must be odd: f(0) = a0 and f(1) = a0 + (a1 + ... + an); so if a1 + ... + an is even, then f(0) = f(1) modulo 2.

If m > 1, then a1 for permutation polynomial must be odd. Otherwise, f(2m - 1) - f(0) = a0 + a1·2m - 1 + 22m - 2·R - a0, where R is an integer. Now, if a1 is even, then f(2m - 1) - f(0) = 2m·(a1 / 2 + 2m - 2·R), so f(2m - 1) - f(0) is divisible by 2m.

After some more experiments you may notice the following rule: if a polynomial is a permutation polynomial modulo 2m, this polynomial also must have even sum a3 + a5 + ... and even sum a2 + a4 + .... Proof of this theorem can be found, for example, in this paper by Ronald L. Rivest.

So, all you need is to check that the sum a1 + a2 + a3 + ... + an is even, and if m > 1, additionally check that a1 is odd and the sum a2 + a4 + ... is even. If this is true, the answer is No (the input contains a permutation polynomial). Otherwise, the answer is Yes.

Problem F. Future Playlists

At first step, we should find a set K of vertices that belong to a single strongly connected component such that there is no edge that starts in a vertex from this set and ends in a vertex outside it.

Due to restrictions from the, problem statement it is possible to find such set, this set is unique, and the vertex corresponding to Byteasar's favorite track belongs to this set.

To find this set, we can run a depth-first search from vertex 1.

It is obvious that at 102016-th track, the probabilities for playing tracks not in our set can be considered as zeroes; moreover, 102016 in this problem can be considered as infinity.

After an infinite number of rounds, the vector p of probabilities must conform to linear equation system pW = p, where W is the adjacency matrix of the subgraph built from vertices of the set K.

So we have a homoheneous system of linear equations (WT - I)pT = 0, and an additional equation . It is easy to see that system is consistent and it can be solved, for example, by Gaussian elimination.

The value p1 is the answer.

P.S. If you find a typo, let me know with a private message.

Read more »

  • Vote: I like it
  • +39
  • Vote: I do not like it

By Chmel_Tolstiy, 5 years ago, translation, In English

Flex your programming muscles in Yandex.Algorithm, an annual international programming championship organised by Yandex, one of the largest internet companies in Europe and Russia's leading search provider.

Anyone over 18 – regardless of education, profession or programming style – has a chance of reaching the finals and winning up to the equivalent of $4,500, while participation in the preliminary rounds is also open to younger competitors from age 6. The top 512 participants according to the cumulative result of all three elimination stage rounds will receive a contest T-shirt.

The warm-up round takes place online on May 10, starting at 21:00 Moscow time (UTC+3). It is followed by the qualification round, which takes place online on May 21 starting at 00:00 Moscow time (UTC+3), and running for 48 hours. To qualify for the elimination stage rounds you need to solve at least one problem in the warm-up or qualification rounds. The top 25 competitors after the elimination rounds advance to the finals in Minsk, Belarus.

See the schedule of all rounds here. Read more about Yandex.Algorithm and register for participation no later than May 22.

Take a look at 2015 problems and results here.

Good luck!

UPDATE 01. The warm-up round takes place tonight at 21:00 (UTC+3). You can qualify for the elimination in this round.

UPDATE 02. Analysis of the warm-up round.

Read more »

  • Vote: I like it
  • +149
  • Vote: I do not like it