Michael's blog

By Michael, history, 8 years ago, translation, In English

I wrote in May about the launch of Data Structures and Algorithms Specialization at Coursera. In September, the Advanced Algorithms and Complexity Course of this Specialization was launched, and I'd like to tell you more about it.

Course topics:

  1. Network Flows (Ford-Fulkerson and Edmonds-Karp algorithms).
  2. Linear Programming (the Simplex Method).
  3. NP-completeness (theory, reductions, solving NP-complete problems using SAT-solvers).
  4. Coping with NP-completeness (bruteforce optimizations, solvable cases, approximation algorithms).
  5. Streaming Algorithms.

The problems for this course were prepared by ifsmirnov, ilyakor, Michael, Perlik, romanandreev, Zlobober and Paul Melnichuk.

Network flows and the algorithms covered in the Specialization can sound as not very advanced topic for many Codeforcers. On the other hand, it is definitely a popular topic: there is a problem on network flows application virtually in every contest nowadays, and knowing the basic algorithms for finding flows is a must. One of the problems in the Programming Assignment of the Network Flows module supports that: Google Code Jam officially allowed us to use the statement of the problem Stock Charts from GCJ 2009 R2 (ilyakor has prepared the tests and the solutions for our version). Also, it often turns out that you should solve competitive programming problems on flows using Ford-Fulkerson algorithm, because it can be implemented extremely fasta, and it works fast in practice, but for the rare cases when the authors prepared specialized test cases breaking Ford-Fulkerson which is non-trivial.

Linear Programming has never been a competitive programming topic...right until the ACM ICPC World Finals 2016 where problem I just screamed "implement Dijkstra and Simplex Method". In the end, only one team from Stanford University solved this problem, and only two more teams tried. Stanford got lucky: they had the simplex method in their team notebook. However, according to what I observed during the finals as an ICPC analyst, they got unlucky too: it seems their implementation had a bug) As a result, they've spent a lot of time on fixing it and got the problem accepted only during the last hour, although they've started trying much earlier. I might have confused Stanford team's troubles with Wroclaw university's, though — sorry if that's the case. Anyway, so many teams have missed their chance for a much better result! One didn't have to solve anything, just needed to implement two algorithms, one of which many ACM ICPC finalists can implement without opening their eyes at night. romanandreev who prepared the problems on linear programming made sure that his solution passes on the problem I from World Finals 2016. Forums on the Coursera are flooded with discussions of accuracy problems and other subtleties of Simplex Method. Some of the students even complained that they have passed a full separate course just about Linear Programming, and have passed this problem there, but can't pass it in our course :) So you can test your implementation for the team notebook pretty well)

As opposed to competitive programming, in the real life Linear Programming finds lots of applications from optimizing advertising budgets, investment portfolios, diets, transportation, manufacturing, telecommunications to approximation algorithms for NP-hard problems, including the Travelling Salesman Problem.

It might sound surprising, but NP-completeness and coping with it could be seen as the most practical part of the course. You'd argue that this is all about Complexity, but in practice most of the problems are NP-complete and it is much more efficient to see the problem is NP-complete at once and think of ways to cope with that (that's what we talk about in the two modules) instead of trying to come up with some clever data structure, essentially trying to prove P!=NP wrong, without knowing it. Some of the problems use SAT-solver as a checker, because the problem is to reduce the initial problem to a SAT instance, and then it turns out modern SAT-solvers based on the latest research advances can solve huge instances in practice.

Streaming Algorithms are on the rise now, because often in Big Data processing you need to compute some statistic (number of different keys, the most frequent key, estimate the size of intersection of two sets of keys) based on terabytes of data using very limited amount of memory and reading the data just once or maybe twice. We've invited Michael Kapralov to cover this topic. We are going to add a problem on application of streaming algorithms later. It is challenging to separate the Streaming Algorithms from the regular ones under the typical time limits, but we already have a prototype, now it's only left to package what we got into a problem.

Full text and comments »

  • Vote: I like it
  • +131
  • Vote: I do not like it

By Michael, history, 8 years ago, In English
  • Vote: I like it
  • +103
  • Vote: I do not like it

By Michael, history, 8 years ago, translation, In English

Last year we've won in Request for Proposals from Coursera, and this year we've launched the Data Structures and Algorithms Specialization at Coursera. It is now the main option for studying algorithms and data structures on the platform. Specialization is a series of courses ending with a Capston Project which enables to learn the subject much deeper than it is usually possible in the scope of a massive online course.

The Specialization is launched by University of California, San Diego (Computer Science program ranked 11-th in the world) and the Computer Science Department of Higher School of Economics:

  1. Daniel Kane — Professor at UCSD, Harvard graduate, PhD from MIT, four times Putnam fellow (US mathematical olympiad for university students), and there is even a wikipedia article about him.
  2. Pavel Pevzner — Professor at UCSD, last 12 years teaching Algorithms and Bioinformatics there, one of the authors of the Bioinformatics Specialization at Coursera.
  3. Neil Rhodes — Lecturer at UCSD, former Staff Software Engineer at Google, has been teaching for the last 10 years, developed educational programs for Apple.
  4. Alexander Kulikov — visiting Professor at UCSD, director of Computer Science Center and coordinator of Computer Science club in Saint Petersburg.
  5. Michael Levin — Chief Data Scientist at Yandex Data Factory, has been teaching algorithms for 8 years at Yandex School of Data Analysis.

One of the main features of the Specialization is large number of problems which enable the learners to really understand algorithms: you all know that it only seems you've solved the problem until you start implementing and submitting it. The same thing is true about specific algorithms and data structures. There are around 70 algorithmic problems in the Specialization. Many of those were prepared by Burunduk1, Gassa, GlebsHP, ifsmirnov, ilyakor, nk.karpov, Perlik, romanandreev, tourist, Zlobober, Paul Melnichuk and Ruslan Savchenko.

There are two options for the Capstone Project: Finding Shortest Paths in Road Networks and Social Networks using algorithms which are thousands of times faster than the classic ones or Bioinformatics Algorithms that are used to assemble a genome out of millions of small pieces.

If you're red or very yellow here, you probably won't learn lots of new things. However, I'll just post a few of the reviews from our learners regarding competitive programming:

"Amazing Course. I have been looking for this kind of course for months. Must for anyone who wants to be good in Competitive Programming and Algorithms"

"An excellent course. Though I have 10 years of experience in software engineering and I've participated in programming contests in my undergraduate years, this course gave me a much clearer vision on solutions for typical programming problems."

"Very good course on algorithms,particularly useful for competitive programming."

UPD. If you don't want to submit assignments and get a certificate, to see the videos and readings for free, you can go to a particular course, e.g. Algorithmic Toolbox, and select the option to "Audit only". The second course of Specialization is Data Structures, it has been launched in April. The other three courses are not launched yet, the next one — Algorithms on Graphs — will be available in June, next — Algorithms on Strings — in July, last — Advanced Algorithms — in August.

UPD.2 You can submit problems in one of the following programming languages: C, C++, Java, Python2, Python3, C#, Haskell, Javascript, Ruby, Scala.

UPD.3 Algorithms on Graphs course starts on June 6, and you can still enroll.

UPD.4 Algorithms on Strings course starts on July 25, you can already enroll.

UPD.5 Advanced Algorithms and Complexity course has started, and I'm going to write a separate about it.

UPD.6 The culminating project on Genome Assembly has launched, and the project on Advanced Shortest Paths launched as the last optional module of the Algorithms on Graphs course.

Greetings from ACM ICPC World Finals in Thailand!

Full text and comments »

  • Vote: I like it
  • +229
  • Vote: I do not like it

By Michael, 11 years ago, translation, In English

Online Round 3 and its analysis were prepared by Chmel_Tolstiy, aropan, Romka, SobolS, Ra16bit, snarknews and Gassa.

During the third round of “Yandex.Algorithm” 265 participants have made at least one attempt, 177 of them solved at least one problem. There were 559 attempts for A (137 successful), 127 attempts for B (47 successful), 111 attempts for C (64 successful), 64 attempts for D (15 successful), 45 attempts for E (20 successful) and 606 attempts for F (129 successful).

Problems of the Round 3 proved to be easier on average than the problems of the first two rounds, and it became apparent during the contest that the winner would probably solve all problems. Indeed, Petr has solved all problems by the 72nd minute, although all of them were submitted “open”. Five minutes till the end of the contest KADR takes 2nd place with 6 problems, he has also solved all problems “open”. But almost immediately nkurtov (Nikolay Kurtov, Switzerland) solves his 6th problem and takes 1st place, as he has solved three problems “blind”. Two minutes before the end of the contest winger takes place with two problems submitted “blind” in the beginning of the round and all others submitted “open”. And at the last minute s.quark (s-quark) from China submits problem E “blind” and takes the first place: he has submitted only B “ open”. Overall five participants have solved all problems before the system test: s.quark is 1st, nkurtov — 2nd, winger — 3rd, Petr — 4th and KADR — 5th.

After the system tests it turns out that s.quark's F which he submitted “blind” on the 25th minute failed. Also nkurtov's С and D failed. Both winger's “blind” solutions passed, so he took the final 1st place in the Online Round 3. Petr took the 2nd place, but he had already qualified to the finals before. KADR stayed at the 3rd place, ainu77 (ainu7) from South Korea moved to 4th, and s.quark still took 5th place even after failing a problem and qualified to the final round.

Problem A (Candy Mood)

Let us count the number of candies M that are not tasty. If M = 0, the answer is . Otherwise, consider the expected increase in the mood braught by i-th candy if it is tasty. It can be equally probable in M + 1 positions relative to the not tasty candies, and only in one case you will eat the tasty one. So, the expected increase in the mood braught by one tasty candy Ai is equal to . For candies that are not tasty the expected change in the mood is , as each of the not tasty candies can become the first not tasty candy. Overall, the expectation of the mood change is equal to .

Problem B (Marbles)

Without loss of generality we will assume N ≤ M.

Let us consider the case N = 1. Denote the number of marbles on the photo by k. Note that we can keep any k marbles. But we cannot get all possible relative positions, because we cannot change their relative order. We can choose any k positions and place marbles on them, though. So, there are different possible photos.

Let N ≥ 2. Then if there are k = N·M marbles, we can get only one such photo, because we didn't make any move (the first move necessarily removes one marble from the field). If k = N·M - 1, then after the first move we can only move to the free cell on the field — it is equivalent to the well-known brainteaser “15 puzzle”; in this case, exactly half of the permutations are reachable (see the link above). If k < N·M - 1, let us consider any two free cells, first get a configuration where they are adjacent. Then the “15 puzzle” is solvable for one of them (it depends on the sum of the number inversions in the permutation and the numbers of row and column of the free cell, and the sum of numbers of row and column differ by one for adjacent cells). Then, choosing the right free cell, we only have to solve the “15 puzzle”.

Thus, the overall number of photos is .

Problem C (Fan Placement)

The easiest way of solving this problem was to generate all possible permutations of the fans. Let us fix an order of fans, for example, from bottom to top, and let i-th fan have initial number pi. Obviously, if for each fan the fan immediately before him does not block the view, then everybody sees the field comfortably. Let us consider all pairs of adjacent fans. If hpi ≤ hpi + 1, then this pair does not create any limitations on the placement: for any placement of the fan pi + 1 fan pi will not block his view because pi is lower. If hpi > hpi + 1, there appear the following restrictions: fan pi + 1 cannot take the next hpi - hpi + 1 rows after the fan pi. So, these hpi - hpi + 1 rows should be removed from the consideration for this fixed order of the fans. After performing this operation for all adjacent pairs of fans, we should add the number of combinations of K out of the number of rows left, where K is the total number of fans. To make this operation fast, we precompute the table of values Cnk for n and k not exceeding 1000.

Time complexity: O(n2 + kk).

There is a harder way using dynamic programming where the state is triple “how many rows already considered, mask of seated fans, number of the last fan”. Transition in this case consists of iterating through the “pre-last seated fan”. For this solution to work fast we have to store a table of cumulative sums parallel to the dynamic programming table to avoid inner loop through the rows.

Problem D (Interesting Language)

Let us first describe the bruteforce solution. Find all pairs of words such that one of them is a prefix of another. Remove the shorter word from the longer one, leaving only the suffix. If we compute for each suffix how many ways are there to get it using this operation and denote the number of ways by cntsuff, the answer for the problem is , where the sum is over all suffixes.

To compute for each suffix the number of such pairs let us use a trie. Let us build two tries based on the input array of words: the first one will contain all the input words, and the second one will contain all the reversed words. Go through the nodes of the trie with reversed words. Each time we enter a terminal node, go up from this node to the root while in parallel going down using the same symbols in the trie with the input words. If we enter a terminal node in the trie with input words during this process, we should increase by one the counter for the suffix where we are at the moment in the trie with reversed words. If we've left the trie with input words during the process, we can stop.

After such pass through the trie there will be the correct value cntsuff in each node of the trie, and we just have to compute the answer using the formula above.

Time complexity: O(sumL), where sumL is the sum of lengths of all words.

Problem E (Taxes and Roads)

Let us see what paying the tax in some city gives us. After paying the tax in city A we can visit some other cities free of charge. Those are cities that are reachable from A and from which A is also reachable. If we get from the cities to graph theory, we can say that we can now visit free of charge all the vertices from the same strongly connected component of A. So, if we find the strongly connected components in the initial graph, we can pay the tax only when entering each component. As entering a strongly connected component “opens” for free of charge travel all the component and all taxes are non-negative, it will never be profitable to pay the tax in any other city of the same component.

Let us build a new graph based on the initial one. Its vertices will correspond to the strongly connected components of the initial graph. The weights of the edges will be set the following way: for each pair of strongly connected components that have an edge between them in the initial graph, create an edge in the new graph with the weight equal to the minimum of the taxes in all the cities which are the endpoints of the edges connecting this pair of components in the initial graph. Of course, we should account for the orientation of the edges.

After the new graph is built, we need to find the shortest path between the vertices that correspond to the strongly connected components which contain vertices 1 and n of the initial graph. It can be done either using the Dijkstra algorithm or by dynamic programming taking into account that the new graph is a directed acyclic graph. We also have to consider the start of the trip separately, as one can only pay the tax when he enters a city.

Problem F (Place for Capital)

If the angle between the radii is more than two radians, the minimum distance is the sum of radii. Otherwise, it is the difference of radii plus the length of the arc passing through the point with smaller radius. It follows from the fact that in the “curvilinear trapezoid” made out of two radial edges and two arcs the shortest path between the opposite nodes is the path through a radial edge and the arc of smaller radius.

We are left to compute everything accurately. Usage of the atan2() function allows to surpass most of the pitfalls. Without using atan2() there are problems with precision in case of two close points with big radii; in this case it helps to use asin instead of acos to avoid loss of precision.

Full text and comments »

  • Vote: I like it
  • +33
  • Vote: I do not like it

By Michael, 11 years ago, In English

Second online round and its analysis were prepared by rng_58, snuke, snarknews and Gassa.

In the second round of “Yandex.Algorithm”, 495 participants have made at least one attempt, 425 of them have solved at least one problem. There were 503 attempts for problem A with 73 successful, 11 attempts for problem B with 2 successful, 472 attempts for problem C with 76 successful, 846 attempts for problem D with 386 successful, 42 attempts for problem E with 16 successful and 624 attempts for problem F with 182 successful.

The results of internal testing made us hope that the number of participants who solve 5 problems will be higher than in the first round, and that the winner will solve all 6 problems. But it turned out too optimistic. By the middle of the round two leaders appeared: tourist and eatmore who submitted all the problems “blind”. There also was a whole group of pursuers each of which solved no more than two problems “blind”. The first one to solve 5 problems was cgy4ever from China who had problem C submitted “blind”. On the 75th minute three more participants solved 5 problems: maciej.klimek (maciejk) from Poland, Jedi_Knight (ivan.popelyshev) and burunduk3. On the 77th minute tourist submitted his 5th problem and took first place with 5 problems “blind”, and also eatmore; submits “blind” problem B which was not solved by anybody else by then. Still, tourist stays in the first place. One minute before the end of the contest natalia submits problem E which becomes 5th for her and with four problems “blind” out of 5 takes 3rd place.

After the system tests it turned out that only tourist stayed in the top-3 thanks to all 5 “blind” submissions accepted. However, he had already passed into the finals from the first round, so the 5th place became a way to the finals. For eatmore, problem E which he submitted next to last — failed; for natalia, problem F which she also submitted next to last — failed. On the other hand, the last submitted problems — B which was solved only by two people (other than eatmore it was s-quark from China) for eatmore and E, submitted at 1:39, for natalia — passed the system tests.

In the end vepifanov took 2nd place, yeputons took 3rd place. They both submitted only problem D “blind” on the 5th minute. All the following participants submitted all problems “open”. Jedi_Knight (ivan.popelyshev) took the 4th place, and peter50216 from Taiwan took the 5th place and also got to the finals.

Problem A (Degree Sequence)

First, if the sum of di is not 2(N - 1), it can't be a degree sequence of a tree, so output “None”. Otherwise, it's easy to prove that it can be a degree sequence of some tree.

Let c be the number of i such that d[i] ≥ 2. There are several cases:

  1. c = 0. In this case d = {1, 1}. Clearly, the answer is “Unique”.
  2. c = 1. The tree must look like a star. The answer is “Unique” again.
  3. c = 2. There are two vertices with degree  ≥ 2. Let us call them s and t. There must be an edge that connects s and t, and all other edges connect one of {s, t} and a leaf. Therefore, the answer is “Unique” again.
  4. c = 3. There are three vertices with degree  ≥ 2. Let us call them s, t and u. Consider three pairs of vertices {s, t}, {t, u} and {u, s}. Exactly two of them must be connected by an edge (if all of them are connected by an edge, the edges form a cycle; if one or less are connected by an edge, we can't make the graph connected), so there are three ways to connect them. If the degrees of s, t, u are all the same, all of those three trees are isomorphic, so return “Unique”. Otherwise return “Multiple”.
  5. c ≥ 4. If the degree sequence looks like {1, 1, 2, ..., 2}, the tree must be a chain, so return “Unique”. Otherwise we have at least one vertex with degree  ≥ 3. We can construct two non-isomorphic trees using this vertex: one possible way is to construct a chain with vertices with degree >= 2 and add leaves to it, the other is to construct three chains with one common vertex. In this case return “Multiple”.

Problem B (Frogs)

Let f(t) be the position of Frog k at time t. We are asked to compute minimal t such that t - f(t) ≥ d. Note that t - f(t) is a non-decreasing function because for any t, f(t + 1) - f(t) is 0 or 1. Since t - f(t) is monotonous, we can use binary search. The main part of the solution is computing f(t).

For simplicity let us assume that k = 2 (otherwise we can repeat a similar algorithm k - 1 times). Let us call n good if the sum of beauty between cell 1 and cell n is a multiple of m. It has period m(p - 1), i.e. n is good if and only if is good. For each n ≤ m(p - 1), precalculate whether n is good. It turns out that f(t) is (the number of good numbers between 1 and t), so we can calculate it quickly with precalculated table.

Problem C (Green Triangle)

We want to compute the sum of signed areas pqr of all clockwise triplets (p, q, r). If we fix points p and q, the signed area of pqr is a linear function of coordinates of r, so we just want to know the sum of x-coordinates (or y-coordinates) of all points in the halfplane.

Fix a point p. Sort other points by the angle from point p in clockwise order (let us call them p0, ..., pn - 2). Then we should use a two-pointer algorithm: we should keep two indices i and j. Here, j is the biggest index that satisfies the condition “three points p, pi, pj are clockwise in this order”. We increment i from 0 to n - 2 and change j properly. If we increment i, j never decreases, so the total number of changes of i and j is O(n). Overall the algorithm works in O(n^2 log n).

There are some problems with precision in this problem under the given limits on the coordinates: double precision is not enough in case of triangles with small area and large perimeter. This is especially troublesome for Java where there is no long double type and using BigInteger leads to Time Limit Exceeded.

One of the variants to cope with precision problems is to store the sum of areas of triangles with the same base both in double and in a 64-bit integer. In this case double will store the highest 52 bits of the result and 64-bit integer will store the lowest 64 bits.

Problem D (MathWorlds)

This is the easiest problem. You should check all operators, and count the number of operators that satisfy the equation “x [operator] y = z”. You should carefully consider the case when y = 0: for example, for y = 0 don't check the division operator at all. Solutions that transform division into multiplication don't work for test 0 0 1 which became a problem for a lot of participants.

Problem E (Small Cycltes)

The properties in the statement are equivalent to the following:

  1. G has a spanning tree. Fix one spanning tree T, and let us call edges contained in T “tree edges”, and call other edges “non-tree edges”.
  2. If e = {u, v} is a non-tree edge, the distance between u and v in T is exactly two.
  3. For each non-tree edge e = {u, v}, color the path between u and v in T. No edge is colored more than once.

The original problem can be reduced to the following problem: You are given a tree, and some edges are already colored. In each operation, you must choose a path of length 2 and color it (you can't color an edge if the edge is already colored). How many operations can you perform?

This task can be solved greedily. See the tree as a rooted tree, and find the deepest uncolored edge e. If e is not adjacent to other uncolored edges, you can't color this edge, so ignore it. If e is adjacent to one of its sibling edges (let's call it e2), you should color e and e2 at the same time (otherwise you can't color both). If e is adjacent only to its parent edge, you should color e and its parent at the same time. After either e is colored or is ignored, find the deepest uncolored edge once again and repeat the process.

Problem F (Swap Balls)

If the answer of (p, q, r, s) is “Yes”, then the answer of (p + 2, q, r, s) is also “Yes”. This is because if you perform the same operation twice in a row, nothing changes. It is intuitive that if p is quite big, the answers to (p, q, r, s) and (p + 2, q, r, s) are the same. In fact we can prove that if p ≥ 48, this is true. With these observations, you can assume that p, q, and r are small, so we can use dynamic programming to solve the task (dp[p][q][r][s]: is it possible to change s to “RGB” using the operations p, q, r times respectively?).

Here is the formal proof. Construct a graph with 48 vertices. Each vertex corresponds to a tuple: (the parity of p, the parity of q, the parity of r, s). A sequence of operations corresponds to a path in this graph. If this path is very long, the path contains cycles. If the cycle contains p', q', and r' times of each operation, these numbers are even, and we can delete this cycle and prove that the answer of (p - p', q - q', r - r', s) is “Yes”. We can repeat this process until p, q and r satisfy  ≤ 48.

It turns out that 48 can be replaced with 3 (if you experiment using a computer).

Full text and comments »

  • Vote: I like it
  • +48
  • Vote: I do not like it

By Michael, 11 years ago, In English

In the first online round of Yandex.Algorithm competition 609 participants submitted at least one solution and 396 submitted at least one correct solution. Problems Non-Squares and Kingdom Division were fairly easy, both resulted in approximately 300 correct submissions. Problem Stacks of Coins was moderately easy and was solved by 145 participants. The three other problems proved to be more difficult. Stickers could be a very challenging string problem, but small input size allowed for dynamic programming solution, which 22 contestants got right. Assistants was solved by 13 participants; it required binary search, greedy, and some basic data structures to get it under time limit.During the contest, nobody was able to solve State Roads, which was a neat graph-theoretical problem with a simple to code, but a quite tricky solution.

Congratulations to Kenny_HORROR who was the only one to solve five problems, and to Petr, tourist, and eatmore who solved four problems and with negative penalty times also advanced to the final round.

The problem set of this round was brought to you by the Polish team: Tomasz Idziaszek ( Kingdom Division ), Jakub Łącki ( State Roads ), Marcin Pilipczuk ( Assistants ), and Jakub Radoszewski ( Non-Squares, Stacks of Coins ). Special thanks goes to Professor Costas Iliopoulos for suggesting the idea of the problem Stickers.

Solutions, test data and analysis were prepared by marek.cygan, monsoon, and jakubr.

Problem A (Assistants)

In this problem we need to find the minimum number of days in which n research tasks can be completed. The i-th task can be executed by the professor using pi consecutive days or can be delegated to an assistant who will execute it using ai consecutive days. Every engaged assistant can execute only one task, and the professor needs one day to find such an assistant.

First we show how to check whether the project can be completed in k days. Then it is only matter of using binary search to obtain the answer.

It is easy to see that there exists an optimal schedule in which the professor first searches for assistants, and then he executes unassigned tasks. Moreover, it means that we do not have to worry whether the professor executes his tasks during consecutive days. Formally speaking, if there is an optimal schedule in which the professor assigns tasks to assistants and he executes the remaining tasks (but not necessarily in consecutive days), then there exists an optimal schedule in which the professor delegates the tasks from T during the first #T days, and later he executes the remaining tasks (each task in consecutive days). Therefore, if the professor delegates the tasks from T, then he will finish the remaining tasks before the deadline if and only if .

Now we show an algorithm which will check whether such T exists. We consider days d = k, k - 1, ..., 1, and during some days we will be delegating tasks to assistants. Assume that we are in day d and we already delegated tasks during days after d. Let be the set of tasks which can be assigned in day d such that an assistant will complete them before the deadline. The main observation is as follows: if is not empty, then we can in day d greedily assign task which maximizes the value of pi.

Now we prove it. Suppose that there exists a schedule S, which in the days after d is the same as our plan. If the schedule S assigns task i in day d, then we are done. Otherwise in the schedule S in day d the professor assigns task j ≠ i or executes task j himself (in this case possibly j = i).

In the first case, the professor assigns task j ≠ i. Then (since assistant will finish before the deadline) and (we assume that S is the same as our plan in the following days). From our greedy choice we have that pi ≥ pj. If in S the professor assigns task i in day d', then d' ≤ d (since , and S is the same as our plan in the following days) and we can swap in S the days in which we assign tasks i and j. Otherwise, if in S the professor executes task i himself, then we can alter S as follows: we delegate task i in day d, and execute task j during these days in which the professor have executed task i (we can do that since pi ≥ pj).

In the second case, the professor executes task j in day d. If j = i, then we can also delegate this task in day d (since ). Otherwise, if j ≠ i, we can delegate task i in day d, and reclaim the missing day to execute task j in the moment in which the professor was working on task i (delegating or executing it) in the schedule S.

In all the cases we get a correct optimal schedule in which the professor delegates task i in day d, therefore the key observation is proved.

How fast can we implement the above algorithm? Choosing maximum from the set can be implemented using a binary heap ordered by pi. When moving from day d + 1 to day d we add the elements from to the heap (i.e. tasks with ai = k - d; we need to sort the tasks by ai to do this), and we remove the maximal element (and we add it to T).

Observe, that we do not have to iterate over all days, but we can start from the day d = min(k, n), since there is no need to assign tasks after the n-th day. Thus the whole solution (with the fixed k) works in time .

We can improve it by iterating not over the days, but over the tasks sorted non-increasingly by pi. Every time we try to assign task i in the latest day from {1, 2, ..., di} where di = min(k - ai, n) in which no task is yet assigned. It is not hard to see that the set T we get will be the same as in the previous algorithm. This assignment can be performed in almost-linear time, using the structure for disjoint sets. Each set contains the free day d and all the days after it in which some task are assigned. Now assigning the task boils down to finding the set which contains di, obtaining the minimal element d' in this set, assigning task i during day d', and joining the set with the set containing element d' - 1.

We can now estimate the final time complexity. Since assigning all the tasks to assistants is a valid solution, thus the answer will always be smaller than n + A, where . Adding binary search, we get the solution which works in time .

Problem B (Kingdom Division)

In this problem we are to divide an n × m grid into the maximum number of parts of distinct sizes formed by connected sets of unit squares. We need to present an example of such a division using characters from to denote the resulting parts. A solution to this problem can be obtained in three natural steps.

In the first step we forget for a moment about the requirement of connectivity and ask what sizes of parts would imply the maximum result. The answer is simple, the sizes should be equal to 1, 2, 3, ... If the last part is smaller than required, we join it with the next-to-last part.

In the second step we note that the division proposed in the previous step is actually possible to obtain. We can cover the whole grid with a snake-like pattern and then cut the snake at appropriate positions to form the respective parts which will certainly be connected sets, see the figure to the left.

The third step is to label parts with the letters from so that no two adjacent parts receive the same label. A tempting idea would be to label them in the order they appear in the snake. However, such a labeling does not work in some cases.

One possible correct solution is to label each part with the smallest letter that does not occur among its neighbours (and perform the labeling in the snake-order). Another idea is to label the parts in the first row using the letters A and B, in the second row use the letters C and D, in the third row use the letters E and F, in the fourth row again A and B and so on (see the figure to the right). Starting from the point when each part covers at least one row, we can stick to just two letters, A and B. In the latter labeling we use only 6 distinct letters.

An interesting question is to find what is the minimum number of letters sufficient to solve every test case.

Of course the above algorithm can be easily implemented in O(nm) time.

Problem C (Non-Squares)

In this problem we need to check if a given positive integer n can be represented as a product of k positive integers all of which are not integer squares. Let us call such a representation a non-square representation.

A generic approach could be dynamic programming over the set of divisors of n. All divisors of n can be found in time. Let D(n) be the number of divisors of n. It turns out that for n ≤ 109 we have D(n) ≤ 1344.

We compute a 2-dimensional Boolean array , where is true if and only if the divisor d can be represented as a product of j non-squares. The computation uses the following Boolean formula that works for j ≥ 1 and :

The time complexity of this solution is due to a map-like data structure required to keep track of the divisors of n. The memory complexity is O(D(n)k).

One could also improve the running time of this solution by noting that the parameter k is actually bounded by , which in our case is 29. Indeed, for the answer is always “NO” (an argument for this can be inferred from the next solution).

There is also a far simpler solution that examines the decomposition of n into prime factors. Let n = p1α1p2α2... pmαm where pi are distinct prime numbers. Then the solution is the following:

  1. If α1 + α2 + ... + αm < k then output “NO”.
  2. Otherwise, if m > 1 then output “YES”.
  3. Otherwise we have m = 1. If k - α1 is even, then output “YES”, otherwise output “NO”.

Let us provide some justification for this solution. Point (1) follows from the fact that any non-square divisor must have a prime factor. For m = 1, we simply need to check if α1 can be represented as a sum of k odd integers, which is possible if and only if the parity of k and α1 are the same. This yields point (3) of the algorithm.

Finally, consider point (2). We need to show that if α1 + α2 + ... + αm ≥ k and m > 1 then there exists a non-square representation of n. Let us take as the first k - 1 non-square divisors the prime divisors of n: first α1 times the divisor p1, then α2 times the divisor p2 and so on. As the last divisor, d, we take the product of all the remaining prime factors. If d is not an integer square, we have the requested representation. Otherwise we change the last divisor to d / pm and the first divisor to p1 pm, thus obtaining the representation.

This solution only requires to find the prime decomposition of n and works in time.

Problem D (Stacks of Coins)

In this problem m gold and silver coins are arranged into n stacks of different heights. We are to rearrange the coins so that there is a maximum number of unicolor stacks (a unicolor stack has only gold or only silver coins). We are allowed to perform any number of operations of swapping two coins that are located at the same height of different stacks.

Let k denote the result, that is, the maximum number of unicolor stacks that can be formed. Let us do a binary search for k. From now on we are left with a decision problem: given k, check if it is valid. Clearly it is optimal to try to make k smallest of the stacks unicolor.

Since the number of operations is not bounded, our problem can be reformulated as follows. Let h be the height of the k-th smallest stack. For each height level 1, ..., h we know the total number of gold and silver coins at this level. We need to check if these numbers are sufficient to create k unicolor stacks. This task will reduce to computing an intersection of O(h) intervals.

First we consider each height level separately. Assume there are gi gold and si silver coins available at this level and the k smallest stacks contain in total ti coins at this level. Using these numbers, we can compute an interval [ai, bi] such that if and only if one can take x gold coins and ti - x silver coins to form the i-th level of the selected stacks. Note that such an interval implies the interval [ti - bi, ti - ai] for the number of silver coins.

Finally we need to make sure that the intervals for different levels allow to find a valid solution. We iterate from the bottommost to the topmost level. For a given level i, we need to intersect the interval [ai, bi] with [ai - 1, bi - 1] and similarly the intervals for the silver coins. Thus we obtain a single updated interval [ai, bi] which we then use to update the intervals for higher levels.

The whole solution works in time.

This problem has also a linear-time solution. Let gi be the total number of gold coins which are located on the i-th level above the ground. Let g'i be the maximum number of stacks which can be formed such that they have all gold coins on the i-th level and below. It can be easily computed using the following recurrence:

Similarly, we define si and s'i for silver coins.

Let hi be the height of the i-th stack, and suppose that stacks are sorted non-increasingly by heights, i.e. h1 ≥ h2 ≥ ... ≥ hn. We iterate over stacks and “color” them (making them unicolor) when possible. Suppose we consider stack i and we already colored stacks numbered 1, 2, ..., i - 1 in such a way that we got G gold stacks and S silver stacks.

The i-th stack can be colored gold if and only if g'hi > G. In such case we show that there is an optimal solution in which this stack is colored gold and stacks 1, 2, ..., i - 1 are colored the same as in our solution. Consider any optimal solution in which stacks 1, 2, ..., i - 1 are colored the same as in our solution. Let j ≥ i be the highest gold stack in this optimal solution (it must exist, otherwise we can add stack i and get a better solution). Now we can swap all the coins from the stack j to the stack i and, since g'hj + 1 ≥ g'hi > G, we also have sufficiently many gold coins to swap into the stack i above level hj.

Similarly, if we cannot color the i-th stack gold, but we can color it silver (i.e. s'hi > S), there is an optimal solution in which we can do this.

Otherwise we have g'hi + s'hi ≤ G + S, which means that we cannot color more than G + S stacks of heights not less that hi.

Since we can sort the heights of the stacks using counting sort, the above algorithm runs in O(n + m) time.

Problem E (State Roads)

In this problem we are given a graph in a dynamic setting. The set of nodes V is fixed, but the edges of the graph are added and deleted over time. We are to answer queries of the following type: given a set of nodes and a moment of time, check whether these nodes form one or more whole connected components of the graph in this moment of time.

We present a randomized solution to this task, which is very simple to code, but also quite tricky. For each edge that is added to the graph we pick at random an integer weight. For each node we store in the -product of weights of all edges that are adjacent to this node. Note that such values can be updated in O(1) time upon insertion and deletion of edges.

Now the crucial observation is that {u1, u2, ..., uk} forms a number of whole connected components if and only if each edge in the graph is adjacent to exactly zero or two of the nodes ui. Hence, to answer a query, we compute:

where denotes the -operation. If the above value is non-zero, we know that the answer is “NO”. Otherwise with high probability the answer is “YES”. The probability of an incorrect positive answer is where M is the maximum weight of an edge. This probability turns out sufficiently small for M = 232 or M = 264.

The whole solution works in linear time with respect to the size of the input.

Problem F (Stickers)

In this problem we are given n kinds of stickers s1, ..., sn, where each si is a word of length d. We are asked what is the minimum number of stickers needed to paste a given word t of length m.

We present a simple dynamic programming solution to this problem. Denote by dp[i] the minimum number of stickers needed to paste exactly the suffix t[i..m] and by the same number, but when we allow stickers to overflow and cover positions from t[i - d..i - 1] (but not necessarily with correct letters). The answer to the problem is dp[1].

Let us see how to compute dp[i]. The i-th position of t must be at some point covered by a sticker pasted in this position. Since all the stickers have the same length, we can restrict ourselves to pasting this sticker as the last one or as the first one. This gives us two cases:

  1. For some j we have sj = t[i..i + d - 1]. Thus we can paste the suffix t[i + d..m] possibly covering letters from t[i..i + d - 1] and after that paste sj at position i. In this case we use stickers.
  2. We have sj[1..d'] = t[i..i + d' - 1] for some j and some d' ≤ d. Thus we can paste sj at position i and then paste t[i + d'..m] with restriction that we cannot modify the letters before position i + d'. In this case we use 1 + dp[i + d'] stickers.

If for every position i we know the maximum l[i] such that t[i..i + l[i] - 1] is a prefix of some sticker, then the case (1) is possible if d = l[i], and the outcome from the case (2) is equal to min1 ≤ j ≤ l[i](1 + dp[i + j]).

We compute similarly, but now instead of the set of all stickers we consider the set of all the suffixes of stickers. Thus we compute l[i] as the maximum number such that t[i..i + l[i] - 1] is an infix of some sticker.

The above solution can be implemented in the time complexity of O(mnd2). Surprisingly, this problem can be solved in linear time with respect to the size of the input, even with stickers of different lengths, but such solution is a lot more complicated.

Full text and comments »

  • Vote: I like it
  • +81
  • Vote: I do not like it

By Michael, 11 years ago, translation, In English

Qualification Round and its analysis were prepared by Chmel_Tolstiy, aropan, Romka, SobolS, Ra16bit, snarknews and Gassa.

1732 participants have started in the Qualification Round of Yandex.Algorithm, 1606 of them have made at least one attempt, 1558 have solved at least one problem. Interestingly, the number of registered participants was 3397 which is almost twice as much as the number of participants who have started the round. We don't know why is that.

As the qualification round was a virtual contest, participants could choose the most convenient time to compete. The maximum number of participants online (around 280) was at the start, the minimum number — between 3 a.m. and 4 a.m. Moscow time (around 20 participants). First one chronologically who solved all 6 problems was Scott.Ai from China; all problems were submitted "open". Soon i.metelsky (Ivan Metelsky, Belarus) who started around 10 p.m. MSK solved problems A-D "blind", E "open" on the second attempt, F "open" on the first attempt and took the 1st place with a very strong result. However, vepifanov (Vladislav Epifanov, Russia) who started 4 hours later solved A, B, C, F "blind" and D, E "open" on the first attempt, and he took the 1st place thanks to smaller penalty time. On Tuesday morning, first place was taken by world champions from ITMO: first eatmore (Evgeny Kapun, ACM ICPC world champion 2009 and 2012) who started at 11 a.m. solves all 6 problems "blind"... but he was the leader only for several hours: with the same strategy but better penalty time, current world champion tourist (Gennady Korotkevich, Belarus) bypasses him. 24 hours after the start of the Qualification Round the top-3 is Korotkevich — Kapun — Epifanov. In the Test Round, Gennady Korotkevich was second, Vladislav Epifanov was third. The Test Round winner — Антон Лунёв (Anton Lunyov, Ukraine) — started close to the end of the Qualification Round. He solved A, B, F, D "blind", then solved E "open" on the second attempt, then solved C "blind" and took the final 3rd place from vepifanov who moved to the 4th place. Thus all three winners of the Test Round got into top-4, but in a different order. Evgeny Kapun has also advanced to the online rounds from the Test Round, so the best participant who had not advanced before was Ivan Metelsky who finished 5th. Overall 34 participants solved all 6 problems, and the best result of a participant who solved all problems "open" was 9-10 place.

We also noticed that some participants had unexpected problems because of the fact that stdin and stdout were specified as input and output files in the problem statements. Some participants interpreted them as the filenames. Jury decided to rejudge such solutions specifying corresponding filenames; in the future system will use phrases "Standard input" and "Standard output" instead.

Problem A (Ancient Basketball)

In this problem one just had to implement what was described in the problem statement. It is sufficient to create two variables to count points of each team, and after reading records about throws, increase the corresponding variable by the specified number of points. To choose which variable to increase, operator if should be used.

Problem B (Prime Problem)

Let us move 1 to the left-hand side — we get k2 - 1 = p1·p2 or after factorization (k - 1)(k + 1) = p1·p2. As k > 2, both multipliers in the left-hand side are not equal to 1, thus (k - 1) = p1 и (k + 1) = p2 (without loss of generality we can assume p2 > p1). Indeed, left-hand side must be divisible by p1 and p2, so if k - 1 = a·p1 and k + 1 = b·p2, then (k - 1)(k + 1) = ab·p1·p2 = p1·p2, and as a and b are positive integers, a = b = 1. If k - 1 was divisible by p2, it would turn out that k - 1 = p2, k + 1 = p1, but it is a contradiction with p2 > p1.

So, the problem is now reduced to searching for such k that both k - 1 and k + 1 are prime (so-called "twin primes"; interestingly, it is not yet known whether the set of such primes is finite or infinite).

Under the given constraints this search can be implemented in any way, even searching through all (not only even) k and while checking primality searching through all possible divisors up to the number itself. Given that n ≥ 4, "empty" answer is impossible as k = 4 satisfies the equation.

Another way is to factorize k2 - 1 and check that the factorization is just a product of two primes.

Problem C (Board Game)

Answer for the first question is very easy to compute. It is the total number of atoms in all the groups. It is equal to .

To answer the second question, let us first reformulate it. It is known from game theory that position in such game is losing if of numbers of atoms in all groups is equal to 0. So, we have to find the number of first moves such that after each of them of all group sizes will be 0. Let us denote . There is a fast method to compute x. To do it we can use the fact that for any non-negative integer k . Thus .

If we fix group number i to make the first move in it, then the number of atoms that must be left in it after the move is determined unambiguously. It is equal to of all other group sizes which is . So, if we are able to make the first move from group i it must be true that . It is only possible if at position p where x has the highest set bit number i also has set bit. We have to compute the number of such i up to N. Among the numbers less than exactly half are such i. If N has set bit at position p, should be added to the answer.

Problem D (Infinite Sum)

Let us make several transformations of the expression:

If we denote , we get

Now we only have to compute the recurrence and output the answer.

Problem E (Homework)

Cases with X = 0 or K = 0 are trivial and can be solved separately. It is advantageous if Gennady solves the maximum number of problems he can each day and the other students can be distributed on the topics, and they can adapt to Gennady's work. If there still exists a topic with at least X problems to solve, Gennady works at maximum this day and solves X problems. If there is no such topic, he takes the topic with maximum number of problems to solve and solves all of them. Thus, it is easy to compute how many problems will be left after Gennady if he works for k days. Then we can use binary search on the number of days and check whether other students will be able to solve all the problems left after Gennady in time.

Time complexity of such solution is . We should also note that one has to initially sort topics by value of to search for topics with the maximum number of problems to solve when Gennady can't solve X problems. Using the fact that and properties of we get complexity .

An alternative solution is to consider the optimal strategy of other students. When we know the optimal strategy for Gennady, we can determine how the other students should work: first they have to solve problem in the topic with the minimum value of and solve this topic until becomes equal to 0. When such topics are all gone and there are still problems to solve, they can choose any topic and solve problems in it until all problems in the topic are solved. So, we have to find out what happens first: Gennady has no more topics with Ai ≥ X, or students have no more topics with .

Time complexity: .

Problem F (Atoms: There and Back Again)

Number of groups doubles after each division. Thus if N is not a power of 2, the given state cannot be reached, and the answer is "NO".

Consider the last division, that is, the division of state right before the current state. Group of size X was created from each group. So, at least half of the groups are of size X. Then we can discard those groups and get the same problem with N / 2 groups. If at any step there is no such size that at least half of the groups have this size, the answer is "NO".

To solve the problem we can cluster the groups by size. Then simulate the process described above using heap (priority queue). We can also greedily separate each cluster of same-size groups into parts of sizes that are powers of 2, so that each power appears exactly once, except for the 0-th power (there must be two parts with 0-th power).

Time complexity is .

Full text and comments »

  • Vote: I like it
  • +75
  • Vote: I do not like it

By Michael, 11 years ago, translation, In English

Test Round and its analysis were prepared by snarknews and Gassa. Problems of the Test Round have already been used before in different competitions. The following rounds will consist of new original problems, and you can still register and compete in the Qualification Round.

During the round a lot of participants tried to make "blind" submits, from time to time even going to the top places, but as the system tests showed with only one problem solved in fact.

By the end of the round tourist was at the first place with all problems but B submitted "blind". vepifanov was at the second place with A, C and D submitted "blind", but B and F — submitted "open" (both with a wrong attempt), Anton_Lunyov was at the third place with only C submitted "blind". All other participants had less than 5 problems solved. After the system tests it turned out that tourist had a wrong solution for A, and vepifanov had a wrong solution for D. So, the only participant who solved 5 problems was Anton_Lunyov. tourist got second place, vepifanov was third.

It is a bit strange that problems B and E, although their solutions don't require special knowledge, remained mostly "unclaimed": all participants who solved these problems were in the top-3 with tourist solving E, Anton_Lunyov and vepifanov solving B.

It also turned out that file input and output can be an obstacle for the participants: 90% of questions were concerned with this. So we decided not to use file input and output in the future rounds of Yandex.Algorithm.

Problem A (Circles)

This problem is an advanced variant of a problem that was used in the 0th (pilot) season of OpenCup which happened in May of 2004. It can also be considered an easier version of a problem used in the Moscow Grand Prix of OpenCup in 2005.

Using the formula for area of triangle S = abc / 4R, we obtain R = abc / 4S, where a, b and c are lengths of the triangle's sides. Taking into account that the vertices of triangle have integer coordinates, we conclude that squares of lengths of the sides and doubled triangle area are integers. So, R2 can be computed exactly as an integer fraction .

As the coordinates do not exceed 1400, numerator doesn't exceed , so it fits into unsigned int64. To compare such fractions one has to either use long arithmetics or use modular arithmetics or compute GCD or store high-order number part separately...

We point out that type double is insufficient even without considering the errors of computation: there exists an example of two triangles with circumscribed circle radiuses differing by less than 2 - 52. For example, triangles with vertices (0, 0), (1312, 164), (134, 1372) and (0, 0), (1351, 169), (106, 1317).

The example above was foundf using the following approach: fix one of the vertices at (0, 0) and the other two at (1400, 0) and (0, 1400). Bruteforce the coordinates of the last two points in some neighborhood, compute the circumference radiuses, sort them and find the minimal difference between two adjacent values.

As we expected, there were a lot of failed submissions for this problem which tried to solve problem using doubles and "search for espilon". Most participants who thought of exact solution right away didn't have problems with implementation.

Problem B (Three Fractions)

This problem was first used in the Moscow Grand Prix of OpenCup in 2005. That version was simpler: it didn't have multitest.

The easiest is to solve this problem using precalc.

Under the given restrictions on n and on the queries number it is reasonable to precalculate the full table of answers. If n is composite number and the problem is solved for all prime numbers less than n, then represent n as product n = a·p where p is prime. Then . So it is sufficient to precalculate answers for all prime n and for n = 4 and then compute the answers for composite n.

Let's make a very rough estimate on c and b. As , then . As a > b > c, then , so and . Using the same logic and . So, let us bruteforce through all prime n, all b and c in the ranges we found above and check whether 4bc - nb - nc divides nbc. If it does, the answer can be found as .

Such precalc takes less than a minute even using a very slow CoreDuo 1.6Ghz. One can just store b and c in the code. That was code size won't exceed 50k, which is less than source limit.

Another way is to bruteforce c and try to represent difference as a sum of two "egyptian" fractions using Fibonacci's algorithm. This solution passes the Time Limit without precalc.

This problem is a particular case of famous mathematical problem — Erdos-Straus conjecture. It is not solved in the general case, but there are solutions for all n ≤ 1014.

For those who appreciate constructive solutions — some formulas.

  • If n = 3k + 2, we get ,
  • If n = 4k, we get ,
  • If n = 4k + 2 and k > 1, then ,
  • If n = 4k + 3, we get ,
  • If n = 8k + 5, we get .

So, there is no explicit formula only for n = 24k + 1.

Problem C (Select The Digits)

This problem was first used at a school informatics circle in Saint-Petersburg Anichkov Palace in 2006.

This problem is rather simple and allows for many different ways of solving. Let us name a few. Below, we assume that s is the given string and res is the result we seek.

Solution 1

We can simply look over all quadruples of positions we select. Here is a sample of code in Pascal:

res := 9999;
for i := 1 to 5 do
  for j := i + 1 to 6 do
    for k := j + 1 to 7 do
      for l := k + 1 to 8 do
        res := Min (res, StrToInt (s[i] + s[j] + s[k] + s[l]));

Solution 2

In a high-level programming language, there could be tools which do all the dirty job related to searching by themselves. Here is a solution in Python looking over all combinations of four elements:

res = "".join (min (itertools.combinations (s, 4)))

Solution 3

The search could be organized recursively. Here is a function in Java taking two parameters: position p in the string s and number of positions q we have yet to take. The function is called from the main program like recur (8 - 1, 4) and looks over the positions from back to front.

int recur (int p, int q) {
  if (q < 0 || p + 1 < q) return 9999;
  if (p < 0) return 0;
  return Math.min (recur (p - 1, q),
    recur (p - 1, q - 1) * 10 + s.charAt (p) - '0');
}

Solution 4

The problem can be solved by dynamic programming. The two parameters are number of visited positions i and number of selected positions j. The value of target function f(i, j) is the minimal number which can be obtained. There are two transitions from each state: first, we either select or drop the current position, and then we move to the next position in string s. Below is a program in C/C++ demonstrating the approach. The answer is contained in cell f[8][4].

memset (f, 0x3F, sizeof (f));
f[0][0] = 0;
for (i = 0; i < 8; i++) {
  for (j = 0; j <= 4; j++) {
    f[i + 1][j] = min (f[i + 1][j], f[i][j]);
    f[i + 1][j + 1] = min (f[i + 1][j + 1],
      f[i][j] * 10 + (s[i] - '0'));
  }
}

Problem D (Product of Different)

This problem was first used at a school informatics circle in Saint-Petersburg Anichkov Palace in 2009.

Let us start by presenting a test which had the number 15: n = 112 500 = 22·32·55. Its complexity lies in the fact that we can not choose divisor 2·3 = 6: in that case, the total number of divisors will be no more than six. Instead, we should choose the following factorization into seven factors: 112 500 = 1·2·3·5·10·15·25.

This is the minimal test which breaks some simple yet wrong solutions. One of them is to greedily choose the divisors, starting with the smallest possible, as long as after we add the next divisor d, the remaining number is strictly greater than d. On this test, it finds divisors 1·2·3·5·6, and then tries to factor 625 = 54, but neither 5 nor 25 = 52 could be taken.

The author's solution is an exhaustive search. A trivial recursive brute-force search which considers divisors up to in increasing order manages to pass in under a second. The following function in C/C++ must be run as recur (n, 1, 0). It writes the number of different divisors into res and the divisors themselves into the array best. The parameters have the following meaning: n is the number remaining to be factorized, k is the lowest possible value of the next divisor, and cur is the number of divisors already selected.

void recur (int n, int k, int cur) {
  if (res < cur + 1) {
    res = cur + 1;
    now[cur] = n;
    memmove (best, now, sizeof (best));
  }
  for (int d = k; d * (d + 1) <= n; d++)
    if (n % d == 0) {
      now[cur] = d;
      recur (n / d, d + 1, cur + 1);
    }
}

How does one estimate how fast is such a program? We can note that the answer is rather small: as the product of the first 13 positive integers is greater than 109, the answer is not greater than 12. This means that the first if will be triggered no more than 12 times. Another useful fact is that the numbers up to 109 have no more than 1344 different divisors each (http://oeis.org/A066150).

The easiest way is to write a program to give a crude upper bound on the number of divisions and or remainder calculations (example in C/C++). Here, n is the number remaining to be factorized, and k is the lowest possible value of the next divisor.

int count (int n, int k) {
  int res = 0;
  for (int d = k; d * (d + 1) <= n; d++)
    res += 1 + count (n / d, d + 1);
  return res;
}

This program differs from the actual solution: here, we step into recursion on every iteration of the cycle by d, and in the actual solution, we call recur only when n is evenly divisible by d. On the other hand, it allows us to use the maximal possible value of n to get an estimate (albeit crude) of the worst case, instead of searching for some large number with many divisors which make the real program run as slow as possible.

If we call count (1000000000, 1) the result will be 151 631 365. If we account for the fact that division by unity can be treated separately and start our search from divisor 2, the upper bound will be the result of calling count (1000000000, 2), it is 75 815 682 (half of the above). In fact, we know that the square root of about 109 is about 30 000, and the number of divisors less than the square root is no more than 1344 / 2 = 672, it is obvious that the actual number of divisions will be a few times less. The maximal number of divisions and/or remainder calculations in all jury tests turned out to be 15 395 811.

There are faster ways to do an exhaustive search as well. For example, instead of looking over all numbers up to the square root of n, we can just look over all divisors of n which can be found in advance in time. Also we can first decompose n into prime factors and then search for divisors as tuples of powers for each of these prime factors.

Problem E (Table of Championship)

This problem is an version of problem from the Moscow Grand Prix of OpenCup 2005 adapted for memory limit=64 MiB. In the initial version memory limit was 3Mb and lengths of team names didn't exceed 15 symbols.

There are two cases: when all three lost numbers are in the same line and when at least one of them concerns a different team (or less than 3 numbers were lost).

Let's denote the number of matches played by i-th team by Pi, number of wins by Wi, number of draws by Di, number of losses by Li and number of points by Si.

Then the following equations hold: Wi + Di + Li = Pi and 3Wi + Di = Si. If not all 3 numbers lost concern the same team, we have to solve several systems of linear equations with not more than 2 variables.

In the case when all 3 lost numbers concern the same team, we get one more equation from the fact that the total number of wins of all teams equals the total number of losses of all teams. Wi + W - i = Li + L - i where W - i is the total number of wins of all teams but i-th and L - i is the total number of losses of all teams but i-th.

So the unique restoration of numbers is always possible.

The main difficulty in this problem is that the whole table doesn't fit into memory.

So one has to read input file twice: first to build the system of equations (while reading, we sum Wi and Li to build the system of equations in the second case), second to find the line which holds the answer.

Problem F (Taxis)

This problem was first set by polish authors and was first used at the first stage of All-Poland school olympiad 2012-2013.

Obviously, either one can go the path from the taxi base to the destination without switching taxis or it is impossible to get to the destination (as the taxi that will arrive at destination has to go through the whole segment from the taxi base to the destination). So let's "reserve" a taxi with the minimum fuel sufficient to go from the taxi base to the destination. Also let's construct the farthest point from the destination from which this car can get a passenger and drive him to the destination. If the taxi base is at the destination, no "reservation" is made.

Then let's sort cars which are not "reserved" by descending amount of fuel and each time choose the car with the maximum fuel. We will now prove that this algorithm gives a solution which is not worse than any other possible solution. Suppose we first use a taxi with fuel X1 and then a taxi with fuel X2, and the passenger starts at distance a before the taxi base. Then the passenger will go (X1 - a) + (X2 - (2a - X1)) = 2X1 + X2 - 3a kilometers. If X2 > X1, then if we change X2 and X1 we'll see that the distance travelled increased. If the car with the largest fuel left can't go already (or there are no cars left that are not "reserved" before the passenger gets past Z), then it is impossible to get to the destination.

If passenger is at point Z or closer to the destination at some point, he can already arrive at the destination using the "reserved" car. We also have to check if passenger can get to the destination right away, without using the "reserved" car.

Full text and comments »

  • Vote: I like it
  • +63
  • Vote: I do not like it

By Michael, 11 years ago, translation, In English

Hi there!

This is the first post in the ACM ICPC World Finals official blog. This year Yandex is the local partner of the World Finals host — Saint Petersburg National Research University ITMO. We have a lot of former ICPC finalists working at Yandex, and some of them including me will work as analysts during the World Finals contest. Let me introduce my friend Egor Kulikov from Yandex in Saint Petersburg, who will be the ICPC blogger. I am more than sure that he will provide an expert coverage on different aspects of the World Finals, and that's why.

First and foremost, he was my teammate in the Moscow State University team for the World Finals 2007 in Tokyo, where we took bronze, so he knows everything from the inside. We had trained for five years to get there. The competition at MSU is always very strong, so in our case we basically had to win NEERC 2006 just to outrun our main rivals from the same university. They had already been to the World Finals once before and were thought to be obvious favourites, but it was like the goal of all our lives to advance to the Finals, so we did.

In high school both of us participated in mathematical competitions. We won prizes in Russian National olympiads several times and became the national team candidates for IMO. We did not qualify into top-6. However, this failure was one of our major motivators to get to the World Finals. Although we didn't know much about algorithms and data structures, we started programming, as there was no system of math olympiads for university students at that time. Participating in various contests and training camps, listening to the lectures, and learning algorithms by the word of mouth has become our main education in Computer Science since then.

Last but not least, Egor has become a World Champion twice: Google Code Jam 2010 and TopCoder Open 2012. There are only two other people on the planet, who did both, and you know one of them: it's Petr Mitrichev, our the same year student from MSU :) They are both my friends, so if you want to know more about them, you can read my answers on Quora about Egor and Petr.

Stay tuned for our ACM ICPC World Finals 2013 coverage!

P.S. If there appear some good posts in English about this year's finals, we may add them into this section of the site.

Full text and comments »

  • Vote: I like it
  • +155
  • Vote: I do not like it

By Michael, 11 years ago, translation, In English

Congratulations to the winners of the Three Quarterfinals Tournament:

  1. SPb NRU ITMO 1
  2. Petr
  3. SPb SU 4
  4. msu-st
  5. SJTU Mithril

Thanks to Oleg Hristenko (snarknews) for the idea of the tournament and for implementing it! Thanks to the jury of the subregionals for interesting problems and for being so kind to share them! Thanks to the Yandex.Contest team for the system itself and for quick fixes and updates!

And of course most of all we are greatful to all the participants of the tournament! Thank you for participation and for your feedback about the system. We hope it was an interesting and useful training, and that it was also fun to have. Good luck at your regionals and at the World Finals in Saint-Petersburg!

Full text and comments »

  • Vote: I like it
  • +3
  • Vote: I do not like it

By Michael, 11 years ago, In English

It seems that many participants may not have noticed an important update of the post about Three Quarterfinals Tournament, so I'll duplicate it here:

We've published the joined monitor of the two completed rounds. In the case of equal scores for two teams according to the Grand Prix 30 scoring system, ties are broken using the sum of IFMO ratings for all rounds as in the OpenCup. We apologize that the rules for breaking ties were not announced beforehand. We pay your attention to the fact that the join is made using the nicknames of the teams in the Yandex.Contest system. If your team participated in both rounds but didn't get the results summed up, please comment. Please set up your nickname for the third round exactly the same as the nickname of your team in the joined monitor. We're going to add the third round to the joined monitor as soon as the first submission is made. We hope that the current joined monitor will be updated regularly during the third round according to the standings. Some discrepancy may exist as the teams which participate in the official subregional may have different nicknames there, and it will take some time to join their results in.

We also inform you that we've enabled a new feature: you can cancel your registration in case you've registered wrong team or change team members before the start of the contest using the tab "Registration". You won't be able to do that after the contest starts.

Full text and comments »

  • Vote: I like it
  • +14
  • Vote: I do not like it

By Michael, 12 years ago, translation, In English

We gladly invite you to participate in the Three Quarterfinals Tournament – a team-based online programming competition parallel to the ACM ICPC Subregionals in Moscow, Minsk and Saint-Petersburg. The tournament will be organized by the team of Yandex.Contest, Oleg Hristenko (snarknews) and the jury of the Subregionals. The online rounds will be held on the Yandex.Contest system. The dates and start times of subregionals are: Moscow (registration) – October 21st, 14:00 MSK, Minsk (registration) – October 28th, 11:00 MSK, Saint-Petersburg (registration) – November 3rd, 12:00 MSK.

Teams that participate in the onsite version of one of the three quarterfinals will be able to participate in the online versions of the other two quarterfinals via Yandex.Contest. Teams that don’t participate in any of the three quarterfinals onsite will be able to participate in all of them online via Yandex.Contest. The results achieved by a team in all of the rounds, online and onsite, will be summed up, and there will be a combined scoreboard based on the Grand Prix 30 scoring system.

You can register for the online rounds at any time since the appropriate link is posted until the start of the competition. To create a team in the Yandex.Contest system, please follow the link «My teams». Each invited team member has to confirm his participation in the team. After you have successfully created your team, you should register it for the competition by specifying the team.

The rules of the online rounds are the same as the rules of the onsite quarterfinals.

You can try out the system and practice submitting problems using the test contest A + B virtual.

Several people may answer your questions and comments to this post, including Vladimir Ten (vtenity), Lev Tolmachev (lev.tolmachev), Vitaly Goldstein (goldvitaly), Oleg Hristenko (snarknews).

Check for updates!

UPD. Moscow subregional online version will start at 14:00 MSK on October 21st, so that everybody has a chance to participate in the Codeforces Round 146 and CodeChef cook-off.

UPD.2 We've added the registration links and the exact start times for all three contests :

  1. Moscow QFOctober 21st, 14:00 MSK.
  2. Minsk QFOctober 28th, 11:00 MSK.
  3. Saint-Petersburg QFNovember 3rd, 12:00 MSK.

UPD.3 Both the official results and the online round results for the ACM ICPC subregional in Moscow are available in the monitor. Teams which participated in the ACM ICPC subregional in Moscow and want to participate in the tournament should register to the other two rounds and set their team nickname in the Yandex.Contest system to be the same or almost the same as their official team name. Teams which participated in the online round of the subregional and which will participate in the ACM ICPC subregionals in Minsk or Saint-Petersburg should set their team nickname in the Yandex.Contest system to be the same as their official team name.

UPD.4 We've published the joined monitor of the two completed rounds. In the case of equal scores for two teams according to the Grand Prix 30 scoring system, ties are broken using the sum of IFMO ratings for all rounds as in the OpenCup. We apologize that the rules for breaking ties were not announced beforehand. We pay your attention to the fact that the join is made using the nicknames of the teams in the Yandex.Contest system. If your team participated in both rounds but didn't get the results summed up, please comment. Please set up your nickname for the third round exactly the same as the nickname of your team in the joined monitor. We're going to add the third round to the joined monitor as soon as the first submission is made. We hope that the current joined monitor will be updated regularly during the third round according to the standings. Some discrepancy may exist as the teams which participate in the official subregional may have different nicknames there, and it will take some time to join their results in.

Full text and comments »

  • Vote: I like it
  • +50
  • Vote: I do not like it

By Michael, 12 years ago, In English

We are pleased to invite you to take part in the Yandex.Contest Test Round 1. The Round is scheduled for Friday, August 17, 2012, 20:00 (UTC +4, Moscow time) and will last 2.5 hours. The programming languages you can use are C++, Java, FreePascal, Delphi, Python 2, Python 3.

The rules of the Test Round are experimental and may differ from the rules of any other systems you might have seen (such as ACM ICPC, TopCoder, Codeforces, IPSC etc.).

To get acquaintance with the system Yandex.Contest you can participate in the virtual test contest A+B virtual.

Your feedback is highly appreciated, you can comment here or use the feedback form. Several people besides me may answer your questions here, among them are Vladimir Ten (vtenity) and Lev Tolmachev (lev.tolmachev).

Full text and comments »

  • Vote: I like it
  • +75
  • Vote: I do not like it

By Michael, 13 years ago, translation, In English

Congratulations to Petr Mitrichev (Petr), the Yandex.Algorithm 2011 Champion!

Congratulations also to Dmitry Dzhulgakov (dzhulgakov), Makoto Soejima (rng_58), Ivan Metelsky (ivan.metelsky), Alexey Levin (levlam) and Gennady Korotkevitch (tourist) who placed 2, 3, 4, 5 and 6 respectively! They solved as many problems as the champion. Full results are available here

Problems turned out to be really difficult: nobody managed to solve more than 3 problems during the 2 hours of competition. Problem A was considered much simpler than all others by the contest authors, but less than half of participants could solve it, and even the champion lacked 30 seconds to fix the last bug. Only one coder could solve problem E: Xhark from Korea.

Problem E was suggested by Ilya Kornakov (ilyakor) who is also a developer in my group at Yandex :) Problem A about domino and a guy named Gena was powered by Ivan Popelyshev (ivan.popelyshev). Problem B — Stanislav Pak, C — Denis Yarets. All three are also our employees. Problem D was suggested by Artem Ripatti. My problem was rejected at the last minute, so it will wait for you at the Petrozavodsk training camp :)

Thanks to all finalists, authors, participants out of competition, contest organizers, MIPT for the accomodations and the main sponsor — Yandex.

Good luck to everybody who competes in the OpenCup onsite today!


A few more photos from the competition:

Petr and our foreign guests rng_58, wata and dolphinigle
ivan.metelsky, MikeMirzayanov and pieguy

Full text and comments »

  • Vote: I like it
  • +41
  • Vote: I do not like it

By Michael, 13 years ago, translation, In English

Hi everybody!

Yandex.Algorithm finalists were gathering in Dolgoprudny... Everybody in their own unique way :)

dzhulgakov, pieguy, levlam and Progger turned out to be the most organized and arrived at the start of the Summer School. They've already socialized with other participants and began solving and competing in the practical bioinformatics problem using a distributed Hadoop cluster accessible for all school participants — good for them! Two Japanese buddies rng_58 and wata landed at Sheremetyevo airport on July 13 at 17.55, and an hour later they were already receiving their badges from me. Their fellow-countryman LayCurse mixed up the dates and took the tickets for the same flight but for July 14 :) tourist with his father came in the morning on July 14th, and dolphinigle landed yesterday late in the evening and told me I was the first man in Russia he met who spoke English :) ivan.metelsky "burst" into the room of the School organizers sleeping after yesterday's party at 8 AM; he got there himself without any instructions :) e-maxx arrived at Moscow by train at 10.38 in the morning and made his way directly to the competition with our other friends from Saratov - with some adventures on the way: the electric trains to Dolgoprudny have a break from 11.10 till 13.40. ktuan decided last minute that he won't spend his last univercity vacation for the finals, and Burunduk1 seems to have been too busy with teaching high school students and didn't answer my e-mails and phone calls (tut-tut, Sergey!) — I wonder whether he will be coming last minute. Petr will drive here from Moscow just before lunch. My intern zeliboba studies at MIPT and lives in the same building where all the other participants of the Summer School, so he'll probably come to the contest floor last :)

Everybody can register and participate in the round as usual. Those who don't participate in the onsite round will be out of competition, but the round is rated for everybody in Div-1.

I remind you that the contest time is unusual today: we will start at 16.00 The problems are harder than usual in Div-1 today — looking forward to a bitter struggle!

Update: full results

A few photos from the School's life starring our finalists Progger и pieguy:

Full text and comments »

  • Vote: I like it
  • +129
  • Vote: I do not like it

By Michael, 13 years ago, translation, In English

We are glad to introduce Yandex open programming competition "Yandex.Algorithm" hosted by Codeforces. The competition starts on May 4th and will consist of two qualification rounds, two online rounds and a final onsite round. Some of the rounds are created by Yandex employees. The onsite round will be held at the Yandex Summer School in Distributed Computing in Moscow Institute of Physics and Technology.

Any registered member of Codeforces can participate. There will be 5 rounds, 2 hours each. 15 winners of the last online round will be invited to participate in the Summer School and in the final round of Yandex.Open. 70 best participants will get T-shirts.

Full text and comments »

  • Vote: I like it
  • +242
  • Vote: I do not like it