ivan100sic's blog

By ivan100sic, history, 5 weeks ago, In English,

The pattern of memory accesses plays a huge role in determining the actual running time of an algorithm. Many of you already know this fact, but do you know how big the difference can be? Take a moment to study this code:

#include <bits/stdc++.h>
using namespace std;
using namespace chrono;

vector<int> generate_random(int n) {
	mt19937 eng;
	vector<int> a(n);
	iota(a.begin(), a.end(), 0);
	shuffle(a.begin(), a.end(), eng);
	return a;
}

vector<int> generate_cycle(int n) {
	vector<int> a(n);
	iota(a.begin(), a.end(), 1);
	a[n-1] = 0;
	return a;
}

int main() {
	int n, t, q, z = 0;
	cin >> n >> t >> q;
	auto a = (t ? generate_cycle(n) : generate_random(n));

	auto start = high_resolution_clock::now();
	while (q--) {
		int x = q % n;
		for (int i=0; i<n; i++)
			x = a[x];
		z += x;
	}
	duration<double> dur = high_resolution_clock::now() - start;
	cout << "Time: " << dur.count() << '\n';
    cout << z << '\n';
}

The program performs $$$q$$$ walks of length $$$n$$$ on a permutation graph. With $$$t=0$$$, the permutation is randomly generated and with $$$t=1$$$ it's a cycle $$$0\rightarrow 1\rightarrow 2 \rightarrow \ldots \rightarrow (n-1) \rightarrow 0$$$.

Now try running this code using custom invocation with the following input: 10000000 0 10, and then 10000000 1 10. Can you guess the running time in both cases? Surely it won't take more than one second as there are only 100M memory reads... Also, you can probably guess that the second one will be faster, but exactly how much faster?

In case you're lazy and don't want to try it yourself

By the way, if I leave out printing $$$z$$$, the second part runs instantly, because the compiler correctly deduces that the entire while loop is unnecessary!

Tips to optimize your memory access patterns:

  1. Avoid using multiple arrays to represent complex data structures. Instead, use arrays of structs. In particular, this applies to segment trees where you have to keep multiple numbers per node. There are cases where the difference is insignificant.

  2. Use smaller data types. Using long long everywhere may slow your program down for multiple reasons, one of them is because your memory accesses will be more spread out = less cache friendly.

  3. Try switching rows/columns of matrices. Make sure that the innermost loop runs on the last dimension of the matrix. In particular, when multiplying matrices, transposing the second matrix significantly reduces the running time.

Feel free to add suggestions!

Read more »

 
 
 
 
  • Vote: I like it
  • +135
  • Vote: I do not like it

By ivan100sic, 4 months ago, In English,

I was solving this problem and I came up with an interesting idea which may in some cases be generalized to other DP problems. I don't think it's well known so that motivated me to write this.

Once you've read the problem, understood it and want to know what I did, continue reading...

Read more »

 
 
 
 
  • Vote: I like it
  • +69
  • Vote: I do not like it

By ivan100sic, history, 6 months ago, In English,

When I submit a code, I get the error message "You cannot submit the same code twice", although this is clearly not the case. As far as I can see, I'm not the only person affected.

Hopefully this will be resolved quickly.

Read more »

 
 
 
 
  • Vote: I like it
  • +33
  • Vote: I do not like it

By ivan100sic, history, 9 months ago, In English,

Does anyone have the problems, test cases and full results from this year's International Zhautykov Olympiad?

Congratulations to the winners! Pajaraja we are proud of you!

Read more »

 
 
 
 
  • Vote: I like it
  • +63
  • Vote: I do not like it

By ivan100sic, history, 10 months ago, In English,

Some problems from ICPC-style contests or other online mirrors where texts are not added are missing. The lengths are not exact and are merely good estimates, for example, formulas are counted with three dollar signs at both the beginning and the end. Sample tests and notes (everything below samples) are not counted.

Here's the list:

  1. (4426 bytes) 39G - Inverse Function
  2. (4470 bytes) 1089J - JS Minification
  3. (4591 bytes) 89B - Widget Library
  4. (4619 bytes) 175F - Gnomes of Might and Magic
  5. (4629 bytes) 523B - Mean Requests
  6. (4911 bytes) 642A - Scheduler for Invokers
  7. (5197 bytes) 1070B - Berkomnadzor
  8. (5803 bytes) 1044B - Intersecting Subtrees
  9. (6688 bytes) 921/01 — Labyrinth-1
  10. (10637 bytes) 927A - BuberPool Taxi Optimization

I also have the list of 10 problems with the shortest statements, but all of them are from April's fools contests, I'll publish it when I filter them out. Well, all except this one: 401E - Olympic Games.

Read more »

 
 
 
 
  • Vote: I like it
  • +55
  • Vote: I do not like it

By ivan100sic, history, 12 months ago, In English,

I've never seen anyone use this in competitive programming (or anywhere really) but it might be useful:

In C++ you can use the basic_string class template instead of vector for "simple" types [1]. It works just like vector but also allows you to use a few convenient member functions and operators just like with strings, most notably operator+ and operator+=. See the following code:

#include <bits/stdc++.h>
using namespace std;

int main() {
    int n;
    basic_string<int> a;

    cin >> n;
    for (int i=0; i<n; i++) {
        int x;
        cin >> x;
        a += x;
    }

    a += a;
    a = a.substr(n/2, n);

    cout << (a + a).find({1, 2, 1}) << '\n';
}

[1] Although I'm not 100% sure, "simple" is any primitive type, std::pair of simple types, etc. Do not use this with vectors, strings, sets, maps and similar types. And for this reason please don't typedef vector as basic_string.

Read more »

 
 
 
 
  • Vote: I like it
  • +255
  • Vote: I do not like it

By ivan100sic, history, 14 months ago, In English,

Today was supposed to be my day...

I needed only a single point of rating to become a grandmaster once again. And I had the best round of my life, taking the 6th place. Perhaps I would have skipped GM entirely and jumped straight into IGM, but to no avail. Of course, I also lost the chance to win a prize.

Sometimes, because of such events, which, unfortunately, happen all too often here on Codeforces, I'm thinking about abandoning CF for good. But I won't. I want everyone reading this to know that we all have bad days and bad rounds, and even if something as unfortunate as what happened to me today happens to you, you shouldn't give up. Your rating and material awards are not as important as your actual skill. And even if you never reach your desired rating or color or even skill level, the experience you'll have gained will last a lifetime. By participating in Codeforces rounds, you're always a winner.

-I

Read more »

 
 
 
 
  • Vote: I like it
  • +156
  • Vote: I do not like it

By ivan100sic, history, 2 years ago, In English,

Hello CodeForces!

I know this happened to everyone — you made an interesting mathematical/algorithmic/whatever discovery and you were very proud of it and later you realized it's already well known. What's your story?

I'll start:

I discovered a variant of Mo's algorithm around 4 years ago. I was solving the following problem: Given a static array a1, ..., an and q = O(n) queries. You are allowed to solve them offline. Each query has the form (l, r) and you're supposed to answer, if you were take all the numbers from al, ..., ar, extract them into another array and then sort that array, what would be the sum of all elements at odd indexes?

This is how I was thinking: If all the queries could be ordered such that both their left ends and right ends form an increasing sequence, you could answer all those queries by adding/removing elements from some augmented balanced binary tree or segment tree in O(nlogn). Then again, the same is true when all the queries can be ordered such that their left ends form an increasing and right ends form a decreasing sequence. So, why not decompose the set of queries into such sequences? When we sort the queries by their left end, this becomes equivalent to decomposing an array of numbers (right ends) into several increasing/decreasing subsequences. Then I remembered a theorem which states that, in an array of n2 + 1 numbers, there is always an increasing subsequence or a decreasing subsequence of length n + 1 — this means that we can decompose any array into such sequences, in time — perhaps there is a better algorithm but this was good enough for my problem. The final complexity is — there are sequences of queries and each one can be solved in .

Also, what were your silly childhood discoveries?

I remember discovering, when I was around 7, that in some apartments you could run around in a circle through a sequence of doors while in others you can't — and I really loved those where you can! Later I found out that they were equivalent to cyclic/acyclic graphs.

Read more »

 
 
 
 
  • Vote: I like it
  • +222
  • Vote: I do not like it

By ivan100sic, 2 years ago, In English,

Do you really hate tree rotations? This blog post is for you!

Recently, I've been solving 768G - The Winds of Winter and I came up with a solution which is so ridiculously overengineered that I think it deserves its own blog post!

I don't want to go over all the details of the solution. Instead, I want to demonstrate a nice way to implement and apply Persistent Set and Persistent Array data structures and share a few ideas some may find interesting.

Basically, what I wanted to do is compute for each node x of the rooted tree a set, which would contain the sizes of all subtrees of x. In other words, if we denote sz(y) — the size of the subtree rooted at y, I want to compute the set for all x. We also want this set to have some lower_bound and upper_bound functionality.

Obviously, just storing these sets in a normal way would require O(n2) memory, so we need another approach. Let's use the same idea that we use when building persistent arrays. A persistent array of size 2k, k > 0 will just have two pointers to persistent arrays of size 2k - 1. A persistent array of size 1 will only store data. Our persistent set will be implemented as a persistent array of size at least N whose only elements are 0, 1. Positions at which there is an element shall have value 1, others shall have 0. In addition to the two pointers, each array node will also store the sum of all values. Basically our structure looks a lot like a persistent segment tree. This will allow us to implement lower_bound and upper_bound in . I used class templates to save some memory — this is another interesting idea and a topic on its own.

When you want to insert a number into the set, you don't actually insert it into the set, instead, you create a copy of the set and insert it there. The copying will create additional nodes and will take time. Here's the code:

this_t* insert(int x) {
  this_t* tmp = new this_t(*this);

  if (x < H)
    tmp->left = tmp->left->insert(x);
  else
    tmp->right = tmp->right->insert(x - H);

  tmp->total = tmp->left->total + tmp->right->total;
  return tmp;
}

this_t is the alias for pa<T, order>, the data type of the class for persistent sets. This is basically a persistent array of length 2order. The first line just creates a shallow copy of the set the we are in. H is equal to half of the size of the current set. If x is less than this number, it means that this new value should be inserted into the left half. Otherwise, insert it into the right half. Again, insertion does not change the original but returns a copy, so we store that copy into tmp->left or tmp->right. We then recompute the total and return the temporary variable. Simple as that! The stopping condition is implemented in the class pa<T, 0>, see my submission 29446581 for more details.

But this is not all! We also need to be able to quickly merge two sets in our original problem! This appears impossible to do efficiently at first, but is actually quite simple and works very fast under some circumstances.

Let's see how we can merge (find the union of) two sets A and B. If one of the sets is empty, we return the other one. Otherwise, just merge the left half of B with the left half of A and the right half of B with the right half of A. Now you're probably asking yourself, how does this even make sense? This is obviously O(n) — and you're right, but, in the original problem, you will be merging small sets most of the time, and when one of the sets you merge is small (let's say it has g elements), this operation will create at most new nodes and take time. To understand this, think of what happens when one of the sets has 1, 2, or 3 elements.

Here's the code:

this_t* merge(this_t* arr) {
  if (arr->total == 0)
    return this;

  if (this->total == 0)
    return arr;

  this_t* tmp = new this_t(*this);
  tmp->left = tmp->left->merge(arr->left);
  tmp->right = tmp->right->merge(arr->right);

  return tmp;
}

When we calculate the sets on a tree, each set element will be merged into another set at most times so the complexity (both time and memory) is . You now have access to all the sets, for each node.

Last, but not least, let's see how lower_bound is computed. This, of course, does not create new nodes and should take time. We define lower_bound(x) as the smallest element y in the set such that y ≥ x. If such an element does not exist, we return the capacity of the set. Here's one possible implementation:

int lower_bound(int x) {
  if (total == 0)
    return 2*H;

  if (x < H) {
    int p = left->lower_bound(x);
    if (p == H) {
      return H + right->lower_bound(0);
    else
      return p;
  } else
    return H + right->lower_bound(x - H);
}

The problem here is that we sometimes call lower_bound in both directions. Fortunately, it is not so tragic, the actual complexity here really is . Why is that so? If the lower bound y for x does not exist, let's take y = N - 1 where N is the capacity of the set. We'll only visit array vertices which correspond to array segments which contain x or y and there's at most of them. We may also visit some others but we will exit immediately as their total is 0.

This concludes the tutorial for this implementation of persistent sets.

But why didn't you just use a segment tree for the problem? After all, you are given a rooted tree, you can flatten it by assigning DFS numbers...

As it turns out, the sets of sizes of subtrees of nodes are not the only thing I needed to compute. I also needed to compute , and also . Also, I wanted to try a new approach and see if it's efficient enough. And as it turns out, it is! :)

Here's the code: 29446581. Unfortunately, many comments are in Serbian and some names are meaningless.

Remark 1. It's also relatively easy to implement the order statistic query (find the kth largest element in the set).

Remark 2. You can also extend this to a multiset or a map<int, ?>.

Remark 3. You can save even more memory by delaying the creation of a subarray until it is needed, similar to the implicit segment tree. This allows us to create sets which can contain the full range of integers or even long longs, not just ones in a relatively small range, like 105.

Remark 4. You can accelerate finding the union or intersection of sets if you also store a hash value of each subarray. Then, only merge two sets if their hashes differ. This works particularly well if you often need to merge similar sets (those which differ in only a few elements). In the original problem this was not needed but it's a nice idea nonetheless.

Read more »

 
 
 
 
  • Vote: I like it
  • +21
  • Vote: I do not like it

By ivan100sic, 7 years ago, In English,

The contest was well balanced — scores for tasks were very close to standard scores. I hope you will find this editorial useful.

A :: k-String

Count the occurrences of each character. If any character appears a number of times not divisible by K, obviously, there is no solution. Otherwise, form the solution by first making the string B. The string B can be any string with the following property: if some character ch appears q times in the string B, the same character appears q***K** times in the given string. Now, just print that string K times.

B :: Special Offer! Super Price 999 Bourles!

The following observation will help you code the solution:

The largest number smaller than p ending with at least k nines is p - p MOD 10^k - 1 If the result turns out to be -1 , you can not reach a positive number with k or more nines. I will not explain the solution in detail — be careful when coding and have all the task statements in mind.

C :: Color Stripe

There are two cases to consider , when K=2 and when K>2. For K=2 there are only two possible solutions: the string "ABABAB..." and "BABABA..."

For both strings, simply count the number of differences between it and the given string and print a string with fewer differences.

For K>2 , decompose the string into contiguous single-colored sequences. Observe one such sequence. If it has an odd number of characters, say 2m+1, replace m characters with some other character in the following fashion:

AAAAA

becomes

ABABA

It can be observed that by changing less than m characters doesn't remove all pairs of adjacent identical characters. Similarly, for sequences of even length, say 2m characters, observe a character in the original string right after the last character of the observed sequence, and choose a character different from both. Example:

AAAAAAB

becomes

ACACACB

Again, it is sufficient and necessary to change m characters.

D :: Choosing Capital for Treeland

Arbitrarily root the tree at some vertex, say vertex 1. Now, all the edges are oriented either up (towards the root) or down (away from it). We will call upwards oriented edges red, and downwards oriented edges green. Now, with a single depth-first search, for each vertex, calculate its distance from the root (in number of edges) and the number of red edges along the path to the root. Also, count the number of red edges in the entire tree.

Now comes the interesting part: Observe that all edges outside the path from the root to vert should turn green, and those on the path should turn red.

The number of edges that need to be flipped if vert is chosen as a capital is given by:

RedEntireTree - 2*RedOnPath[vert] + RootDistance[vert]

E :: Parking Lot

Use a heap to maintain sequences of empty parking spaces as intervals. The comparison function for such intervals should return an interval which could store a car farthest from any other car, and if there is a tie, it should return the leftmost such interval. When inserting a car, pop the heap, look at the interval, place a car in the corresponding space and push two new intervals onto the heap. When removing a car, you should be able to find the two intervals which end at that car, remove them from the heap and push a new interval which forms when the two merge.

Read more »

 
 
 
 
  • Vote: I like it
  • +16
  • Vote: I do not like it