### mouse_wireless's blog

By mouse_wireless, history, 5 months ago, Hello codeforces,

Long time no see.

I'm having a look at problem #211 on leetcode. It's a classic problem: you add strings to a collection, and you have queries which ask if a string is part of the collection, except the query strings can contain wildcards, which can match any character.

The recommended solution is a trie-based one, in which if you encounter a wildcard, you try to continue the query from each of the current node's children (similar to a normal graph traversal).

The problem with this solution, is that it has a worst case complexity of $O(Q * N * L)$, where Q is the number of search queries, N is the number of added strings, and L is the maximum length of the string. This is achieved when there are N "different enough" strings (randomly generated will do) of max length, and Q queries of L wildcards (to avoid premature termination of the traversal, you can also force the final character to be different). In this case, on each query, you have to traverse the entire trie, which has $O(N * L)$ nodes.

I've run a lot of solutions I found on leetcode against this test case (code below).

int main() {
WordDictionary D;
for (int i = 0; i < 25000; ++i) {
string word;
for (int j = 0; j < 499; ++j)
word.push_back(rand() % 26 + 'a');
word.push_back('z');
}
int cnt = 0;
string q(499, '.');
q.push_back('a');
for (int i = 0; i < 25000; ++i)
cnt += D.search(q);
cout << cnt << endl;
return 0;
}


And none of them would finish in under 1 minute (there are some solutions which use a different approach which work on this case but "fail" on others).

So I decided to take out "the big guns" and ask codeforces. Is there a better than worst-case $O(Q * N * L)$ solution? trie,
By mouse_wireless, history, 3 years ago, Prelude: this post assumes the reader already know the concepts behind Binary Indexed Trees, as I will not be explaining them here.

Let's talk about 2-dimensional Binary Indexed Trees. The implementation of 2D BIT usually goes as follows:

1. Implement a standard, 1D BIT, with update and query functions;
2. Implement 2D update and query; these look exactly the same as the 1D case, except basic operations are replaced by calls to the 1D versions of these operations.

This is OK and works well for most cases. However, there are some issues with this implementation:

1. 2D methods are usually implemented (for coding speed reasons), by copy-pasting the 1D code and changing the operations. Copy-pasting is always risky, since it's so easy to miss a change you're supposed to make.
2. It doesn't feel "clean". There's duplicate code in there that makes the source harder to read (and maybe debug). Also, if you've messed up something in one of the dimensions (for example, you wrote < instead of <=), chances are you've also messed it up in the other one; and you have to remember to correct the mistake in both places.
3. It doesn't scale well. Issues #1 and #2 are heavily amplified when more than 2 dimensions are concerned. It was a problem which required 3D BIT that motivated me to write this post, but it's not unreasonable to imagine problems with 4D or even 5D BITs.

There's a bunch of different approaches to "fixing" these issues, but I want to tell you about a very clean, performance-centered approach. It uses template programming in C++ 11 and up. We'll be able to write code like this:

BIT<N, M, K> b;
b.update(x, y, z, P);
b.query(x1, x2, y1, y2, z1, z2);


The first line declares a N x M x K (3-dimensional) BIT. The second line adds P to the point at (x, y, z). The third line tells us the sum of values inside the cube with corners (x1, y1, z1) and (x2, y2, z2). Let's now get into the code.

template <int... ArgsT> struct BIT {
int val = 0;
void update(int val) {
this->val += val;
}
int query() {
return val;
}
};


The declaration tells us that a BIT structure will be parametric in 0 or more integral arguments. As mentioned previously, these parameters will tell us the sizes of the dimensions of the BIT. The code inside the class block will be used for 0-dimension BITs. In other words, single values. It's obvious what the update and query functions should be for this structure.

Now it's time to specialize this template for the case in which we have at least one parameter (in other words, at least one dimension).

template <int N, int... Ns>
struct BIT<N, Ns...> {
BIT<Ns...> bit[N + 1];
template<typename... Args>
void update(int pos, Args... args) {
for (; pos <= N; bit[pos].update(args...), pos += lastbit(pos));
}
template<typename... Args>
int query(int l, int r, Args... args) {
int ans = 0;
for (; r >= 1; ans += bit[r].query(args...), r -= lastbit(r));
for (--l; l >= 1; ans -= bit[l].query(args...), l -= lastbit(l));
return ans;
}
};


Here we have a multi-dimensional BIT in which the first dimension is of size N. This tree contains N BITs similar to this one, but with the first dimension removed (in fact, it's N + 1 BITs, since we are indexing from 1). In other words a BIT<10, 20, 30> will contain 10 instances of BIT<20, 30>. Makes sense!

The update and query functions take multiple arguments: the first ones are used for "walking along" the current dimension (as you would in a standard, 1D BIT), the rest are passed on to the functions called from the lower-tier BITs.

And with that we are done! With only 20 lines of code, we have a fully-functional multi-dimensional BIT that will work naturally for however many dimensions we want it to have. Oh, you will also need the lastbit function, but you're probably familiar with that already if you have come this far.

inline int lastbit(int x) {
return x & (-x);
}


Why do I like this implementation? Not only is it short and clean, but it's also efficient. Template programming means that all the structures needed will be generated at compile-time. At runtime, the program will waste no operations managing the dimensions and the structure of the trees.

Closing remarks:

• Many times with 2D (and higher) BITs, due to memory constraints, you will not be able to store the array BIT<Ns...> bit[N + 1];. Rather, it is preferred to use something like std::map< int, BIT<Ns...> > bit;, since most indexes will not be accessed anyway. You can find tons of articles on the internet about 2D BITs and you can read more about that, and other optimizations you can apply.
• An example of a simple problem where you can try this out is problem 1470 on timus. By mouse_wireless, history, 3 years ago, I want to discuss a type of data structure that I (personally) haven't really seen get any attention. It doesn't really do anything special, but I still find it interesting enough that it should be at least noticed.

It is an alternative to Fenwick trees (aka binary indexed trees), in the sense that it solves the same class of problems, in a memory-efficient way (unlike segment trees or binary search trees). Although the implementation has a couple extra lines of code, (in my subjective opinion at least,) it is easier to visualize and understand conceptually when compared to Fenwick trees. In fact, my motivation for writing this is that personally I've had a hard time learning BITs and understanding them (beyond memorizing the code) and for a long time I've avoided them in favor of segment trees or (when time/memory restrictions were tight), a structure similar to the one I'll be describing.

With that out of the way, the sample problem we are trying to solve is the following: given an sequence of N numbers, execute two kinds of queries: update (increase the value at some index by some value) and query (print the sum of all numbers up to some position).

We are going to build a binary tree in which node #i holds the sum of numbers in some range centered in i. How would we build this kind of tree (conceptually) from a given sequence of numbers?

1. we want to build a tree which covers the indexes in interval [l,r) -- initially, [0,N)
2. recursively build the tree which covers [l,mid) and set it as this node's left son
3. recursively build the tree which covers [mid+1,r) and set it as this node's right son
4. the value in this node is the sum of the values in its children plus sequence[mid]

Here's a visualization of what this tree looks like for the sequence (of length 6) {1, 10, 100, 1000, 10000, 100000}: It is easy to see that such a tree has exactly N nodes and a height of ceil(log2(N)). The nice thing about this construction is that we can always represent it as an array of numbers of length N, in which the ith element corresponds to the value inside the node which holds a range centered in i. We also don't need to keep any pointers to the left and right nodes, since we can compute them on the fly.

Now let's get to solving the problem at hand.

How do you perform an update operation (i.e. increasing the value at some index by some value)? We start at the top node (which contains the entire array). When we get to a node, add the update value to this node's value. Then, if the update index is smaller than this node's index, we need to go to the left subtree. If the update index is greater, go to the right subtree. Otherwise, (if the indexes are equal), we can stop. Keep in mind that "this node's index" is given by the average of the left and right of the node's range. Sample code for this operation:

void update(int pos, int x) {
for (int l = 0, r = N; l < r;) {
int mid = (l + r) / 2;
tree[mid] += x;
if (mid == pos) return;
if (pos < mid) r = mid;
else l = mid + 1;
}
}


What about querying (i.e. asking "what is the sum up to a given position")? Again, start from the top. If the queried position is less than the current node's index, move the query to the left subtree. Otherwise, move to the right subtree, but we have to add all of the elements to the left and inside the current node. In other words, if current node is represented by range [l,r), when we make the leap to node [mid+1,r) (the right subtree), we have to add all elements inside range [l,mid]. We don't have this value explicitly (left node holds interval [l,mid), not [l,mid]), but we can obtain it by noticing that the sum [l,mid] is the sum [l,r) minus the sum [mid+1,r]. In other words, it is the value inside the current node minus the value inside the right subtree. Sample code for this operation:

int query(int pos) {
int ans = 0;
for (int l = 0, r = N; l < r;) {
int mid = (l + r) / 2;
if (pos < mid) r = mid;
else {
ans += tree[mid];
l = mid + 1;
if (l < r) ans -= tree[(l + r) / 2];
}
}
return ans;
}


You can make various modifications to this structure. For example, you can implement single-pass range queries (currently, if you want to know the sum from an interval, you have to take the difference of two queries), which in offers better performance (even when compared to Fenwick trees, in my experience). You can also easily implement range updates with lazy propagation (the same way you would do it for a segment tree).

For sample problems, any problem which uses Fenwick trees will work, really. This is an educational problem for learning Fenwick trees. The memory limit is low to prevent segment tree solutions. The page is in Romanian but google translate works wonders. By mouse_wireless, history, 3 years ago, A 256 KB input file for Custom Invocations is many times not enough to cover large tests for problems.

The whole point of custom invocation is that it allows contestants to see how their code behaves on the machine used by the judge. Particularly handy when checking for time limits, since you cannot know how fast your code will run on the target machine.

With full feedback this isn't really a problem, since you'll just submit your code and get the "time limit" verdict and you'll know you have to optimize your code. But with partial feedback, you can test your code on your local machine, see that it runs in reasonable time, submit it, pass the pretests with plenty of time to spare and get time limit on the main tests. This tends to happen particularly when problems run in ~10^8 steps with strict time bounds. I feel like when this happens, it isn't really the programmer's fault since they cannot know for sure how fast their code will run on the judging machine. This is where Custom Invocation comes in, but is heavily limited by the 256 KB input file size limit (especially since checking for time limits usually involves large cases).

I was thinking we could have input generators for custom invocations. The system is already in place for hacks, so why not add it to invocations as well? Doesn't seem like it should be much of a hassle, does it? What are your opinions on this?

TL;DR Since input generators are already in place for hacks, why not add them for Custom Invocations as well? By mouse_wireless, history, 3 years ago, If you've been doing competitive programming for a while, you're probably aware by now that dynamically allocating memory (i.e. using new or malloc) is pretty expensive in terms of computation time (both because of slow allocation and potential cache misses).

What many people do to avoid dynamically allocated memory is implement a "memory cache": a pre-allocated array of elements, from which you can fetch elements at will. In code, this might look like this:

elem elemCache[MAX_NUMBER_OF_ELEMENTS];
int nxtElemCache = 0;

elem *fetchElement() {
return &elemCache[nxtElemCache++];
}


And instead of using the new operator, you just use the fetchElement function (you can also override the new operator itself, but it's slightly less readable for the purposes of this thread, so we'll leave it at that).

The problem arises when you do not know what MAX_NUMBER_OF_ELEMENTS is. For example, let's say that you are implementing a data structure (maybe a persistent segment tree) which you are not very familiar with. You can usually approximate this number (let's take the segment tree example: you know that at every step you are creating at most two nodes and you are doing O(log N) steps so maybe 2 * N * log N is a safe upper bound; maybe increase the factor for safety), but maybe since you are not familiar with the structure you don't know of a good upper bound. Maybe you just don't want to waste time thinking about it or maybe you are simply afraid that your calculations are wrong and don't want to take the chance and have wasted submission.

An elegant solution is to use std::deque. An example of how this would work in code:

deque<elem> elemCache;

elem *fetchElement() {
elemCache.push_back(elem());
return &elemCache.back();
}


Simply add an element to the back of your deque cache and return the address of this element.

You might be asking what makes std::deque such a good candidate for this job. Well, there are quite a few reasons:

• pushing to the end of std::deque takes amortized constant time (similar to std::vector)

• unlike something like std::vector, references to elements of a deque are never invalidated (when pushing at the front or back of the structure); this is guaranteed by the standard; this means that once you have allocated an element, it will remain allocated at that address (which is the desired behavior: you don't want the pointers to be invalidated);

• unlike something like std::list, it uses very little memory overhead;

• cache efficiency is maintained (again, unlike something like std::list) since internally internally std:deque holds contiguous chunks of memory.

Here are some submissions for last contest's problem F using persistent segment tree:

As a later edit, I will also add that this method also supports deletion (and later re-using the "deleted" memory), with the help of an additional stack. The idea is simple: instead of physically deleting a node, just throw it in a "recycle bin". When wanting to fetch a new node, first check if you have anything in the recycle bin instead. Implementation below:

stack<elem*> recycleBin;
deque<elem> elemCache;

elem *fetchElement() {
if (!recycleBin.empty()) {
elem *temp = recycleBin.top();
recycleBin.pop();
return temp;
}
elemCache.push_back(elem());
return &elemCache.back();
}

inline void deleteElement(elem *e) {
recycleBin.push(e);
} By mouse_wireless, history, 3 years ago, During one of my problem solving sessions, I thought about this problem somewhat related to what I was solving:

Say you have a set of numbers a_1, a_2, ..., a_n and some values x_1, x_2, ..., x_n.

On this set, you can make the following operation:

a_1, a_2, ..., a_n ------> a_1 + x_1, a_2 + x_2, ..., a_n + x_n

(in other words, you add to each number its corresponding x value).

You have to repeat this operation many times and after each time answer what is the minimum element in the set. By many I mean (for example) that the number of elements and the number of queries have the same order.

If all x values are equal, it's easy to solve with something like segment tree. However, the way the problem is, I have tried different approaches (including some algebraic ones) but I can't seem to come up with a solution that is better than quadratic. It seems like it should be a simple problem, is there a simple solution I am not seeing?

UPDATE: reading random (completely unrelated) posts on codeforces, I have by chance found something called "Convex hull trick" which can be used to determine the minimum element from a set of linear functions evaluated at a certain point. It seems like it could be applied in this situation, since the value of a_i after k steps can be expressed a linear function in k: f_i(k) = a_i + k * x_i. This method would provide a linearithmic solution. I will check it out tomorrow. By mouse_wireless, history, 4 years ago, I've recently come across the following problem in a real life scenario:

You are given a regular expression with two special characters: * matches any sequence of characters (including the empty sequence) and ? matches any one character.

For example, a?b matches acb, but it does not match abc, accb or ab. a*?b however does match accb. abc*f?h matches abcdefgh.

You have to write a program that checks if a pattern and a string match.

Obviously, we can write an O(n^2) algorithm which looks like this:

bool matches(const char *pat, const char *s) {
if (*pat == 0)
return (*s == 0);
if ((*pat == '?' || *pat == *s) && (*s != 0))
return matches(pat + 1, s + 1);
if (*pat == '*')
return ((*s != 0) && matches(pat, s + 1)) || matches(pat + 1, s);
return false;
}


For the scenario I have encountered, O(n^2) is more than enough, however I've been wondering for a while if a linear time algorithm (or anyway something better than quadratic) exists. I've been giving it some thought and can't seem to come up with anything. It looks like so it seems like some linear algorithm should exist, right? By mouse_wireless, history, 4 years ago, I feel like there should be a comprehensive list somewhere with all the rules (about judging, submitting, hacking etc), something that describes the entire process of a contest with all cases that can occur. A "codeforces handbook" if you will.

Story time: I've had a situation in the last contest I've taken part in, in which I have solved the first 4 problems in the first hour of the contest. For the last minutes of the contest I was starting to get bored (I had absolutely no idea how to solve the last problem and I've already been through all the submissions in my room for hacks). I started to modify my code for the 4th problem with different seeds and parameters (it was a randomized, heuristic solution, as was the official one) to see how it fared. I wanted to see how the new parameters would fare on the official tests, so I started to search google and codeforces blogs to see what happens if you re-submit after you've already passed the pretests and no matter what search terms I used I couldn't find anything on the topic. Ultimately, I decided to submit since in pretty much every contest I've taken part in if you've got "Accepted" on one solution, the submissions after the "Accepted" do not matter towards the score. But lo and behold, there was a penalty! My first solution got "Skipped" and only the last one was evaluated, adding more than 1 hour of submission time and losing me 300-400 points. I was stunned, left to wonder why in the world this isn't documented anywhere on codeforces. Only after the fact did I managed to find something about it, by googling "Codeforces skipped submissions". The whole situations seems kinda silly to me.

To avoid situations like that in the future, maybe it wouldn't be bad to have a centralized handbook with all the rules (or maybe there is one and I am missing it?). I remember how confused I was about the whole "hacking" system as well when I started out (nowhere was it clearly explained what a "hack" was or the concept of rooms or locking problems). By mouse_wireless, history, 5 years ago, 