BessieTheCow's blog

By BessieTheCow, 13 months ago,

I remember seeing people mention that link cut trees can be used for dynamic LCA queries, but I can't find any information on how its done. One way that I can think of to find the LCA of nodes $u$ and $v$ is to access $u$ first and then $v$, then splay $u$ and take its path parent pointer. Is there a more efficient way? Also, can link cut trees be used for subtree aggregate queries?

• +30

By BessieTheCow, 13 months ago,

In CodeChef's June Lunchtime contest today, there was a problem called Gold Mining, which I'll summarise here:

There are a $N$ gold mines, and the $i$th one has $G_i$ units of gold in it. Chef and Chefu can extract gold from the mines at a constant rate. For the $i$th mine, Chef can extract $P_i$ units of gold per unit of time, and Chefu can extract $Q_i$ units of gold per unit of time. Note that time is continuous here, not discrete. Chef and Chefu each can extract gold from one mine at a time, and they can instantaneously move from one mine to another at any point in time. Chef always know where Chefu is, and Chefu always know where Chef is. Chef and Chefu are always trying to maximize the total amount of gold they get themselves. The task is to predict the total amount of gold which Chef will mine and the total amount of gold which Chefu will mine if Chef and Chefu act optimally.

The solution in the editorial goes like this:

Let $X = \sum_i G_i P_i / (P_i + Q_i)$ and $Y = \sum_i G_i Q_i / (P_i + Q_i)$. $X$ and $Y$ are the amounts of gold which Chef and Chefu will get if they are always in the same mine. Clearly, the sum of the amount of gold which Chef and Chefu will mine is equal to the total amount of gold in the mines. Assume that Chef and Chefu are in different mines at some point. If Chef will get at least $X$ units of gold continuing in this state, then Chefu will get no more than $Y$ units of gold this way. Therefore, it is always optimal for Chefu to move to the mine which Chef is in so that Chefu gets $Y$ units of gold and Chef gets $X$ units of gold. Similarly, if Chef will get no more than $X$ units of gold continuing in this state, then it will be optimal for Chef to move to the mine which Chefu is in so that Chef gets $X$ units of gold and Chefu gets $Y$ units of gold. Therefore, if both Chef and Chefu act optimally, they will always be in the same mine, and the answers are $X$ and $Y$.

I believe that this proof is invalid, and I will now attempt to demonstrate why. First of all, I'll state how I'm interpreting the editorial's proof a bit more formally:

Assume that at some time $t$, Chef and Chefu are in different mines. Let $X'$ and $Y'$ be the amount of gold which Chef and Chefu will get respectively if Chef and Chefu do not move into the same mine at time $t$. If $X' \geq X$, then $Y' \leq Y$, therefore it is optimal for Chefu to move into the mine that Chef is in, because Chefu will then be able to get $Y$ gold instead of $Y'$ gold. If $X' \leq X$, then it is optimal for Chef to move to the mine which Chefu is in so that he can get $X$ gold instead of $X'$ gold. Therefore, the optimal strategy for both Chef and Chefu is to always be in the mine which the other person is in, and they will get $X$ and $Y$ gold.

This proof assumes that if Chef and Chefu move into the same mine at time $t$, then they will get $X$ and $Y$ units of gold. However, the problem is that Chef and Chefu can still move to a different mine afterwards in the same instant. Let's say at time $t$, $X' \leq X$ so Chef moves into the mine which Chefu is in. The proof assumes that this will result in Chef getting $X$ gold and does not rule out the possibility that Chefu can move to a different mine in the same instant. In fact, the claim that it is always optimal for Chefu to not move to another mine immediately after Chef moves is the exact same as the statement that is supposedly proved, because the positions of Chef and Chefu at a given instant doesn't actually matter. We all know that the proof of the statement can't contain the assumption that the statement is true.

To think about it another way, the proof in the editorial is claiming that Chef can ensure that he is always in the same mine as Chefu, because he can move into the mine which Chefu is in at any moment. However, using the same logic we can conclude that as long as there are at least two nonempty mines, Chefu can ensure that he is always in a different mine than Chef by moving as soon as Chef moves into his mine. As far as I can tell, if Chef and Chefu are in different mines for any nonzero amount of time, the outcome would change.

So this is why I believe the proof in the editorial for this problem is invalid. I suspect that this problem is actually unsolvable, just like how it makes no sense to try to predict the outcome of a game where one player wants to make a bit 1, the other player wants to make it 0, and both players can instantaneously flip the bit at any moment. I haven't been able to find a rigorous proof of this though. I'm not sure how to even define "optimal" for this problem. What do you think about this? If anyone has a rigorous proof, please share it.

Link to same post on codechef: https://discuss.codechef.com/t/is-the-problem-golmine-invalid/70104?u=feixiang

• +85

By BessieTheCow, 14 months ago,

Today I'm going to talk about Kőnig's theorem in graph theory, which states that in a bipartite graph, the size of a maximum matching is the same as the size of a minimum vertex cover. I had a hard time understanding this theorem since the resources I found required either logical leaps as in the case of the Wikipedia article and the book the article referenced or advanced concepts such as linear programming. Therefore, I'm going to try to explain it in a more intuitive way here both as a review for myself and to help others learn.

Review of relevant concepts

Bipartite graph: A graph where the vertices can be partitioned into two sets such that every edge connects a vertex of one set to a vertex in the other set.

Matching: A subset of the edges in a graph such that no two edges in the subset share an endpoint. A matching with the greatest possible size in a graph is called a maximum matching. A matching is said to match a vertex if the vertex is an endpoint of some edge in the matching.

Alternating path: An alternating path of a matching is a path in the graph where the edges alternate between being in the matching and not being in the matching.

Augmenting path: An alternating path where the two vertices on both ends are unmatched.

Vertex cover: A subset of the vertices in a graph which includes at least one endpoint of every edge. An vertex in a vertex cover is said to cover an edge if it is incident to the edge. A vertex cover with the minimum possible size is called a minimum vertex cover.

Proof of the theorem

The proof for this theorem which I will present will also provide an efficient method of finding a minimum vertex cover given a maximum matching. It is based on the Wikipedia article and page 74-75 of a book that the Wikipedia article referenced.

Lemma 1 (Berge's lemma): Maximum matchings have no augmenting paths. Proof: Suppose we have a matching which has augmenting path. For each of the edges in the path, if it is in the matching, we'll remove it from the matching, and if it isn't in the matching we'll add it to the matching. The result will be a valid matching with a size 1 greater than what we started with, therefore the original matching isn't maximum.

Lemma 2: The size of the minimum vertex cover must be greater than or equal to the size of the maximum matching. Proof: Since no two edges share an endpoint in a matching, each vertex in the vertex cover can cover at most one edge in the matching. Therefore a vertex cover must not have less edges than the size of the maximum matching. This means that if we can construct a vertex cover with size equal to the size of a maximum matching, it will be a minimum vertex cover.

Now I'll show how to construct a vertex cover with the same size as a maximum matching.

Some definitions: Let $V$ be the set of vertices and $E$ be the set of edges in a bipartite graph. Let our graph $G = (V, E)$ be partitioned into two sets of vertices $X$ and $Y$ such that every edge has an endpoint in $X$ and an endpoint in $Y$. Let $M$ be a maximum matching in $G$. Let $U$ be the set of unmatched vertices under $M$ in $X$. Let $Z$ be the set of vertices which can be reached from a vertex in $U$ via an alternating path under $M$ (including the vertices in $U$). All alternating paths that I talk about in this post will start from a vertex in $U$. Let $S = Z \cap X$ and $T = Z \cap Y$, the vertices in $Z$ which are in $X$ and the vertices in $Z$ which are in $Y$ respectively. Let $N(S)$ be the set of vertices adjacent to a vertex in $S$.

Lemma 3: The alternating path from a vertex in $U$ to a vertex in $S$ ends with an edge which is in $M$, and the alternating path from a vertex in $U$ to a vertex in $T$ ends with an edge not in $M$. Proof: The first edge in the alternating path must not be in $M$ since the vertex in $U$ is unmatched. Paths from a vertex in $X$ to another vertex in $X$ must have even length, and paths from a vertex in $X$ to a vertex in $Y$ must have odd length. Since the edges alternate between being in and not being in $M$, even length paths end in a edge part of $M$ and odd length edges end in an edge not part of $M$.

Lemma 4: $N(S) = T$ Proof: Clearly, $N(S) \supseteq T$, since for every vertex $u \in T$, the previous vertex on an alternating path from a vertex in $U$ to $u$ is in $S$. Consider a vertex $u \in N(S)$. If $u$ is connected to a vertex $v \in S$ by an edge not in $M$, then the alternating path to $v$ can be extended to $u$ by that edge and therefore $u$ is in $T$. If $u$ is connected to a vertex $v \in S$ by an edge in $M$, then $u$ must be a vertex on an alternating path to $v$. This is because the alternating path to $v$ must end in an edge in $M$ due to lemma 3 above, and since $v$ is only incident to one such edge, $u$ must be the previous node in the alternating path to $v$. This means that $u$ is also in $T$ in this case. Now we know that $N(S) \subseteq T$, and since $N(S) \supseteq T$, $N(S) = T$.

Let $K = (X \setminus S) \cup T$. All edges must have at least one endpoint in $K$. To prove this, suppose there is an edge $e = (u, v)$ such that $u \in X$ and $v \in Y$, and $u, v \notin K$. Then $u \in S$ and $v \notin T$, which is a contradiction because of lemma 4 since $v \in N(S)$ but $v \notin T$. Therefore, $K$ is a vertex cover. Now I'll prove that $|K| = |M|$, by proving that all vertices in $K$ are matched by $M$ and no edge in $M$ matches two vertices $K$. All vertices in $K \cap X = X \setminus S$ are matched because the unmatched vertices in $X$ are all in $U$ and therefore $S$. All vertices in $K \cap Y = T$ are matched because otherwise, the alternating path to an unmatched vertex in $T$ would be an augmenting path contradicting lemma 1. If an edge $e \in M$ has both endpoints in $K$, one of the endpoints $u$ must be in $K \cap X = X \setminus S$ and the other endpoint $v$ must be in $K \cap Y = T$. By lemma 3 the alternating path to $v$ ends in an edge not in $M$, which means it can be extended to $u$ via $e$. This means $u \in S$ which is a contradiction. Since all vertices in $K$ are incident to an edge in $M$ and all edges in $M$ are incident to exactly one vertex in $K$, $|K| = |M|$ and $K$ is a minimum vertex cover by lemma 2.

Now we have a method to construct a minimum vertex cover in a bipartite graph given a maximum matching. $S$ and $T$ can be found by a simple DFS, so this entire construction takes linear time.

Practice problems

CSA Round #23: No Prime Sum, suggested by limabeans

COCI 19/20 Round 6: Skandi, suggested by -is-this-fft-

PacNW 2016 G: Maximum Islands, suggested by brandonzhang

BAPC 2017 E: Easter Eggs, suggested by brandonzhang

SWERC 2012 F: Sentry Robots, suggested by brandonzhang

Please comment below if you have any questions or if there's a part that I could have explained better. Thanks for reading!

• +119

By BessieTheCow, 15 months ago,

You might have seen neal's nice blog post on blowing up std::unordered_map with integer keys: https://codeforces.com/blog/entry/62393. It works because GCC's std::hash<int> simply returns the input value unchanged and std::unordered_map mods the hash with a prime to obtain the index in the hash table. By inserting a bunch of multiples of the prime used, it's possible to cause all the elements to be put into the same bucket which makes operations take linear time.

std::unordered_map and std::unordered_set are commonly used with strings, so I figured out a way to do something similar by generating tons of strings which hash to the same value with std::hash<std::string>>.

The hashing algorithm

I dug into the libstdc++ source code on Github to see how strings are hashed. std::hash<std::string>> is in <string>, which includes <bits/basic_string.h>. std::hash<std::string>> is defined there to call std::_Hash_impl::hash on the string. std::_Hash_impl is defined in <bits/functional_hash.h>, and it calls std::_Hash_bytes with a fixed seed of 0xc70f6907. _Hash_bytes is defined in <bits/hash_bytes.cc>, where we can finally find the actual hashing algorithm.

When compiling for 32-bit, std::_Hash_bytes implements the 32-bit MurmurHash2. Here's an equivalent implementation:

uint32_t MurmurHash2(const void* key, int len, uint32_t seed)
{
// 'm' and 'r' are mixing constants generated offline.
// They're not really 'magic', they just happen to work well.

const uint32_t m = 0x5bd1e995;
const int r = 24;

// Initialize the hash to a 'random' value

uint32_t h = seed ^ len;

// Mix 4 bytes at a time into the hash

const unsigned char* data = (const unsigned char*)key;

while (len >= 4)
{
uint32_t k = data[0];
k |= data[1] << 8;
k |= data[2] << 16;
k |= data[3] << 24;

k *= m;
k ^= k >> r;
k *= m;

h *= m;
h ^= k;

data += 4;
len -= 4;
}

// Handle the last few bytes of the input array

switch (len)
{
case 3: h ^= data[2] << 16;
case 2: h ^= data[1] << 8;
case 1: h ^= data[0];
h *= m;
};

// Do a few final mixes of the hash to ensure the last few
// bytes are well-incorporated.

h ^= h >> 13;
h *= m;
h ^= h >> 15;

return h;
}


Notice that the data is processed in blocks of four bytes. Each block is hashed individually before being mixed into the hash of the previous blocks.

One method to generate collisions for MurmurHash is described here: https://emboss.github.io/blog/2012/12/14/breaking-murmur-hash-flooding-dos-reloaded/. However, it generates sequences of arbitrary bytes. After an exhaustive test of all possible four byte blocks I found that it wasn't possible to use that method to generate strings which only contain lowercase letters. This attack generates collisions which work regardless of the seed value, which is overkill here since the seed is fixed. By taking advantage of the fixed seed I was able to invent my own attack that generates strings with only lowercase letters.

My attack

Firstly we can ignore the part that handles strings with lengths that aren't multiples of four by only using strings that are multiples of four long. Notice that the algorithm xors the hash of the last block with the hash of the previous blocks and then does some mixing to produce the final hash. The other attack that I linked to takes advantage of the fact that the process for hashing a block and generating k can be fully inverted like this:

uint32_t invert(uint32_t x)
{
x *= 0xe59b19bd;
x ^= x >> 24;
x *= 0xe59b19bd;
return x;
}


This means that we control the value of h after the last block is hashed before the final mixes by controlling the hash of the last block using the inverse function above. We are however restricted to lowercase ASCII bytes, so only around 1 in 10000 of all the 2^32 possible values can be produced. I wrote some code that randomly generates eight character strings, computes the partial hashes of the strings, and then checks if the hash values can be produced using only lowercase ASCII bytes. Using this code I can quickly generate 40000 twelve character strings which all hash to 0. Here's the collision generator code:

#include <iostream>
#include <fstream>
#include <string>
#include <random>
#include <cstdlib>
#include <cctype>
#include <set>
using namespace std;

constexpr uint32_t m = 0x5bd1e995u;
constexpr int r = 24;

mt19937 rng((random_device()()));
uniform_int_distribution<> dist(0, 25);

uint32_t invert(uint32_t x) //inverse of process to generate k
{
x *= 0xe59b19bd;
x ^= x >> 24;
x *= 0xe59b19bd;
return x;
}

bool good(uint32_t x) //tests if a value contains only lowercase letter bytes
{
for (int i = 0; i < 4; i++, x >>= 8)
{
unsigned char b = x;
if (!islower(b))
{
return false;
}
}
return true;
}

string genstr(int length) //randomly generate a string
{
string s;
s.reserve(length);
for (int i = 0; i < length; i++)
{
s += dist(rng) + 'a';
}
return s;
}

uint32_t hsh(const string& s, uint32_t h) //computes partial hash of the string
{
const char* data = s.data();
int len = s.size();

while (len >= 4)
{
uint32_t k = data[0];
k |= data[1] << 8;
k |= data[2] << 16;
k |= data[3] << 24;

k *= m;
k ^= k >> r;
k *= m;

h *= m;
h ^= k;

data += 4;
len -= 4;
}
return h;
}

int main(int argc, char** argv)
{
if (argc != 4)
{
cout << "usage: " << argv[0] << " blocks count outputfile\n";
return 0;
}

int blocks = atoi(argv[1]); //number of random blocks
if (blocks < 1)
{
cout << "block count must be positive\n";
return 0;
}

int len = 4 * blocks + 4; //total length
int seed = 0xc70f6907u ^ len;

int totalcnt = atoi(argv[2]);

ofstream fout(argv[3]);
if (!fout.is_open())
{
cout << "cannot open output file\n";
return 0;
}

set<string> results; //filters out duplicates
int prevprogress = 0; //progress when the previous string was found
while (static_cast<int>(results.size()) < totalcnt)
{
string s = genstr(4 * blocks);
uint32_t x = invert(hsh(s, seed) * m); //block value needed to make hash value constant
if (good(x))
{
s.reserve(len);
for (int i = 0; i < 4; i++, x >>= 8) //convert last block into characters
{
s += x & 0xffu;
}

if (results.insert(s).second)
{
//print progress
int progress = static_cast<int>(results.size()) * 10 / totalcnt;
if (progress != prevprogress)
{
cout << progress << "0%\n";
prevprogress = progress;
}
}
}
}

for (const string& s : results) //print results
{
fout << s << '\n';
}
}

Sample of the output with 40000 collisions

With the 256 KB limit on the size of hack cases, we can fit around 18000 of these strings in one test. The following test program takes 1185 ms on 18000 collision strings, compared to less than 1 ms on the same amount of random strings.

#include <iostream>
#include <string>
#include <unordered_set>
using namespace std;

int main()
{
ios_base::sync_with_stdio(false);
cin.tie(nullptr);
unordered_set<string> x;
for (string s; cin >> s;)
{
x.insert(s);
}
cout << x.size() << '\n';
}


This isn't much but with some more operations on the set it should be enough to cause a TLE. I've tried to come up with ways to fit more strings into one test case, but the generator isn't fast enough to be directly submitted and the 20 KB limit on the size of the generator means that I can't compress the strings and embed them into a program. If you can think of ways to cram more strings into a test case, or you know of a problem in a recent contest which I can test this hack on, please tell me.

Extension to 64-bit GCC and MSVC

64-bit GCC uses a 64-bit version of the same hash function, so the same attack also works there. The only difference is that the fraction of blocks which contain only lowercase ASCII bytes is much smaller at around 1 in 100 million, since blocks are eight bytes large. This means that generating 40000 collisions would take a few days, although using multiple threads to take advantage of multiple processor cores could shorten that a bit.

MSVC uses the FNV-1a hash. It xors each byte with the hash of the previous bytes and the multiplies the state with a prime. A similar attack can be used, where we generate a random string and then test if it is possible to manipulate the last few characters in order to control the hash. For example, we can precompute a set of all partial hash values such that it is possible to make the hash 0 by appending a few letters. Then we generate random strings and check if their partial hashes are in the set.

Protecting against this attack

The best way probably is to use another hash function with a randomized seed. Beware of similar vulnerabilities in other hash functions. The authors of the other attack that I mentioned wrote their own hash function which they claim is more secure: https://131002.net/siphash. Note that mixing a random constant with the output of std::hash<std::string>> won't work because at that point the hashes have already collided and are equal.