imachug's blog

By imachug, history, 3 months ago, In English

Unfortunately, most derivations of the chance of polynomial hashing collision are invalid, wrong, or misleading, and finding reliable public sources with proofs is incredibly difficult. This article is a formal analysis of the method. The goal of this article is to complement well-known empirical facts with theory, provide boundaries on the probability of collision, justify common choices, and advise against certain popular parameters.

There is no code in this article, and it's mostly theoretical. It's more of a summa of everything we know about polynomial hashing in integers rather than a guide for beginners. You'll probably find something of interest still. I do, however, describe some methods of cracking hashes, which I can provide code and additional explanation for if someone asks in the comments and some general guidelines in the last section of the present article.

Table of contents

  • The concept of polynomial hashing
  • Classical justification of the coprimality requirement
  • The pitfall
  • Probability of size-bounded collision
  • Probability of value-bounded collision
  • Back to basics
  • Conditions of surjectivity
  • Handling lengths
  • Side note on coprimality
  • The two scenarios
  • Reversal
  • Thue-Morse sequence
  • Attempt to reuse Thue-Morse sequence for other moduli
  • Birthday paradox
  • Lower bounds for anti-hash tests
  • Particularities
  • Key takeaways

Read more »

 
 
 
 
  • Vote: I like it
  • +308
  • Vote: I do not like it

By imachug, history, 3 months ago, In English

I'd like to share this one unpopular but awesome algorithm that I only heard a mention of once. Kapun's algorithm finds a hash collision for moduli as large as $$$10^{18}$$$. I know, we have lots of algorithms that do much better than that (and I'm writing an article on that at the moment, keep tuned), but Kapun's algorithm is really surprising in its simplicity. That its correctness is so difficult to prove is of more surprise even.

Here we go.

A polynomial hash of a string $$$s$$$ is defined as

$$$ H(s) = s_0 b^0 + s_1 b^1 + s_2 b^2 + \dots + s_{n-1} b^{n-1} \pmod M. $$$

We want to find two strings with the same polynomial hash using only two characters: $$$0$$$ and $$$1$$$.

We can reformulate this problem in another way. Let $$$a_i = b^i \mod M$$$, then we want to find two distinct subsets of $$$a$$$ with the same sum modulo $$$M$$$. Now forget about the modulus: let's just find two subsets with the same sum.

Firstly, when is this possible? There are $$$2^n$$$ possible subsets and $$$n(M - 1) - n + 1 = n (M - 2) + 1$$$ distinct sums. If $$$2^n > n (M - 2) + 1$$$, that is, $$$\frac{2^n}{n} \gg M$$$, there are certainly two subsets with the same sum. Generating $$$M$$$ or even just $$$\sqrt{M}$$$ strings (if you use birthday paradox) is impossible for large $$$M$$$, but there's still a way through.

It turns out there is a deterministic algorithm that attempts to solve this problem that is very easy to implement!

Read more »

 
 
 
 
  • Vote: I like it
  • +246
  • Vote: I do not like it

By imachug, history, 6 months ago, In English

The first day of IATI 2021 has finished about half an hour ago. Please feel free to discuss the competition here!

Problems

Results

Editorials

^ Please feel free to provide links when they are available

Read more »

 
 
 
 
  • Vote: I like it
  • +130
  • Vote: I do not like it

By imachug, 9 months ago, In English

This is not an exit.

I am looking at the post called Is Mike Mirzayanov dictator? at the moment. I'm reading through the comments, I'm looking at people's reactions, I'm reading Mike's replies, and... I am utterly disappointed.

I demand progress.

For context, I do believe Codeforces is the best website for competitive programming contests at the moment. I sincerely give credit to Mike for creating Codeforces and providing participants and contest authors a platform for interaction completely for free. I am confident that Codeforces is the most appropriate website for being the platform for sharing knowledge, exchanging tricks and methods, and collaboration between both contest participants, their authors, and coordinators.

Please consider this an open letter, for this is a will not of a single person, but of many individuals. Please consider signing it by stating so in a comment under the post, should you incline to do so.

Read more »

 
 
 
 
  • Vote: I like it
  • +527
  • Vote: I do not like it

By imachug, history, 12 months ago, translation, In English

image

Polygon is not world's greatest evil. Polygon doesn't even have a particular defect that makes it horrible. You can't say what's exactly wrong with Polygon. Polygon seems to work, it seems like every feature is supported, but if you touch it here it will fall apart, and if you touch it there a problem (pun not intended) will appear. Or not, depending on your luck. Polygon is like PHP. For those who haven't read the famous rant, I'll cite it for you:

I can't even say what's wrong with PHP, because-- okay. Imagine you have uh, a toolbox. A set of tools. Looks okay, standard stuff in there.

You pull out a screwdriver, and you see it's one of those weird tri-headed things. Okay, well, that's not very useful to you, but you guess it comes in handy sometimes.

You pull out the hammer, but to your dismay, it has the claw part on both sides. Still serviceable though, I mean, you can hit nails with the middle of the head holding it sideways.

You pull out the pliers, but they don't have those serrated surfaces; it's flat and smooth. That's less useful, but it still turns bolts well enough, so whatever.

And on you go. Everything in the box is kind of weird and quirky, but maybe not enough to make it completely worthless. And there's no clear problem with the set as a whole; it still has all the tools.

Now imagine you meet millions of carpenters using this toolbox who tell you “well hey what's the problem with these tools? They're all I've ever used and they work fine!” And the carpenters show you the houses they've built, where every room is a pentagon and the roof is upside-down. And you knock on the front door and it just collapses inwards and they all yell at you for breaking their door.

That's what's wrong with PHP.

Read more »

 
 
 
 
  • Vote: I like it
  • +180
  • Vote: I do not like it

By imachug, history, 13 months ago, In English

This is the tenth time I stumble upon a controversial blog, write a large comment that'd be useful both to the OP and other people, and when I finally post the comment Codeforces tells me the post is deleted and my giant rant doesn't get saved. I usually breath in and out and get over it, but this time I figured out I have spare contribution to post a complaint I can share my thought with the community.

It'd be great if comments were saved to drafts just like posts. This would be useful in case of network problems or power failure too, and would just improve UX overall.

Read more »

 
 
 
 
  • Vote: I like it
  • +45
  • Vote: I do not like it

By imachug, history, 13 months ago, In English

Hello, Codeforces!

A few days ago MohammadParsaElahimanesh posted a blog titled Can we find each Required node in segment tree in O(1)? Apparently what they meant was to find each node in $$$\mathcal{O}(ans)$$$, according to ecnerwala's explanation. But I was too dumb to realize that and accidentally invented a parallel node resolution method instead, which speeds up segment tree a lot.

A benchmark for you first, with 30 million RMQ on a 32-bit integer array of 17 million elements. It was run in custom test on Codeforces on Apr 6, 2021.

  • Classic implementation from cp-algorithms: 7.765 seconds, or 260 ns per query
  • Optimized classic implementation: (which I was taught) 4.452 seconds, or 150 ns per query (75% faster than classic)
  • Bottom-up implementation: 1.914 seconds, or 64 ns per query (133% faster than optimized)
  • Novel parallel implementation: 0.383 seconds, or 13 ns per query (400% faster than bottom-up, or 2000% faster than classic implementation)

Read more »

 
 
 
 
  • Vote: I like it
  • +402
  • Vote: I do not like it

By imachug, history, 15 months ago, In English

Hi, Codeforces!

I've just made a script that publishes all new posts (and their updates) on Codeforces to a Telegram channel, with formatting and stuff.

The notifications should are quite fast, they should propagate in less than 5 minutes.

Enjoy!

Read more »

 
 
 
 
  • Vote: I like it
  • +63
  • Vote: I do not like it

By imachug, history, 16 months ago, translation, In English

Hello, everyone! You might have noticed that Codeforces has changed the logo to a new one temporarily, but it seems the admins can't decide what is better. Let's vote on the logo and see what the community itself likes! Upvote the comment for the logo you like below.

Read more »

 
 
 
 
  • Vote: I like it
  • +214
  • Vote: I do not like it

By imachug, history, 16 months ago, In English

Hello, Codeforces!

The current state of affairs is that running your own a contest on Codeforces is quite difficult. First, Codeforces doesn't accept single-problem proposals, so if you don't have many CP friends and can't make 6 good problems, you're out of luck. Second, many trash problems get proposed, and the coordinators have to spend much time filtering them out and then explaining why these problems were rejected. This status quo is bad for everyone, both participants and problem setters, so I'm thinking of a way to fix the situation.

Here is my idea. People generate trash problems because, when just a single of their problems is rejected, the whole contest is in danger, so problem setters 'bruteforce' problems to plug the hole. What if we help problem setters to propose just a single problem, and then problems from different people could be merged into a contest? This would reduce the fraction of bad problems because setters won't propose them just to fill holes.

The question is about bypassing the bus factor—coordinators. I think there are many 2300+/red users who could help moderate problem proposals. There are many people from Div 1 who like competitive programming but write Codeforces rounds not very often, perhaps because of Codeforces style or timing. I think such people would be ideal moderators: they have experience, they won't cheat, they are ready to help and they're interested in looking 'behind the scenes'.

As a bonus, testers who would like to test rounds but they don't have friends—problem setters, will be able to take part in the process. And these testers aren't necessarily high-rated participants — it's quite common that neither contest authors, nor testers see a simple problem solution based on, say, greedy algorithms, or the tests are too weak, and this is only noticed by experts during the round.

In TL;DR form, the idea is to help testers, moderators and problem setters find each other to reduce the burden on Codeforces coordinators and run quality rounds more often.

What do you think?

Read more »

 
 
 
 
  • Vote: I like it
  • +66
  • Vote: I do not like it

By imachug, history, 18 months ago, In English

I'm interested in non-classic problems here on Codeforces, so I've looked through the problems with special tag. The ones with non-standard format are:

  1. Problems that can only be solved in a single special language, such as Q# or a secret language. Example: a single problem 409B - Загадочный язык or the whole Kotlin Heroes 5: ICPC Round contest.
  2. Problems with a stateful checker, which detects how many submissions one has made. Example: 409H - A + B наносит ответный удар.

I also vaguely remember a problem where one had to output his username with some limitations on the source code, but it could be just a comment on a blog.

Anyway, I can't find out how to create any similar problem on Polygon. Obviously there's a whitelist of supported languages, but what about allowing the user to add an interpreter for the language in C? Or allowing the checker to read the original code, not its output? I'm interested how this is implemented for the official contests and if I can do that in a gym.

As for the second type, it'd be useful if the checker could get meta-information such as the user handle or ID, and access a local DB or use network as a state store. I couldn't find any sane documentation so I'm asking here.

Read more »

 
 
 
 
  • Vote: I like it
  • +134
  • Vote: I do not like it