Блог пользователя Radewoosh

Автор Radewoosh, история, 3 года назад, По-английски

Hello Codeforces!

As all of you know, there are so many known algorithms named after people who invented them — from the easiest ones, like Dijkstra, to the harder ones, like Berlekamp–Massey algorithm. These algorithms were innovative when they were invented, so of course, it's good that they are named after their inventors.

But, today, I've seen a blog about the solution to the problem "compute LCS of two strings in time $$$O((n + k) \cdot \log(k))$$$, where $$$n$$$ is the sum of lengths of the strings, and $$$k$$$ is the number of pairs of matching positions in them". To be honest, it a bit pissed me off that even this algorithm is named after its creators. Is it ok to name the solution to every possible problem after its author? I won't judge it. Anyway, I want my very own Radecki algorithm, so let me give it a try. If anyone has ever heard about it — it's cool. Let me know, and it'll be just another helpful blog on Codeforces.

Let's imagine the following problem: there is a directed graph on $$$n$$$ vertices without any edges and a list of $$$m$$$ directed edges which we want to add to this graph one by one. After adding each edge, we want to have full information about the strongly connected components. Know the number of strongly connected components, be able to tell whether two vertices are in the same connected component or even make this information persistent. Critical condition: the list of edges is given in advance.

So, for a few years, I'm able to solve this problem, but people told me that it's rather boring and it's not worth to write a paper about it. Anyway, let me give it a try.

Let's use the divide and conquer technique on the list of the edges. We can also think about it like about divide and conquer over the timeline. So, let's assume that we have some function $$$rec(a, b, edges)$$$, where $$$a$$$ is the start of the time interval, $$$b$$$ is the end (we can assume that both are inclusive, doesn't matter, I'm still pissed a bit) and $$$edges$$$ is the list of edges together with the times when they're added. We start the process like $$$rec(0, m-1, everything)$$$.

Everytime we should look at the middle of the time interval, so let's compute $$$s=\lfloor \frac{a+b}{2} \rfloor$$$. Let's use the standard offline algorithm to calculate strongly connected components in the graph which consists only of the edges from $$$edges$$$ which were added before $$$s$$$ or exactly at $$$s$$$. We want to split recursively somehow, but how to do it to keep good complexity? It turns out that in the moments after $$$s$$$ all the SCC that we've just calculated will stay connected. So, we can merge whole components into single vertices and only pass to the right edges connecting vertices from different strongly connected components. Also, the edges that are now connecting different connected components were useless in the past, which means that we can pass to the left the rest of the edges. In this way, each edge will occur once at every level of the recurrence, there are $$$O(\log(m))$$$ levels, and in each level the overall complexity is linear, thus we have $$$O(m \cdot \log(m))$$$ complexity (if we implement it carefully enough to have no extra logs).

What should we take from the recurrence? The simplest idea: at each call of the $$$rec$$$ function, for each edge, push some information to some global data structure. My idea is to for each edge that connects vertices from the same strongly connected component push information "these two vertices are in the same strongly connected component at this time". This gives us something which we can interpret as a graph of undirected weighted edges. It's easy to see that we are only interested in its MST (the graph has $$$n$$$ vertices and $$$O(m \cdot \log m)$$$ edges with weights from $$$0$$$ to $$$m-1$$$, we can calculate it's MST in $$$O(m \cdot \log(m))$$$ time).

What can we do now with this information? We can answer queries like "What's the moment when vertices $$$a$$$ and $$$b$$$ start belonging to the same strongly connected component?" Yep, that's the maximum value on the path between these two vertices.

Clear? Clear.

Cool? As f**k.

Easy to implement? So so, easy to copy-paste and use. My implementation (without computing MST, just getting the list of undirected edges) has 72 lines.

New? That's a tough question. I know that it was on the prime new year contest 2018, which took place between 2018 and 2019, which tells that this problem was created in 2018. Even if the intended solution is the same as mine, here comes a twist: link. This is the problem authored by me in July 2017. I know, just a pdf is worth nothing, but it's possible to dig deeper and believe me: the solution implemented here is the same as described.

Lest thoughts: the resulting tree is really similar to Gomory-Hu tree (because of minimums/maximums on the paths).

So, do you know any existing papers about this algorithm? If no, I claim that it'll be called Radecki algorithm, Radecki's algorithm, or something, it doesn't matter, it's mine XD. The resulting tree can even be called Radectree. Let me know if you know any sources about it and if (in your opinion) it's worth writing a paper about it (I hope that this blog is enough to sue anybody if he'll write this paper before me) as I don't know much about this scientific world.

  • Проголосовать: нравится
  • +543
  • Проголосовать: не нравится

»
3 года назад, # |
  Проголосовать: нравится +85 Проголосовать: не нравится

Five minutes (not enough to read it) after posting this blog it has a score of -4. Is it very known algorithm, but it happens that I've never heard about it or what? The blog is cool and educative anyway imo, what's going on?

  • »
    »
    3 года назад, # ^ |
      Проголосовать: нравится -63 Проголосовать: не нравится

    People were waiting for me to upvote first, here you go!(jking) But the blog deserves upvote since this is indeed educational for me.

  • »
    »
    3 года назад, # ^ |
      Проголосовать: нравится +354 Проголосовать: не нравится

    I looked on your post for whole 5 seconds and couldn't immediately understand the core idea of the algorithms. There are also no pictures and very little fancy $$$\LaTeX$$$ formulas. Bad post.

  • »
    »
    3 года назад, # ^ |
    Rev. 8   Проголосовать: нравится -28 Проголосовать: не нравится

    Don't bother, the people downvoting are either doing it for fun or they have 100000 rating on some random website that they read and disproved your algorithm in five minutes.

»
3 года назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

I wont be surprised if people are again fighting over whether to name the solution to the problem on its authors or not instead of talking about other aspects of the blog

»
3 года назад, # |
Rev. 2   Проголосовать: нравится +84 Проголосовать: не нравится

Excuse the stupid question, but what does LSC stand for?

»
3 года назад, # |
  Проголосовать: нравится +427 Проголосовать: не нравится

Waiting for someone to comment "This is a well known algorithm in China since 2011."

»
3 года назад, # |
  Проголосовать: нравится +17 Проголосовать: не нравится

So, I've got an information that the mentioned problem from prime new year contest was authored by Yuhao Du. jqdai0815: was your solution the same, or did you really want people to implement the solution from mentioned paper?

  • »
    »
    3 года назад, # ^ |
      Проголосовать: нравится +38 Проголосовать: не нравится

    This was also an open cup problem in Grand Prix of Zhejiang in September 2018 and based on this comment, the intended solution was the same as yours.

  • »
    »
    3 года назад, # ^ |
      Проголосовать: нравится +85 Проголосовать: не нравится

    Same as yours. I came up with this problem individually. Actually, I didn't even read the paper I mentioned.

»
3 года назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

So, next step is an article on arxiv.org.

»
3 года назад, # |
  Проголосовать: нравится +31 Проголосовать: не нравится

Recursive part of the algo strongly resembles the $$$O(n \log n)$$$ offline algorithm for dynamic connectivity problem, by the way.

»
3 года назад, # |
  Проголосовать: нравится +53 Проголосовать: не нравится

It seems as though something very similar was mentioned in this paper from 2016 here.

»
3 года назад, # |
  Проголосовать: нравится +9 Проголосовать: не нравится

Somehow in a quilted way in this blog, there is:

If I solved a problem, and my solution was completely new to me and I had no prior knowledge about any idea I used in the solution, then this solution should be named after me.
  • »
    »
    3 года назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    Isn’t the LCS case just like it (+ way more boring)?

    • »
      »
      »
      3 года назад, # ^ |
        Проголосовать: нравится +10 Проголосовать: не нравится

      The difference is time. If you discovered something easy, then it better be when the topic is new to get your name on it. It seems that LCS algorithm was published in the 70s.

    • »
      »
      »
      3 года назад, # ^ |
        Проголосовать: нравится +13 Проголосовать: не нравится

      Recently, I dipped into computer graphics and was surprised to learn that prefix sums are known there by confusing names like "SAT" and "integral images" and sometimes even assumed to be discovered by Franklin Crow. This gave me two things to reflect on:

      1. Emergence of catchy names is out of anyone's control.
      2. The presumed "inventor" almost surely did something different. In case of Crow, it was about unexpected discovery of fast and easy method for texture filtering, not about prefix sums.

      (Not sure about LCS algorithm you mentioned, though.)

»
3 года назад, # |
  Проголосовать: нравится +33 Проголосовать: не нравится

I'd expect something like
This is my algorithm for offline incremental strongly connected components in O(m*log(m)). There are many others like it, but this one is mine...

»
3 года назад, # |
Rev. 2   Проголосовать: нравится +28 Проголосовать: не нравится

Here is an article from 2008 which works online for amortized $$$O(\sqrt{m})$$$ per addition. The only reason I know about it is that the following problem was in the Hello Barcelona 2018, Day 3 (link-cut contest):

statement

Our team (and probably many others, maybe it was an official solution lol, I don't remember) implemented something like you write here (if anyone is interested for some reason: code)

If I'm not mistaken, ifsmirnov (last active 6 months ago) prepared this contest. Probably it's worth asking him about the source of this problem and the history of the solution in general

  • »
    »
    3 года назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    That’s the one from the prime new year contest. I also know about the paper, but I don’t know this algorithm, so I don’t know if it’s usable on contests (also my complexity is better).

    • »
      »
      »
      3 года назад, # ^ |
        Проголосовать: нравится +10 Проголосовать: не нравится

      I'm implementing Tarjan08 now, and my impression is that the method used to elimnate the log factors are monstronous. I think the "Soft-threshold search" will be practically slower than heap. Though, I'm not done implementing it and I do it only in the free time, so not a very good position to talk about it.

»
3 года назад, # |
  Проголосовать: нравится +225 Проголосовать: не нравится

Are you drunk? What's up with all that "I'm so cool, look at me" bullshit?

  • »
    »
    3 года назад, # ^ |
      Проголосовать: нравится +9 Проголосовать: не нравится

    YES SLAY UM_NIK — CANCEL RADEWOOSH

  • »
    »
    3 года назад, # ^ |
      Проголосовать: нравится +107 Проголосовать: не нравится

    I’m not (as far as I know). To be honest, seriously, for a few years, I was wondering if I missed my opportunity to write a cool paper about something useful. From the comments, I’ve learned that it wasn’t that innovative. I’ll try with something different sometime in the future. If at the same time I’m teaching people about cool (in my opinion) tasks and tricks, then it doesn’t look that bad.

    I don’t think that I’m that cool, don’t worry. Maybe I’m kind of mixing emotions and thoughts from outside of Codeforces (like private conversations with friends involved in CP) with things here, but I’m not sure if it’s that bad. It’s this kind of bullshit about being yourself or something. I don’t want to involve in long discussions about if CP should be more or less formal, but sometimes showing some emotions doesn’t sound like a crime. The emoticons are a good example. For sure, a lot of us use some of them (like XD or :D) in private chats, while here it’s not something frequent. It’s understandable, it’s not a chat with friends, but it isn’t an email chat with your boss also. We’re mostly relatively young people, so I don’t think that we should be deadly serious all the time.

    About the way to write educative blogs: For sure (at least for me), the less formal explanation, the easier to understand it. Here I mean papers as the hardest to understand and editorials/blogs as the easiest ones. If in your opinion it’s so shitty that it cannot be understood, then I’m sorry for wasting your time.

    • »
      »
      »
      3 года назад, # ^ |
        Проголосовать: нравится +27 Проголосовать: не нравится

      I feel that the format of asking a rhetorical question every line and then giving an answer can feel annoying at times (in the second half of the blog). Maybe once or twice would be good but other than that your blog is fine.

  • »
    »
    3 года назад, # ^ |
      Проголосовать: нравится +75 Проголосовать: не нравится

    Why are you jealous over Radewoosh explaining his cool and useful trick?

»
3 года назад, # |
  Проголосовать: нравится +34 Проголосовать: не нравится

Why do you bother having your algorithm when there is Polish Mafia Algorithm :)

On your question, I recommend you to check out ONTAK 2009 Godzilla.

  • »
    »
    3 года назад, # ^ |
      Проголосовать: нравится +10 Проголосовать: не нравится

    I know this problem, but the intended solution strongly uses the fact that we are only interested in the SCC with 0 indegree. I think that it can be solved in almost linear time (possibly with DSU).

»
3 года назад, # |
  Проголосовать: нравится +8 Проголосовать: не нравится

Not the MST idea but everything about divide-and-conquer can be found in Burunduk1's thesis from 2012 here (only in Russian I guess).

  • »
    »
    3 года назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    They are different algorithms. Linked thesis does the optimization by repeated shrinking, while Radewoosh's algorithm is a variant of parallel binary search.