minimario's blog

By minimario, history, 8 years ago, In English

Here's the problem: here.

My submission got an AC, sure, but it ran in O(N^2 log(10^6)), and it took >2000 ms to run (even with optimised MOD). But there are so many submissions, which I think are the same complexity, which run so fast!

For example, just look at the first 2 here!

The first one, I don't understand, I think it uses some magic instructions (wtf, 100'000 instead of 100000 and mod_t!!?) (by ordcoder). The second one, by mmaxio is even in Java, and I'm almost completely sure it's the same O(N^2 log(10^6)) algorithm as everyone else.

So how did these people (or anyone else with some fast submission) get their runtime down so low?

Can someone elaborate?

Edit 1: Thanks to ned_chu, I've replaced the inverse modding to be linear. My new submission is here

Thanks,

minimario

  • Vote: I like it
  • +15
  • Vote: I do not like it

»
8 years ago, # |
Rev. 2   Vote: I like it +16 Vote: I do not like it

Initializing mod inverse in you solution cost , but it is linear in Java solution. Actually, my teammate did things like this 2 days ago and got TLE and get AC after change mod inverse to online version. You can calculate mod inverse online instead of initializing it. Or you can initial it in linear time like that Java solution. Considering upper bound of mod inverse usage, it is always faster to calculate inverse online on single-case OJ like CF if you have N of 105 or 106. If it's multi-case OJ, it may be better to initialize mod inverse. You can learn how to initialize mod inverse in linear time. Maybe you need this, just skip all Chinese and go to about 60% this article, there some formulas about linear time mod inverse initialization.

  • »
    »
    8 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Thanks, it shaved about 100 ms. off my time. But I guess it doesn't really optimise it that much. New submission link here.

»
8 years ago, # |
  Vote: I like it +29 Vote: I do not like it

mod operations are extremely costly. Inside your bottleneck loop, you have four (!) of them; that Java solution has zero.

(Incidentally, the new fastest solution is just your code with the modulos removed. :P)

»
8 years ago, # |
  Vote: I like it +24 Vote: I do not like it

The first one, I don't understand, I think it uses some magic instructions (wtf, 100'000 instead of 100000 and mod_t!!?)

"100'000" is one of C++14 features. https://en.wikipedia.org/wiki/C%2B%2B14#Digit_separators

  • »
    »
    8 years ago, # ^ |
      Vote: I like it +21 Vote: I do not like it

    And "using mod_t = uint64_t" is a C++11 synonym for "typedef uint64_t mod_t".