No TLE for 10^9 operations solutions??

→ Обратите внимание

До соревнования
Helvetic Coding Contest 2024 online mirror (teams allowed, unrated)
06:51:44
Зарегистрироваться »

→ Лидеры (рейтинг)

№	Пользователь	Рейтинг
1	tourist	3690
2	jiangly	3647
3	Benq	3581
4	orzdevinwang	3570
5	Geothermal	3569
5	cnnfls_csy	3569
7	Radewoosh	3509
8	ecnerwala	3486
9	jqdai0815	3474
10	gyh20	3447

Страны | Города | Организации

Всё →

→ Лидеры (вклад)

№	Пользователь	Вклад
1	maomao90	174
2	awoo	165
3	adamant	161
4	TheScrasse	160
5	nor	158
6	maroonrk	156
7	-is-this-fft-	152
8	orz	146
9	SecondThread	145
9	pajenegod	145

Всё →

→ Найти пользователя

→ Прямой эфир

Детальнее →

Блог пользователя imposter_syndrome

No TLE for 10^9 operations solutions??

Автор imposter_syndrome, история, 5 лет назад, По-английски

Hi Everyone, i have been seeing this for some days now on codeforces. In yesterday's educational round, I submitted problem D with https://codeforces.com/contest/1107/submission/49013323 solution. Later I realized that it takes minimum 5040*5040*60 operations in worst case. After contest, i tried hacking my solution but it passed successfully. Also I think testcase 20 is having same worst case input (all 0's and last rows last element if say 1), but it passes there in 732 ms. In my local env it takes more than 4s for this to pass.

Also in CF#534, I tried hacking this O(n^2) solution https://codeforces.com/contest/1104/submission/48743647, which was doing more than 10^9 operation on worst case (ababa....aa..bababababa string). But it passed in 640 ms. From my understanding it takes 1s for approx 10^8 operations.

Has codeforces upgraded there testing servers or I am missing something here?

imposter_syndrome
5 лет назад
18

Комментарии (18)

Написать комментарий?

eisen_shinzou

5 лет назад, # |

← Rev. 2 →

I don't know about your code( didn't even opened it ) but I used bitset, which divides the operations by 32. And btw in worse case it takes 5200*5200*48( not 60 ).

→ Ответить

imposter_syndrome

5 лет назад, # ^ |

+16

5040 has 60 divisors. It takes 1524096000 number of operations (+ some other computations) :) While 5200 has 30 divisors and it takes 811200000 operations. Worst case is 5040 from my understanding.

→ Ответить

eisen_shinzou

5 лет назад, # ^ |

← Rev. 2 →

-70

OK. Anyway, you don't need to post a blog for such things!!!

→ Ответить

imposter_syndrome

5 лет назад, # ^ |

+26

Ohh. sorry didn't know exact way to ask this. I thought other people may also be seeing this and it would be good to ask if something is missing. Sorry for all the inconveniences it caused to you

→ Ответить

eisen_shinzou

5 лет назад, # ^ |

-53

I don't care, but as I know they'll downvote, maybe a comment somewhere on tutorial or announcement could be enough...

→ Ответить

eisen_shinzou

5 лет назад, # ^ |

← Rev. 2 →

-27

Seems they just downvote me... Why always me?? What have I done to you?? OK downvote, I love downvotes!!!

→ Ответить

l_Love_Thuy_Linh

5 лет назад, # |

Also I think testcase 20 is having same worst case input (all 0's and last rows last element if say 1), but it passes there in 732 ms. You are using fast input, without it : 49042714 (732ms -> 1528ms)
In my local env it takes more than 4s for this to pass. It depend on the power of your computer
From my understanding it takes 1s for approx 10^8 operations. The time limit is 2.5s so i think 10^9 is fine after all ?

→ Ответить

ankeshgupta007

5 лет назад, # ^ |

+16

Just to add to his answer that 10⁸ is a heuristic bound and must in no way understood as a hard cut. Sometimes, if operation per loop iterations are heavy, 10⁷ might be tricky to squeeze in, whereas 10⁹ may pass if per iteration work is simple.

Also, cache plays a big role where memory accesses are linear, one after the other and can reduce time by factors. I believe in the question above, array accesses were quite linear and operations were just simple comparison.

→ Ответить

imposter_syndrome

5 лет назад, # ^ |

Yes that is correct that it's only heuristic bound but thing is that most of the time when i see these solutions passing, there is always a better solution which fits into 1s approx = 10^8 operations time limit. I have seen a comment from Tanya_Romanova and many other people about this 10^8 operations thing but not able to find it now. One more ref: https://www.quora.com/How-many-instructions-can-be-executed-in-Codeforces-and-TopCoder-judge-PCs-in-one-second. Also there is one blog on TC https://www.topcoder.com/community/competitive-programming/tutorials/computational-complexity-section-1/.

Also concern is that should these solutions really pass for respective problems? It makes tough to understand what exactly is expected from solution when these solution passes.

→ Ответить

LLI_E_P_JI_O_K

5 лет назад, # |

← Rev. 2 →

+19

I'm also interested in such question.

What has happened with C++ STL string?

Why solutions where string length is about 300 000 characters and which erase small peace (of 2 characters) in the center of the string while it is not empty passes in 1 second on Codeforces?

Isn't there about 3*1.5*10^10 operations? Or string is not a continuous memory block now, it is like an amazing "list" with indexation of elements?

Examle of 2 different problems and their solutions that I can't hacked and even final tests can't do it too:

1) https://codeforces.com/contest/990/submission/39115847 (passed in 1 second for 300 000 string!)

2) https://codeforces.com/contest/1104/submission/48742260 (passed in 0.078 second for 100 000 string!!!)

Are you kidding me? :))

What has happened?

→ Ответить

dmkz

5 лет назад, # ^ |

← Rev. 9 →

+25

You forgot to divide number of operations by sizeof(YMM-register) / sizeof(char) const for a string. So, erasing from std::string takes n / sizeof(YMM-register) * sizeof(char) = n / 256 * 8 = n / 32 operations. Estimate is n*(n+1)/2 / 32 = 300000^2 / 64 = 1.4 * 10^9 operations. Most costly operation in this case is loading / storing data to / from registers. It is 1s = 10^9 in aligned by cache line size case and 2s = 10^9 in unaligned case on codeforces with sequential access and cache prefetching.

→ Ответить

LLI_E_P_JI_O_K

5 лет назад, # ^ |

So, memmove() can process 256 bits in one processor clock cycle?

→ Ответить

dmkz

5 лет назад, # ^ |

+26

Yes, it is.

As additional, you can see AVX, AVX2 and FMA in my solution for gym problem G. Underpalindromity. With n = 200000 and k = 100000 it is n * k / 2 / sizeof(YMM-register) * sizeof(int) = 200000 * 100000 / 8 = 2.5 * 10^9 operations, I think.

If submission not available for you, this is source code

→ Ответить

LLI_E_P_JI_O_K

5 лет назад, # ^ |

OMG, problem setters, let our restrictions for string/array length be 1e6 or 1e7 to avoid passing solutions with N^2 and architecture level optimizations.

→ Ответить

dmkz

5 лет назад, # ^ |

← Rev. 2 →

For example, you can't increase n and k in problem G. Underpalindromity, because there are solutions with correct algorithm in O(n log(n)) with working time 0.7s-1s.

Learning algorithms and data structures — one way, learning architecture level's optimizations — another way. When you are using array instead list and iterate over it, you already using architecture level optimizations. Both knowledge can be combined, for example, you can use sqrt-decomposition with such optimizations and sqrt-list and it will be faster, than segment tree and cartesian tree.

→ Ответить

MrDindows

5 лет назад, # ^ |

A bit simpler and faster solution: 49057315

→ Ответить

dmkz

5 лет назад, # ^ |

Fantastic, speed up in 2 times and gcc can vectorize even with addition to int64_t!

If you are interested, there are another problem 101341I - Matrix God "Matrix God" with 1000 x 1000 matrix multiplication by modulo 10^9+7 and fantastic solution 45798437 by Constantine Drozdov in 374 ms

→ Ответить

dalex

5 лет назад, # ^ |

+30

MikeMirzayanov, consider dmkz to be the tester of CF rounds!

→ Ответить

Соревнования по программированию 2.0

Время на сервере: 04.05.2024 03:13:17 (h1).

Десктопная версия, переключиться на мобильную.

При поддержке