Why are there so many n~10^5 problems here?

#	User	Rating
1	ecnerwala	3649
2	Benq	3581
3	jiangly	3578
4	orzdevinwang	3570
5	Geothermal	3569
5	cnnfls_csy	3569
7	tourist	3565
8	maroonrk	3531
9	Radewoosh	3521
10	Um_nik	3482

#	User	Contrib.
1	maomao90	174
2	awoo	164
3	adamant	161
4	TheScrasse	159
5	nor	158
6	maroonrk	156
7	-is-this-fft-	152
8	SecondThread	146
8	orz	146
10	pajenegod	145

I feel like a majority of the Div 1 problems on Codeforces have $$$n \leq 10^5$$$ or $$$n \leq 10^6$$$ here, which basically means that the solutions have to be between $$$O(n)$$$ and $$$O(n \log n)$$$ (maybe in rare cases $$$O(n \sqrt n)$$$ or $$$O(n \log^2 n)$$$). However, this doesn't seem to be the case in other high level competitions; for example, this rarely seems to be the case in Topcoder Div 1 SRM, Google Code Jam, and ICPC.

With such a high constraint on $$$n$$$, the solution space is pretty constrained, and only certain algorithms will be viable. This might explain why certain data structures like Segment Trees/Fenwick Trees are overrepresented on Codeforces compared to other platforms. Other interesting algorithms like Max Flow, Knuth DP, Blossom Algorithm, etc pretty much never appear on Codeforces, and when there is a difficult problem (eg Div 1 C/D) with constraints that are not $$$10^5$$$/$$$10^6$$$, you can pretty much expect that the solution will be one of these algorithms that rarely appears, as opposed to other platforms where you would have to be on your toes for all problems.

I have never formally set problems before, so perhaps there is something I am missing (eg perhaps $$$n \leq 10^5$$$ are easier to test and block incorrect solutions), but is there any particular reason that there are so many problems with this constraint? I feel like contests would be filled with a more rich variety of algorithms if this was not the case.

EDIT: I suppose I wrote this in a way that seems to discount the effort of many problemsetters. I recognize that problemsetting is hard work and am pretty grateful that CF problemsetters are active enough to allow contests to be created multiple times a week; that's pretty much unheard of on any other site. I didn't intend to say that these types of problems are necessarily bad, I was just curious if there was any reason that I hadn't thought of for why the problems tend to be of this flavor.

Comments (19)

Show archived | Write comment?

Errichto

4 years ago, # |

+106

1) Topcoder system can only handle input of size up to 20,000 characters so that's a reason why you won't meet many DS problems there.
2) Arguably, it's easier to come up with problems with big $$$n$$$ and those are more standard. Big competitions like Code Jam have more resources to come up with plenty of problems and choose just a few of them that they find most interesting and unusual.

→ Reply

Kognition

4 years ago, # ^ |

+28

Interesting, I didn't know about that restriction on the Topcoder system, thanks. Your reasoning for 2) is the hypothesis I was currently working with, but I wasn't sure how true it was.

Um_nik

+95

Do you think that problemsetters work in this order:
1. What will be the limitation on $$$n$$$ in my problem?
2. With that settled, let's come up with the problem. Also, what was that mysterious $$$n$$$ thing?

Problemsetter come up with some problem, decide what solutions they want to accept and what solutions they want to reject, deciding on that they choose limitations.

I don't really understand your question.

+48

Ok, so under your framework for how problemsetters create problems, I think the appropriate and somewhat straightforward formulation of the question is: "Why is it that the problems on Codeforces happen to mostly have model solutions with $$$O(n)$$$ or $$$O(n \log n)$$$ complexity while all "naive" solutions happen to be $$$O(n^2)$$$, when this does not seem to be the case for problems in other platforms/contests?"

Is that question more clear?

Swistakk

+39

Well, it doesn't have to be exactly such process, but some people may definitely think like "hm, what kind of interesting queries on interval I could perform?" or "I am curious whether I am able to push suffix automaton to its limits" and then you are bound to have not very interesting problem with $$$n \le 10^5$$$

pllk

+67

In Google Code Jam, one reason may be that the contest used to have input files that the participants downloaded — this restricted input sizes.

Are you sure about ICPC? I checked the problems from ICPC World Finals 2019 and 7/11 of them had large constraints.

Still, I think you have a good point: it would be great to have more problems where the interesting part is how to solve the problem in polynomial time, instead of how to improve the time complexity from O(n^2) to O(n log n).

← Rev. 2 →

+45

I'll admit that to verify my initial statement I only checked the most recent Codeforces, GCJ and Topcoder contests, not ICPC. I could be wrong and the same could be true of ICPC, I'd still need to more sampling of CF and old WF sets to see if there's a difference. I think the point you make about old GCJ format is pretty relevant; that makes sense. Another factor I guess is that GCJ allows Python so it can't push $$$n$$$ up exorbitantly high.

But yeah, you seemed to get the gist of my post, which was asking why a lot of the problems here focus specifically on $$$O(n^2)$$$ -> $$$O(n)$$$/$$$O(n \log n)$$$ optimizations. I remember a Topcoder problem a while back where the interesting optimization was $$$O(2^n)$$$->$$$O(n^2)$$$, and you could even go down to $$$O(n)$$$ using linear bridge finding algorithm but that was obvious and not really the point of the problem.

I also don't like habit of pushing up the constraints unnecessarily just to get to the ~$$$10^5$$$ number, since it prematurely can give away the complexity for the answer. For example I think today's problem E could have just had $$$n \leq 50$$$, and it would still be fundamentally the same problem, but with $$$n \leq 500$$$ it's obvious that you only have to think about straightforward linear construction algorithms (ie linear in the total number of elements). Again I could be wrong here and missing some detail.

-16

Another factor I guess is that GCJ allows Python so it can't push n up exorbitantly high.

The same is true for problems with quadratic intended solution, so I don't see your point.

Enchom

+90

I find that Um_nik shares my opinion — your logic seems to imply that authors do this purposefully. They most definitely don't.

However, I can't deny that thinking about it the correlation you've observed is somewhat true. My opinion is that this is due to the format of the competition. We have short time for a decent amount of problems. What seems to happen is that "difficult" problems are often ones where some clever data structure or a twist on a known approach yields some speedup (often from $$$O(N^2)$$$ down to linear or quasilinear as you pointed out).

Problems with smaller constraints are often (not always) more about coming up with a decent (e.g. polynomial) solution to what's seemingly an intractable problem, rather than optimising something with structures and tweaks. These problems can be interesting but I personally think they're more fit for longer competitions (IOI style). If the application of the appropriate algorithm is not concealed in a neat and smart way, it's just a bad problem.

Consider your examples. It's not easy to come up with an original max flow problem and its implementation is not particularly interesting (most people will have a ready copy of it anyway). Knuth optimization of DP is cool, but again, pretty much all problems using it that I've seen share very similar features. The Blossom Algorithm, in my opinion, should never be given on a competition.

I suppose my theory boils down to the fact that taking something and optimizing it in a smart way is a much straightforward way of creating a problem than coming up with a completely new problem that has a neat solution. The former approach will yield problems such as those you've described, the latter is harder to do and I believe is more fit for longer competitions.

Note that all of this is very general talk, of course there are plenty of problems that don't follow this logic.

+13

"I suppose my theory boils down to the fact that taking something and optimizing it in a smart way is a much straightforward way of creating a problem than coming up with a completely new problem that has a neat solution. The former approach will yield problems such as those you've described, the latter is harder to do and I believe is more fit for longer competitions."

Cool, I think this makes sense. Out of curiosity, is there a reason why you think Blossom should never appear in a contest? I remember doing a Google foo.bar problem that involved that algorithm a while back, and it seemed like a reasonable enough problem.

+26

I find Blossom algorithm's implementation to be nasty and the only time I've used it was a copy-paste from a public implementation (it was allowed). For onsite competitions it just sounds mean.

There are many arguments for which algorithms should be allowed on competitions and which shouldn't. Everybody has a different threshold, but for everyone there exist some algorithms which are too nasty and shouldn't be given on competitions. I'm sure for a lot of people Blossom is not one of them, but for me it is.

ko_osaga

+31

With such a high constraint on n, the solution space is pretty constrained, and only certain algorithms will be viable

It's not, why do you think so?

I don't think problems are "standard" or whatsoever just because $$$N$$$ is large, it's an inherently silly demarcation, and there is no backing evidence (my personal experience disagrees with that). Bad problems are just bad for their own reason.

+10

Sorry, I don't mean to say that such problems would be "standard". But is it not necessarily true that it reduces the amount of algorithms that may end up being involved (ie any algorithm with over O(n) complexity)? Perhaps this is exclusion is irrelevant, since competition problems are less about the algorithms themselves and more about the observations/reductions leading up to using the algorithms.

I also don't mean to say that these problems are necessarily bad, of course "bad problem" has an entirely separate meaning.

Thanks for the clarification.

But is it not necessarily true that it reduces the amount of algorithms that may end up being involved

This is true. But I claim this doesn't necessary means "such problem is easy to think of a solution" (aka boring):

Obviously, such problems are harder because only excellent solutions survive. This is the most recent commented case where there are a bunch of easy $$$O(NQ)$$$ solution but only one of them have a nice property that is optimizable. So they pose a different kind of challenge compared to exponential-polytime game, and personally I enjoy these kind of problems more.
As you mentioned, if you are solving a non-standard problem, then even with the right toolkit it shouldn't be obvious.
Starting to think from any polynomial-time solution is a valid strategy regardless of the size of $$$n$$$. Reducing the problem into multi-state DP or flows could be helpful, since these are much familiar concepts and so is easy to manipulate.

On the other hand, if it's trivial or only technical to optimize the algorithm from any poly-time to linear ones, I don't see a reason to give it that way either. But it's up to the setter's decision. Sometimes, there is an occasion where the setter increases the $$$N$$$ to make problem easier.

Xellos

An easy way to get an $$$O(N^2)$$$ naive solution: for each of $$$N$$$ obvious parameter choices, run an easy $$$O(N)$$$ algorithm. Alternatively, answering queries that can be trivially answered in $$$O(N)$$$ — most tree problems are like this. These are common and often boring problems.

+110

You may be right that there are a lot (maybe even too many) problems of kind "given array of length $$$10^5$$$ and $$$10^5$$$ queries of $$$10^5$$$ different types, do some shit". You know, ko_osaga-type. But it is kinda its own genre of problems which has rights to exist.

I don't agree on the part "there are a lot of algorithms with polynomial but non-linear complexity". You have mentioned most of them, really. There are flows/matchings, Floyd/Ford-Bellman, some linear algebra stuff. I guess you can count DP optimizations, but I think DP complexities are out of discussion here, because they depend on what is the DP state, which changes from problem to problem.

On the other hand, there are A LOT of linear or quasi-linear algorithms.
- Most basic graph stuff: DFS (+bridges+articulation_points+topological_sort+SCC+euler_tour), BFS, Dijkstra, Kruskal
- A lot of things on trees can be done in $$$O(log^p)$$$ after precalc in quasi-linear time: binary lifting, heavy-light decomposition, centroid decomposition, small-to-large
- Most string algorithms: Z-function, Prefix-function, hashing, Aho-Korasik, suffix structures
- Some geometry stuff: convex hull, two closest points, halfplane intersection
- Some combinatorics, since we can precalculate factorials
- FFT
- and yes, data structures. a lot of them

Then we have a lot of greedy and ad-hoc problems which requires only sorting, binary searches, maybe some two-pointers or scanlines. All of which is quasi-linear. Sometimes these problems will remain the same if lower $$$n$$$ to 100. But is it necessary?

On the point of exponential/polynomial vs $$$O(n^2)$$$/$$$O(n \log n)$$$: well, yes, it is rewarding to come up with polynomial solution to a problem that looks NP-hard. But the problem is that if you take a random problem that looks NP-hard, it is with high probability NP-hard. Sometimes you can make interesting transitions from $$$O(n^n)$$$ to $$$O(n!)$$$ to $$$O(3^n)$$$ to $$$O(n2^n)$$$ and it is interesting to make a problem out of it. But most of the time you just cannot do anything. While speeding-up existing polynomial solutions can be done with various standard technics.
What I want to say is that we are trying hard to come up with problems that look impossible to solve in polynomial time. But most of the time it remains unsolved.

Summary:
1. We have a lot more tools to optimize polynomial solutions to quasi-linear times than to solve impossible problems in polynomial time;
2. Some topics (not only data structures) almost purely consist of quasi-linear algorithms;
3. Not all of the problem with $$$n \le 10^{5}$$$ are just optimizing $$$O(n^2)$$$;
4. Yes, in some sense it is easier to come up with problems with $$$n \le 10^5$$$. But it is not that bad. Not in all cases, at least.

I think we should analyze this case by case. In the next comment I'll try to overview some recent CF problems.

+70

Round 631

Global Round 7

Ozon Tech Challenge

Now I understand that I just wasted hour of my life and that was completely useless. I enjoy analyzing things and calculating statistics though.

+142

300iq, let's stream solving some spooky data structure problem together. Um_nik is bullying me T.T

+14

Sorry :|

Kognition's blog