#	User	Rating
1	tourist	3690
2	jiangly	3647
3	Benq	3581
4	orzdevinwang	3570
5	Geothermal	3569
5	cnnfls_csy	3569
7	Radewoosh	3509
8	ecnerwala	3486
9	jqdai0815	3474
10	gyh20	3447

#	User	Contrib.
1	maomao90	174
2	awoo	164
3	adamant	162
4	TheScrasse	160
5	nor	158
6	maroonrk	156
7	-is-this-fft-	152
8	orz	146
9	pajenegod	145
9	SecondThread	145

errorgorn's blog

Confession: I sent problem H to goodbye

By errorgorn, 4 months ago, In English

Going to be 100% honest here, I sent problem H to goodbye. I feel really bad to authors of Goodbye 2023 for ruining their round, it seems the round would have been better by just removing H. As much as the community is out here roasting 74TrAkToR, I also had a part to play in this contest. I wish to be as transparent as I can here without saying any classified information (apologies to KAN and 74TrAkToR if anything here shouldn't be disclosed).

To give some context, here is what went down in goodbye round from my perspective:

13 days before contest: KAN asks me if I can source a 1E for goodbye. I asked traktor to send me the testing gym so that I could gauge how hard the 1E should be. He told me he will send me after he translates the statements. (UPD: Note that the IMO 1986 problem was already removed before this stage which is why a new 1E was needed, I never saw this IMO problem)
11 days before contest: traktor send it to me the testing gym after translating the statements to english
10 days before contest: I look at the problems and tell traktor that old G already appeared before (UPD: in APIO 2021, which I participated, so I did not google it)
9 days before contest: I confirmed with traktor that he still needed a problem to append to the contest as I thought old G was appended recently. I knew codeton 5 had 2 extra problems that were 1F difficulty so I asked if goodbye could take one of them. I assumed authors already ensured that problem was not googleable cause they only found the problem with $$$O(n^2)$$$ on some chinese OJ. I also implemented A-F and gave comments on them.
7 days before contest: H prepared, traktor sent me a buffed version of old G that I couldn't solve
6 days before contest: Endagorion tested and comments "H — from a theoretical point of view, this is a super standard exercise From a practical point of view, the only unclear place is what to do when p has a small order modulo 998" . In a discord group chat with Endagorion and authors of H, we resolved the p small order issue
4 days before contest: I asked for clarification for buffed version of old G cause it actually seemed impossible, ghosted by traktor
2 days before contest: Current G was added to the gym and traktor asked me to try and I misread the statement
1 day before contest: I told traktor my current ideas for the misread version of G (I couldn't implement it cause I was overseas)
day of contest: I realized I misread G a few hours before contest so basically my feedback on G was kind of bullshit

Well, I'm sorry to break the illusion it turns up I was the bad coordinator here. But here are my thoughts on some things organisational wise that all coordinators should do to prevent such things from happening again.

Make a discord server for rounds

Simply, you are just removing the coordinator as the middle man of being the message forwarder. This creates many problems. A trivial one is that if I'm not awake (or just lazy) to forward messages, stuff will be delayed to be sent. But the more serious one is that you won't have transparency between authors and testers. And it may be just a personal feeling, but I feel more compelled as a tester to help out when there is an active server of people rather than just my dms to a coordinator (maybe Endagorion can give his opinion of testing pinely 3 vs testing goodbye).

However, some cons of this is that not all countries allow their citizens to use discord (such as China). If anyone has suggestions for an alternative social media platform, I would be happy to hear it so that I don't have to use wechat.

Spoiler

As an example:

This was my feedback for G, I somehow misread the problem wrongly in 2 places

I thought the weights are on vertices instead of on the edges
I thought we wanted to find the sum of values instead of the max value

I'm not sure how this wasn't caught by 74TrAkToR or the authors which I would assume he sent these messages too since defining the function $$$f$$$ is complete bullshit for getting max value. I suppose that if this discussion happened in a testing server with all testers, someone would quickly point out that I'm being stupid and I would actually think of solving the true problem. Funny story, only when I was talking with htetgm about the problems (which is technically breaking the testing rules XD) then I realized that I misread G XDDD.

Ensure that round is ready before even scheduling the contest

I think it becomes apparent that many of the issues of this round was made due to changing many problems in the span of a week before the contest is held. This can be easily negated by testing the round sufficiently early. Personally, I do not schedule contests until I am sure that I do not expect any more changes to the problem set, so in the time period between scheduling and the contest being held, the only thing that authors need to care about is making sure preparation is good, not making even more problems. As such, if you message me or authors about wanting to test the round after the announcement blog is posted, most likely you will be ignored as there is really no point in testing only a few days before the contest. Because realistically we can't make any changes based on your feedback.

As much as I would like to think that I'm a "good" coordinator, stuff like me putting problems that already exists in a contest happens more often than you think. The only reason you (almost) don't see them in contests I coordiante is cause of the testing phase where testers (mostly from China lol) tell me that the problem is already well-known. Of course, this problem being so easily googleable was still a massive oversight and I still take responsibility for this. Yes, I was aware of Endagorion finding H to be standard and I agree with him that it is standard (but I wasn't aware that it was googleable). Under normal circumstances where we had more time, I definitely would have changed it to another problem.

However, it seems that these days there are simply too little div 1 rounds to go around for the number of needed rounds (recently it is pinely 3, goodbye, hello and I predict that there might not be div 1 rounds for a while after these contests...). As such, I had to schedule pinely 3 before I was confident in the problem set and there we had many problem changes in the few weeks to the contest (https://codeforces.com/blog/entry/123449 did not help). But thankfully TheScrasse had a stock of already-prepared problems to switch around and is really efficient at preparing problems (orz).

WE NEED MORE COORDINATORS

The main issue here that there are simply there are too little div 1 coordinators. I would like to appeal to MikeMirzayanov to officially make the process for non-Russian speaking coordinators to be more open. To be completely transparent, I am only a coordinator now because I message MikeMirzayanov that I would like to be a coordinator (of course, I hope people don't start spamming mike now about wanting to be a coordinator). I recall that I had conviction to become a coordinator after testing that div 2 round with barely legible statements and I thought "damn bruh, I can probably do better". Well, for the people who cares about CP and wants to see more great rounds on codeforces, hopefully MikeMirzayanov will make this "application to become a coordinator" more official. Personally, I am not sure how long I am able to keep coordinating at this pace as I am kind of getting burnt up (sorry to all authors who I've ghosted 💀) and being in the military full-time doesn't really help. That being said, even if people are willing to coordinator, how many people have the luxury of having too much time to coordinate? I only have this luxury cause during my highschool days I pretty much didn't care about school and now in military most of my day is spent doing nothing.

Full text and comments »

+815

errorgorn
4 months ago
41

Invitation to TLX Regular Open Contest #30

By errorgorn, 16 months ago, In English

Selamat pagi Codeforces!

We are excited to invite you to TLX Regular Open Contest #30!

Key details:

Contest URL: TLX
Time: 7th January 2023, 14:05 UTC
Writer: minum susu (errorgorn, Pyqe, rama_pang, yz_)
Duration: 150 minutes
Problems: 8
Scoring distribution: 100 — 250 — 250 — 300 — 550 — 550 — 800 — 800

Many thanks to:

Pyqe, steven.novaryo for coordinating the contest and translating the statements.
dorijanlendvaj, gansixeneh, nandonathaniel, and AMnu for testing the contest.
fushar🔥 for the amazing TLX platform.

Please register for the contest or else rama_pang will drink all your milk 😋🥛.

UPD: the contest is over!

Congratulations to the top 5:

Benq (perfect score)
maroonrk
hos.lyric
hitonanode
nuip

Congratulations to our first solvers:

Also, congratulations to Phantom_Performer for getting the 400000-th submission!

You can upsolve the problem here. Editorial is available in the upsolve link!

Thank you for participating and see you on the next contest!

Full text and comments »

tlx, troc, toki, indonesia

+200

errorgorn
16 months ago
17

What were your Favourite Problems/Blogs of 2022?

By errorgorn, 16 months ago, In English

Now that Goodbye 2022 has recently concluded (I hope you liked it). Let's share the best problems from 2022 instead of only before 3 weeks ago (I am looking at you kozliklekarsky).

Btw, I have noticed a trend in these blog posts where many problems come from the last few months of the year. So I hope that we take a good look through the contests of 2022 before posting.

Also, I think it would be a good idea to share your favourite blogs of 2022 as well in case we missed out some genius blog!

Full text and comments »

+120

errorgorn
16 months ago
37

Facebook Account Used to Compete in Meta Hacker Cup Has Been Suspended 2 Times

By errorgorn, history, 21 month(s) ago, In English

I have created a facebook account to compete in Meta Hacker Cup (I did not compete in previous editions for some reason). However, it has been suspended 2 times recently. I did not post anything in account and therefore I don't think there is any reason to suspend my account? I even uploaded a picture of my face to facebook when they wanted evidence that the account belongs to an actual person so I don't understand the reason for the second suspension. Also, this is my second facebook account because I did not appeal when my first one got suspended (again, I did not do anything on facebook and also did not see the email that my account got suspended).

Here is screenshot of my email

Is anyone else facing similar issues?

SecondThread is there any way that I can compete without a facebook account? It is really annoying to appeal to these false suspensions.

Full text and comments »

facebook, fbhc

+143

errorgorn
21 month(s) ago
8

[Challenge] Optimizing JOISC19 Minerals (I'll give up to 50USD prizes)

By errorgorn, 21 month(s) ago, In English

Note: I am not sponsored by the JOISC committee, if the committee is not ok with prizes, I will remove them.

I have managed to solve this problem with $$$n=45000$$$. As far as I know, the person with the highest $$$n$$$ other than me is jqdai0815 who has achieved $$$n=44000$$$ in this comment.

I think this problem is quite interesting to constant optimize, so I am offering prizes for people who can improve the solution. The prizes work as follow:

For every increase of $$$\min(\lfloor \frac{n}{100} \rfloor,460)$$$ by $$$1$$$, I will give the person 5USD (sorry, I'm still a broke student).
You must link a submission of your code on oj.uz that is AC.
You must state the value of $$$n$$$, rounded down to nearest $$$100$$$, that your code works in under $$$1$$$ million queries.
As a bonus, you can explain what your optimizations are (well, I would like to know how you were able to optimize this problem).
If this limits of $$$n$$$ turns out to be too hard, I might make the conditions more lenient.

I'll start first.

This is my solution: https://oj.uz/submission/619999. It can barely solve $$$n=45000$$$. The maximum number of queries used in $$$10$$$ random tests is $$$999920$$$.

These are my optimizations

Here is the test case generator, if you want

#include <bits/stdc++.h>
using namespace std;
mt19937 rng(chrono::system_clock::now().time_since_epoch().count());

int n=50000;
int arr[100005];

signed main(){
	for (int x=0;x<2*n;x++) arr[x]=x+1;
	shuffle(arr,arr+2*n,rng);
	
	cout<<n<<endl;
	for (int x=0;x<n;x++) cout<<arr[2*x]<<" "<<arr[2*x+1]<<endl;
}

Good luck!

UPD: The maximum value of $$$n$$$ has been capped to $$$46000$$$, I think $$$50000$$$ was probably too unrealistic of a goal.

Full text and comments »

+219

errorgorn
21 month(s) ago
1

[Tutorial] Theoretically Faster HLD and Centroid Decomposition

By errorgorn, 22 months ago, In English

This blog was going to be translation of $$$[1]$$$ (which is in Chinese) but I think I ended up deviating too much from the source material that it would not be appropriate to call this a translation. Anyways, I am honestly amazed by how ahead of its time this paper is. In its abstract, it starts by saying that the state of art for tree path queries is $$$O(n \log n + q \sqrt n \log n)$$$, which I honestly can't fathom how one would arrive, then proceeds to claim that it has found an $$$O((n+q) \log n)$$$ algorithm. All the more, in 2007.

While I was in the process of writing this blog, smax sniped me and posted $$$[9]$$$. However, we will focus on static trees only.

Thanks to oolimry, Everule and timreizin for proofreading.

Note to reader: I am expecting the reader to be familiar with what a splay tree and HLD is. You do not really need to know why splay trees are $$$O(\log n)$$$ amortised but you should know what the zig, zig-zig and zig-zag operations are. Also, when I say some algorithm has some complexity, I usually mean that it has that complexity amortised. You can probably tell from context which ones are amortised complexity. Also, there is basically $$$0$$$ reason to use the techniques described in this blog in practice (except maybe in interactive problems?), the constant factor is too high. You can scroll to the bottom to see the benchmarks. Then again, my implementation of normal HLD uses a really fast segment tree and my implementation $$$O((n+q) \log n)$$$ algorithm might have a really big constant factor.

Splay Trees

First, we want to talk about splay trees $$$[2]$$$, specifically why splay trees have amortised $$$O(\log n)$$$ complexity for splaying some vertex $$$v$$$.

Define:

$$$s(v)$$$ as the size of the subtree of $$$v$$$
$$$r(v)=\log s(v)$$$, the rank function

There are $$$3$$$ types of operations we can perform on a vertex when we are splaying it to the root — zig, zig-zig, zig-zag. Let $$$r$$$ and $$$r'$$$ be the rank functions before and after we perform the operation. The main results we want to show is that:

zig has a time complexity of $$$O(r'(x)-r(x)+1)$$$ amortised
zig-zig and zig-zag has a time complexity of $$$O(r'(x)-r(x))$$$ amortised

I will not prove these as you can find the proof in $$$[2]$$$. But the point is that through some black magic, the only operation that has a cost with a constant term is a zig operation, which we only perform once. Somehow, we manage to remove all constant terms from our zig-zig and zig-zag operations.

Let $$$r_i$$$ be the rank function after performing the $$$i$$$-th operation. Suppose that we have performed $$$k$$$ operations on to splay some vertex, then the total time complexity is $$$O((r_{k}(x)-r_{k-1}(x))+(r_{k-1}(x)-r_{k-2}(x))+\ldots+(r_{1}(x)-r_{0}(x))+1)=O(r_k(x)-r_0(x)+1)$$$. The black magic from removing all constant terms from our zig-zig and zig-zag operation is that no matter how many zig-zig and zig-zag operations we perform, they don't have any effect on the final complexity since we cancel out all the terms in a telescoping sum manner (which also explains why performing zigs only would make the complexity blow up, as we should expect). For the case of splay trees, $$$r_k(x)=\log n$$$, since $$$x$$$ becomes the root of the tree, so we obtain the complexity of $$$O(\log n)$$$.

Link-Cut Trees

Now, let's discuss why link cut trees $$$[3]$$$ have a complexity of $$$O(\log n)$$$. Here is an image of link cut trees from $$$[4]$$$.

We split the tree into disjoint chains. Internally, each chain is a splay tree storing elements in the order of depth (lowest depth to the left and highest depth to the right). This way, when we splay $$$x$$$ in its splay tree, the vertices to its left are those with lower depth and the vertices to its right are those with higher depth. $$$2$$$ disjoint chains which are connected by an edge in the original tree would be connected by a path-parent edge in the new tree. If chains $$$a$$$ and $$$b$$$ were connected, with $$$a$$$ being the parent of $$$b$$$, then there would be a path-parent edge from the root of the splay tree containing $$$b$$$ to $$$a$$$. That is for each splay tree, only its root will have a path-parent edge. In the image, the tree on the left is the initial tree which is stored internally by the structure of splay trees on the right. It is good to mention now that we can actually view these disjoint splay trees as a single giant splay tree by not caring about which edge types join the vertices internally (and not caring about the fact that vertices can only have at most 2 children).

An important operation on link-cut trees is the $$$access(x)$$$ operation, which given some $$$x$$$ that we have chosen, changes the chains in such a way that the chain containing the root of the original tree also contains the vertex $$$x$$$. This is done by repeatedly ascending into the highest vertex in each chain, and switching which child is connected to its parent. So the cost of this operation is related to the number of distinct chains on the path from $$$x$$$ to the root of the whole tree. Also, we will shorten the chain of $$$x$$$ to have the deepest vertex be $$$x$$$ for reasons that will be apparent later. In the above image, we have performed $$$access(l)$$$ to get the state being the middle tree.

In pseudo-code, the access function would look something like this:

1. access(x):
2.     splay(x)
3.     if (right(x)!=NULL):                  //disconnect the current child
4.         path_parent(right(x))=x
5.         right(x)=NULL
6.     while (path_parent(x)!=NULL):        //it is now the root
7.         par=path_parent(x)
8.         splay(par)
9.         if (right(par)!=NULL):            //change both child chain's path_parent
10.            path_parent(right(par))=par      
11.        path_parent(x)=NULL
12.        right(par)=x                      //change child chain to new one
13.        x=par

$$$O(\log^2 n)$$$ Time Complexity Bound

First, let us show that the time complexity of access is bounded by $$$O(\log^2 n)$$$ We will show this by showing that the while loop (lines 6-13) will only loop $$$O(\log n)$$$ times. We will use ideas from HLD for this. Label each edge as either light or heavy. The idea is that it is rare that light edges will be changed to be in a chain.

Let the potential be the number of heavy edges that are not in a chain. Here is how the potential changes for an operation in our access function.

If we change the edge in the chain to a heavy edge, our potential decreases by $$$1$$$.
If we change the edge in the chain to a light edge, our potential increases by at most $$$1$$$ (at most because it could be the case that the edge was in another light edge or does not exist).

The number of times we change an edge in the change to a light edge is $$$O(\log n)$$$. The potential can only increase by at most $$$O(\log n)$$$ in each access, so the number of times the while look is run is $$$O(\log n)$$$. Each time the while loop is run, the most time-consuming part is line 8, which we have show earlier to have an time complexity of $$$O(\log n)$$$, so the entire access function has a complexity of $$$O(\log^2 n)$$$.

$$$O(\log n)$$$ Time Complexity Bound

Here, we want to show that over all time lines 6-13 is called, the total time complexity is bounded by $$$O(\log n)$$$. Let's stop thinking about the splay trees as individual splay trees but imagine the entire structure is a giant splay tree instead. Recall our rank function from earlier. Since we want to regard the entire structure as a giant splay tree, $$$s(x)$$$ does not only refer to the size of the subtree when only considering the splay tree describing the chain that contains $$$x$$$, but all nodes that are in the subtree of $$$x$$$ when considering path-parent edges too.

We have established earlier that if we perform a splay operation on $$$x$$$, we have the time complexity of $$$O(r_{final}(x)-r_{initial}(x)+1)$$$. Suppose our while loop runs for $$$k$$$ times and on the $$$i$$$-th time the variable par is vertex $$$par^k$$$ (for convenience, we let $$$par^0=x$$$). Our time complexity would look something like $$$O(r_{final}(par^k)-r_{initial}(par^k)+1+\ldots+r_{final}(par^1)-r_{initial}(par^1)+1+r_{final}(par^0)-r_{initial}(par^0)+1)$$$. Notice that $$$r_{initial}(par^{x+1}) \geq r_{final}(par^x)$$$, because when we begin splaying $$$par^{x+1}$$$, the final position of $$$par^x$$$ would have been in the subtree of $$$par^{x+1}$$$, so $$$par^{x+1}$$$ obviously has a bigger subtree size than $$$par^x$$$. Therefore, we are able to cancel quite a few terms from the time complexity giving us a time complexity of $$$O(r_{final}(par^k)-r_{initial}(par^0)+k)$$$. We know that $$$r_{final}(par^k)=\log n$$$ since $$$par^k$$$ becomes the root of the tree and from the previous analysis on the $$$O(\log^2 n)$$$ time complexity bound, $$$k = \log n$$$. Therefore, $$$O(r_{final}(par^k)-r_{initial}(par^0)+k)=O(\log n)$$$.

Tree Path Queries

HLD + Segment Tree

This the solution that I think everyone and their mother knows. Perform the HLD on the tree and make a segment tree on each heavy chain. But there are a few things I want to talk about here.

Worst Case for HLD

There are $$$2$$$ styles of implementing this "segment tree on each heavy chain" part. One way, which is more popular because it is easy to code, is to have a single segment tree that stores all heavy chains. Another way is to construct a new segment tree for each heavy chain.

Now, let's think about how we can actually get HLD to its worst case of $$$O(n+q \log^2 n)$$$ for both implementations. For the case of balanced binary trees, can easily figure that the balanced binary tree easily forces the implementation of having a single segment tree to go into its worse case of $$$O(n+q\log^2 n)$$$, but for the implementation where we have a new segment tree for each heavy path, it only forces it to have $$$O(n+q \log n)$$$ complexity. To force the worst case $$$O(n+q \log^2 n)$$$ complexity, we will have to do modification to the binary tree. The problem with the balanced binary tree is that we do not make our segment tree use $$$O(\log n)$$$ time since the chains are so short. Ok, so let's make the chains long instead. Specifically, create a balanced binary tree of $$$\sqrt n$$$ vertices, then replace each vertex with a chain of size $$$\sqrt n$$$.

It seems that our complexity would be something like $$$O(n+q \log(\sqrt n) \log(\sqrt n))$$$, which has quite low constant compared to the implementation using a single segment tree on a normal balanced binary tree since $$$\log^2(\sqrt n) = \frac{1}{4} \log^2(n)$$$. Is there a tree that can make the constant higher?

Really Fast Segment Trees

When looking for fast segment tree implementations, most people would probably direct you to the AtCoder Library $$$[5]$$$ where they have implemented a fast lazy segment tree template with which maintains an array with elements in the monoid $$$S$$$ and is able to apply operation $$$F$$$ acting on $$$S \to S$$$ on an interval in the array. Although their code is very fast, we can speed it up by assuming that the functions are commutative. That is for $$$f,g \in F, x \in S, f(g(x))=g(f(x))$$$, which is usually true for the uses of segment trees in competitive programming.

There is actually a way to handle lazy tags in a way that does not change implementation of iterative segment tree too much. The idea is pretty similar to what is mentioned in $$$[6]$$$. We do not propagate lazy tags. Instead, the true value of a node in the segment tree is $$$val_u + \sum\limits_{v \text{ is ancestor of } u} lazy_v$$$ and we apply the lazy tags in a bottom-up fashion when doing queries.

Below is code for segment tree that handles range increment updates and range max queries.

Code

struct node{
	int BUF=1000005;
	int val[2000010],lazy[2000010];
	
	void get(int i){ val[i]=max(val[i<<1],val[i<<1|1])+lazy[i]; }
	
	void update(int i,int j,int k){ //[i,j]
		i+=BUF-1,j+=BUF+1; //(i,j)
		while (i|j){
			if (j-i>1){
				if (~i&1) val[i^1]+=k,lazy[i^1]+=k;
				if (j&1) val[j^1]+=k,lazy[j^1]+=k;
			}
			i>>=1,j>>=1;
			get(i),get(j);
		}
	}
	
	int query(int i,int j){ //[i,j]
		i+=BUF-1,j+=BUF+1; //(i,j)
		int resl=-1e18,resr=-1e18;
		while (i|j){
			if (j-i>1){
				if (~i&1) resl=max(resl,val[i^1]);
				if (j&1) resr=max(resr,val[j^1]);
			}
			i>>=1,j>>=1;
			resl+=lazy[i],resr+=lazy[j];
		}
		return max(resl,resr);
	}
};

Link-Cut Tree

Let's just maintain a link-cut tree with some modifications to the underlying splay tree. Specifically, we additionally store a value which is the composition of values in the subtree of the splay tree (we only consider those vertices in the same chain). To perform a query on $$$u$$$ to $$$v$$$ (WLOG $$$u$$$ has lower depth than $$$v$$$), perform $$$access(u)$$$ then $$$access(v)$$$. Let $$$w=lca(u,v)$$$. It is possible that $$$w=u$$$. The below image shows the state of the tree after performing $$$access(u)$$$ and $$$access(v)$$$ respectively.

After performing both accesses, we would have $$$w$$$ being the root of the entire splay tree since we have accessed $$$u$$$ before $$$v$$$, $$$w$$$ would have already been in the same chain as the root before we start to access $$$v$$$. At the same time, we must splay $$$w$$$ when we are accessing $$$v$$$. So $$$w$$$ would be the last thing we splay, making it the root of the entire splay tree. Now, it is actually easy to perform path queries from $$$u$$$ to $$$v$$$. If $$$u=w$$$, then we simply query $$$w$$$ and the right child of $$$w$$$ (which is the path to $$$v$$$). If $$$u \neq w$$$, then we have to additionally include the path from $$$u$$$ to $$$w$$$ (but not including $$$w$$$). Luckily, due to the way we have accessed the vertices, this path would be exactly the chain containing $$$u$$$.

HLD + Splay Tree

Let's replace the segment tree in the HLD solution with a splay tree with the same modification to the underlying splay tree as the earlier section. If we need to query the prefix of a splay tree, just splay the vertex then additionally query the left child. If we need to query the subarray of $$$[l,r]$$$ on the splay tree, splay $$$l$$$ to the root and $$$r$$$ to just below root, then we only have to additionally query the right child of $$$r$$$. Remember for a HLD query, we do some prefix queries and at most $$$1$$$ subarray query.

What would be the complexity? It seems like it would be the same $$$O(n + q \log^2 n)$$$. But no, it can be shown that it is actually $$$O(n+q\log n)$$$. Yes, it is not a typo. I still don't believe it myself even though the justification is below.

In the same way we showed that access works in $$$O(\log n)$$$ in link-cut trees, we will do the same here by imagining that there are fake edges from the root of each chain of the splay tree to its the parent of the chain so when we define the rank function and count the number of vertices in a subtree of $$$u$$$, we also take into account those vertices connected via fake edges. It is not too hard to see that the time complexity for the path query between $$$a$$$ and $$$b$$$ would be a similar telescoping sum resulting in $$$O(r_{final}(a)+r_{final}(b)-r_{initial}(a)-r_{initial}(b)+k_a+k_b)=O(\log n)$$$.

Balanced HLD

Although the previous $$$2$$$ algorithms have $$$O(n+q\log n)$$$ complexity. They are extremely unpractical as splay trees (or any dynamic tree) carry a super huge constant. Is it possible to transfer the essence of what splay tree is doing into a static data structure?

What was so special about our HLD+splay tree solution that it is able to cut one log when compared to HLD+any other data structure? It's splaying! That is if a vertex was recently accessed, it would be pushed near to the root the tree even though it may have many light edges on its path to the root. This property of caring about the entire tree structure is unique to splay trees and isn't found in any other method mentioned here as other data structures treat each of the heavy chains independently.

So, we need to create a data structure that is similar to HLD+segment tree but instead of having a structure based on dividing based on the sum of weights of unweighted nodes (which is a segment tree), maybe let's give each node a weight and split the based on those weight. Wait, isn't that centroid decomposition?

Indeed, let us first do HLD then take the heavy chain containing the root of the tree. For each vertex in this heavy chain, give each vertex a weight which is the number of vertices in its connected components after removing all edges in the heavy chain. And that's pretty much the entire idea. Divide things based on the weighted case. More specifically, given a heavy chain with total weight $$$W$$$, choose a vertex $$$x$$$ such that the total weight of the left and rights sides of the chain (excluding $$$x$$$) have weight $$$\leq \frac{W}{2}$$$. Such a vertex always exists (it's the centroid). We set $$$x$$$ as the root of the binary tree and recurse on the left and right childs. For the subtrees that are children of the heavy chain, repeat this process and we have constructed our desired tree.

Now, we need to show that queries here are indeed $$$O(\log n)$$$. First we need to think about how we actually perform queries in our HLD structure. We know from HLD+segment tree our query loop is for querying the path between $$$a$$$ and $$$b$$$ is this:

if $$$a$$$ and $$$b$$$ are in the same heavy chain, query $$$in[a]$$$ to $$$in[b]$$$
if $$$a$$$ is deeper than $$$b$$$, query $$$in[hpar[a]]$$$ to $$$in[a]$$$
if $$$b$$$ is deeper than $$$a$$$, query $$$in[hpar[b]]$$$ to $$$in[b]$$$

Of these $$$3$$$ queries, only the first query type is a sub-array query on the heavy chain, the rest of them are queries on prefixes of the heavy chain. Furthermore, the first query type is only done once. Now, what is the time complexity for a prefix query on a binary tree? It may be tempting to say it is "just $$$O(\log n)$$$ duh", but we can improve it.

For example, if we want to query the values on the prefix where the last vertex is the one labelled $$$x$$$, we simply perform a walk from vertex $$$x$$$ to the root of the tree and add the costs of those vertices and their left children where appropriate (if we have walked up from the left child, don't add stuff). Walking from vertex $$$x$$$ to the root of the tree takes $$$O(d_x-d_{root}+1)$$$ time. For the case of sub-array queries on $$$[x,y]$$$ we can see that it is walking from $$$x$$$ and $$$y$$$ respectively to the root of the tree which will take $$$O(d_x+d_y-d_{root}-d_{root}+1+1)$$$.

Let's change the definition of $$$d_a$$$ slightly to be the depth when the consider the entire structure of our tree (so we consider light edges too when calculating the depth). Let $$$x_{root}$$$ be the root of the heavy chain containing vertex $$$x$$$. The time complexity for prefix or suffix queries becomes $$$O(d_x-d_{x_{root}}+1)$$$ and for sub-array queries it becomes $$$O(d_x+d_y-d_{x_{root}}-d_{y_{root}}+2)$$$. Then we can see that the time complexity is some telescoping sum here too, since when we traverse a light edge, the depth of the would decrease. Actually we don't need the telescoping sum justification here as we can just prove it by saying the querying the simple path from $$$x$$$ to $$$y$$$ only requires us to move $$$x$$$ and $$$y$$$ upwards (and never downwards). In the end, the time complexity only depends on the depth of the endpoints of our queries. So, the only thing we need to prove is that the depth of the entire tree is bounded by $$$O(\log n)$$$.

But the proof of that is exactly the proof that centroid decomposition gives us a centroid tree of depth at most $$$O(\log n)$$$. Maybe I should elaborate more. Let's consider the new tree structure we have built. There are $$$2$$$ types of edges, heavy edges and light edges. When we traverse down a heavy edge, the size of the subtree would be at least halved due to how we have chosen to split the heavy tree so there are most $$$O(\log n)$$$ heavy edges on the path from some vertex to the root on our constructed tree. However, when we traverse down a light edge, there is no guarantee about what happens to the size of the subtree, except it can decrease by $$$1$$$, which is pretty useless. Luckily for us, we know that for every vertex, it has at most $$$O(\log n)$$$ light edges on the path to the root, because that's how HLD works. So we can determine that the depth of the tree is $$$O(\log n)+O(\log n)=O(\log n)$$$. We have shown that our complexity for querying is $$$O(\log n)$$$. Also, is not too hard show that our complexity of construction of this structure is $$$O(n \log n)$$$ since constructing the tree from a single heavy chain is literally centroid decomposition.

The depth of the tree is $$$O(\log n)$$$ but what is the constant. The number of heavy and light edges are both $$$\log n$$$ so our analysis from earlier gives us a $$$2 \log n$$$ bound on the height of the tree. Can we do better? It turns out no, this bound is sharp (up to $$$\pm 1$$$ maybe). Here is how we can construct a tree that forces the depth to be $$$2 \log n$$$ by our construction.

Let $$$T(s)$$$ be our constructed tree that has $$$s$$$ vertices. It is defined recursively by the below image.

Let $$$d(s)$$$ denote the depth of $$$T(s)$$$. As a base case, $$$d(0)=0$$$. We also have $$$d(s)=2+d(\frac{s-2}{2})$$$ since the heavy chain of $$$T(s)$$$ would be on the right side of the tree, so $$$a$$$ would be connected to its left child by a light edge in the original tree. We can have the root of the heavy chain be vertex $$$b$$$ in our constructed tree (it can be vertex $$$a$$$ but we want to assume the worst case) so that in our construction tree, $$$a$$$ would have a depth of $$$2$$$, requiring us to traverse $$$2$$$ edges to get to $$$T(\frac{s-2}{2})$$$. Therefore, it is easy to see that we can make $$$T(s)$$$ have a height of about $$$2 \log n$$$.

Path and Subtree Queries

With our normal HLD+segment tree query, we can easily handle both path and subtree queries $$$[7]$$$.

Can we do it for our new structure? Yes.

Firstly, one of the problems of subtree operations is that if the number of children is very large, it will be hard to compute the aggregate values of children. This is the reason for the difficulty of $$$[9]$$$. But we are not doing operations on a dynamic tree, we can simply augment our tree to make the number of children small for our case.

As in $$$[9]$$$, the idea is to binarize the tree, however since we do not have to care about the depth of the augmented tree, we can simply augment it into a caterpillar graph.

Subtree operations are the same on the original tree and the main tree. The only case we have to handle differently is path operations. For example, the path $$$A \to B$$$ passes through $$$X$$$ in the original tree but not in the augmented tree. However, we can solve this by checking if the lca of $$$A$$$ and $$$B$$$ is a fake vertex. If so, we separately process the actual lca in the original tree.

The original lazy tag we used when we only had path queries only applies the value to the children in the real tree. However with subtree queries, we need a new lazy tag that applies the value to all children, which includes children connected via light edges. Modifying the code to add another lazy tag is not hard, just very tedious.

More specifically, we have $$$2$$$ lazy tags, $$$lazy1$$$ and $$$lazy2$$$. $$$lazy1$$$ is applied to the children in the same heavy chain while $$$lazy2$$$ is applied to all children regardless of whether or not they are in the same heavy chain.

Then, the true value of a vertex $$$u$$$ in the balanced binary tree is $$$val_u + \sum\limits_{\substack{v \text{ is ancestor of } u \\ v \text{ and } u \text{ same heavy chain}} }lazy1_v + \sum\limits_{v \text{ is ancestor of } u }lazy2_v$$$.

Modifying these changes to the original algorithm is not difficult, just very tedious.

Benchmarks

Here are the benchmarks for the various implementations of the tree path queries so that you have a better ideas of the practical performance of the things I will describe so you will realize that the algorithm is practically pretty useless (except, maybe some interactives which are similar to $$$[8]$$$).

The problem is given a tree with $$$n=10^6$$$ vertices where all vertices initially of weight $$$0$$$, handle the following $$$q=5 \cdot 10^6$$$ operations of $$$4$$$ types:

1 u v w increase the weights of all vertices on the simple path from $$$u$$$ to $$$v$$$ by $$$w$$$
2 u v find the maximum weight of any vertex on the simple path from $$$u$$$ to $$$v$$$
3 u w increase the weights of all vertices on the subtree of $$$u$$$ by $$$w$$$
4 u find the maximum weight of any vertex on the subtree of $$$u$$$

It is bench-marked on my desktop with Ryzen 7 3700X CPU with compilation command g++ xxx.cpp -O2.

Note that the difference between balanced HLD 1 and 2 is that balanced HLD 2 is able to handle all types of queries while balanaced HLD 1 is only able to handle the first $$$2$$$ queries.

Benchmarks when there are only query types $$$1$$$ and $$$2$$$.

	HLD + segment tree (single segment tree)	HLD + segment tree (many segment tree)	HLD + segment tree (many segment tree, ACL)	link-cut tree	HLD + splay tree	Balanced HLD 1	Balanced HLD 2
Time Complexity	$$$O(n + q \log ^2 n)$$$	$$$O(n + q \log ^2 n)$$$	$$$O(n + q \log ^2 n)$$$	$$$O(n+q \log n)$$$	$$$O(n + q \log n)$$$	$$$O((n + q) \log n)$$$	$$$O((n+q) \log n)$$$
Random ($$$wn=0$$$)	$$$14.31~s$$$	$$$8.91~s$$$	$$$10.89~s$$$	$$$8.66~s$$$	$$$9.77~s$$$	$$$7.90~s$$$	$$$13.38~s$$$
Random ($$$wn=-10$$$)	$$$10.84~s$$$	$$$6.65~s$$$	$$$6.97~s$$$	$$$5.78~s$$$	$$$5.18~s$$$	$$$4.54~s$$$	$$$11.17~s$$$
Random ($$$wn=10$$$)	$$$15.14~s$$$	$$$10.73~s$$$	$$$12.62~s$$$	$$$10.69~s$$$	$$$13.25~s$$$	$$$10.04~s$$$	$$$13.74~s$$$
Binary Tree ($$$k=1$$$)	$$$21.45~s$$$	$$$13.09~s$$$	$$$17.49~s$$$	$$$12.40~s$$$	$$$13.62~s$$$	$$$10.59~s$$$	$$$13.96~s$$$
Binary tree ($$$k=5$$$)	$$$20.48~s$$$	$$$14.64~s$$$	$$$18.12~s$$$	$$$11.82~s$$$	$$$15.16~s$$$	$$$11.80~s$$$	$$$14.90~s$$$

Benchmarks when there are all $$$4$$$ query types.

	HLD + segment tree (single segment tree)	Balanced HLD 2
Time Complexity	$$$O(n + q \log ^2 n)$$$	$$$O((n+q) \log n)$$$
Random ($$$wn=0$$$)	$$$9.06~s$$$	$$$10.61~s$$$
Random ($$$wn=-10$$$)	$$$7.12~s$$$	$$$8.86~s$$$
Random ($$$wn=10$$$)	$$$9.92~s$$$	$$$11.15~s$$$
Binary Tree ($$$k=1$$$)	$$$13.47~s$$$	$$$11.80~s$$$
Binary tree ($$$k=5$$$)	$$$11.67~s$$$	$$$11.63~s$$$

I am unsure why a value of $$$k$$$ closer to $$$\sqrt n$$$ made the first $$$3$$$ codes all faster. Maybe there is something wrong with my generator or is the segment tree just too fast? Also, IO takes about $$$1~s$$$.

Here are my codes (generators + solutions). They have not been stress tested and are not guaranteed to be correct. They are only here for reference.

gen_random.cpp

#include "testlib.h"

#include <bits/stdc++.h>
using namespace std;

int main(int argc, char* argv[]){
    registerGen(argc, argv, 1);
	
	int n=atoi(argv[1]),q=atoi(argv[2]),wn=atoi(argv[3]),qt=atoi(argv[4]);
	
	vector<pair<int,int> > edges;
	for (int x=1;x<n;x++) edges.push_back({rnd.wnext(x,wn),x});
	
	shuffle(edges.begin(),edges.end());
	vector<int> perm(n);
	iota(perm.begin(),perm.end(),1);
	shuffle(perm.begin(),perm.end());
	
	cout<<n<<" "<<q<<endl;
	for (auto [a,b]:edges){
		if (rnd.next(2)) cout<<perm[a]<<" "<<perm[b]<<endl;
		else cout<<perm[b]<<" "<<perm[a]<<endl;
	}
	
	for (int x=0;x<q;x++){
		int a=rnd.next(1,qt);
		if (a==1){
			int u=rnd.next(1,n),v=rnd.next(1,n),w=rnd.next(1,1000000000);
			cout<<a<<" "<<u<<" "<<v<<" "<<w<<endl;
		}
		else if (a==2){
			int u=rnd.next(1,n),v=rnd.next(1,n);
			cout<<a<<" "<<u<<" "<<v<<endl;
		}
		else if (a==3){
			int u=rnd.next(1,n),w=rnd.next(1,1000000000);
			cout<<a<<" "<<u<<" "<<w<<endl;
		}
		else{
			int u=rnd.next(1,n);
			cout<<a<<" "<<u<<endl;
		}
	}
}

gen_binary.cpp

#include "testlib.h"

#include <bits/stdc++.h>
using namespace std;

int main(int argc, char* argv[]){
    registerGen(argc, argv, 1);
	
	int n=atoi(argv[1]),q=atoi(argv[2]),k=atoi(argv[3]),qt=atoi(argv[4]);
	
	vector<pair<int,int> > edges;
	queue<int> qu;
	qu.push(0),qu.push(0);
	int IDX=0;
	
	while (IDX<n-1){
		int u=qu.front();
		qu.pop();
		
		for (int x=0;x<k;x++){
			edges.push_back({u,++IDX});
			u=IDX;
			if (IDX==n-1) break;
		}
		qu.push(IDX),qu.push(IDX);
	}
	
	shuffle(edges.begin(),edges.end());
	vector<int> perm(n);
	iota(perm.begin(),perm.end(),1);
	shuffle(perm.begin()+1,perm.end()); //keep 1 as the root
	
	cout<<n<<" "<<q<<endl;
	for (auto [a,b]:edges){
		if (rnd.next(2)) cout<<perm[a]<<" "<<perm[b]<<endl;
		else cout<<perm[b]<<" "<<perm[a]<<endl;
	}
	
	for (int x=0;x<q;x++){
		int a=rnd.next(1,qt);
		if (a==1){
			int u=rnd.next(1,n),v=rnd.next(1,n),w=rnd.next(1,1000000000);
			cout<<a<<" "<<u<<" "<<v<<" "<<w<<endl;
		}
		else if (a==2){
			int u=rnd.next(1,n),v=rnd.next(1,n);
			cout<<a<<" "<<u<<" "<<v<<endl;
		}
		else if (a==3){
			int u=rnd.next(1,n),w=rnd.next(1,1000000000);
			cout<<a<<" "<<u<<" "<<w<<endl;
		}
		else{
			int u=rnd.next(1,n);
			cout<<a<<" "<<u<<endl;
		}
	}
}

hld_single.cpp

#include <bits/stdc++.h>
using namespace std;

#define int long long
#define ii pair<int,int>
#define fi first
#define se second
#define endl '\n'

#define pub push_back
#define pob pop_back

#define rep(x,start,end) for(int x=(start)-((start)>(end));x!=(end)-((start)>(end));((start)<(end)?x++:x--))
#define all(x) (x).begin(),(x).end()
#define sz(x) (int)(x).size()

struct node{
	int BUF=1000005;
	int val[2000010],lazy[2000010];
	
	void get(int i){ val[i]=max(val[i<<1],val[i<<1|1])+lazy[i]; }
	
	void update(int i,int j,int k){ //[i,j]
		i+=BUF-1,j+=BUF+1; //(i,j)
		while (i|j){
			if (j-i>1){
				if (~i&1) val[i^1]+=k,lazy[i^1]+=k;
				if (j&1) val[j^1]+=k,lazy[j^1]+=k;
			}
			i>>=1,j>>=1;
			get(i),get(j);
		}
	}
	
	int query(int i,int j){ //[i,j]
		i+=BUF-1,j+=BUF+1; //(i,j)
		int resl=-1e18,resr=-1e18;
		while (i|j){
			if (j-i>1){
				if (~i&1) resl=max(resl,val[i^1]);
				if (j&1) resr=max(resr,val[j^1]);
			}
			i>>=1,j>>=1;
			resl+=lazy[i],resr+=lazy[j];
		}
		return max(resl,resr);
	}
} root;

int n,q;
vector<int> al[1000005];

int ss[1000005];
int in[1000005];
int out[1000005];
int d[1000005];
int pp[1000005];
int hp[1000005];

int _TIME=0;

void dfs_ss(int i,int p){
	ss[i]=1;
	for (auto &it:al[i]){
		if (it==p) continue;
		dfs_ss(it,i);
		ss[i]+=ss[it];
		if (al[i][0]==p || ss[al[i][0]]<ss[it]) swap(al[i][0],it);
	}
}

void dfs_hld(int i,int p){
	in[i]=++_TIME;
	
	for (auto it:al[i]){
		if (it==p) continue;
		
		d[it]=d[i]+1;
		if (it==al[i][0]) hp[it]=hp[i];
		else hp[it]=it;
		pp[it]=i;
		
		dfs_hld(it,i);
	}
	
	out[i]=_TIME;
}

signed main(){
	ios::sync_with_stdio(0);
	cin.tie(0);
	cout.tie(0);
	cin.exceptions(ios::badbit | ios::failbit);
	
	cin>>n>>q;
	
	int a,b,c;
	rep(x,1,n){
		cin>>a>>b;
		al[a].pub(b);
		al[b].pub(a);
	}
	
	dfs_ss(1,-1);
	hp[1]=1;
	dfs_hld(1,-1);
	
	while (q--){
		cin>>a;
		
		if (a==1){
			cin>>a>>b>>c;
			while (hp[a]!=hp[b]){
				if (d[hp[a]]<d[hp[b]]) swap(a,b);
				root.update(in[hp[a]],in[a],c);
				a=pp[hp[a]];
			}
			if (d[a]>d[b]) swap(a,b);
			root.update(in[a],in[b],c);
		}
		else if (a==2){
			cin>>a>>b;
			int res=-1e18;
			while (hp[a]!=hp[b]){
				if (d[hp[a]]<d[hp[b]]) swap(a,b);
				res=max(res,root.query(in[hp[a]],in[a]));
				a=pp[hp[a]];
			}
			if (d[a]>d[b]) swap(a,b);
			res=max(res,root.query(in[a],in[b]));
			
			cout<<res<<endl;
		}
		else if (a==3){
			cin>>a>>c;
			root.update(in[a],out[a],c);
		}
		else{
			cin>>a;
			cout<<root.query(in[a],out[a])<<endl;
		}
	}
}

hld_many.cpp

#include <bits/stdc++.h>
using namespace std;

#define int long long
#define ii pair<int,int>
#define fi first
#define se second
#define endl '\n'

#define pub push_back
#define pob pop_back

#define rep(x,start,end) for(int x=(start)-((start)>(end));x!=(end)-((start)>(end));((start)<(end)?x++:x--))
#define all(x) (x).begin(),(x).end()
#define sz(x) (int)(x).size()

int __dat[8000005];
int *ptr=__dat;

struct node{
	int BUF;
	int *val,*lazy;
	
	node(){}
	node(int n){
		n++;
		
		BUF=n;
		val=ptr; ptr+=n<<1;
		lazy=ptr; ptr+=n<<1;
	}
	
	void get(int i){ val[i]=max(val[i<<1],val[i<<1|1])+lazy[i]; }
	
	void update(int i,int j,int k){
		i+=BUF-1,j+=BUF+1; //(i,j)
		while (i|j){
			if (j-i>1){
				if (~i&1) val[i^1]+=k,lazy[i^1]+=k;
				if (j&1) val[j^1]+=k,lazy[j^1]+=k;
			}
			i>>=1,j>>=1;
			get(i),get(j);
		}
	}
	
	int query(int i,int j){
		i+=BUF-1,j+=BUF+1; //(i,j)
		int resl=-1e18,resr=-1e18;
		while (i|j){
			if (j-i>1){
				if (~i&1) resl=max(resl,val[i^1]);
				if (j&1) resr=max(resr,val[j^1]);
			}
			i>>=1,j>>=1;
			resl+=lazy[i],resr+=lazy[j];
		}
		return max(resl,resr);
	}
};

int n,q;
vector<int> al[1000005];

int ss[1000005];
int in[1000005];
int d[1000005];
int pp[1000005];
int hp[1000005];
int num[1000005];

void dfs_ss(int i,int p){
	ss[i]=1;
	for (auto &it:al[i]){
		if (it==p) continue;
		dfs_ss(it,i);
		ss[i]+=ss[it];
		if (al[i][0]==p || ss[al[i][0]]<ss[it]) swap(al[i][0],it);
	}
}

void dfs_hld(int i,int p){
	num[hp[i]]++;
	
	for (auto it:al[i]){
		if (it==p) continue;
		
		d[it]=d[i]+1;
		if (it==al[i][0]) hp[it]=hp[i],in[it]=in[i]+1;
		else hp[it]=it;
		pp[it]=i;
		
		dfs_hld(it,i);
	}
}

node root[1000005];

signed main(){
	ios::sync_with_stdio(0);
	cin.tie(0);
	cout.tie(0);
	cin.exceptions(ios::badbit | ios::failbit);
	
	cin>>n>>q;
	
	int a,b,c;
	rep(x,1,n){
		cin>>a>>b;
		al[a].pub(b);
		al[b].pub(a);
	}
	
	dfs_ss(1,-1);
	hp[1]=1;
	dfs_hld(1,-1);
	
	rep(x,1,n+1) if (hp[x]==x) root[x]=node(num[x]);
	
	while (q--){
		cin>>a;
		
		if (a==1){
			cin>>a>>b>>c;
			while (hp[a]!=hp[b]){
				if (d[hp[a]]<d[hp[b]]) swap(a,b);
				root[hp[a]].update(0,in[a],c);
				a=pp[hp[a]];
			}
			if (d[a]>d[b]) swap(a,b);
			root[hp[a]].update(in[a],in[b],c);
		}
		else{
			cin>>a>>b;
			int res=-1e18;
			while (hp[a]!=hp[b]){
				if (d[hp[a]]<d[hp[b]]) swap(a,b);
				res=max(res,root[hp[a]].query(0,in[a]));
				a=pp[hp[a]];
			}
			if (d[a]>d[b]) swap(a,b);
			res=max(res,root[hp[a]].query(in[a],in[b]));
			
			cout<<res<<endl;
		}
	}
}

hld_many_ACL.cpp

#include <bits/stdc++.h>
#include <atcoder/lazysegtree>
using namespace std;
using namespace atcoder;

#define int long long
#define ii pair<int,int>
#define fi first
#define se second
#define endl '\n'

#define pub push_back
#define pob pop_back

#define rep(x,start,end) for(int x=(start)-((start)>(end));x!=(end)-((start)>(end));((start)<(end)?x++:x--))
#define all(x) (x).begin(),(x).end()
#define sz(x) (int)(x).size()

using S=int;
using F=int;

S op(S l,S r){ return max(l,r); }
S e(){ return 0; }
S mapping(F l,S r){ return l+r; }
F composition(F l,F r){ return l+r; }
F id(){ return 0; }
using segtree=lazy_segtree<S, op, e, F, mapping, composition, id>;

int n,q;
vector<int> al[1000005];

int ss[1000005];
int in[1000005];
int d[1000005];
int pp[1000005];
int hp[1000005];
int num[1000005];

void dfs_ss(int i,int p){
	ss[i]=1;
	for (auto &it:al[i]){
		if (it==p) continue;
		dfs_ss(it,i);
		ss[i]+=ss[it];
		if (al[i][0]==p || ss[al[i][0]]<ss[it]) swap(al[i][0],it);
	}
}

void dfs_hld(int i,int p){
	num[hp[i]]++;
	
	for (auto it:al[i]){
		if (it==p) continue;
		
		d[it]=d[i]+1;
		if (it==al[i][0]) hp[it]=hp[i],in[it]=in[i]+1;
		else hp[it]=it;
		pp[it]=i;
		
		dfs_hld(it,i);
	}
}

segtree root[1000005];

signed main(){
	ios::sync_with_stdio(0);
	cin.tie(0);
	cout.tie(0);
	cin.exceptions(ios::badbit | ios::failbit);
	
	cin>>n>>q;
	
	int a,b,c;
	rep(x,1,n){
		cin>>a>>b;
		al[a].pub(b);
		al[b].pub(a);
	}
	
	dfs_ss(1,-1);
	hp[1]=1;
	dfs_hld(1,-1);
	
	rep(x,1,n+1) if (hp[x]==x) root[x]=segtree(num[x]);
	
	while (q--){
		cin>>a;
		
		if (a==1){
			cin>>a>>b>>c;
			while (hp[a]!=hp[b]){
				if (d[hp[a]]<d[hp[b]]) swap(a,b);
				root[hp[a]].apply(0,in[a]+1,c);
				a=pp[hp[a]];
			}
			if (d[a]>d[b]) swap(a,b);
			root[hp[a]].apply(in[a],in[b]+1,c);
		}
		else{
			cin>>a>>b;
			int res=-1e18;
			while (hp[a]!=hp[b]){
				if (d[hp[a]]<d[hp[b]]) swap(a,b);
				res=max(res,root[hp[a]].prod(0,in[a]+1));
				a=pp[hp[a]];
			}
			if (d[a]>d[b]) swap(a,b);
			res=max(res,root[hp[a]].prod(in[a],in[b]+1));
			
			cout<<res<<endl;
		}
	}
}

linkcut.cpp

#include <bits/stdc++.h>
using namespace std;

#define int long long
#define ii pair<int,int>
#define fi first
#define se second
#define endl '\n'

#define pub push_back
#define pob pop_back

#define rep(x,start,end) for(int x=(start)-((start)>(end));x!=(end)-((start)>(end));((start)<(end)?x++:x--))
#define all(x) (x).begin(),(x).end()
#define sz(x) (int)(x).size()

struct node{
	node *p=nullptr; bool head=true;
	node *l=nullptr,*r=nullptr; //only store real children
	int val=0,mx=0,lazy=0;
	
	node (node *_p){
		p=_p;
	}
	
	bool dir(){
		return this==p->l;
	}
	
	void propo(){
		if (lazy==0) return;
		
		val+=lazy;
		mx+=lazy;
		if (l) l->lazy+=lazy;
		if (r) r->lazy+=lazy;
		lazy=0;
	}
	
	void upd(){
		mx=max({
			val,
			l==nullptr?0:l->mx+l->lazy,
			r==nullptr?0:r->mx+r->lazy
		});
	}
};

void rotate(node *u){
	node *par=u->p;
	bool dir=u->dir();
	
	par->propo();
	u->propo();
	
	u->p=par->p;
	if (!par->head){
		if (par->dir()) par->p->l=u;
		else par->p->r=u;
	}
	swap(u->head,par->head);
	
	if (dir){
		node *child=u->r;
		par->l=child;
		if (child) child->p=par;
		
		u->r=par;
		par->p=u;
	}
	else{
		node *child=u->l;
		par->r=child;
		if (child) child->p=par;
		u->l=par;
		par->p=u;
	}
	
	par->upd();
	u->upd();
}

void splay(node *u){
	while (!u->head){
		if (u->p->head){ //almost root
			rotate(u);
		}
		else if (u->dir()==u->p->dir()){
			rotate(u->p);
			rotate(u);
		}
		else{
			rotate(u);
			rotate(u);
		}
	}
	u->propo();
}

node* access(node *u){
	splay(u);
	if (u->r!=nullptr){
		u->r->head=true;
		u->r=nullptr;
		u->upd();
	}
	while (u->p){
		node *par=u->p;
		splay(par);
		if (par->r) par->r->head=true;
		u->head=false;
		par->r=u;
		par->upd();
		u=par;
	}
	
	return u;
}

int n,q;
vector<int> al[1000005];
int d[1000005];
node *root[1000005];

void dfs(int i,int p){
	root[i]=new node(p==-1?nullptr:root[p]);
	for (auto it:al[i]){
		if (it==p) continue;
		d[it]=d[i]+1;
		dfs(it,i);
	}
}

signed main(){
	ios::sync_with_stdio(0);
	cin.tie(0);
	cout.tie(0);
	cin.exceptions(ios::badbit | ios::failbit);
	
	cin>>n>>q;
	
	int a,b,c;
	rep(x,1,n){
		cin>>a>>b;
		al[a].pub(b);
		al[b].pub(a);
	}
	
	dfs(1,-1);
	
	while (q--){
		cin>>a;
		
		if (a==1){
			cin>>a>>b>>c;
			
			if (d[a]>d[b]) swap(a,b);
			access(root[a]);
			node *temp=access(root[b]);
			splay(root[a]);
			
			if (root[a]->p) root[a]->lazy+=c;
			temp->val+=c;
			if (temp->r) temp->r->lazy+=c;
			temp->upd();
		}
		else{
			cin>>a>>b;
			
			if (d[a]>d[b]) swap(a,b);
			access(root[a]);
			node *temp=access(root[b]);
			splay(root[a]);
			
			int ans=max({
				root[a]->p?(root[a]->mx)+(root[a]->lazy):0,
				temp->val+temp->lazy,
				temp->r?temp->r->mx+temp->r->lazy+temp->lazy:0
			});
			
			cout<<ans<<endl;
		}
	}
}

hld_splay.cpp

#include <bits/stdc++.h>
using namespace std;

#define int long long
#define ii pair<int,int>
#define fi first
#define se second
#define endl '\n'

#define pub push_back
#define pob pop_back

#define rep(x,start,end) for(int x=(start)-((start)>(end));x!=(end)-((start)>(end));((start)<(end)?x++:x--))
#define all(x) (x).begin(),(x).end()
#define sz(x) (int)(x).size()

struct node{
	node *p=nullptr;
	node *l=nullptr,*r=nullptr; //only store real children
	int val=0,mx=0,lazy=0;
	
	node (node *_p){
		p=_p;
	}
	
	bool dir(){
		return this==p->l;
	}
	
	void propo(){
		if (lazy==0) return;
		
		val+=lazy;
		mx+=lazy;
		if (l) l->lazy+=lazy;
		if (r) r->lazy+=lazy;
		lazy=0;
	}
	
	void upd(){
		mx=max({
			val,
			l==nullptr?0:l->mx+l->lazy,
			r==nullptr?0:r->mx+r->lazy
		});
	}
};

void rotate(node *u){
	node *par=u->p;
	bool dir=u->dir();
	
	par->propo();
	u->propo();
	
	u->p=par->p;
	if (par->p){
		if (par->dir()) par->p->l=u;
		else par->p->r=u;
	}
	
	if (dir){
		node *child=u->r;
		par->l=child;
		if (child) child->p=par;
		
		u->r=par;
		par->p=u;
	}
	else{
		node *child=u->l;
		par->r=child;
		if (child) child->p=par;
		u->l=par;
		par->p=u;
	}
	
	par->upd();
	u->upd();
}

void splay(node *u){
	while (u->p){
		if (u->p->p==nullptr){ //almost root
			rotate(u);
		}
		else if (u->dir()==u->p->dir()){
			rotate(u->p);
			rotate(u);
		}
		else{
			rotate(u);
			rotate(u);
		}
	}
	u->propo();
}

int n,q;
vector<int> al[1000005];

int ss[1000005];
int d[1000005];
int pp[1000005];
int hp[1000005];
int num[1000005];
node* root[1000005];

void dfs_ss(int i,int p){
	ss[i]=1;
	for (auto &it:al[i]){
		if (it==p) continue;
		dfs_ss(it,i);
		ss[i]+=ss[it];
		if (al[i][0]==p || ss[al[i][0]]<ss[it]) swap(al[i][0],it);
	}
}

void dfs_hld(int i,int p){
	num[hp[i]]++;
	
	if (hp[i]==i) root[i]=new node(nullptr);
	else root[i]=new node(root[p]);
	
	for (auto it:al[i]){
		if (it==p) continue;
		
		d[it]=d[i]+1;
		if (it==al[i][0]) hp[it]=hp[i];
		else hp[it]=it;
		pp[it]=i;
		
		dfs_hld(it,i);
	}
}

signed main(){
	ios::sync_with_stdio(0);
	cin.tie(0);
	cout.tie(0);
	cin.exceptions(ios::badbit | ios::failbit);
	
	cin>>n>>q;
	
	int a,b,c;
	rep(x,1,n){
		cin>>a>>b;
		al[a].pub(b);
		al[b].pub(a);
	}
	
	dfs_ss(1,-1);
	hp[1]=1;
	dfs_hld(1,-1);
	
	while (q--){
		cin>>a;
		
		if (a==1){
			cin>>a>>b>>c;
			while (hp[a]!=hp[b]){
				if (d[hp[a]]<d[hp[b]]) swap(a,b);
				splay(root[a]);
				root[a]->val+=c;
				if (root[a]->l) root[a]->l->lazy+=c;
				root[a]->upd();
				a=pp[hp[a]];
			}
			if (d[a]>d[b]) swap(a,b);
			if (a==b){
				splay(root[a]);
				root[a]->val+=c;
				root[a]->upd();
			}
			else{
				splay(root[b]);
				splay(root[a]);
				if (root[b]->p!=root[a]) rotate(root[b]);
				
				if (root[b]->l) root[b]->l->lazy+=c;
				root[a]->val+=c;
				root[b]->val+=c;
				
				root[b]->upd();
				root[a]->upd();
			}
		}
		else{
			cin>>a>>b;
			int res=-1e18;
			
			while (hp[a]!=hp[b]){
				if (d[hp[a]]<d[hp[b]]) swap(a,b);
				splay(root[a]);
				res=max(res,root[a]->val+root[a]->lazy);
				if (root[a]->l) res=max(res,root[a]->l->mx+root[a]->l->lazy+root[a]->lazy);
				a=pp[hp[a]];
			}
			if (d[a]>d[b]) swap(a,b);
			if (a==b){
				splay(root[a]);
				res=max(res,root[a]->val+root[a]->lazy);
			}
			else{
				splay(root[b]);
				splay(root[a]);
				if (root[b]->p!=root[a]) rotate(root[b]);
				res=max(res,root[a]->val+root[a]->lazy);
				res=max(res,root[b]->val+root[a]->lazy+root[b]->lazy);
				if (root[b]->l) res=max(res,root[b]->l->mx+root[a]->lazy+root[b]->lazy+root[b]->l->lazy);
			}
			
			cout<<res<<endl;
		}
	}
}

binary_tree.cpp

#include <bits/stdc++.h>
using namespace std;

#define int long long
#define ii pair<int,int>
#define fi first
#define se second
#define endl '\n'

#define pub push_back
#define pob pop_back

#define rep(x,start,end) for(int x=(start)-((start)>(end));x!=(end)-((start)>(end));((start)<(end)?x++:x--))
#define all(x) (x).begin(),(x).end()
#define sz(x) (int)(x).size()

int n,q;
vector<int> al[1000005];

int ss[1000005];
int pp[1000005];
int hp[1000005];

int ss2[1000005]; //size of other childs
vector<int> chain[1000005];

void dfs_ss(int i,int p){
	ss[i]=1;
	for (auto &it:al[i]){
		if (it==p) continue;
		dfs_ss(it,i);
		ss[i]+=ss[it];
		if (al[i][0]==p || ss[al[i][0]]<ss[it]) swap(al[i][0],it);
	}
}

int TIME=0;
int in[1000005];

void dfs_hld(int i,int p){
	in[i]=TIME++;
	
	ss2[i]=1;
	chain[hp[i]].pub(i);
	
	for (auto it:al[i]){
		if (it==p) continue;
		
		if (it==al[i][0]) hp[it]=hp[i];
		else hp[it]=it;
		pp[it]=i;
		
		dfs_hld(it,i);
		if (hp[it]!=hp[i]) ss2[i]+=ss[it];
	}
}

int d[1000005];
int hd[1000005];
int par[1000005];
int childs[1000005][2];

int dfs_chain(int i,int l,int r,int p,int depth,int hdepth){
	int tot=0;
	rep(x,l,r+1) tot+=ss2[chain[i][x]];
	
	int curr=0;
	rep(x,l,r+1){
		curr+=ss2[chain[i][x]];
		if (curr>=(tot+1)/2){ //centroid
			int u=chain[i][x];
			d[u]=depth;
			hd[u]=hdepth;
			par[u]=p;
			if (l!=x) childs[u][0]=dfs_chain(i,l,x-1,u,depth+1,hdepth);
			if (r!=x) childs[u][1]=dfs_chain(i,x+1,r,u,depth+1,hdepth);
			for (auto it:al[u]) if (hp[it]!=hp[u] && it!=pp[u]){
				dfs_chain(it,0,sz(chain[it])-1,u,depth+1,depth+1);
			}
			return u;
		}
	}
}

int lazy[1000005];
int val[1000005];
int mx[1000005];

void upd(int i){
	mx[i]=val[i];
	rep(z,0,2) if (childs[i][z]!=-1) mx[i]=max(mx[i],mx[childs[i][z]]);
	mx[i]+=lazy[i];
}

signed main(){
	ios::sync_with_stdio(0);
	cin.tie(0);
	cout.tie(0);
	cin.exceptions(ios::badbit | ios::failbit);
	
	cin>>n>>q;
	
	int a,b,c;
	rep(x,1,n){
		cin>>a>>b;
		al[a].pub(b);
		al[b].pub(a);
	}
	
	dfs_ss(1,-1);
	hp[1]=1;
	dfs_hld(1,-1);
	
	memset(par,-1,sizeof(par));
	memset(childs,-1,sizeof(childs));
	dfs_chain(1,0,sz(chain[1])-1,-1,0,0);
	
	while (q--){
		cin>>a;
		
		if (a==1){
			cin>>a>>b>>c;
			
			while (hp[a]!=hp[b]){
				if (hd[a]<hd[b]) swap(a,b);
				
				int oa=a;
				while (hp[a]==hp[oa]){
					if (in[a]<=in[oa]){
						val[a]+=c;
						if (childs[a][0]!=-1){
							lazy[childs[a][0]]+=c;
							mx[childs[a][0]]+=c;
						}
					}
					upd(a);
					a=par[a];
				}
			}
			
			if (in[a]>in[b]) swap(a,b);
			int oa=a,ob=b;
			while (a!=b){
				int md=max(d[a],d[b]);
				if (d[a]>=md){
					if (in[oa]<=in[a]){
						val[a]+=c;
						if (childs[a][1]!=-1){
							lazy[childs[a][1]]+=c;
							mx[childs[a][1]]+=c;
						}
					}
					upd(a);
					a=par[a];
				}
				if (d[b]>=md){
					if (in[b]<=in[ob]){
						val[b]+=c;
						if (childs[b][0]!=-1){
							lazy[childs[b][0]]+=c;
							mx[childs[b][0]]+=c;
						}
					}
					upd(b);
					b=par[b];
				}
			}
			
			val[a]+=c;
			while (a!=-1 && hp[oa]==hp[a]){
				upd(a);
				a=par[a];
			}
		}
		else{
			cin>>a>>b;
			
			int ans=0;
			while (hp[a]!=hp[b]){
				if (hd[a]<hd[b]) swap(a,b);
				
				int curr=0;
				int oa=a;
				while (hp[a]==hp[oa]){
					if (in[a]<=in[oa]){
						curr=max(curr,val[a]);
						if (childs[a][0]!=-1) curr=max(curr,mx[childs[a][0]]);
					}
					curr+=lazy[a];
					a=par[a];
				}
				ans=max(ans,curr);
			}
			
			if (in[a]>in[b]) swap(a,b);
			int oa=a,ob=b;
			int currl=0,currr=0;
			while (a!=b){
				int md=max(d[a],d[b]);
				if (d[a]>=md){
					if (in[oa]<=in[a]){
						currl=max(currl,val[a]);
						if (childs[a][1]!=-1) currl=max(currl,mx[childs[a][1]]);
					}
					currl+=lazy[a];
					a=par[a];
				}
				if (d[b]>=md){
					if (in[b]<=in[ob]){
						currr=max(currr,val[b]);
						if (childs[b][0]!=-1) currr=max(currr,mx[childs[b][0]]);
					}
					currr+=lazy[b];
					b=par[b];
				}
			}
			
			currl=max({currl,currr,val[a]});
			
			while (a!=-1 && hp[oa]==hp[a]){
				currl+=lazy[a];
				a=par[a];
			}
			
			ans=max(ans,currl);
			cout<<ans<<endl;
		}
	}
}

binary_tree2.cpp

#include <bits/stdc++.h>
using namespace std;

#define int long long
#define ii pair<int,int>
#define fi first
#define se second
#define endl '\n'

#define pub push_back
#define pob pop_back

#define rep(x,start,end) for(int x=(start)-((start)>(end));x!=(end)-((start)>(end));((start)<(end)?x++:x--))
#define all(x) (x).begin(),(x).end()
#define sz(x) (int)(x).size()

int n,q;
vector<int> al[2000005];

int ss[2000005];
int hp[2000005];

int ss2[2000005]; //size of other childs
vector<int> chain[2000005];

int FAKE_IDX;
int rp[2000005]; //real vertex for fake vertices

void dfs_ss(int i,int p){
	if (p!=-1 && i<=n){
		for (auto &it:al[i]) if (it==p || it==rp[p]) swap(it,al[i].back());
		al[i].pob();
	}
	
	int curr=i;
	while (sz(al[curr])>2){
		swap(al[curr],al[FAKE_IDX]);
		al[curr]={al[FAKE_IDX].back(),FAKE_IDX};
		al[FAKE_IDX].pob();
		rp[FAKE_IDX]=i;
		curr=FAKE_IDX;
		FAKE_IDX++;
	}
	
	ss[i]=1;
	for (auto &it:al[i]){
		dfs_ss(it,i);
		ss[i]+=ss[it];
		if (ss[al[i][0]]<ss[it]) swap(al[i][0],it);
	}
}

int TIME=0;
int in[2000005];

void dfs_hld(int i,int p){
	in[i]=TIME++;
	
	ss2[i]=1;
	chain[hp[i]].pub(i);
	
	for (auto it:al[i]){
		if (it==al[i][0]) hp[it]=hp[i];
		else hp[it]=it;
		
		dfs_hld(it,i);
		if (hp[it]!=hp[i]) ss2[i]+=ss[it];
	}
}

int d[2000005];
int hd[2000005];
int par[2000005];
int childs[2000005][3]; //(left,right,fake)

int dfs_chain(int i,int l,int r,int p,int depth,int hdepth){
	int tot=0;
	rep(x,l,r+1) tot+=ss2[chain[i][x]];
	
	int curr=0;
	rep(x,l,r+1){
		curr+=ss2[chain[i][x]];
		if (curr>=(tot+1)/2){ //centroid
			int u=chain[i][x];
			d[u]=depth;
			hd[u]=hdepth;
			par[u]=p;
			if (l!=x) childs[u][0]=dfs_chain(i,l,x-1,u,depth+1,hdepth);
			if (r!=x) childs[u][1]=dfs_chain(i,x+1,r,u,depth+1,hdepth);
			for (auto it:al[u]) if (hp[it]!=hp[u]){
				childs[u][2]=dfs_chain(it,0,sz(chain[it])-1,u,depth+1,depth+1);
			}
			return u;
		}
	}
}

int lazy1[2000005]; //only propogate on real tree
int lazy2[2000005]; //propogate on both real and fake tree
int val[2000005];
ii mx[2000005]; //(real,fake)

void upd(int i){
	mx[i]={val[i],0};
	rep(z,0,3){
		if (childs[i][z]!=-1){
			if (z!=2) mx[i].fi=max(mx[i].fi,mx[childs[i][z]].fi);
			else mx[i].se=max(mx[i].se,mx[childs[i][z]].fi);
			mx[i].se=max(mx[i].se,mx[childs[i][z]].se);
		}
	}
	mx[i].fi+=lazy1[i]+lazy2[i]; mx[i].se+=lazy2[i];
}

signed main(){
	ios::sync_with_stdio(0);
	cin.tie(0);
	cout.tie(0);
	cin.exceptions(ios::badbit | ios::failbit);
	
	cin>>n>>q;
	
	int a,b,c;
	rep(x,1,n){
		cin>>a>>b;
		al[a].pub(b);
		al[b].pub(a);
	}
	
	memset(rp,-1,sizeof(rp));
	FAKE_IDX=n+1;
	dfs_ss(1,-1);
	hp[1]=1;
	dfs_hld(1,-1);
	
	memset(par,-1,sizeof(par));
	memset(childs,-1,sizeof(childs));
	dfs_chain(1,0,sz(chain[1])-1,-1,0,0);
	
	// rep(x,1,FAKE_IDX) cout<<hp[x]<<" "; cout<<endl;
	// rep(x,1,FAKE_IDX) cout<<childs[x][0]<<" "; cout<<endl;
	// rep(x,1,FAKE_IDX) cout<<childs[x][1]<<" "; cout<<endl;
	// rep(x,1,FAKE_IDX) cout<<childs[x][2]<<" "; cout<<endl;
	
	while (q--){
		cin>>a;
		
		if (a==1){
			cin>>a>>b>>c;
			
			while (hp[a]!=hp[b]){
				if (hd[a]<hd[b]) swap(a,b);
				
				int oa=a;
				while (hp[a]==hp[oa]){
					if (in[a]<=in[oa]){
						val[a]+=c;
						if (childs[a][0]!=-1){
							lazy1[childs[a][0]]+=c;
							mx[childs[a][0]].fi+=c;
						}
					}
					upd(a);
					a=par[a];
				}
			}
			
			if (in[a]>in[b]) swap(a,b);
			
			if (rp[a]!=-1){ //lca is a fake vertex
				int g=rp[a];
				val[g]+=c;
				while (g!=-1){
					upd(g);
					g=par[g];
				}
			}
			
			int oa=a,ob=b;
			while (a!=b){
				int md=max(d[a],d[b]);
				if (d[a]>=md){
					if (in[oa]<=in[a]){
						val[a]+=c;
						if (childs[a][1]!=-1){
							lazy1[childs[a][1]]+=c;
							mx[childs[a][1]].fi+=c;
						}
					}
					upd(a);
					a=par[a];
				}
				if (d[b]>=md){
					if (in[b]<=in[ob]){
						val[b]+=c;
						if (childs[b][0]!=-1){
							lazy1[childs[b][0]]+=c;
							mx[childs[b][0]].fi+=c;
						}
					}
					upd(b);
					b=par[b];
				}
			}
			
			val[a]+=c;
			while (a!=-1){
				upd(a);
				a=par[a];
			}
		}
		else if (a==2){
			cin>>a>>b;
			
			int ans=0,ansl=0,ansr=0;
			while (hp[a]!=hp[b]){
				if (hd[a]<hd[b]) swap(a,b),swap(ansl,ansr);
				
				int curr=0;
				int oa=a;
				while (hp[a]==hp[oa]){
					if (in[a]<=in[oa]){
						curr=max(curr,val[a]);
						if (childs[a][0]!=-1) curr=max(curr,mx[childs[a][0]].fi);
					}
					curr+=lazy1[a]+lazy2[a];
					ansl+=lazy2[a];
					a=par[a];
				}
				ansl=max(ansl,curr);
			}
			
			if (in[a]>in[b]) swap(a,b),swap(ansl,ansr);
			
			if (rp[a]!=-1){ //lca is a fake vertex
				int g=rp[a],curr=val[rp[a]];
				while (g!=-1){
					if (hp[g]==hp[rp[a]]) curr+=lazy1[g];
					curr+=lazy2[g];
					g=par[g];
				}
				ans=max(ans,curr);
			}
			
			int oa=a,ob=b;
			int currl=0,currr=0;
			while (a!=b){
				int md=max(d[a],d[b]);
				if (d[a]>=md){
					if (in[oa]<=in[a]){
						currl=max(currl,val[a]);
						if (childs[a][1]!=-1) currl=max(currl,mx[childs[a][1]].fi);
					}
					currl+=lazy1[a]+lazy2[a];
					ansl+=lazy2[a];
					a=par[a];
				}
				if (d[b]>=md){
					if (in[b]<=in[ob]){
						currr=max(currr,val[b]);
						if (childs[b][0]!=-1) currr=max(currr,mx[childs[b][0]].fi);
					}
					currr+=lazy1[b]+lazy2[b];
					ansr+=lazy2[b];
					b=par[b];
				}
			}
			
			currl=max({currl,currr,val[a]});
			ansl=max(ansl,ansr);
			
			while (a!=-1){
				if (hp[a]==hp[oa]) currl+=lazy1[a];
				currl+=lazy2[a];
				ansl+=lazy2[a];
				a=par[a];
			}
			
			cout<<max({ans,currl,ansl})<<endl;
		}
		else if (a==3){
			cin>>a>>c;
			
			int oa=a;
			while (a!=-1){
				if (in[oa]<=in[a]){
					val[a]+=c;
					rep(z,1,3) if (childs[a][z]!=-1){ //update both right and light children
						lazy2[childs[a][z]]+=c;
						mx[childs[a][z]].fi+=c;
						mx[childs[a][z]].se+=c;
					}
				}
				upd(a);
				a=par[a];
			}
		}
		else{
			cin>>a;
			
			int oa=a,ans=0,curr=0;
			while (a!=-1){
				if (in[oa]<=in[a]){
					curr=max(curr,val[a]);
					rep(z,1,3) if (childs[a][z]!=-1){
						if (z==1) curr=max(curr,mx[childs[a][z]].fi);
						else ans=max(ans,mx[childs[a][z]].fi);
						ans=max(ans,mx[childs[a][z]].se);
					}
				}
				if (hp[a]==hp[oa]) curr+=lazy1[a];
				curr+=lazy2[a];
				ans+=lazy2[a];
				
				a=par[a];
			}
			
			cout<<max(ans,curr)<<endl;
		}
	}
}

script.py

import subprocess

#compile everything
subprocess.run("g++ gen_random.cpp -o bin/gen_random -O2 -std=c++17",shell=True)
subprocess.run("g++ gen_binary.cpp -o bin/gen_binary -O2 -std=c++17",shell=True)

tests=[
	"gen_random 1000000 5000000 0",
	"gen_random 1000000 5000000 -10",
	"gen_random 1000000 5000000 10",
	"gen_binary 1000000 5000000 1",
	"gen_binary 1000000 5000000 5"
]

solutions=[
	"hld_single",
	"hld_many",
	"hld_many_ACL",
	"linkcut",
	"hld_splay",
	"binary_tree",
	"binary_tree2"
]

for i in solutions:
	subprocess.run("g++ {}.cpp -o bin/{} -O2 -std=c++17".format(i,i),shell=True)

print("compilation done")

for test in tests:
	print(test)
	subprocess.run("./bin/{} 2 > in".format(test),shell=True)
	
	for x in range(7):
		print(solutions[x],subprocess.run("time ./bin/{} < in > out".format(solutions[x]),shell=True,capture_output=True).stderr)
	print()

for test in tests:
	print(test)
	subprocess.run("./bin/{} 4 > in".format(test),shell=True)
	
	for x in [0,6]:
		print(solutions[x],subprocess.run("time ./bin/{} < in > out".format(solutions[x]),shell=True,capture_output=True).stderr)
	print()

1/3 Centroid Decomposition

When I was writing this blog, I was wondering whether we could cut the $$$log$$$ factor from some centroid decomposition problems. Thanks to balbit for telling me about this technique.

Consider the following problem: You are given a weighted tree of size $$$n$$$ whose edges may have negative weights. Each vertex may either be toggled "on" or "off". Handle the following $$$q$$$ operations:

Toggle vertex $$$u$$$.
Given vertex $$$u$$$, find the maximum value of $$$d(u,v)$$$ over all vertices $$$v$$$ that is toggled "on". It is guaranteed that at least one such $$$v$$$ exists.

It is well-know how to solve this in $$$O(n \log n + q \log^2 n)$$$ by using centroid decomposition + segment trees. But can we do better?

The reason segment trees have to be used is because when we query for the longest path ending at some centroid parent to perform our queries, we have to ignore the contribution of our own subtree. An obvious way to solve this is to try to decompose the centroid tree in such a way that each vertex has at most $$$2$$$ children. Unfortunately, I do not know a way to do this such that the depth of the tree bounded by $$$\log_2 n$$$, but there is a way to make the depth of the tree bounded by $$$log_{\frac{3}{2}} n$$$.

Instead of thinking of doing centroid decomposition on vertices, let us consider doing it on edges. Top image is the usual cetroid decomposition, while bottom image is the one we want to use here. That is, the centroid gets passed down to its children when recursing.

Ok, so now we want to figure out what is the largest possible number of edges of the smaller partition. Remember, we want to make this value as large as possible to get a split that is as even as possible.

Firstly, we have a lower bound $$$\frac{1}{3}m$$$, whre $$$m$$$ is the number of edges. This is obtained when the tree is a star graph with $$$3$$$ children. The number of edges in each subtree are $$$[1,1,1]$$$, it is clear that the best way to partition the subtrees is $$$[1]$$$ and $$$[1,1]$$$, which gives us our desired lower bound.

Now, we will show that this bound is obtainable. Let $$$A$$$ be an array containing elements in the interval $$$[0,\frac{1}{2}]$$$ such that $$$\sum A=1$$$. This array $$$A$$$ describes the ratio between the number of edges in each subtree against the total number of edges. The elements are bounded above by $$$\frac{1}{2}$$$ due to the properties of the centroid.

Then, the following algorithm surprisingly finds a subset $$$S$$$ whose sum lies in the range $$$[\frac{1}{3},\frac{2}{3}]$$$.

1. tot=0
2. S={}
3. for x in [1,n]:
4.     if (tot+A[x]<=2/3):
5.         tot+=A[x]
6.         S.insert(x)

It is clear that $$$\sum\limits_{s \in S} A[s] \leq \frac{2}{3}$$$, so it suffices to show that $$$\sum\limits_{s \in S} A[s] \geq \frac{1}{3}$$$.

Let $$$P[x]=A[1]+A[2]+\ldots+A[x]$$$, that is $$$P$$$ is the prefix sum array.

Consider the first index $$$x$$$ such that $$$P[x]>\frac{2}{3}$$$. We will split into $$$2$$$ cases.

$$$A[x]<\frac{1}{3}$$$: when we have completed iteration $$$x-1$$$, the $$$\sum\limits_{s \in S} A[s] = P[x-1] = P[x]-A[x] > \frac{1}{3}$$$.
$$$A[x] \geq \frac{1}{3}$$$: it is easy to see that the final $$$S=[1,n] \setminus \{x\}$$$. So $$$\sum\limits_{s \in S} A[s] = 1-A[x] \geq \frac{1}{2}$$$.

Therefore, we are able to obtain a centroid tree which is a binary tree and has depth at most $$$\log_{\frac{3}{2}} n$$$.

Returning back to the original problem, we are able to solve it in $$$O(n \log n + q_1 \log^2 n + q_2 \log n)$$$ where $$$q_1$$$ and $$$q_2$$$ are the number of queries of type $$$1$$$ and $$$2$$$ respectively.

References

[1] https://github.com/dawxy/ACM-CODER/blob/master/%E3%80%90%E8%AE%BA%E6%96%87%26%26%E6%95%99%E7%A8%8B%E3%80%91/QTREE%E8%A7%A3%E6%B3%95%E7%9A%84%E4%B8%80%E4%BA%9B%E7%A0%94%E7%A9%B6.pdf
[2] https://ocw.mit.edu/courses/6-854j-advanced-algorithms-fall-2008/921232cb9a69015c50002ff5ea6a9824_lec6.pdf
[3] https://courses.csail.mit.edu/6.851/spring12/scribe/L19.pdf
[4] https://en.wikipedia.org/wiki/Link/cut_tree
[5] https://codeforces.com/blog/entry/82400
[6] https://codeforces.com/blog/entry/72626
[7] https://codeforces.com/blog/entry/53170
[8] https://oj.uz/problem/view/JOI14_secret
[9] https://codeforces.com/blog/entry/103726

Full text and comments »

tutorial, hld, centroid decomposition

+245

errorgorn
22 months ago
10

[Tutorial] Berlakamp's Algorithm for Polynomial Factorization in a Field (1698G buffed)

By errorgorn, 22 months ago, In English

Hi, this is going to be a solution to a "harder" version of the last problem of the recent div 2 round that I coordinated. We can actually solve the problem for $$$|s| \leq 60$$$ but we decided to lower the constraint as forcing this solution is not interesting and also way too hard for the last problem of a div 2. I would like to thank the authors for allowing me to write this blog and shamelessly farm contribution from their contest.

Firstly, I would like to share a solution (credits to Ari) which solves $$$|s| \leq 60$$$ without explictly factoring the entire polynomial.

spoiler

Now, let's get into the algorithm to factorize polynomials. We will use Berlakamp's algorithm, which is an algorithm to factorise polynomials where the coefficients of polynomials are elements of the field $$$GF(q)$$$, where $$$q=p^k$$$. I am writing this blog because the original paper was pretty hard to follow for me. So, I hope writing this blog would make understanding this algorithm from a competitive programming standpoint easy.

Before we get to the algorithm we have to prove $$$2$$$ things first:

$$$f(x)^q-f(x)=\prod\limits_{u \in GF(q)} (f(x)+u)$$$

proof

$$$f(x)^q=f(x^q)$$$

proof

The actual algorithm

Let's say we found some polynomial $$$h(x)^q-h(x) = 0 \pmod {f(x)}$$$ where $$$1 \leq \deg h<\deg f$$$ (ignore how we have done this for now).

We know that $$$h(x)^q-h(x)= \prod\limits_{u \in GF(q)} (h(x)+u)$$$ which is a multiple of $$$f(x)$$$.

$$$\gcd$$$ is a multiplicative function in the sense that for coprime $$$a,b$$$, we have $$$\gcd(a,c) \cdot \gcd(b,c) = \gcd(a \cdot b,c)$$$. It is obvious that $$$h(x)+a$$$ is coprime from $$$h(x)+b$$$ when $$$a \neq b$$$. So $$$\prod\limits_{u \in GF(q)} \gcd(h(x)+u,f(x)) = f(x)$$$.

We are almost done, we just need to show that the product above gives us a useful factorization of $$$f(x)$$$, that is it doesn't tell us that $$$f(x)=f(x) \cdot 1 \cdot 1 \ldots$$$. But we have $$$\deg h<\deg f$$$, so it is impossible that $$$\gcd(h(x)+a,f(x))=f(x)$$$.

Finding $$$h$$$

We have $$$h(x)^q-h(x) = h(x^q)-h(x) = 0 \pmod{f(x)}$$$. Therefore, we can reduce this to finding $$$h_0,h_1,\ldots$$$ such that $$$h_0(x^{0q}-x^0)+h_1(x^{1q}-x^1)+\ldots = 0 \pmod{f(x)}$$$. This is some linear algebra as we want to find the null-space of polynomial of $$$x^{iq}-x^i \mod f(x)$$$. This can be done using XOR basis Since we are finding $$$h(x)$$$ under modulo $$$f(x)$$$, we can reduce any non-trivial solution to $$$h(x)$$$ to one with $$$1 \leq \deg h < \deg f$$$.

Now, the only problem is figuring out when such $$$h$$$ exists. Suppose that $$$f(x)=f_0(x)f_1(x)$$$ for some coprime non-constant polynoimials $$$f_0(x)$$$ and $$$f_1(x)$$$.

By Chinese remainder theorem, we know that for any $$$(c_0(x),c_1(x))$$$ with $$$\deg c_i < \deg f_i$$$, we can find a $$$h(x)$$$ that satisfies $$$h(x) \pmod{f_i(x)}=c_i(x)$$$.

Let us look at the particular case of $$$(c_0(x),c_1(x))=(1,1)$$$.

We have $$$h(x) = 1 = 1^q = h(x)^q \pmod{f_i(x)}$$$.

Since we have $$$h(x)=h(x)^q \pmod{f_i(x)}$$$, it follows that $$$h(x)=h(x)^q \pmod{f(x)}$$$.

However, a shortcoming of this process is that $$$h(x)$$$ only exists when we can find appropriate $$$f_0(x)$$$ and $$$f_1(x)$$$. In particular, we will have to seperately handle the case where $$$f(x)=g(x)^k$$$ where $$$g(x)$$$ is an irreducible polynomial and $$$k>1$$$.

Handling Repeated Powers

Notice that $$$\gcd(f(x),f'(x)) = \gcd(g(x)^k,k g(x)^{k-1})$$$.

If $$$k$$$ is a multiple of $$$p$$$, then we can use the identity of $$$f(x^p)=f(x)^p$$$ to recurse to a smaller polynomial.
Otherwise, $$$\gcd(f(x),f'(x))=g(x)^{k-1}$$$ and we immediately have $$$g(x)=\frac{f(x)}{\gcd(f(x),f'(x))}$$$.

Time Complexity

Here is my implementation: 162171925. Note that it does not actually solve $$$|s| \leq 60$$$ quickly because it does not do factorize on integers quickly.

Let the degree of the polynomial be want to factorize be $$$d$$$. The each call of factor requires us to build a xor basis and also find the $$$\gcd$$$ of $$$q$$$ polynomials. They respectively take $$$O(d^2 \cdot A(d))$$$ and $$$O(q \cdot d \cdot A(d))$$$ time, where $$$O(A(d))$$$ is the time complexity required to add two polynomials. factor is called at most $$$d$$$ times.

In $$$GF(2)$$$, we can assume that $$$A(d)=1$$$ because it is literally XOR, so the time complexity is $$$O(d^3)$$$.

In $$$GF(q)$$$, it does not make sense to assume that $$$A(d)=1$$$, so we will use $$$A(d)=d$$$. The time complexity balloons to $$$O(d^4 + q d^2)$$$.

Full text and comments »

tutorial, polynomials, factorization

errorgorn
22 months ago
8

Catalan Numbers and Generating Uniform Balanced Bracket Sequences

By errorgorn, 23 months ago, In English

Hi everyone! Today nor sir and I would like to talk about generating uniform bracket sequences. Over the past few years when preparing test cases, I have had to generate a uniform bracket sequence a few times. Unfortunately I could not figure out how, so I would like to write a blog about this now. Hopefully this blog would be useful to future problem setters :) Scroll down to the end of the blog to copy our generators.

First, let's define some things. A bracket sequence is a string with $$$n$$$ $$$\texttt{(}$$$ s and $$$n$$$ $$$\texttt{)}$$$ s. A balanced brackets sequence is a bracket sequence with the additional constraint that it becomes a valid arithmetic expression when we add some $$$\texttt{1}$$$ s and $$$\texttt{+}$$$ s. Let $$$p$$$ be the prefix sum array of a bracket sequence (we assign $$$1$$$ to $$$\texttt{(}$$$ and $$$-1$$$ to $$$\texttt{)}$$$ and take the prefix sum). For example, if our bracket sequence is $$$\texttt{())()((())}$$$, then $$$p=[1,0,-1,0,-1,0,1,2,1,0]$$$.

An alternate way to think about the prefix sum is to imagine the bracket sequence as a sequence of increasing and decreasing lines.

We can see that the necessary and sufficient for a bracket sequence to be balanced is that all elements of the prefix sum is non-negative.

Let's define the badness of a bracket sequence as the number of lines in the above diagram that is below the $$$0$$$-line. So, the badness of our bracket sequence is $$$4$$$.

Let's call a bracket sequence irreducible if the line touches the $$$0$$$-line exactly twice (once at the start and at the end). So, $$$\texttt{(())}$$$ and $$$\texttt{))()((}$$$ are both irreducible but not $$$\texttt{()()}$$$. It is natural to also define a irreducible decomposition of a bracket sequence. $$$\texttt{())()((())}$$$ gets decomposed into $$$\texttt{()}$$$,$$$\texttt{)(}$$$,$$$\texttt{)(}$$$,$$$\texttt{(())}$$$.

Now, we want to note that all irreducible bracket sequences are either balanced or are some sort of "inverse" of a balanced bracket sequences. By inverse, we can either reverse the entire string or flip every bracket in the string. Please note that these are different.

For example, $$$\texttt{)))(()((}$$$ can be turned into $$$\texttt{((())())}$$$ by flipping every bracket or $$$\texttt{(()(()))}$$$ by reversing the substring.

Firstly, let's start with finding a recursive formula for the Catalan numbers. One of the basic recurrences when dealing with bracket sequences is to decompose them into the form of $$$\texttt{(}X\texttt{)}Y$$$, where $$$X$$$ and $$$Y$$$ are both valid bracket sequences (they might be empty). This decomposition is actually unique since $$$\texttt{(}X\texttt{)}$$$ can only be our first term in our irreducible decomposition (for us competitive programmers, think of it as dp where our transition is removing the first irreducible substring). We immediately have the recurrence $$$C_{n+1} = \sum\limits_{i=0}^n C_i C_{n-i}$$$ for $$$n > 0$$$ and a base case of $$$C_0=1$$$ for $$$n=0$$$. After some manipulation with formal power series $$$[1,2 \text{ example } 2.3.3]$$$, you can find that we get the familiar $$$C_n = \frac{1}{n+1} \binom{2n}{n} = \frac{1}{2n+1} \binom{2n+1}{n}$$$.

Mapping 1

Let's take a look at $$$C_n = \frac{1}{2n+1} \binom{2n+1}{n}$$$ first. What other sequence can we get the $$$\binom{2n+1}{n}$$$ term from? Well, the number of binary strings with $$$n$$$ $$$\texttt{(}$$$ s and $$$n+1$$$ $$$\texttt{)}$$$ s. And nicely enough, $$$2n+1$$$ is the length of said binary string. Let's define the equivalence class $$$\sim _R$$$ where $$$s\sim_Rt$$$ when $$$s$$$ is a cyclic shift of $$$t$$$. Generally for binary strings, it is not so easy to count the number of elements in someone's equivalence class when we define it this way. But since $$$\gcd(2n+1,n)=1$$$ here, all cyclic shifts of a string are distinct and there are exactly $$$2n+1$$$ strings in any equivalence class. We are almost at a bijection to Catalan numbers already! We just need to show that there is a bijection between equivalence classes and balanced bracket sequences.

Lemma

Given a string of $$$n$$$ $$$\texttt{(}$$$s and $$$n+1$$$ $$$\texttt{)}$$$s, there exists exactly $$$1$$$ cyclic shift where the first $$$2n$$$ elements is a valid bracket sequence.

As an example, consider the bracket sequence $$$\texttt{)())(()}$$$. The only cyclic shift that gives us a prefix sum with our condition is $$$\texttt{(())())}$$$.

The proof of this is not too hard and you should prove it yourself, but for completeness, I will put it here.

proof

Let us define the special index $$$s_p$$$ as the index ($$$0$$$-indexed) of the leftmost minimum value of the prefix sum. For example, $$$\texttt{)())(()}$$$ gives us $$$p=[-1,0,-1,-2,-1,0,-1]$$$, $$$s_p=3$$$. By definition the $$$s_p$$$ is the unique index such that $$$p_i > p_{s_p}$$$ if $$$i<s_p$$$ and $$$p_{s_p} \leq p_i$$$ if $$$s_p<i$$$.

Claim: when we perform a right rotation on the string, the special index will be updated like $$$s_p \gets (s_p+1) \% (2n+1)$$$. Loosely, the special index shifts along with the string.

Let us consider that currently our prefix sum is $$$p=[p_1,p_2,\ldots,p_k]$$$. After a single right rotation, it will become $$$p'=[0+(p_k-p_{k-1}),p_1+(p_k-p_{k-1}),p_2+(p_k-p_{k-1}),\ldots,p_{k-1}+(p_k-p_{k-1})]$$$. That is, we append $$$0$$$ to the beginning of $$$p$$$, delete the last element of $$$p$$$, then add $$$p_k-p_{k-1}$$$ to all elements.

We want to show that $$$s_{p'}= (s_p+1) \% (2n+1)$$$. We will split into two cases.

Note that $$$p_k$$$ is $$$-1$$$, since there are $$$n$$$ $$$\texttt{(}$$$s and $$$n+1$$$ $$$\texttt{)}$$$s. Furthermore, we can more easily consider $$$p'=[0,p_1,p_2,\ldots,p_{k-1}]$$$ since adding a constant all elements of $$$p'$$$ does not change $$$s_{p'}$$$.

Case 1

Suppose that $$$s_p < 2n$$$, then $$$s_{p'} = s_{p}+1$$$. From our assumption that $$$s_p$$$ is the special index of $$$p$$$, we already have that $$$p'_i > p'_{s_p+1}$$$ if $$$1 \leq i<s_{p}+1$$$ and $$$p_{s_p+1} \leq p_i$$$ if $$$s_p+1<i$$$. It only remains to prove that $$$p'_0 > p'_{s_p+1}$$$.

Since $$$p_k=-1$$$, it must be true that $$$p_{s_p} \leq -1$$$. We obtain $$$p'_0=0>-1\geq p_{s_p}$$$. Therefore, $$$p'_0 > p'_{s_p+1}$$$ and $$$s_{p'}=s_p+1$$$ is the special index of $$$p'$$$.

Case 2

Suppose that $$$s_p = 2n$$$. We only have to prove that all elements of $$$p'$$$ are non-negative to claim that $$$s_{p'}=0$$$. This is easy by contradiction. Suppose that $$$p'_j < 0$$$. Then it is impossible that $$$0 > p_{j-1} > p_{2n}=-1$$$. $$$\blacksquare$$$

In fact, our claim is a modified version of cycle lemma where we have $$$m=n+1$$$ and $$$k=1$$$ (see, for instance, $$$[3]$$$ for two nice proofs -- one by arranging the sequence on a circle and removing certain segments, and the other using an idea similar to the prefix sum idea presented above).

Now, let us rotate the string until $$$s_p=2n$$$. By our above claim, there is a unique position that satisfies this. since there are $$$n$$$ $$$\texttt{(}$$$s and $$$n+1$$$ $$$\texttt{)}$$$s, we have $$$p_{2n}=p_{s_p}=-1$$$. So, $$$p_i \geq 0$$$ for all $$$0 \leq i < 2n$$$. That is, the first $$$2n$$$ characters of the string is now a valid bracket sequence.

We can see that in each equivalence class, there is exactly a single element whose first $$$2n$$$ elements are a balanced bracket sequence. Let this be the representative of the equivalence class. We can obtain balanced bracket sequences from representatives by removing the last character and vice versa by appending a $$$\texttt{)}$$$. Therefore, there is a bijection between the equivalence classes and

There is a interesting problem which is related to this $$$[4]$$$ (in Chinese).

translation

solution

Mapping 2

Thank you nor sir for writing this part and independently discovering it.

The mapping mentioned here was, unfortunately, already present in literature $$$[5]$$$ after we checked, and my old proof was quite similar to the one mentioned in the paper. It used the Chung-Feller theorem, which says that the badness of bracket sequences is uniformly distributed (i.e., there are an equal number of strings for each value of badness), and then showed that the mapping was in fact a bijection from the set of strings of a fixed badness to the set of balanced bracket sequences.

However, I came up with a much simpler proof which we will present below along with the intuition behind the construction (we could not find this proof anywhere, so if anyone finds a reference, please let us know in the comments).

The main idea behind what we will do below is the identity $$$C_n = \frac{1}{n + 1} \binom{2n}{n}$$$. We try to construct a balanced bracket sequence from a bracket sequence with $$$n$$$ opening brackets and $$$n$$$ closing brackets, and then try to show that this construction is in fact an $$$n + 1$$$ to $$$1$$$ mapping from $$$S_n$$$ (the set of bracket sequences with $$$n$$$ opening brackets and $$$n$$$ closing brackets) to $$$S^*_n$$$ (the set of balanced bracket sequences of length $$$2n$$$).

The first idea is to note that, as mentioned earlier, if a string $$$S = AB$$$, where $$$A$$$ is the irreducible prefix of $$$S$$$, then we can make $$$A$$$ an irreducible balanced bracket sequence by either flipping all brackets of $$$A$$$ (which corresponds to flipping the corresponding diagram in the $$$x$$$-axis) or by reversing the string $$$A$$$ (which corresponds to rotating the diagram $$$180^\circ$$$).

The first attempt at finding a mapping was to do this: set $$$f(S) = A'f(B)$$$, where $$$A'$$$ is the string obtained by flipping all brackets in $$$A$$$ (or just reversing $$$A$$$). However, this will not give us a uniformly random way of constructing balanced bracket sequences, since $$$\texttt{((()))}$$$ is the image of $$$2$$$ strings under this mapping, and $$$\texttt{()()()}$$$ is the image of $$$8$$$ strings under this mapping.

So here was the crucial idea: we try to offset this asymmetry by modifying how we handle the case when $$$A \ne A'$$$. In this case, $$$A =\; \texttt{)}C\texttt{(}$$$ for some $$$C$$$, i.e., $$$S = \texttt{)}C\texttt{(}B$$$. So in order to try to compensate for the imbalance in distribution, we use $$$\texttt{(}f(B)\texttt{)}C'$$$ instead (where $$$A' = (C')$$$). As it turns out, this works out perfectly.

Here's the formal construction:

Definition: $$$f: \cup_{n = 0}^\infty S_n \to \cup_{n = 0}^\infty S^*_n$$$ is defined as

$$$f(\varepsilon) = \varepsilon$$$, where $$$\varepsilon$$$ is the empty string.
$$$f(AB) = Af(B)$$$, if $$$A$$$ is irreducible and balanced.
$$$f(AB) = (f(B))C'$$$, if $$$A$$$ is irreducible and not balanced, and as a corollary, $$$A =\; )C($$$ for some $$$C$$$.

Claim: For any string $$$S^* \in S^*_n$$$, there are exactly $$$n + 1$$$ solutions to $$$f(S) = S^*$$$ for $$$S \in S_n$$$.

Proof: Let $$$S^* = A^*B^*$$$, where $$$A^*$$$ is the irreducible prefix of $$$S^*$$$. Let $$$S \in S_n$$$ be a string satisfying the equation $$$f(S) = S^*$$$. We prove this by induction. For $$$n = 0$$$, there is nothing to prove. Let's suppose $$$n > 0$$$. Then $$$S$$$ can be mapped to $$$S^*$$$ by using one of the two rules, depending on whether the irreducible prefix of $$$S$$$ is a balanced bracket sequence or not:

Let $$$S = AB$$$, where $$$A$$$ is a balanced irreducible prefix of $$$S$$$. Then we must have $$$S^* = f(S) = A f(B)$$$. Since $$$A, A^*$$$ are both balanced irreducible prefixes of $$$S^*$$$, they must be equal, and $$$B^* = f(B)$$$. The number of solutions in this case is hence equal to $$$\frac{|B^*|}{2} + 1$$$ by the induction hypothesis.
Let $$$S = AB$$$, where $$$A$$$ is not balanced, but an irreducible prefix of $$$S$$$. Then $$$A =\; \texttt{)}C\texttt{(}$$$, and $$$A' = \texttt{(}C'\texttt{)}$$$ (as mentioned above). By definition, $$$S^* = f(S) = \texttt{(}f(B)\texttt{)} C'$$$. By an analogous argument as above, since $$$f(B)$$$ is balanced, $$$A^* = \texttt{(}f(B)\texttt{)}$$$ and $$$B^* = C'$$$. There are exactly $$$\frac{|A|^*}{2}$$$ solutions as earlier.

Overall, there are $$$\frac{|A^*| + |B^*|}{2} + 1 = n + 1$$$ solutions to $$$f(S) = S^*$$$, as needed.

In fact, if, in the proof, we also keep track of the badness of the strings that generate a given balanced bracket sequence, we can see that there is one input string for each possible value of badness (and this gives an alternative proof of the Chung-Feller theorem).

Generating Balanced Bracket Sequences

It has been asked before on CodeForces $$$[6]$$$ how to generate a random bracket sequence. Judging from the comments, no one has been able to figure out how to generate it quickly (i.e. in linear time). So I will provide the code to generate a uniform balanced bracket sequence. Additionally, I will provide code that generates a random binary tree of $$$n$$$ vertices to balanced bracket sequences of size $$$2n$$$.

I know that there is also a bijection between bracket sequences and ordered trees, but there are better ways to generate a tree $$$[7]$$$. Also, there is also a bijection to full binary tree of $$$2n+1$$$ vertices, but there is a easy bijection to binary trees of $$$n$$$ vertices noting that the full binary tree of $$$2n+1$$$ vertices has $$$n+1$$$ leaf nodes. We simply remove all leaves, which leaves (pun not intended) us with a binary tree of $$$n$$$ vertices. So the binary tree is the "internal nodes" of the full binary tree.

Also, there is a (rather inelegant) method of generating balanced bracket sequences without using any of the techniques discussed earlier.

One of the first ideas that anyone attempting this task might have is to randomly place $$$\texttt{(}$$$ or $$$\texttt{)}$$$ in a string unless it is impossible to do so. However, we do not get a uniform distribution. Instead, let us vary the probability that we place $$$($$$ or $$$)$$$ in accordance to the actual probability that we will see in our required distribution.

Let $$$C_n^k$$$ denote the number of bracket sequences of size $$$2n$$$ with the first $$$k$$$ elements being $$$\texttt{(}$$$. From $$$[8]$$$, we know that $$$C_n^k = \frac{k+1}{n+1} \binom{2n-k}{n-k} = \frac{k+1}{2n-k+1} \binom{2n-k+1}{n-k}$$$. We can actually prove it using the same technique of adding a single $$$\texttt{)}$$$ and performing rotations.

proof

We are enumerating balanced bracket sequences that have length $$$2n$$$ and the first $$$k$$$ elements are $$$\texttt{(}$$$. This is equivalent to enumerating the bracket sequence which have $$$n-k$$$ $$$\texttt{(}$$$s and $$$n$$$ $$$\texttt{)}$$$s where $$$p_i \geq -k$$$.

Let us study the distinct binary strings with $$$n-k$$$ $$$\texttt{(}$$$s and $$$n+1$$$ $$$\texttt{)}$$$s. Instead of defining a special index, let us define a special set of indices $$$s_p$$$. Let $$$mn$$$ be the minimum value of the prefix sum. We will store the special leftmost index with values in the range $$$[mn,mn+k]$$$, so we are storing $$$k+1$$$ indices in our set. Note that $$$k+1$$$ is equal to the difference between the number of $$$\texttt{(}$$$ s and $$$\texttt{)}$$$ s.

Now, with a similar case work to our original solution, we can show that this set similarly rotates with our string (exercise to reader). The claim is now that if and only if $$$2n-k \in s_p$$$, then the concatenation of $$$\underbrace{\texttt{((}\ldots\texttt{(}}_{k \ \text{times}}$$$ with the first $$$2n-k$$$ characters of the string gives us a balanced bracket sequence.

Suppose that $$$2n-k \in s_p$$$. Since $$$p_{2n-k}=-k-1$$$, then we have $$$p_i \geq -k$$$ for $$$0 \leq i < 2n-k$$$. Suppose that this is not true, that is we have $$$p_j < -k$$$ for some $$$j$$$, Let us define $$$p_{-1}=0$$$ for convinience. By the definition of prefix sum, we have $$$|p_i-p_{i-1}| =1$$$ for $$$0 \leq i < 2n-k$$$. Since $$$p_{-1}=0$$$ and $$$p_j < -k$$$, by discrete intermediate value theorem, there exists $$$p_z = -k$$$ for some $$$0 \leq z < j$$$. This contradicts the fact that $$$2n-k$$$ was the leftmost index with value $$$-k-1$$$. So it is proven for this case.

Suppose that $$$2n-k \notin s_p$$$. Either $$$-k-1 \notin [mn,mn+k]$$$ or $$$2n-k$$$ is not the leftmost index with value $$$-k-1$$$. Both of these contradict our statement. So it is proven for this case too.

Now, let us study a single equivalence class defined by the same equivalence relation of $$$s \sim_R t$$$ when $$$s$$$ is a cyclic shift of $$$t$$$. But we need to note here that not all equivalence classes have the same size. Indeed, let $$$n=3$$$ and $$$k=1$$$. Then we are studying strings that are a permutation of $$$\texttt{(())))}$$$. In the equivalence class containing $$$\texttt{())())}$$$, there are only $$$3$$$ elements, while in the equivalence class containing $$$\texttt{(())))}$$$, there are $$$6$$$ elements. But this is actually fine.

Given a string $$$s$$$, choose the shortest string $$$t$$$ such that $$$s$$$ is the concatenation of some copies of $$$t$$$. Then, the number of elements in the equivalence class containing $$$s$$$ is $$$|t|$$$.

We now want to show that of those $$$|t|$$$ elements, $$$\frac{k+1}{n+1} \cdot \frac{|t|}{|s|}$$$ of them satisfy $$$2n-k \in s_p$$$.

I claim that if $$$x \in s_p$$$, then $$$(x+|t|) \% (2n-k+1) \in s_p$$$. It is not too hard to prove this by noting that $$$p_{|t|}<0$$$, since the ratio of bracket types in $$$s$$$ and $$$t$$$ are the same.

Therefore, pick any element in the equivalence class. By the property of elements of $$$s_p$$$ being rotated while we rotate the string, the number of strings that correspond to valid bracket sequences is the number of elements in $$$(2n-k-|t|,2n-k] \cap s_p$$$. Since, we know that if $$$x \in s_p$$$, then $$$(x+|t|) \% (2n-k+1) \in s_p$$$, we have $$$\sum\limits_{i=0}^\frac{|s|}{|t|} |(-1+i \cdot |t|,-1+(i+1) \cdot |t|] \cap s_p| = |s_p| = \frac{|s|}{|t|} \cdot |(2n-k-|t|,2n-k] \cap s_p|$$$.

Therefore, we conclucde that $$$|(2n-k-|t|,2n-k] \cap s_p| = (k+1) \cdot \frac{|t|}{|s|}$$$. That is to say, in each equivalence class, there is a $$$\frac{k+1}{|s|}= \frac{k+1}{2n-k+1}$$$ probability that some element corresponds to a bracket sequence.

Since there are $$$\binom{2n-k+1}{n-k}$$$ elements total, there are $$$\frac{k+1}{2n-k+1} \binom{2n-k+1}{n-k}$$$ elements that correspond to bracket sequences where the first $$$k$$$ elements are $$$\texttt{(}$$$ s.

In fact, this gives us a way to generate a uniform random balanced bracket sequence of length $$$2n$$$ where the first $$$k$$$ elements are $$$\texttt{(}$$$ s. Generate a random string with $$$n-k$$$ $$$\texttt{(}$$$ s and $$$n+1$$$ $$$\texttt{)}$$$ s. The equivalence class that this string will be in is obviously proportional to the number of elements in said equivalence class. Now, we could pick a random element of $$$(2n-k-|t|,2n-k] \cap s_p$$$, but you can simply pick a random element of $$$s_p$$$ and it will be equivalent.

Oh course, we could use the method of incrementally placing $$$\texttt{(}$$$ or $$$\texttt{)}$$$ by their correct ratio, but this is way cooler.

Ok, so if we want to make a balanced bracket sequence of length $$$2n$$$ and we currently placed $$$a$$$ $$$\texttt{(}$$$s and $$$b$$$ $$$\texttt{)}$$$s, then there are $$$C_{n-b}^{a-b}$$$ possible balanced bracket sequence that have our current string as a prefix. Out of them, $$$C_{n-b}^{a+1-b}$$$ have the next character being $$$\texttt{(}$$$. So we simply calculate the probability that we should put $$$\texttt{(}$$$ as $$$\frac{C_{n-b}^{a+1-b}}{C_{n-b}^{a-b}} = \frac{n-a}{2n-a-b} \cdot \frac{a-b+2}{a-b+1}$$$. Notice that $$$n-a$$$ and $$$n-b$$$ are the remaining numbers of $$$\texttt{(}$$$ s and $$$\texttt{)}$$$ s respectively. The $$$\frac{n-a}{2n-a-b}$$$ term nicely corresponds to the ratio of $$$\texttt{(}$$$ s in the remaining characters.

As for the bijection from balanced bracket sequences to binary trees, remember earlier when we showed the recurrence $$$C_{n+1} = \sum\limits_{i=0}^n C_i C_{n-i}$$$ by bijecting into pairs $$$(X,Y)$$$ such that string is of the form $$$\texttt{(}X\texttt{)}Y$$$. This bijection works for a recursive bijection into binary trees too $$$[\text{folklore}]$$$. That is, $$$\texttt{()}$$$ bijects to the binary tree with $$$1$$$ nodes, then for any other bracket sequence, $$$X$$$ describes the left child and $$$Y$$$ describes the right child.

Ok enough talking, here are the codes (I used testlib $$$[7]$$$). They are written by both nor sir and I.

generator

#include <bits/stdc++.h>

#include "testlib.h"

std::string balanced_bracket_sequence_v1(int n) {
    auto res = std::string(n, '(') + std::string(n + 1, ')');
    shuffle(res.begin(), res.end());
    std::vector<int> pref(2 * n + 2, 0);
    for (int x = 0; x < 2 * n + 1; x++)
        pref[x + 1] = pref[x] + (res[x] == '(' ? 1 : -1);
    int idx =
        int(std::min_element(pref.begin(), pref.end()) - pref.begin());
    std::rotate(res.begin(), res.begin() + idx, res.end());
    return res.substr(0, 2 * n);
}

std::string balanced_bracket_sequence_v2(int n) {
    auto a = std::string(n, '(') + std::string(n, ')');
    shuffle(a.begin(), a.end());
    auto output = a;
    for (int l_in = 0, l_out = 0, r_in = 2 * n; l_in < r_in;) {
        int pos = l_in;
        int balance = 0;
        for (; pos != r_in; ++pos)
            if ((balance += (a[pos] == '(' ? +1 : -1)) == 0) break;
        if (a[l_in] == '(') {
            for (; l_in <= pos; ++l_in, ++l_out) output[l_out] = a[l_in];
        } else {
            const int sz_u = r_in - pos - 1, r_out = l_out + sz_u + 1;
            output[l_out] = '(', output[r_out] = ')';
            for (int i = l_in + 1, j = r_out + 1; i != pos; ++i, ++j)
                output[j] = (a[i] == '(' ? ')' : '(');
            l_in = pos + 1;
            l_out++;
        }
    }
    return output;
}

std::string balanced_bracket_sequence_v3(int n) {
    long long a = 0, b = 0;
    std::string res;
    for (int x = 0; x < 2 * n; x++) {
        if (rnd.next((2 * n - a - b) * (a - b + 1)) <
            (n - a) * (a - b + 2)) {
            res += '(';
            a++;
        } else {
            res += ')';
            b++;
        }
    }
    return res;
}

struct tree {
    int node;
    tree *l, *r;
    tree(int node_, tree* l_, tree* r_) : node{node_}, l{l_}, r{r_} {}
};

int get_value(tree* t) {
    if (t) return t->node;
    return -1;
}

using representation_t = std::vector<std::tuple<int, int, char>>;

auto representation(tree* t) {
    representation_t repr;
    auto dfs = [&repr](auto self, tree* a) -> void {
        if (!a) return;
        repr.emplace_back(a->node, get_value(a->l), 'L');
        repr.emplace_back(a->node, get_value(a->r), 'R');
        self(self, a->l);
        self(self, a->r);
    };
    dfs(dfs, t);
    return repr;
}

auto shuffled_representation(int n, representation_t r) {
    std::vector<int> p(n);
    std::iota(p.begin(), p.end(), 0);
    shuffle(p.begin(), p.end());  // shuffle vertex labels
    for (auto& [n1, n2, dir] : r) {
        if (n1 != -1) n1 = p[n1];
        if (n2 != -1) n2 = p[n2];
    }
    shuffle(r.begin(), r.end());  // shuffle edge order
    return r;
}

tree* binary_tree_from_balanced_bracket_sequence(const std::string& s) {
    int n = int(s.size()) / 2;
    std::vector<int> nxt(2 * n, -1);
    std::vector<int> stk;
    for (int x = 0; x < 2 * n; x++) {
        if (s[x] == '(') {
            stk.push_back(x);
        } else {
            nxt[stk.back()] = x;
            stk.pop_back();
        }
    }
    int node = 0;
    std::vector<tree*> roots(2 * n, nullptr);
    for (int x = 2 * n - 1; x >= 0; x--) {
        if (s[x] == '(') {
            tree *left = nullptr, *right = nullptr;
            if (s[x + 1] == '(') left = roots[x + 1];        // left child
            if (nxt[x] + 1 < 2 * n && s[nxt[x] + 1] == '(')  // right child
                right = roots[nxt[x] + 1];
            roots[x] = new tree(node++, left, right);
        }
    }
    return roots[0];
}

int main(int argc, char* argv[]) {
    registerGen(argc, argv, 1);
    int n = opt<int>("n");
    int t = opt<int>("t");
    std::cout << t << '\n';
    for (int test = 1; test <= t; ++test) {
        std::cout << n << '\n';
        auto rep = shuffled_representation(
            n, representation(binary_tree_from_balanced_bracket_sequence(
                   balanced_bracket_sequence_v1(n))));
        for (auto [parent, child, direction] : rep)
            std::cout << parent + 1 << ' ' << child + 1 << ' ' << direction
                      << '\n';
    }
}

generator (C_n^k)

#include "testlib.h"
#include <bits/stdc++.h>
using namespace std;

string bSeq(int n,int k) {
    string res=string(n-k,'(')+string(n+1,')');
    shuffle(res.begin(),res.end());
    deque<int> dq(k+1);
    
    int mn=0;
    int curr=0;
    for (int x=0;x<=2*n-k;x++){
    	curr+=(res[x]=='('?1:-1);
    	if (curr<mn){
    		mn=curr;
    		dq.push_back(x);
    		dq.pop_front();
    	}
    }
    
	rotate(res.begin(),res.begin()+dq[rnd.next(k+1)]+1,res.end());
    return string(k,'(')+res.substr(0,2*n-k);
}

int main(int argc, char* argv[]) {
    registerGen(argc, argv, 1);
    int n=atoi(argv[1]),k=atoi(argv[2]);
    
    cout<<bSeq(n,k)<<endl;
}

A Surprisingly Fast Generation

Thanks to pajenegod for pointing out that this method might be faster than $$$O(n^2)$$$.

Let us return to the original reccurence of $$$\texttt{(}X\texttt{)}Y$$$. This gives us a natural DnC-esque recurence if we are able to determine how to uniformly choose the correct size for $$$X$$$ and $$$Y$$$. Recall that $$$C_{n+1} = \sum\limits_{i=0}^n C_i C_{n-i}$$$ and that we have the approximation $$$C_n \approx \frac{4^n}{\sqrt{\pi n^3}}$$$ $$$[1]$$$.

For some bracket sequence of size $$$2(n+1)$$$, the probability we recurse into $$$X$$$ of size $$$s$$$ is $$$P_s = \frac{C_{s}C_{n-s}}{C_{n+1}}$$$. Of course, we are not able to store $$$C_n$$$ directly, but using the earlier approximation earlier, it makes sense to store $$$C'_n = \frac{C_n}{4^n}$$$, which should be at a reasonable size to use doubles. We can calculate $$$C'_{n+1} = \frac{n+\frac{1}{2}}{n+2} \cdot C'_n$$$ in linear time. Nicely, $$$P_s = \frac{C'_{s}C'_{n-s}}{4 C'_{n+1}}$$$.

Here, we can already bound these types of reccurences by $$$O(n \log n)$$$ if we don't care about the distribution of $$$s$$$ that we recurse to. We can do this by mapping $$$P_s$$$ in the order of $$$P_0,P_n,P_1,P_{n-1},\ldots$$$. That is, we generate a random number in $$$Z=[0,1)$$$. If $$$Z < P_0$$$, we recurse with $$$s=0$$$, if $$$Z < P_0+P_n$$$, we recurse with $$$s=n$$$, etc.

With this method, our time complexity is actually $$$T(n)=O(\min(a,b))+T(a)+T(b)$$$. So, we have $$$T(n)=O(n \log n)$$$.

But maybe we can do better. Note that the true complexity is $$$T(n+1) = \sum\limits_{s=0}^{n} P_s \cdot (T(s) + T(n-s) + O(\min(s,n-s))$$$

Let's look at the distribution of $$$P_s$$$ for $$$n=8$$$.

$$$P_0$$$	$$$P_1$$$	$$$P_2$$$	$$$P_3$$$	$$$P_4$$$	$$$P_5$$$	$$$P_6$$$	$$$P_7$$$
$$$0.3$$$	$$$0.092$$$	$$$0.059$$$	$$$0.049$$$	$$$0.049$$$	$$$0.059$$$	$$$0.092$$$	$$$0.3$$$

We see that the distribution is symmetric (which isn't that important) and that larger values appear on the extreme ends (which is important). This suggests in most cases, our recursion will make one side much smaller than the other.

Let us try to find an approximation for expected value of $$$\min(s,n-s)$$$ when drawn over the probability distribution of $$$P_s$$$. Note that $$$P_s = \frac{C'_{s}C'_{n-s}}{4 C'_{n+1}} \approx \frac{\sqrt{n^3}}{4 \sqrt{\pi s^3(n-s)^3}}$$$.

When $$$s \leq \frac{1}{2} n$$$, we get $$$P_s \lessapprox \frac{\sqrt{8}}{4 \sqrt{\pi s^3}} = \frac{1}{\sqrt{2 \pi}} \cdot \frac{1}{\sqrt{s^3}}$$$.

$$$\begin{align}E(\min(a,b)) &= 0 \cdot P_0 + 0 \cdot P_n + 1 \cdot P_1 + 1 \cdot P_{n-1} + \ldots \\&= 2 \cdot (\sum_{s=0}^{\frac{n}{2}} s \cdot P_s)\\ &\lessapprox \sqrt{\frac{2}{\pi}} \cdot (\sum_{s=0}^{\frac{n}{2}} \frac{s}{\sqrt{s^3}}) \\ &< \sqrt{\frac{2}{\pi}} \cdot (\int_{0}^{\frac{n}{2}} \frac{1}{\sqrt{s}} \, ds) \\ &= \sqrt{\frac{2}{\pi}} \cdot \sqrt{2n} \\ &= 2 \sqrt{\frac{n}{\pi}} \end{align}$$$

So, I thought the reccurence was going to look like $$$T(n)=O(\sqrt n) + T(\sqrt n) + T (n - \sqrt n)$$$. Which gives us $$$T(n) = O(n \log \log n)$$$. But unfortunately, this turned out to not be true.

After empirical testing and explicitly calculating the theoretical value of $$$T(n)$$$, I have observed that the actual complexity is $$$O(n \log n)$$$. I think it is quite interesting that despite our recurrence usually cutting at values of around $$$\sqrt n$$$, we end up having a total complexity of $$$O(n \log n)$$$ anyways, but probably with a significantly large logarithm base.

I'm using a Ryzen 7 3700X CPU with the compile command g++ xxx.cpp -O2.

code (theoretical value)

#include <bits/stdc++.h>
using namespace std;

#define rep(x,start,end) for(int x=(start)-((start)>(end));x!=(end)-((start)>(end));((start)<(end)?x++:x--))

mt19937_64 rng(chrono::system_clock::now().time_since_epoch().count());
uniform_real_distribution<double> dist(0.0, 1.0);

double cat[10000005];
double t[10000005];

signed main(){
	ios::sync_with_stdio(0);
	cin.tie(0);
	cout.tie(0);
	cin.exceptions(ios::badbit | ios::failbit);
	
	cat[0]=1;
	rep(x,0,10000000) cat[x+1]=(double)(x+0.5)/(x+2)*cat[x];
	
	rep(x,1,100000){
		double tot=0;
		rep(y,0,x){
			double prob=cat[y]*cat[x-y-1]/(4*cat[x]);
			t[x]+=prob*(t[y]+t[x-y-1]+min(y,x-y-1));
			tot+=prob*min(y,x-y-1);
		}
	}
	
	rep(x,1,100000) cout<<x<<" "<<t[x]<<endl;
}

code (empirical value)

#include <bits/stdc++.h>
using namespace std;

#define rep(x,start,end) for(int x=(start)-((start)>(end));x!=(end)-((start)>(end));((start)<(end)?x++:x--))

mt19937_64 rng(chrono::system_clock::now().time_since_epoch().count());
uniform_real_distribution<double> dist(0.0, 1.0);
 
long long time(){
	return chrono::system_clock::now().time_since_epoch().count();
}
 
double cat[10000005];

void solve(int pl,int pr,string &s){
	if (pl==pr) return;
	
	double temp=dist(rng);
	int n=(pr-pl)/2;
	
	double curr=0;
	int l=0,r=n-1;
	int ss;
	
	while (true){
		{ //add l
			curr+=cat[l]*cat[n-1-l]/(4*cat[n]);
			if (temp<curr){ ss=l; break; }
			l++;
		}
		{ //add r
			curr+=cat[r]*cat[n-1-r]/(4*cat[n]);
			if (temp<curr){ ss=r; break; }
			r--;
		}
	}
	
	s[pl+1+2*ss]=')';
	solve(pl+1,pl+1+2*ss,s);
	solve(pl+2+2*ss,pr,s);
}

string gen(int n){
	cat[0]=1;
	rep(x,0,n) cat[x+1]=(double)(x+0.5)/(x+2)*cat[x];
	
	string res(2*n,'(');
	solve(0,2*n,res);
	return res;
}

signed main(int argc, char *argv[]){
	ios::sync_with_stdio(0);
	cin.tie(0);
	cout.tie(0);
	cin.exceptions(ios::badbit | ios::failbit);
	
	rep(x,0,24){
		long long start=time();
		long long num=0;
		while (time()-start<10000000000){
			gen(1<<x);
			num++;
		}
		
		long long end=time();
		cout<<(1<<x)<<" "<<num<<" "<<end-start<<" "<<(double)(end-start)/(num*(1<<x))<<endl;
	}
}

output (empirical testing)

1 266547240 10000000078 37.5168
2 187154707 10000000050 26.7159
4 102058691 10000000091 24.4957
8 47076531 10000000131 26.5525
16 22151581 10000000151 28.2147
32 10758623 10000000660 29.0465
64 5403745 10000009142 28.9152
128 2734025 10000000066 28.5751
256 1338297 10000004834 29.1882
512 660750 10000010888 29.5592
1024 328150 10000004632 29.7597
2048 158510 10000011967 30.8045
4096 77867 10000111333 31.3539
8192 38048 10000260495 32.0841
16384 18565 10000365194 32.8777
32768 9273 10000355466 32.9113
65536 4657 10000484304 32.7669
131072 2277 10004361811 33.521
262144 1104 10004416345 34.5687
524288 548 10006717357 34.829
1048576 270 10012533429 35.3655
2097152 130 10035361701 36.8095
4194304 65 10088588940 37.0047
8388608 33 10127937055 36.5862

References

[1] https://en.wikipedia.org/wiki/Catalan_number
[2] https://www2.math.upenn.edu/~wilf/gfologyLinked2.pdf
[3] https://www.sciencedirect.com/science/article/pii/S0195669813800534
[4] https://uoj.ac/problem/273
[5] http://www.cs.otago.ac.nz/staffpriv/mike/Papers/RandomGeneration/RandomBinaryTrees.pdf
[6] https://codeforces.com/blog/entry/45723
[7] https://codeforces.com/blog/entry/18291
[8] https://codeforces.com/blog/entry/87585

Full text and comments »

catalan numbers, generators, testlib

+205

errorgorn
23 months ago
3

Codeforces Global Round 20

By errorgorn, 2 years ago, In English

On Apr/23/2022 17:05 (Moscow time), we will host Codeforces Global Round 20.

Note the unusual timing, it is 30 minutes earlier.

This is the second round of the 2022 series of Codeforces Global Rounds. The rounds are open and rated for everybody.

The prizes for this round:

30 best participants get a t-shirt.
20 t-shirts are randomly distributed among those with ranks between 31 and 500, inclusive.

The prizes for the 6-round series in 2022:

In each round top-100 participants get points according to the table.
The final result for each participant is equal to the sum of points he gets in the four rounds he placed the highest.
The best 20 participants over all series get sweatshirts and place certificates.

Thanks to XTX, which in 2022 supported the global rounds initiaitive!

All problems were written and prepared by errorgorn, maomao90 and oolimry.

We would like to thank the following people who made this round possible:

antontrygubO_o for coordinating our round and somehow only rejecting 2 div2As.
oolimry for contributing to national defence and keeping Singapore safe.
dario2994 for being the best tester. The perfect tester does not exist but I think dario2994 comes really close to it.
tqbfjotld, gamegame,dorijanlendvaj, dario2994, kclee2172, Monogon, nvmdava, TheScrasse, jamessngg, pavement, iLoveIOI, AlperenT, icypiggy, timreizin, Myrcella, DeMen100ns, QuangBuiCPP, bWayne, chicken337, joelau for testing the round and providing useful feedback.
MikeMirzayanov for Codeforces and Polygon.

You will have 3 hours to solve 9 problems, one of which is divided into two subtasks.

The scoring distribution is 250-500-750-1000-1500-(1250+1250)-2750-3000-4000. GLHF!

Also, please upvote the blog so that I can get more contribution than antontrygubO_o.

PS: We would like to take this opportunity to congratulate rotavirus or antontrygubX_y on getting married. Wishing you lots of love and happiness.

Edit: Editorial is released. Even if you have solved a problem, we encourage you to read the editorial as you might learn something new from it.

Full text and comments »

Announcement of Codeforces Global Round 20

codeforces global rounds, global round 20

+861

errorgorn
2 years ago
161

On "PermutationForces"

By errorgorn, 2 years ago, In English

Disclaimer: This blog is entirely my own opinion, please do not get mad at the authors from round 779. If you did not enjoy that round, please do not blame the authors. Personally, I felt that the authors overall did a wonderful job (SPyofgame's div 2F was honestly one of my favourite problems in 2022 so far).

Last week round 779 was held, a common feedback that people seemed to be quite vocal about was that the round was "PermutationForces".

If we look at the actual contest, we do see that problems B, C, D, E all contain the word permutation inside, so it is natural to think that problems are all repetitive. But saying a round is "PermutationForces" is like saying a round is "ArrayForces". Like arrays, permutations are very basic objects that are one of the basic building blocks of competitive programming, claiming that a round has too many problems using permutations is pretty baseless to me.

According to the definition of permutations, a permutation is an array consisting of $$$n$$$ distinct integers from $$$1$$$ to $$$n$$$ in arbitrary order. For example, $$$[2,3,1,5,4]$$$ is a permutation, but $$$[1,2,2]$$$ is not a permutation ($$$2$$$ appears twice in the array) and $$$[1,3,4]$$$ is also not a permutation ($$$n=3$$$ but there is $$$4$$$ in the array).

Sometimes when encountering permutation problems, we only use the definitions of permutation at "face value".

1658B - Марин и не взаимно простая перестановка — a permutation of length $$$n$$$ consists of distinct integers from $$$1$$$ to $$$n$$$ in arbitrary order, so there are at most $$$\frac{n}{k}$$$ elements which are a multiple of $$$k$$$.
IZhO 2015 Day 1 A — a permutation of length $$$n$$$ consists of distinct integers from $$$1$$$ to $$$n$$$ in arbitrary order, so a straightforward approach is just to do multiset comparison using hashing since the multiset of any permutation of length $$$n$$$ is unique.

However, in most permutation problems, we do not have to explicitly care that these values are from $$$1$$$ to $$$n$$$, if we instead say that they are $$$n$$$ distinct real numbers the problem would not change. In fact, in most problems, we do not have to do any arithmetic operations on elements on the array. For example, in the context of the problem given the permutation $$$[1,2]$$$, I can completely don't care that $$$1+1=2$$$. I only need to know that $$$1=1$$$ and $$$1<2$$$. Using this, we can replace the array with something like $$$[e,\tau]$$$ and although it is false that $$$e+e=\tau$$$ (well, unless you are an engineer), it still holds true that $$$e < \tau$$$. In fact, in many problems where we only care about the relative position of elements, there is a trick known as "discretization" where we compress an array of large values into an array only containing values in $$$[1,n]$$$. I feel that for problems that clearly only care about the relative ordering of elements, problem setters should default to using values in $$$[1,n]$$$ except for special cases.

Some examples of problems to illustrate what I am talking about:

1658C - Синдзю и потерянная перестановка — power is defined using the $$$max$$$ function, which only cares about the relative ordering of elements. We could possibly have used some array $$$a$$$ of length $$$n$$$ where all elements are in $$$[1,10^9]$$$ instead of a permutation here to prevent the corner case of having only a single $$$1$$$ in the array $$$c$$$ from happening. If you think about this problem more, it is actually about min stacks (wow! data structure in div 2 C? real?)
1658E - Годзё и игра на матрице — not so obvious example that we only care about the relative orderings of the elements here. I suggested that we change the problem to $$$1 \leq V_{i,j} \leq 10^9$$$ but I think it suggested it too late and so we left $$$V$$$ as a permutation in the end. I suggested that we make it in the range $$$[1,n^2]$$$ because of my philosophy regarding not requiring people to discretize and also allowing for $$$O(n^2)$$$ solutions, but I think setting larger limits would have been better here because the fact that we only care about relative orderings is completely non-trivial. Again, permutations play a very small role here and should be treated as "an array with distinct elements".
1641D - Два массива — notice that the condition for a good pair $$$(i,j)$$$ is that $$$a_{i,1},a_{i,2},\ldots,a_{i,m},a_{j,1},a_{j,2},\ldots,a_{j,m}$$$ is distinct. That is, we only need the inequality operator — we can discretize without caring about the relative orders of elements as long as our discretization function is a bijection. If you looked at the solution of anyone with $$$O(\frac{n^2\cdot m}{w})$$$ complexity, most likely they would have discretized all values in $$$a$$$.

Another aspect of permutation is studying cycles. Since the permutation is some arbitrary arrangement of integers from $$$1$$$ to $$$n$$$, a common way to view that is to think about the graph where the edges are of the form $$$(i,p_i)$$$. This graph very nicely has the properties of each edge having in-degree and out-degree of $$$1$$$. Also, this gives us a very intuitive sense of what the inverse permutation is — we just flip the edges of the graph we are looking at. It is very common to think about such permutation when the problem asks for something like "sort this sequence and your operation is swapping $$$2$$$ elements". The general approach for such problems is that you draw the graph of the permutation and completely forget that the permutation ever existed then proceed to solve the problem on the graph--- the permutation is just a compact way to represent our graph.

1491G - Обменяйте и переверните, ARC124D- blatant applications of using graph representation of permutation.
1656G - Циклические палиндромы — According to the statement: We say that a permutation $$$\sigma$$$ is a cycle permutation if $$$1,\sigma(1),\sigma^2(1),\ldots,\sigma^{n-1}(1)$$$ are pairwise different numbers. Here $$$\sigma^{m}(1)$$$ denotes $$$\underbrace{\sigma(\sigma(\ldots \sigma}_{m \text{ times}}(1)\ldots))$$$. This definition is not very nice to work with, instead using the graph representation of the permutation and a-ha, we simply require that the graph is a single connected component, furthermore we can analyze how to "connect" cycles into a bigger connected component without ever thinking about the permutation as a list of numbers.
1647E - Мадока и шестиклассики — not a permutation, but the problem statement very nicely shows you what the graphical representation of an array looks like nice.
2018-2019 Winter Petrozavodsk Camp, Oleksandr Kulkov Contest 1 G — adamant tells me this is magically related to the graphical representation of permutations.

Finally, permutations may be used when we need to define how to "shuffle" a sequence. These problems usually have nothing related to permutations other than the fact that it is a useful way to define "shuffling" (see 1658D2 - 388535 (усложненная версия) and 1526D - Убить Антона).

I hope this blog can convince people that just because many problems in a contest references the notion of permutations, that does not mean that the round is all about a single concept. Although it is valid criticism to say a round has too much of some idea, it simply does not work when you only say "permutation". Just take a look at how many different ways we can view permutations under!

Random Sidetrack

Many times in problems, we have to use a well-known object like a permutation, bitwise operator or a subsequence, we have to paste some definition into the statement. Of course, I feel it is kinda funny that the definition of a permutation has to appear in statements $$$3$$$ times in round 779. Furthermore, it is kind of weird to put the definition of a permutation on something like a div 1C. Do you really think the people who are attempting a div 1C really need to know what a permutation is?

I suggest that we instead have a glossary on CodeForces that defines all common terms that are needed in CodeForces, then like the guide to interactive problems, we can just paste that glossary page link into all announcement blogs so beginners can read up on those page for those "technical terms" that frequently appear. We don't need to paste lengthy definitions in statements, less work for everyone!

Full text and comments »

permutations

+197

errorgorn
2 years ago
9

On Div2ABs

By errorgorn, 2 years ago, In English

2 years ago, antontrygubO_o wrote a blog about div2ABs where he expressed his opinions that d2ABs should not be about "here is a statement, please implement what is written there". Thanks to him, the quality of d2ABs (and problem quality in general) have certainly improved. However, I still believe that there still quite large differences between how coordinators/problemsetters view d2ABs and how the intended participants view them.

From the survey made by kpw29, we can see that most people agree that most people agree that we should primarily consider the target audience when proposing a task. I too think if a task is boring to div 1 contestants, we should not think of that as a reason to immediately disqualify a problem from being a d2A.

I think when people judge the interesting-ness of d2As, they try too hard to make it pleasing to div 1 contestants by having short implementation, short statement (no reading comprehension) and at some cute ideas.

While this mindset works very well for combined rounds where div 1 contestants do need to and forced to solve d2ABs and quickly get ABCD over with, I think we should have a different mindset when considering div 2 problems.

We should not make setting interesting d2ABs our top priority but instead make it setting friendly d2ABs. By friendly d2ABs, it means that the solution is easy to justify (there is no difficult constructive proof of why this random condition is sufficient) and that the statement is without much mathematical constructs, it would be best if a 5 year old can read the statement and be able to understand the problem.

Elaborating on my point about having d2ABs contain some non-trivial ideas, sometimes this ideas take the form of something that is downright unapproachable to newer contestants where for them, it feels like a "guess and check" game where they cannot prove anything they submit and just hope that their guess is good.

I believe that this makes beginners in competitive programming think that solving competitive programming is just having good guessing skills and such observation comes from nothing other than divine sources. I recognize that having a good intuition is a valid strategy to solve competitive programming problems and I do guess observations all the time when solving problems. But we should not colude not having a proof with the complete inability to justify why something should be true (I am looking at you 1672F1 - Перемешивание массива).

We should still strive observations on d2ABs more managable such that when one is able to correctly guess the observation, it would not be too hard for them to figure out why the code they submitted is correct if they wanted to. For some constructive d2ABs where the output is yes/no, it might make sense to ask for participants to print their contsruction too. Although this would make coding much more involved that it has to be, I believe this would be fine for most codeforces rounds.

Furthermore, when people set d2ABs, they sometimes use mathematical constructs that not all beginners would know of. It is not uncommon to see bitwise operations in the statement of a d2A followed by an explanation of what said bitwise operator is. If people see the need to provide a link to an explanation of what a bitwise operator is, then why not just not set a d2A that requires you to understand what a bitwise operator is to even comprehend the statement?

Imagine you are a beginner in competitive programming and you only know some basic language syntax, would you really be interested in caring about random problem where someone calculates some stupid value using some random function? There are billions of d2ABs with $$$\text{gcd}$$$, $$$\text{mex}$$$ or some bitwise functions. But how many times do you see such functions outside of d2ABs? With $$$\text{gcd}$$$ or $$$\text{mex}$$$, I think it is very rare. Because most of such problems involving these just use the "fun facts" of these functions.

Let's look at 1325A - ЕхАб И нОд as an example (which antontrygubO_o used as an example of a good d2A). The problem just asks if you know the fun fact that $$$\gcd(x,1)=1$$$ and $$$\text{lcm}(x,1)=x$$$. Despite myself belonging to the camp of formal statements, I believe that d2ABs should have as natural a statement as possible, in the sense that as much as possible, we should not have random mathematical constructs being the basis of d2ABs. I don't want to solve d2ABs where I have to recall the fun fact that $$$\gcd(x,1)=1$$$.

At the end of the day, there are many "factors" that affect the suitability of d2ABs for codeforces contests. I don't think we should sacrifice the enjoyability of d2ABs to the intended participants contestants for the sake of making them "interesting".

Because of the goal of setting d2ABs with the intended audience in mind, I decided to ask a few coordinators and users who are specialist and below to answer questions about their thoughts of d2ABs. Since I was already asking people their opinions on d2ABs, I decided to ask some additional questions that felt interesting to me somewhat related to this topic.

Thank you to Monogon, Um_nik, dario2994, antontrygubO_o, isaf27, 74TrAkToR, darkkcyan, -is-this-fft-, sus, tdpencil, kIee,SPyofgame and ak2006 for spending their time answering these questions and giving me permission to post it here.

Note: These responses were from a month ago but I procrastinated writing this blog.

What is your philosophy on d2ABs?

-is-this-fft-:

In general I believe the easiest problem should be completely trivial for the intended audience of the contest. A gift, so to speak. I have heard this principle from several seasoned contest organizers. This has several reasons.
1. Someone is much more likely to do another contest if they solved at least one problem, even if it is completely trivial to them.
2. A psychological thing: it builds engagement and commits you to the contest. If you are in a contest and can't solve the first problem relatively quickly, then you will kinda get bored and drift away. Once you solved the easiest problem you are more committed to the contest and are ready to spend more time to solve the rest.
3. On Codeforces, it warps ratings if people don't submit to the easiest problem because they don't have ideas for it.
I'm not a fan of "troll" div2ABs, that is, problems that have some weird restriction but some very simple construction. All it does is teach beginners that cp is about guessing.
1. Also, people are often misled by thinking that if a problem solution is simple, it must also be easy to solve. Solution length, unless very excessive, is no argument as to the difficulty of a problem. I'm reminded of a certain local contest where the easiest problem turned out to be pretty hard for the participants, for exactly this reason.
Easiest problems should not be annoying or tedious for the same reason as 1b.
Making the problem interesting should not come at the expense of any of the above.

Monogon: A div2AB problem should be possible for a programmer with minimal knowledge of algorithms to solve, but still presents some kind of challenge. Maybe some simple observation or a slightly nontrivial implementation will be necessary, but it shouldn't feel like you're mindlessly doing exactly what the statement says if you're a beginner.

dario2994: Good very easy problems are not so different from good hard problems, they are just easier. They should be interesting for everyone, but this is often an impossible requirement to satisfy. If one cannot find an easy problem interesting for everyone, then the priority should be to make it interesting for the target audience, i.e., newcomers in competitive programming. A good very easy problem does not require any knowledge, is not "implement what is described in the statement", is solvable in any language (python I am looking at you). I prefer to give very easy problems involving some computer science instead of something which is solved with a sequence of if-checks or in $$$O(1)$$$. I don't have any strong opinion about whether a very easy problem should be trivial to implement or not.

antontrygubO_o: They shouldn't be painful (in the sense that they shouldn't have long statements, be caseworky, or have long implementation), and shouldn't be "implement what you read". The natural statement is preferred. I prefer when there is some cute idea.

isaf27: My philosophy is:

Perfect d2a should be the problem, that will be solved by all participants of the contest. On the other hand this problem should contain some idea, maybe very very simple and obvious, but idea. I don't like the problems "just write what is asked in the statement" even for d2a.
d2b should be simple, but not obvious.

74TrAkToR: These are tasks for beginners. I think they should be on the idea. There should be a minimum implementation.

darkkcyan: I think that a round on Codeforces should be fun, and maybe educational for newcomers.

The word "difficulty" is relative. A problem can be very hard for one person, but can be trivial for the others. But when judging a problem by myself, I often measure it by the time I spent to think, as well as implementation time. I wanted that for A the idea should be found in a very short amount of time, if not instantly, at least with my level, and for B might be a bit longer, but not toooo long. But a coordinator is only a barrier, the next barrier is the testing phase, which should be more accurate if we invite a good spectrum of testers.

About the idea. First of all, I don't think div2A should be like Atcoder ABC A, because in Atcoder ABC A, you can do what is asked in the statement. Again, I wanted to round to be fun. The joy of solving puzzle is when you find the right idea, and thinking how smart you were. So a broad answer is just, not to make it too trivial and standard, and the rest is determined by the difficulty. That is the same for B, but yes, we must make B somehow harder. This is still an open question, but I think I should leave it there, because creating a problem is truly a hard task.

Why should we care about the quality of d2AB if most div 1 competitors forget them after 5 mins anyways?

antontrygubO_o: Question is weird: most Div1 users forget all problems after 5 minutes. But we should just aim for a better quality of problems.

Still, for me personally, boring/stupid D2A-D2C problems may ruin the round, as I am not expecting anything good in the harder problems. I understand that many Div1 users don't care, but many Div1 users would be happy with standard problems on the harder spots as well, so what?

darkkcyan: I think this is an odd question when asking a question about the easiest problems of a div2 round, made for div2 competitors. Well, it should be fun for div 1 competitors too. And for me, they are warm up problems. And for div 2 competitors, they should be fun, again. For newbies, there are educational values as well. In all cases, these problems are the introduction to the way of thinking and observing when doing contests, and I still think that they are doing a good job on that. I do see that nowadays, div2AB requires more thinking than the previous years, but that is also true for div2C and above. Thus, because they are the very firsts problems of a round, acting like guarding problems for the other problems, their qualities should be in the care as well.

because of the bulk of the participants only solve d2ABs and div1 participants are not the target audience

What things do you like/dislike about d2ABs on codeforces?

sus: I like the simplicity and typicallity of div2 A and Bs. Even though they only cover simple topics like math, brute force, and simulation, they allow any beginner cper to practice their problem solving skills. Some of the drawbacks of ABs is that they only cover a small amount of topics, so they might start to get repetetive/boring if a problem setter is feeling lazy and that you can't learn much from them after you get good at solving them. People get carried away trying to get 800 problems solved and only solve ABs trying to reach this goal.

Whenever I need to get ready for a contest, I get warmed up by quickly solving an AB. ABs serve as a stepping stool for beginners and thats what I like about them.

ak2006: I like the fact that div 2 ABs dont require much DSA knowledge and can be solved by someone who hasnt done much CP. I dislike that they might use too much math (number theory, combinatorics) which only math major or minors/math olympiad people may be acquainted with well enough to be able to solve them.

kIee: I do not like ABs at all, for contestants who cant really tackle harder problems/ spend much more time on them, AB is purely a speed contest. In addition, it's largely affected by internet speed, submission speed, etc, and their weightage is very high. Especially affected when trying to open up problems. (eg, solving AB slower by 1/2mins, can lose to another person even though i solve C faster by 15mins)

SPyofgame: Simple to read, simple to understand statement and testcase, interesting idea, not too focus on a specific type of problem, easy to solve but still require thinking a bit instead of doing without brain

Do you think d2ABs must necessarily be short implementation?

tdpencil: I believe d2A should. d2B should probably be more implementation based or should permit longer implementation. Just because newbies and beginners are going to try div 2 A and if they don't know how to code it up fast or they don't know how to write such long code then its going to be challenging for them.

sus: I think i dont care. There are 1 liners, there are 30 liners and there are those in between. Just another problem doesnt make a difference to me.

Some high rated users have difficulty differentiating d2As-d2Cs what would be a good rule of thumb to determine the correct position of a easy problem?

dario2994: Gauging the difficulty of problems is hard. It is hard to distinguish d2As from d2Bs as it is hard to distinguish d2Ds from d2Es. The difficulty is on a spectrum and there is not a cheap trick to distinguish. Experience helps and I don't think that high rated users find it harder to distinguish d2As from d2Bs than low rated users. But it might be that high rated users are more aware of their inability to make the distinction compared to low rated users.

antontrygubO_o: It's hard for me to imagine that someone may actually have difficulty with this, to be honest. Look at D2A-Bs of recent contests, you will most likely agree that B is harder than A in most of them.

tl;dr testers exist for a reason

Do you prefer div 2 rounds or educational rounds more? Why?

tdpencil: I prefer educational rounds. This is because i can pretty much depend on the difficulty of educational rounds and expect to learn something. Most edus are made by the same people so the quality of problems are about the same. For div2s however, a variety of people make the problems and it can be argued that some are much easier than others, meaning not all div2s are at the same difficulty (on average). I would argue that having div2s be varying difficulties is an issue but thats just my opinion.

ak2006: I prefer regular rounds since they use much less math for some reason — edu rounds arent very educational and arent much DSA based as they should be.

sus: I prefer div 2 rounds because educational rounds are the same as any other round for beginners: every round is an educational oppurtunity when you are new. The only reason I prefer div2s is becuase it is easier to gain rating in them. Sometimes I solve C in div2s which is worth a lot more than AB but in edu rounds C is worth the same.

Ratings of d2ABs

I decided to ask people to give rate some d2ABs from 1 to 10. These div 2ABs were chosen to get a hopefully diverse set of current meta of d2ABs. But take the numerical rating with a giant pinch of salt.

In the words of dario2994, "it seems to me as "rank from 1 to 10 these apples and these cars according to their color, their speed, their consistency, and your overall feeling". It would be a random number."

Anyways, here are the assigned scores in a table form. I have bolded the scores which were assigned the coordinator of the respective contest (you can see that I'm pretty biased).

	1266A	1281A	1339A	1391A	1581A	1583B	1598A	1610A	1634B	1642A
-is-this-fft-	10	8	9	3	5	3	8	2	5	6
antontrygubO_o	9	2	7	5	7	10	7	6	10	7
Um_nik	3	0	7	4	4	6	4	5	2	3
Monogon	5	7	10	6	3	5	9	7	4	6
darkkcyan	7	8	9	5	7	7.5	8	9	9	7
isaf27	10	2	6	5	8	6	9	6	10	4
errorgorn	6	1	10	4	9	7	10	5	3	4
kIee	1	8	10	2	7	3	8	4	2	9
sus	3	3	8	4	2	7	6	4	1	9
mean	6	4.33	8.44	4.22	5.78	6.06	7.67	5.33	5.11	6.11
stddev	3.28	3.35	1.51	1.20	2.39	2.21	1.80	2	3.62	2.15

From this, we can see what is pretty universally agreed to be good d2ABs (the comments are my opinion):

1339A - Заполнение ромбами — timeless masterpiece of print(input())
1598A - Компьютерная игра — although this can be solved by performing dfs and the problem is classic by all standards, it is a wonderful d2A due to the fact that the intended solution is so elegant yet easily motivable by beginners

We can also see what the "controversial" problems are:

1266A - Спортивный программист — requires quite a bit of requisite knowledge on number theory which is pretty unrelated to competitive programming in general, why am I required to recall from primary school math that a number is divisible by $$$3$$$ if its sum of digits is also divisible by $$$3$$$?
1281A - Suffix Three — textbook "implement what is in the statement"
1634B - Гадание на массиве — the odd/even parity idea is overused and here it is too artificial, making the problem feel forced

Full Comments about each Problem

Here are the full text comments for each problem if you want to read them.

1266A

1281A

1399A

1391A

1581A

1583B

1598A

1610A

1634B

1642A

Poll

I think it would be interesting to see how users from various ratings rate each of the 10 d2ABs stated earlier in the blog. So I have created a google form where you can rate these d2ABs. I will release the results after a couple of days.

Maybe I will do some data analysis on the results. I tried to do some data analysis on the scores assigned by coordinators (after normalization) and got this:

This chart looks very dumb by all standards but hopefully we can see better trends after I get more data.

Anyways, I encourage members of the community to share their thoughts in the comments about the current direction that d2ABs are heading towards, especially for people who are specialist or below since d2ABs are targeted towards them.

Also, to future problem setters, maybe this will allow you to get more d2ABs accepted by seeing the preferences of your coordinator lol

Results of Poll

You can access the results of the poll here. I have manually added the scores of those people who I personally asked to send scores and the names of people who have posted their scores in the comments.

I have done some basic data analytics and I believe that the results is quite interesting. For the data analytics part, I used mostly the same procedures as those described by my comment. The only different was that I only did normalization for everyone to have mean=0, because it does not make sense that "1 5 9" would be normalized to the same thing as "8 9 10". Also, I removed entry 5 when I did the analytics because it had an extremely low mean score. When I checked it, it turns out that the person had given $$$7$$$ problems a score of $$$1$$$, so I just removed it. You can check my jupyter notebook file here.

Anyways, here is the plot on the average scores given to each problem.

We can see that 1281A actually has a bimodal distribution — either you love it or hate it. From manually looking at the scores assigned to it, I don't think there is any clear correlation between rating and whether you would like this problem. Suggesting that the idea of d2ABs having "do what author tells you = bad problem" is not agreed upon even on most div 1 users.

Here is the plot of using PCA to squish down $$$10$$$ dimensions ($$$1$$$ dimension for each score) into $$$3$$$ dimension so that we can visualise it better. The third dimension is represented by the size of the point (I hope that this will allow you to imagine the point as being nearer or further away). The points are colored by the rank (or rating) of the user.

One of the striking thing is the fact that there is a vague correlation between rating and our correlation on this chart, going from the bottom right corner to the top right corner. I know that I do not have sufficient data from div2 users but I think we can kind of conclude that the opinions of most coordinators on d2ABs differs quite abit from the cluster of div 2 users on the middle right. It seems that -is-this-fft-'s and Monogon's philosophies on div2ABs are more suited for having d2ABs that are targetted towards their intended audience.

Thanks to everyone who submitted to the poll!

Full text and comments »

+330

errorgorn
2 years ago
16

Interesting Information-Theoretical Problem

By errorgorn, 2 years ago, In English

Recently, there was a local contest which I set a problem for (task 4 in this pdf).

The abridged problem statement is that you choose an array $$$A$$$ of length $$$N$$$ where the elements are $$$<2^X$$$. The grader will pick $$$[L,R]$$$ and return $$$A_L \oplus A_{L+1} \oplus \ldots \oplus A_R$$$, where $$$\oplus$$$ is the bitwise xor operation. You need to find any integer $$$z$$$ such that $$$L \leq z \leq R$$$.

I set this in contest with the constraints of $$$N=2^{19}$$$ and $$$X=192 \approx \frac{(\log N)^2}{2}$$$ where you also had to answer queries quickly.

Solution

During testing, oolimry and icypiggy decided it would be funny to write solutions that had $$$X \approx c \cdot \log N$$$ (of course they also had to ensure they could answer queries quickly so these are more of describing speedup techniques rather than techniques for reducing number of bits).

oolimry's solution

(this was written by oolimry)

Let $$$B = \sqrt N$$$, we split it into $$$B$$$ buckets of size $$$B$$$ each. There are $$$3$$$ layers.

We generate $$$B$$$ random $$$36$$$-bit numbers such that all $$$O(B^2)$$$ ranges have unique xors, which we can store all unique xors into a map. Let $$$R[i]$$$ be i-th the random $$$36$$$ bit number. This will be used in layer 1 and layer 2.

For layer 1 ($$$36$$$ bits), the position is blank if $$$i \text{ mod } B != 0$$$. Else if $$$i \text{ mod } B == 0$$$ (i.e the boundary between two buckets), the position has the value $$$R[i/B]$$$ for those 36 bits.

Layer 2 ($$$36$$$ bits), the position has value $$$R[i \text{ mod } B]$$$.

If the range extends across more than $$$1$$$ bucket, layer 1 can detect it. Otherwise, layer 2 will tell us that the range is of the form $$$L + kB$$$ to $$$R + kB$$$ for some $$$k$$$, where $$$k$$$ is the bucket number. Now we use layer 3 to determine which bucket.

Layer 3 ($$$38$$$ bits) is assigned to all positions and can just be assigned randomly. To find the value of $$$k$$$, just check all B buckets and verify if the range xor matches. Layer 3 serves as a random checksum.

[The values of 36 and 38 are arbitrary]

code

#include <bits/stdc++.h>
using namespace std;
#define all(x) (x).begin(), (x).end()
#define sz(x) ((int) x.size())
#define show(x) cerr << #x << " is " << x << endl;
#define show2(x, y) cerr << #x << " is " << x << ", " << #y << " is " << y << endl;
#define show3(x,y,z) cerr << #x << " is " << x << " " << #y << " is " << y << " " << #z << " is " << z << endl;
typedef long long lint;
typedef pair<lint, lint> ii;
typedef pair<lint, ii> iii;


mt19937_64 rng(time(0));
int n;

lint arr[2005];
bitset<192> arrb[2005];
vector<iii> M;

lint bitsA = 38;
lint bucket = 2000;

lint bitsD = 36;
bitset<192> brr[2005];


bitset<192> firstmask;
bitset<192> secondmask;
bitset<192> thirdmask;

vector<bitset<192> > toset;
vector<bitset<192> > prefxor;
pair<int, vector<bitset<192> > > init(){
    int n = 1<<19;
    toset.resize(n);
    prefxor.resize(n);
    
    rng.seed(1633860579000212400);
    for(int i = 0;i < bucket;i++){
        arr[i] = rng() & ((1LL<<bitsA) - 1LL);
    }
    
    for(int l = 0;l < bucket;l++){
        lint X = 0;
        for(int r = l;r < bucket;r++){
            X ^= arr[r];
            M.push_back(iii(X, ii(l,r)));
        }
    }
    
    sort(all(M));
    
    for(int i = 0;i < bucket;i++){
        bitset<192> bs(arr[i]);
        arrb[i] = bs;
    }
        
    for(int i = 0;i < n;i++){
        toset[i] |= arrb[i%bucket];
        
        if(i%bucket == 0){
            toset[i] |= (arrb[i/bucket] << bitsA);
        }
        
        bitset<192> bs(rng()&((1LL<<bitsD)-1));
        bs <<= (2*bitsA);
        toset[i] |= bs;
    }
        
    for(int i = 0;i < n;i++){
        prefxor[i] = toset[i];
        if(i != 0) prefxor[i] ^= prefxor[i-1];
    }
    
    return {n, toset};
}

int answer(bitset<192> bs){
    lint B = 0;
    for(int b = 0;b < bitsA;b++){
        if(bs[b+bitsA]) B |= (1LL<<b);
    }
    
    auto it = lower_bound(all(M), iii(B, ii(-1,-1)));
    if(it != M.end() and it->first == B){
        return (it->second.first) * bucket;
    }
    
    lint A = 0;
    for(int d = 0;d < bitsA;d++){
        if(bs[d]) A |= (1LL<<d);
    }
    auto it2 = lower_bound(all(M), iii(A, ii(-1,-1)));
        
    
    for(int b = 0;b < bucket;b++){        
        int L = b*bucket + (it2->second).first;
        int R =    b*bucket + (it2->second).second;
        
        
        bitset<192> BBS = prefxor[R];
        if(L > 0) BBS ^= prefxor[L-1];
        
        if(BBS == bs){
            return L;
        }
    }
    
    assert(false);
}

icypiggy's solution

(this was written by errorgorn so might be a gross oversimplication :P)

If we randomly assign bits such that they were periodic with period $$$T$$$ and their xor-sum of a subarray with length $$$T$$$ was $$$0$$$, then we could find several candidates for $$$(l \text{ mod } T,r \text{ mod } T)$$$. We can store these bits for several coprime values of $$$T$$$ then we have another checksum to ensure that our candidate is correct.

Do note that this solution actually recovers the original subarray!

code

#include <bits/stdc++.h>
#include "battleship.h"
using namespace std;

const int N = (1<<19);
vector<bitset<192>> V;

const int m0 = 801;
const int m[3] = {797, 787, 773};

const int bits = 21;

mt19937 gen(0); //Standard mersenne_twister_engine seeded with rd()
uniform_int_distribution<> distrib(1, (1<<bits)-1);

int crt0[m0+2][m0+2];
int crt1[m0+2][m0+2];
int crt2[m0+2][m0+2];

int ctr[(1<<bits)+1];
pair<short,short> p[m0*m0];

int next_random() {
    return distrib(gen);
}

const bitset<192> mask = bitset<192>((1<<22)-1);

vector<bitset<192>> init() {
    V.resize(N);
    for(int i=0; i<N; i++) V[i].reset();
    vector<int> v;
    v.push_back(0);
    set<int> s;
    while(v.size() <= m[0]) {
        int x;
        do {
            x = next_random();
        } while(s.find(x)!=s.end()); // make sure x does not already exist
        s.insert(x);
        v.push_back(x);
    }
    for(int i=0; i<m[0]; i++) {
        for(int j=0; j<i; j++) {
            ctr[v[i]^v[j]]++;
        }
    }

    for(int i=1; i<=(1<<bits); i++) {
        ctr[i] += ctr[i-1];
    }
    for(int i=0; i<m[0]; i++) {
        for(int j=0; j<i; j++) {
            int* pt = &(ctr[v[i]^v[j]]);
            (*pt)--;
            p[*pt] = make_pair(i,j);
        }
    }
    for(int a=0; a<3; a++) {

        v[m[a]] = 0;
        //v.push_back(0);
        for(int i=0; i<N; i++) {
            int r = i % m[a];
            int d = v[r+1] ^ v[r]; // copy the bits of d into V[i]
            V[i] |= ((bitset<192>)(d))<<(a*22);
        }
        // counting sort (i,j) based on (v[i]^v[j])


    }
    for(int i=0; i<N; i++) {
        long long tmp = (79143LL*i) ^ (79143LL*(i+1));
        V[i] |= ((bitset<192>)(tmp)) << 66;
    }

    for(int i=1; i<=N; i++) {
        crt1[i%m[0]][i%m[2]] = i;
        crt2[i%m[0]][i%m[1]] = i;
    }
    return V;
}



int answer(bitset<192> x) {
    // extract the lowest 66 bits into a number

    int a[3];
    a[0] = static_cast<int>((x&mask).to_ulong());
    a[1] = static_cast<int>(((x>>22)&mask).to_ulong());
    a[2] = static_cast<int>(((x>>44)&mask).to_ulong());
    long long z = (((x>>66)).to_ullong());
    if(a[2]==0) {
        int b = 0;
        int c = 1;
        for(int i1=ctr[a[b]]; i1 < ctr[a[b]+1]; i1++) {
            pair<short,short> p0 = p[i1];
            for(int i2=ctr[a[c]]; i2 < ctr[a[c]+1]; i2++) {
                pair<short,short> p1 = p[i2];
                int st = crt2[p0.first][p1.first];
                int en = crt2[p0.second][p1.second];
                if(st<=N && en<=N && (z ^(79143LL*st) ^ (79143LL*en))==0) {
                    return (st+en)/2;
                }
                swap(p1.first, p1.second);
                st = crt2[p0.first][p1.first];
                en = crt2[p0.second][p1.second];
                if(st<=N && en<=N && (z ^(79143LL*st) ^ (79143LL*en))==0) {
                    return (st+en)/2;
                }
            }
        }
    }
    if(a[1]==0) {
        int b = 0;
        int c = 2;
        for(int i1=ctr[a[b]]; i1 < ctr[a[b]+1]; i1++) {
            pair<short,short> p0 = p[i1];
            for(int i2=ctr[a[c]]; i2 < ctr[a[c]+1]; i2++) {
                pair<short,short> p1 = p[i2];
                int st = crt1[p0.first][p1.first];
                int en = crt1[p0.second][p1.second];
                if(st<=N && en<=N && (z ^(79143LL*st) ^ (79143LL*en))==0) {
                    return (st+en)/2;
                }
                swap(p1.first, p1.second);
                st = crt1[p0.first][p1.first];
                en = crt1[p0.second][p1.second];
                if(st<=N && en<=N && (z ^(79143LL*st) ^ (79143LL*en))==0) {
                    return (st+en)/2;
                }
            }
        }
    }
    if(true) {
        int b = 1;
        int c = 2;
        for(int i1=ctr[a[b]]; i1 < ctr[a[b]+1]; i1++) {
            pair<short,short> p0 = p[i1];
            for(int i2=ctr[a[c]]; i2 < ctr[a[c]+1]; i2++) {
                pair<short,short> p1 = p[i2];
                //int st = crt0[p0.first][p1.first];
                //int en = crt0[p0.second][p1.second];
                int st = static_cast<int>((391139LL * p1.first) + (217213LL * p0.first)) % (m[1]*m[2]);
                int en = static_cast<int>((391139LL * p1.second) + (217213LL * p0.second)) % (m[1]*m[2]);

                if(st<=N && en<=N && (z ^(79143LL*st) ^ (79143LL*en))==0) {
                    return (st+en)/2;
                }
                swap(p1.first, p1.second);

                st = static_cast<int>((391139LL * p1.first) + (217213LL * p0.first)) % (m[1]*m[2]);
                en = static_cast<int>((391139LL * p1.second) + (217213LL * p0.second)) % (m[1]*m[2]);
                if(st<=N && en<=N && (z ^(79143LL*st) ^ (79143LL*en))==0) {
                    return (st+en)/2;
                }
            }
        }
    }
    assert(false);
    return 0;
}

Anyways, this raises some interesting questions about the minimal $$$X$$$ we need if we completely do not care about answering queries quickly. A trivial lower bound is $$$X \approx \log N$$$ since we could query $$$[pos,pos]$$$ for all values of $$$pos$$$. An upper bound is randomly choosing bits. I would think that it is reasonable to assume that the xor-sum of subarrays should be independent and we should require somewhere around $$$X \approx 4 \cdot \log N$$$ bits?

Is there a better lower and upper bound $$$X$$$ in this problem?

Full text and comments »

+117

errorgorn
2 years ago
2

[Tutorial] Knapsack, Subset Sum and the (max,+) Convolution

By errorgorn, 2 years ago, In English

I would like to thank:

tfg for his helpful discussions, digging through papers and implementing SMAWK.
maomao90 for proofreading.

Prerequisites

Let us first define the classical knapsack, unbounded knapsack and subset sum problems.

Knapsack

There are $$$N$$$ items. The $$$i$$$-th item has weight $$$w_i$$$ and value $$$v_i$$$. Find a set $$$S$$$ such that $$$\sum\limits_{i \in S} w_i \leq C$$$ and $$$\sum\limits_{i \in S} v_i$$$ is maximized.

Unbounded Knapsack

There are $$$N$$$ items. The $$$i$$$-th item has weight $$$w_i$$$ and value $$$v_i$$$. Find a multiset $$$S$$$ such that $$$\sum\limits_{i \in S} w_i \leq C$$$ and $$$\sum\limits_{i \in S} v_i$$$ is maximized.

Subset Sum

There are $$$N$$$ items. The $$$i$$$-th item has weight $$$w_i$$$. Find a set $$$S$$$ such that $$$\sum\limits_{i \in S} w_i = C$$$.

You should know how to do both versions of knapsack in $$$O(NC)$$$ and subset sum in $$$O(\frac{NC}{32})$$$ before reading this blog.

In this blog post, I will just show some results that speed up the above problems for special cases.

Subset Sum Speedup 1

There are $$$N$$$ items. The $$$i$$$-th item has weight $$$w_i$$$. Let $$$\sum\limits_{i=1}^N w_i=C$$$. For each $$$k \in [1,C]$$$, can we find a set $$$S$$$ such that $$$\sum\limits_{i \in S} w_i = k$$$?

It turns out we can do this in $$$O(\frac{C \sqrt C}{32})$$$.

Let us first consider an algorithm that seems like it is $$$O(\frac{C \sqrt C \log C}{32})$$$. We will group items of equal weights together, so we will get $$$M$$$ tuples of the form $$$(w_i,occ_i)$$$ where there are $$$occ_i$$$ occurrences of $$$w_i$$$ in the original items. For each tuple, we will decompose it into powers of $$$2$$$. For example, for $$$(w_i,15)$$$, we will decompose it into $$$\{w_i,2w_i,4w_i,8w_i\}$$$, for $$$(w_i,12)$$$, we will decompose it into $$$\{w_i,2w_i,4w_i,5w_i\}$$$. If you think about these things as binary numbers, it is not too difficult to see that we will get the same answers when we use these new weights instead of the original weights.

Furthermore, it is well known that if you have some weights that sum to $$$C$$$, then there are only $$$\sqrt{C}$$$ distinct weights. So we can already determine that $$$M \leq \sqrt{C}$$$. Since $$$occ_i \leq C$$$, we can figure out that each tuple will contribute $$$\log C$$$ elements. Giving up a simple upper bound of $$$O(\frac{C \sqrt C \log C}{32})$$$.

However, this is actually bounded by $$$O(\frac{C \sqrt C}{32})$$$ $$$^{[1]}$$$. Consider the set of tuples that contribute a weight $$$k \cdot w_i$$$, we will call this set $$$S_k$$$. We want to show that $$$\sum\limits_{k \geq 1} |S_k| = O( \sqrt C)$$$. Firstly, we note that most of the new weights will be a multiple of $$$2^k$$$ of the original weight, for each tuple, it will only contribute at most $$$1$$$ new weight that is not a power of $$$2$$$. Therefore, if we can show that $$$\sum\limits_{k=2^z} |S_k| = O(f(C))$$$, then $$$\sum\limits_{k \geq 1} |S_k| = O(f(C)+\sqrt C)$$$.

It is obvious that $$$\sum\limits_{i \in S_k} w_i \leq \frac{C}{k}$$$ and we can conclude that $$$|S_k| \leq \sqrt\frac{C}{k}$$$.

Therefore, $$$\sum\limits_{k=2^z} |S_k| \leq \sum\limits_{z \geq 1} \sqrt \frac{C}{2^z} = \sqrt C \left (\sum\limits_{z \geq 1} \frac{1}{\sqrt {2^z}} \right) = O(\sqrt C)$$$.

However, there is a simpler way to do this $$$^{[2]}$$$. Consider processing the weights from smallest to largest, with an initially empty list $$$W'$$$ as the list of our new weights. Suppose there are $$$occ$$$ occurrences of the smallest weight $$$w$$$. If $$$occ$$$ is odd, we add $$$\frac{occ-1}{2}$$$ occurrences of $$$2w$$$ to the original weights and $$$1$$$ occurrence of $$$w$$$ to $$$W'$$$. The case if $$$occ$$$ is even is similar, we add $$$\frac{occ-2}{2}$$$ occurrences of $$$2w$$$ to the original weights and $$$2$$$ occurrence of $$$w$$$ to $$$W'$$$.

It is not hard to see that $$$|W'| \leq 2 \sqrt C = O(\sqrt C)$$$, since we only add at most $$$2$$$ occurences of some weight to $$$W'$$$.

Problems:

CC MINFUND (testcases are super weak)
CF755F
CC TREUPS↵

Subset Sum Speedup 2

There are $$$N$$$ items. The $$$i$$$-th item has weight $$$w_i \leq D$$$. Can we find a set $$$S$$$ such that $$$\sum\limits_{i \in S} w_i = C$$$?

We will solve this in $$$O(ND)$$$ $$$^{[3]}$$$. Firstly, if $$$\sum\limits w_i<C$$$, the answer is obviously no, so we will ignore that case.

Let us find the maximal $$$k$$$ such that $$$\sum\limits_{i=1}^k w_i < C$$$. The basic idea is that we initially have an answer of $$$w_1+w_2+\ldots+w_k$$$, then we can either subtract $$$w_i$$$ for $$$i \leq k$$$ or add $$$w_i$$$ for $$$i>k$$$ in some order such that the cost of our items is in the range $$$[C-D,C+D]$$$, which is not too hard to show constructively. The proof sketch is just: if the current weight is more than $$$C$$$, remove something, otherwise add something.

Let us define $$$can(total,l,r)$$$ as a dp function that returns true iff there exists $$$\lambda_l,\lambda_{l+1}, \ldots, \lambda_r \in [0,1]$$$ such that $$$\sum\limits_{i=1}^{l-1}w_i + \sum\limits_{i=l}^{r} \lambda_i w_i = total$$$, where $$$total \in [C-D,C+D]$$$.

Notice that $$$can(total,l,r) = true$$$ implies that $$$can(total,l-1,r)=true$$$, this means that $$$can$$$ is monotone on the dimension $$$l$$$. Therefore, let us define a new dp function $$$dp(total,r)$$$ that stores the maximal $$$l$$$ such that $$$can(total,l,r)=true$$$.

Furthermore, $$$can$$$ is monotone on the dimension $$$r$$$, so $$$dp(total,r) \leq dp(total,r+1)$$$.

Let us consider the transitions.

From $$$dp(total,r)=l$$$, we can extend $$$r$$$, transitioning to $$$dp(total+w_{r+1},r+1)$$$ or $$$dp(total,r+1)$$$. We can also extend $$$l$$$ and transition to $$$dp(total-w_{l'},r)=l'$$$ for $$$l'<l$$$. However, this transition would be $$$O(N)$$$ per state, which is quite bad.

However, it would only make sense to transition to $$$dp(total-w_{l'},r)=l'$$$ for $$$dp(total,r-1) \leq l' < dp(total,r)=l$$$, otherwise this case would have been covered by another state and there would be no need for this transition. Since $$$dp(total,r) \leq dp(total,r+1)$$$, the total number of transitions by extending $$$l$$$ is actually bounded by $$$O(ND)$$$ using amortized analysis.

Therefore, our dp will have $$$O(ND)$$$ states with a total of $$$O(ND)$$$ transitions.

Actually there is a scammy way to do this. Similarly, we find the maximal $$$k$$$ such that $$$\sum\limits_{i=1}^k w_i < C$$$, then we want to solve the subset sum problem with the weights $$$\{-w_1,-w_2,\ldots,-w_k,w_{k+1},\ldots,w_N\}$$$ and sum $$$C- \sum\limits_{i=1}^k w_i$$$.

Let us randomly shuffle the list of weights and pretend $$$C- \sum\limits_{i=1}^k w_i$$$ is very small. Then this can be viewed as a random walk with $$$0$$$ sum. The length of each step is bounded by $$$D$$$, so we can expect $$$O(D \sqrt N)$$$ deviation from the origin (my analysis is very crude). Using bitsets, we can obtain a complexity of $$$O(\frac{ND \sqrt N}{32})$$$ which is probably good enough.

The paper also mentions a way to generalize this technique to knapsack where weights are bounded by $$$D$$$ and values are bounded by $$$V$$$ in $$$O(NDV)$$$ but I think it is not a significant speedup compared to the usual knapsack to be used in competitive programming.

Problems:

Knapsack Speedup 1

Consider SG NOI 2018 Prelim knapsack.

The editorial solution is to only consider $$$O(S \log S)$$$ items, since only $$$\frac{S}{w}$$$ items for a certain weight $$$w$$$ can be used, to get a complexity of $$$O(S^2 \log S)$$$. But I will discuss a ~~slower~~ $$$O(NS)$$$ solution.

Let $$$dp(i,j)$$$ be the maximum value of items if we only consider the first $$$i$$$ types using a weight of $$$j$$$.

Then our transition would be $$$dp(i,j)=\max\limits_{0 \leq k \leq K_i}(dp(i-1,j-k\cdot W_i)+k\cdot V_i)$$$. If we split each $$$dp(i,j)$$$ into the residue classes of $$$j \% W_i$$$, it is easy to see how to speed this up using the sliding deque trick or using an RMQ.

Now, let us talk about a more general speedup. But first, we will have to introduce a convolution that appears frequently in dp.

(max,+) convolution

The problem statement is this: Given $$$2$$$ arrays $$$A$$$ and $$$B$$$ of length $$$N$$$, find an array $$$C$$$ of length $$$2N-1$$$ such that $$$C_i = \max\limits_{j+k=i}(A_j+B_k)$$$.

Doing this in $$$O(N^2)$$$ is easy. Can we do better? It seems that doing this faster is a really hard problem and so far the fastest known algorithm is a very scary $$$O(\frac{n^2}{\log n})$$$ or $$$O(\frac{n^2 (\log \log n)^3}{(\log n)^2})$$$, which does not seem practical for competitive programming.

Note that $$$(\max,+)$$$ and $$$(\min,+)$$$ convolution are not too different, we can just flip the sign. In the rest of this section, I will use $$$(\min,+)$$$ convolution to explain things.

First, we note that if both arrays $$$A$$$ and $$$B$$$ are concave, then we can do this convolution in $$$O(N)$$$ time $$$^{[5],[6]}$$$. The basic idea is we can consider the union of the slopes of the lines $$$\{A_i - A_{i-1} \} \cup \{B_i - B_{i-1} \}$$$ and sort them to get the slopes of $$$C$$$.

Another way to reason about this is using epigraphs (fancy word for colour everything above the line), which I find more intuitive. Because if we take the epigraph of $$$A$$$ and $$$B$$$, we get $$$2$$$ convex polygons, taking their Minkowski sum gets us the epigraph for $$$C$$$ and finding Minkowski sum on convex polygons is well known as taking the union of their edges, which is why this is also known as the Minkowski sum trick.

This speedup can be used in several dp problems such as ABC218H and JOISC18 candies. Furthermore, this can be abused in several dp problems where we can think of slope trick as $$$(min,+)$$$ convolution so we can store a priority queue of slopes such as in SG NOI 2021 Finals password. Another more direct application is CF1609G.

Anyways, we can actually solve $$$(\min,+)$$$ convolution quickly with a weaker constraint that only $$$B$$$ is convex.

First, let us discuss a $$$O(N \log N)$$$ method.

Define a $$$i$$$-chain as the line segment with points $$$\{(i+1,A_i+B_1),(i+2,A_i+B_2),(i+3,A_i+B_3),\ldots \}$$$. Then I claim that $$$2$$$ chains only intersect at a point, which is the sufficient condition for using Li Chao tree to store these lines $$$^{[7]}$$$.

Let us prove that this is actually true. Let the equation for the $$$i$$$-chain be $$$f_i(x)$$$, so $$$f_i(x)$$$ is a convex function. We want to show that for $$$g(x)=f_i(x)-f_j(x)$$$, there exists $$$k$$$ such that $$$g(x)<0$$$ for $$$x<k$$$ and $$$0 \leq g(x)$$$ for $$$k \leq x$$$ whenever $$$i<j$$$.

$$$g(x)=(A_i+B_{x-i})-(A_j+B_{x-j})=(B_{x-i}-B_{x-j})+(A_i-A_j)$$$.

Consider $$$g(x+1)-g(x)=(B_{x+1-i}-B_{x-i})-(B_{x+1-j}-B_{x-j})$$$. Recall that $$$B$$$ is convex (i.e. $$$B_{x+1}-B_x \geq B_{x}-B_{x-1}$$$). Since we have assumed that $$$i<j$$$, we can conclude that $$$(B_{x+1-i}-B_{x-i}) \geq (B_{x+1-j}-B_{x-j})$$$. Therefore, $$$g(x+1)-g(x) \geq 0$$$, i.e. $$$g$$$ is a (non-strict) increasing function.

Therefore, we can insert all chains into a Li Chao Tree and find the convolution in $$$O(N \log N)$$$.

Now, we will consider an $$$O(N)$$$ method. We will have to use the SMAWK algorithm $$$^{[8]}$$$ which I will not explain because the reference does a better job at this than me. Here I will actually use $$$(\max,+)$$$ convolution to explain.

Anyways, the result we will be using from SMAWK is that given an $$$N \times M$$$ matrix $$$X$$$ that is totally monotone, we can find the minimum value in each row.

A matrix is totally monotone if for each $$$2 \times 2$$$ sub-matrix is monotone i.e. for $$$\begin{pmatrix} a & b \\ c & d \end{pmatrix}$$$:

If $$$c<d$$$, then $$$a<b$$$
If $$$c = d$$$ then $$$a \leq b$$$

The way I think about this is consider the function $$$\sigma(x)=\begin{cases} 1, & \text{if } x > 0 \\ 0, & \text{if } x = 0 \\ -1 , & \text{if } x < 0 \end{cases}$$$. We pick $$$2$$$ columns $$$1 \leq i < j \leq M$$$ and consider writing down $$$[\sigma(H_{1,i}-H_{1,j}), \sigma(H_{2,i}-H_{2,j}), \ldots, \sigma(H_{N,i}-H_{N,j})]$$$. Then this sequence should be increasing when $$$i<j$$$.

Now given the arrays $$$A$$$ and $$$B$$$, we define a $$$2N-1 \times N$$$ matrix $$$X$$$ such that $$$X_{i,j}=A_{i}+B_{i-j}$$$. It is clear that the row minimas of $$$X$$$ are the answers we want. It can be shown that if $$$B$$$ is convex, $$$X$$$ is totally monotone $$$^{[9]}$$$. I would just remark the proof is basically the same as the proof writen for the case on Li Chao Trees.

Here is a template for SMAWK written by tfg for $$$(\max,+)$$$ convolution with a single concave array.

code

// smawck for max
template<class F>
std::vector<int> smawck(F f, const std::vector<int> &rows, const std::vector<int> &cols) {
    std::vector<int> ans(rows.size(), -1);
    if((int) std::max(rows.size(), cols.size()) <= 2) {
        for(int i = 0; i < (int) rows.size(); i++) {
            for(auto j : cols) {
                if(ans[i] == -1 || f(rows[i], ans[i], j)) {
                    ans[i] = j;
                }
            }
        }
    } else if(rows.size() < cols.size()) {
        // reduce
        std::vector<int> st;
        for(int j : cols) {
            while(1) {
                if(st.empty()) {
                    st.push_back(j);
                    break;
                } else if(f(rows[(int) st.size() - 1], st.back(), j)) {
                    st.pop_back();
                } else if(st.size() < rows.size()) {
                    st.push_back(j);
                    break;
                } else {
                    break;
                }
            }
        }
        ans = smawck(f, rows, st);
    } else {
        std::vector<int> newRows;
        for(int i = 1; i < (int) rows.size(); i += 2) {
            newRows.push_back(rows[i]);
        }
        auto otherAns = smawck(f, newRows, cols);
        for(int i = 0; i < (int) newRows.size(); i++) {
            ans[2*i+1] = otherAns[i];
        }
        for(int i = 0, l = 0, r = 0; i < (int) rows.size(); i += 2) {
            if(i+1 == (int) rows.size()) r = (int) cols.size();
            while(r < (int) cols.size() && cols[r] <= ans[i+1]) r++;
            ans[i] = cols[l++];
            for(; l < r; l++) {
                if(f(rows[i], ans[i], cols[l])) {
                    ans[i] = cols[l];
                }
            }
            l--;
        }
    }
    return ans;
}

// max smawck
// F(i, j, k) checks if M[i][j] <= M[i][k]
// another interpretations is: 
// F(i, j, k) checks if M[i][k] is at least as good as M[i][j]
// higher == better
// when comparing 2 columns as vectors
// for j < k, column j can start better than column k
// as soon as column k is at least as good, it's always at least as good
template<class F>
std::vector<int> smawck(F f, int n, int m) {
    std::vector<int> rows(n), cols(m);
    for(int i = 0; i < n; i++) rows[i] = i;
    for(int i = 0; i < m; i++) cols[i] = i;
    return smawck(f, rows, cols);
}

template<class T>
std::vector<T> MaxConvolutionWithConvexShape(std::vector<T> anyShape, const std::vector<T> &convexShape) {
    if((int) convexShape.size() <= 1) return anyShape;
    if(anyShape.empty()) anyShape.push_back(0);
    int n = (int) anyShape.size(), m = (int) convexShape.size();
    auto function = [&](int i, int j) {
        return anyShape[j] + convexShape[i-j];
    };
    auto comparator = [&](int i, int j, int k) {
        if(i < k) return false;
        if(i - j >= m) return true;
        return function(i, j) <= function(i, k);
    };
    const std::vector<int> best = smawck(comparator, n + m - 1, n);
    std::vector<T> ans(n + m - 1);
    for(int i = 0; i < n + m - 1; i++) {
        ans[i] = function(i, best[i]);
    }
    return ans;
}

Knapsack Speedup 2

Now, we can talk about which special case of knapsack we can speed up.

Furthermore, the number of distinct weights is $$$D$$$. Then, we have a $$$O(DC)$$$ solution $$$^{[9]}$$$. Usually $$$D=O(N)$$$, but maybe there are some special conditions such as $$$w_i$$$ being small, such that we can bound $$$D$$$.

We will start with $$$D=1$$$. This has an obvious greedy solution of putting the largest value first. Let's suppose the values are $$$v_1,v_2,\ldots$$$ with $$$v_1 \geq v_2 \geq \ldots$$$ and they all have weight $$$w$$$. To make the sum of items become $$$kw$$$, the answer is $$$\sum\limits_{i=1}^k v_i$$$.

Therefore, it is easy to extend this to $$$O(DC)$$$ by performing $$$(\max,+)$$$ convolution with $$$B=[0,v_1,v_1+v_2,\ldots]$$$ on each residue class modulo $$$w_i$$$. We will perform $$$w_i$$$ convolutions and each convolution will take $$$O(\frac{C}{w_i})$$$ time since $$$B$$$ is concave and we are doing $$$(\max,+)$$$ convolutions. So each distinct weight we only need $$$O(C)$$$ time to process it.

Consider there to be $$$4$$$ items. $$$w=\{2,2,3,3,3\}$$$ and $$$v=\{5,1,7,10,2\}$$$. Initially $$$A=[0,-\infty,-\infty,\ldots]$$$.

We process $$$w_i=2$$$, $$$B=[0,5,6]$$$. Then $$$A=[0,-\infty,5,-\infty,6,-\infty,\ldots]$$$.

We process $$$w_i=3$$$, $$$B=[0,10,17,19]$$$. Then $$$A=[0,-\infty,5,10,6,15,17,16,22,19,\ldots]$$$.

Please note that we cannot assume that the sequence $$$[A_i,A_{i+w_i},A_{i+2w_i},\ldots]$$$ is convex which is shown by the above example.

Problems:

NAIPC16 thief

Knapsack Speedup 3

There are $$$N$$$ items. The $$$i$$$-th item has weight $$$w_i \leq D$$$. Find a multiset $$$S$$$ such that $$$\sum\limits_{i \in S} w_i \leq C$$$ and $$$\sum\limits_{i \in S} v_i$$$ is maximized.

We can solve this in $$$O(D^2 \log C)$$$ $$$^{[9]}$$$. Note that when we talk about convolution, we are refering to $$$(\max,+)$$$ convolution in $$$O(D^2)$$$ complexity. (I believe Algorithm 2 in the reference has a slight error and I have emailed the paper authors.)

Let us define $$$ans_i$$$ as the optimal answer with sum of weights being $$$i$$$. The main observation is that $$$ans_i=\max\limits_{j+k=i}(ans_j,ans_k)$$$, but we can actually impose a constraint that $$$|j-k| \leq D$$$ without loss of correctness. Suppose that $$$j-k >D$$$, then we can "move" an item from $$$ans_j$$$ to $$$ans_k$$$. This process can be repeated until $$$|j-k| \leq D$$$. Actually this can be generalized to $$$|j-k - \alpha| \leq D$$$ for some arbitrary $$$\alpha$$$ with the exact same constructive proof (of course we will assume reasonable values for $$$\alpha$$$).

Therefore, we can convolute $$$[ans_{i-\frac{D}{2}}, ans_{i-\frac{D}{2}+1}, \ldots, ans_{i+\frac{D}{2}}]$$$ with $$$[ans_{j-\frac{D}{2}}, ans_{j-\frac{D}{2}+1}, \ldots, ans_{j+\frac{D}{2}}]$$$ to get the value of $$$ans_{i+j}$$$.

From this, we have several extensions. We can convolute $$$[ans_{T-D}, ans_{T-D+1}, \ldots, ans_{T}]$$$ with $$$[ans_0,ans_1,\ldots,ans_{2D}]$$$ to get the values for $$$[ans_{T+1}, ans_{T+2}, \ldots, ans_{T+D}]$$$ (there are probably many off by one errors so just pad them with $$$\pm 5$$$ or something). We can also convolute $$$[ans_{T-D}, ans_{T-D+1}, \ldots, ans_{T+D}]$$$ with itself to get the values for $$$[ans_{2T-D}, ans_{2T-D+1}, \ldots, ans_{2T}]$$$.

We can get the array $$$[ans_0,ans_1,\ldots,ans_{2D}]$$$ in $$$O(D^2)$$$ by using the normal knapsack dp.

Since we have a method of "doubling", which allows us to obtain a complexity of $$$O(D^2 \log C)$$$ to compute the array $$$[ans_{C-D},ans_{C-D+1},\ldots,ans_{C}]$$$, which is sufficient to find our answer.

Problems:

USP16 knapsack

References

Full text and comments »

dp, knapsack, subset sum

+345

errorgorn
2 years ago
13

Linear Basis (Xor Basis Extended)

By errorgorn, 2 years ago, In English

As part of the graduation requirements for my school, I have to complete a simple research project, so I decided to do something related to data structure and algorithms. I believe I have come out with a data structure that maintains the basis of vectors in $$$(\mathbb{Z}/m\mathbb{Z})^d$$$, where $$$m$$$ may not be prime. Since this was related to competitive programming, I think it is a good idea to share it here. Hopefully, this algorithm is actually novel :P

I would like to thank:

icypiggy for being my research mentor and tolerating my dumb questions
rama_pang and adamant for their helpful suggestions and comments

Please comment under the blog or message me on codeforces if any parts are unclear or wrong. Also, I hope that some LGMs can help solve the open problems in this blog.

Introduction

Maintaining the basis of vectors in $$$(\mathbb{Z}/2 \mathbb{Z})^d$$$, also known as the xor basis algorithm is a well-studied algorithm in competitive programming. The standard xor basis algorithm supports $$$2$$$ operations. Given a set $$$S$$$ that is initially empty, you can add a vector to $$$S$$$ or query whether a vector is representable as a linear combination of vectors in $$$S$$$. Using the word RAM model, the basis can be maintained in $$$O(nd)$$$ as the addition of vectors in $$$(\mathbb{Z}/2 \mathbb{Z})^d$$$ can be done in $$$O(1)$$$ time, where $$$n$$$ is the number of vectors we add. This algorithm can also be easily modified to handle $$$2$$$ additional queries: the lexicographically largest representable vector and the number of representable vectors. This data structure is used in various problems such as AGC45A, UER3C, 1299D - Around the World, 1336E1 - Chiori and Doll Picking (easy version),1427E - Xum and 1466F - Euclid's nightmare.

An extended version of the xor basis algorithm allows for querying the basis set of vectors for a given subarray of an array of vectors. More formally, given an array $$$A$$$ where $$$A_i \in (\mathbb{Z}/2 \mathbb{Z})^d$$$, we can check if a vector is representable as a linear combination of vectors in $$$A_l,A_{l+1},\ldots,A_r$$$. It has appeared in 1100F - Ivan and Burgers and ABC223H which can be solved in $$$O(nd)$$$ offline or $$$O(nd \log n)$$$ online.

Another known extension of the xor basis algorithm allows one to maintain the basis of vectors in $$$(\mathbb{Z}/p\mathbb{Z})^d$$$ where $$$p$$$ is prime. It is used in XXII Opencup GP of Siberia F with $$$p=7$$$.

In this blog, we will propose an algorithm that can maintain the basis for a set of vectors in $$$(\mathbb{Z}/m\mathbb{Z})^d$$$.

GIANT DISCLAIMER: In this blog, the words "vector'' and "basis'' are used very loosely. The word vector is used to refer to an element of a module as $$$(\mathbb{Z}/m\mathbb{Z})^d$$$ is not necessarily a vector space. Furthermore, the notion of linear independence is not very well-defined when we are considering a ring. Consider the ring $$$\mathbb{Z}/4\mathbb{Z}$$$, then the set $$$\{2\}$$$ is not linearly independent because $$$\lambda=2$$$ satisfies $$$\lambda \cdot 2 = 0$$$ but $$$\lambda \neq 0$$$. But the term basis is used because after some restrictions to the set of scalars that are applied to the vectors we store, each set of scalars correspond to a unique "vector'', a fact that will be proven in theorem 5.

More specifically, given an initially empty set $$$S$$$ of vectors in $$$(\mathbb{Z}/m\mathbb{Z})^d$$$, we can add $$$n$$$ vectors with $$$O(n \cdot d^2 + \Omega(m) \cdot d^3 + d \cdot \log(m))$$$ pre-processing, where $$$\Omega(m)$$$ is the prime omega function that counts the total prime factors of $$$m$$$ and check if some vector in $$$(\mathbb{Z}/m\mathbb{Z})^d$$$ is can be represented by some linear combination of vectors in $$$S$$$ in $$$O(d^2)$$$ time online. Additionally, we can also query other things such as the number of distinct vectors we that can be represented as the sum of vectors in $$$S$$$ and the lexicographical largest vector that can be represented.

Later, we will extend this idea to handle elements of an arbitrary finite abelian group $$$G$$$.

The Algorithm

The main idea of constructing the linear basis is an extension of the xor basis algorithm where we maintain a set of $$$d$$$ vectors whose span is equal to the span of the basis. Similar to the xor basis algorithm we impose an additional constraint over our vectors such that the $$$i$$$-th vector we store will have the first $$$i-1$$$ dimensions to be $$$0$$$.

Although this problem may seem like a trivial extension to the xor basis algorithm, it is not. Consider the ring $$$(\mathbb{Z}/6\mathbb{Z})^2$$$ and the vector $$$(3~1)$$$. In the xor basis algorithm, checking if a vector is representable is done greedily by doing Gaussian elimination and greedily scalars to multiply to the vectors. So we would expect our basis to store $$$\begin{pmatrix} 3 & 1 \\ 0 & 0 \end{pmatrix}$$$. However, if we were to check if the vector $$$(0~2)$$$ is in the set, we would fail to produce it.

The way that the linear basis algorithm handles this is to allow a single vector to be propagated down to multiple rows. In the example mentioned above, we would realize that $$$(3~1) \cdot 2 = (0~2)$$$. So we would store $$$\begin{pmatrix} 3 & 1 \\ 0 & 2 \end{pmatrix}$$$ as our matrix. It should be noted that the row space of the matrix we store may not be linearly independent.

Another notable case to consider is on $$$\mathbb{Z}/6\mathbb{Z}$$$. With vectors $$$(2)$$$ and $$$(3)$$$, it actually spans the entire ring since $$$\gcd(2,3)=1$$$.

It should be noted that in the case where $$$m$$$ is prime, the algorithm behaves exactly like the generalized version of the xor basis algorithm that works on prime numbers.

Define $$$V,W$$$ as $$$1 \times d$$$ vectors of elements in $$$\mathbb{Z}/m\mathbb{Z}$$$ and $$$A$$$ as a $$$d \times d$$$ matrix of elements in $$$\mathbb{Z}/m\mathbb{Z}$$$ where initially all elements are set to $$$0$$$.

With such a construction of the basis, we can easily check if a vector is representable using a simple greedy algorithm.

Note that in the above $$$2$$$ algorithms, we are doing all operations modulo $$$m$$$ with the exception of $$$\frac{a}{b}$$$ and $$$a \% b$$$. In the above $$$2$$$ pseudo-codes and our proofs below, $$$\frac{a}{b}$$$ is used to denote integer division, it is always guaranteed that $$$b|a$$$ when $$$\frac{a}{b}$$$ is written. $$$a \% b$$$ is also used to denote the remainder after integer division of $$$a$$$ by $$$b$$$. We will also use the notation $$$a<b$$$ for elements in $$$\mathbb{Z}/m\mathbb{Z}$$$, this should be understood as comparing the integer values of the elements.

Let us first list out some properties of $$$A$$$ and $$$V$$$.

Lemma 1: Consider the vector $$$V$$$ after the $$$i$$$-th iteration of the loop at line 5 of Algorithm 1, then for all $$$1 \leq j \leq i$$$, $$$V_j=0$$$.

proof

Lemma 2: $$$A_{i,j}=0$$$ if $$$i>j$$$

Lemma 3: If $$$A_{i,i}=0$$$, then $$$A_{i,j}=0$$$.

Lemma 4: If $$$A_{i,i} \neq 0$$$, then $$$\gcd(A_{i,i},m)=A_{i,i}$$$.

Let $$$S$$$ be a set of vectors in $$$(\mathbb{Z}/m\mathbb{Z})^d$$$. Then $$$span(S)$$$ is the set of vectors that can be formed by a linear combination of vectors in $$$S$$$. More formally, let $$$S={s_1,s_2, \ldots, s_w}$$$. Then $$$V \in span(S)$$$ iff there exists $$$\lambda_1, \lambda_2, \ldots,\lambda_w \in (\mathbb{Z}/m\mathbb{Z})$$$ such that $$$V=\sum\limits_{k=1}^w \lambda_i s_i$$$.

Theorem 1: Suppose there exists $$$\mu_1, \mu_2, \ldots,\mu_d \in (\mathbb{Z}/m\mathbb{Z})$$$ such that $$$V=\sum\limits_{k=1}^w \mu_ks_k$$$ then there exists $$$\lambda_1, \lambda_2, \ldots, \lambda_d \in (\mathbb{Z}/m\mathbb{Z})$$$ such that $$$V=\sum\limits_{k=1}^d \lambda_k A_k$$$ i.e. the rows of $$$A$$$ spans $$$S$$$.

proof

Consider how $$$span(S)$$$ changes after a call to \AV{$$$V$$$}. $$$x \in span(S \cup V)$$$ implies that there exists $$$y \in span(S)$$$ and $$$\lambda \in (\mathbb{Z}/k\mathbb{Z})$$$ such that $$$x=y+\mu V$$$. Let $$$x=y+\mu V=\lambda_1 A_1 + \lambda_2 A_2 + \ldots + \lambda_d A_d + \lambda V$$$.

Case 1: lines 7 to 11

Let $$$g=\gcd(V_i,m)$$$, $$$(s,t)=\texttt{Extended-Euclidean}(V_i,m)$$$ and $$$m'=\frac{m}{g}$$$.

Let $$$A_i' = V \times s$$$ and $$$V' = V \times m'$$$.

We want to find $$$\lambda_i'$$$ and $$$\mu'$$$ such that $$$\lambda_i' A_i' + \mu'V'=\lambda_iA_i+\mu V$$$. Since $$$A_{i,i}=0$$$, we can see that $$$\lambda_iA_i=0$$$ and we can rewrite the above equation as $$$\lambda_i' \cdot s+\mu' \cdot m'=\mu$$$.

Since $$$s \cdot V_i + t \cdot m = g$$$, we have $$$\gcd(s \cdot V_i,m)=\gcd(s \cdot V_i + t \cdot m,m)=\gcd(g,m)=g$$$, which implies that $$$\gcd(s \cdot \frac{V_i}{g},\frac{m}{g})=1$$$.

Therefore, $$$\gcd(s,m')=\gcd(s,\frac{m}{g})=1$$$, by Bezout's lemma, we can find that there is a solution for $$$(\Lambda'_i,\lambda')$$$ for any value of $$$\mu$$$.

Case 2: lines 12 to 14

Let $$$V'=V-A_i \times \frac{V_i}{A_{i,i}}$$$.

We want to find $$$\lambda_i'$$$ and $$$\mu'$$$ such that $$$\lambda_i' A_i + \mu'V'=\lambda_iA_i+\mu V$$$.

$$$\lambda_i' A_i + \mu' V'=(\lambda_i'+\mu' \frac{V_i}{A_{i,i}})A_i+\mu' V=\lambda_iA_i+\mu V$$$.

$$$(\lambda', \Lambda_i')=(\lambda,\Lambda_i-\lambda\frac{V_i}{A_{i,i}})$$$ satisfies the equation.

Case 3: lines 15 to 23

Let $$$g=\gcd(V_i,A_{i,i})$$$, $$$(s,t)=\texttt{Extended-Euclidean}(V_i,A_{i,i})$$$, $$$A_{i}'=V \times s+A_{i} \times t$$$, $$$V'=V-A_{i}' \frac{V_i}{A_{i,i}'}$$$, $$$W=A_i-A_{i}' \frac{A_{i,i}}{A_{i,i}'}$$$. Note that $$$A_{i,i}'=g$$$.

$$$A_i'=\begin{pmatrix} s & t \end{pmatrix} \cdot \begin{pmatrix} V \\ A_i \end{pmatrix}$$$

$$$V'=\begin{pmatrix} 1-s\frac{V_i}{g} & -t\frac{V_i}{g} \end{pmatrix} \cdot \begin{pmatrix} V \\ A_i \end{pmatrix}$$$

$$$W=\begin{pmatrix} -s\frac{A_{i,i}}{g} & 1-t\frac{A_{i,i}}{g} \end{pmatrix} \cdot \begin{pmatrix} V \\ A_i \end{pmatrix}$$$

We want to find $$$\Lambda_i' A_i'+\lambda'V'+\lambda_w' W=\Lambda_i A_i+\lambda V$$$.

Let $$$M=\begin{pmatrix} s & t \\ 1-s\frac{V_i}{g} & -t\frac{V_i}{g} \\ -s\frac{A_{i,i}}{g} & 1-t\frac{A_{i,i}}{g} \end{pmatrix}$$$

$$$\begin{pmatrix} \Lambda_i' & \lambda' & \lambda_w' \end{pmatrix} M = \begin{pmatrix} \lambda & \Lambda_i \end{pmatrix}$$$

Observe that $$$\begin{pmatrix} \frac{V_i}{g} & 1 & 0 \end{pmatrix} M = \begin{pmatrix} 1 & 0 \end{pmatrix}$$$ and $$$\begin{pmatrix} \frac{A_{i,i}}{g} & 0 & 1 \end{pmatrix} M = \begin{pmatrix} 0 & 1 \end{pmatrix}$$$.

$$$(\Lambda_i', \lambda',\lambda_w')=(\frac{V_i}{g}\lambda+\frac{A_{i,i}}{g}\Lambda_i,\lambda,\Lambda_i)$$$ satisfies the equation.

Corollary 1: Suppose for some vector $$$W$$$, $$$V=W$$$ at the $$$i$$$-th iteration of the for loop. Then there exists $$$\lambda_i,\lambda_{i+1}, \ldots,\lambda_d \in (\mathbb{Z}/m\mathbb{Z})$$$ such that $$$W=\sum\limits_{k=i}^d \lambda_k A_k$$$ after $$$\texttt{Add-Vector}$$$ has returned.

Theorem 2: After each call to $$$\texttt{Add-Vector}$$$, the following holds: for all $$$i$$$ such that $$$A_{i,i} \neq 0$$$, there exists $$$\mu_{i+1}',\mu_{i+2}',\ldots,\mu_d' \in (\mathbb{Z}/m\mathbb{Z})$$$ such that $$$\frac{m}{A_{i,i}} A_i = \sum\limits_{k=i+1}^d \mu_k' A_k$$$.

proof

Theorem 3: Suppose for all $$$V \in span(S)$$$ there exists $$$V=\sum\limits_{k=1}^d \mu_kd_k$$$ where $$$\mu_1, \mu_2, \ldots,\mu_d \in (\mathbb{Z}/m\mathbb{Z})$$$ then for all $$$V' \in span(S)$$$ there exists $$$\lambda_1, \lambda_2, \ldots, \lambda_d \in (\mathbb{Z}/m\mathbb{Z})$$$ that satisfies $$$V=\sum\limits_{k=1}^d \lambda_k A_k$$$ and $$$\lambda_i=0 $$$ if $$$A_{i,i}=0$$$ and $$$0 \leq \lambda_i < \frac{m}{A_{i,i}}$$$ otherwise.

proof

Theorem 4: $$$\texttt{Inside-Basis}(V)$$$ is returns true iff $$$V \in S$$$.

proof

It is also straightforward to adapt the algorithm $$$\texttt{Inside-Basis}$$$ to find the lexicographically largest vector. We will now show that the matrix $$$A$$$ also maintains the number of representable vectors.

Define a function $$$c(z)=\begin{cases} \frac{m}{z}, & \text{if } z \neq 0 \\ 1, & \text{otherwise} \end{cases}$$$

Theorem 5: The number of representable vectors is $$$\prod\limits_{k=1}^d c(A_{k,k})$$$.

proof

Complexity

Suppose there are $$$n$$$ vectors being added into the basis. We will consider the number of times $$$\texttt{Add-Vector}$$$ is called. In 15 to 23 $$$\texttt{Add-Vector}$$$ is called twice, let us show that the number of times lines 15 to 23 is executed by bounded by $$$O(\Omega(m))$$$.

Let $$$A_i$$$ and $$$A_i'$$$ be the values of $$$A_i$$$ before and after the execution of lines 15 to 23. $$$A_{i,i}'=\gcd(V_i,A_{i,i})$$$ so it is easy to see that $$$A_{i,i}'|A_{i,i}$$$. Yet, $$$A_{i,i}' \neq A_{i,i}$$$ since that will only happen if $$$V_i = kA_{i,i}$$$ for some $$$k$$$ and lines 12 to 14 will be executed instead. Therefore, we can conclude that $$$A_{i,i}'|A_{i,i}$$$ and $$$A_{i,i}' \neq A_{i,i}$$$ implying that $$$\Omega(A_{i,i}')<\Omega(A_{i,i})$$$. Since $$$\Omega(A_{i,i})$$$ can only decrease at most $$$O(\Omega(m))$$$ times, the number of times that lines 15 to 23 is executed by bounded by $$$O(\Omega(m))$$$.

Therefore, there will only be $$$O(n+\Omega(m) \cdot d)$$$ calls to $$$\texttt{Add-Vector}$$$. It is easy to see that the time complexity excluding the calls to $$$\texttt{Extended-Euclidean}$$$ is $$$O(n \cdot d^2+\Omega(m) \cdot d^3)$$$.

It is easy to see that the total calls to $$$\texttt{Extended-Euclidean}$$$ will take $$$O(d \log (m))$$$, so the total complexity is $$$O(n \cdot d^2 + \Omega(m) \cdot d^3 + d \cdot \log(m))$$$.

The complexity of $$$\texttt{Inside-Basis}$$$ is trivially $$$O(d^2)$$$.

Extension to Handling Elements of Arbitrary Finite Abelian Group

First, let us consider maintaining the basis for vectors in $$$G=\mathbb{Z}/m_1\mathbb{Z} \times \mathbb{Z}/m_2\mathbb{Z} \times \ldots \times \mathbb{Z}/m_d\mathbb{Z}$$$.

Let $$$L=lcm(m_1,m_2,\ldots,m_d)$$$. It is clear that the above group is a subset of $$$(\mathbb{Z}/L\mathbb{Z})^d$$$, so we can maintain the basis of $$$G$$$ using this algorithm.

Upon analysis of the time complexity, it is easy to see that the time complexity is actually $$$O((n + \sum\limits_{k=1}^d \Omega(m_k)) \cdot d^2 + \sum\limits_{k=1}^d \log(m_k))$$$ assuming addition and multiplication under $$$\mathbb{Z}/L\mathbb{Z}$$$ can be done in $$$O(1)$$$.

By the fundamental theorem of finite abelian groups, every finite abelian group $$$G$$$ is an internal group direct product of cyclic groups whose orders are prime powers. So, $$$G=\mathbb{Z}/p_1^{k_1}\mathbb{Z} \times \mathbb{Z}/p_2^{k_2}\mathbb{Z} \times \ldots \times \mathbb{Z}/p_d^{k_d}\mathbb{Z}$$$ which can be handled easily.

Generalization to Euclidean Domains

Big thanks to adamant for discussing this with me.

It turns out that we can generalize this algorithm to work for any euclidean domain given that we are able to:

Do division and the extended euclidean algorithm quickly
Determine the size of the ideal $$$(r)$$$ generated by some element $$$r \in R$$$

We obviously have to do division quickly or else we will have a horrible time with things like $$$\mathbb{Z}[X]$$$.

As adamant suspected, this algorithm works also on $$$\mathbb{Z}[i]/p\mathbb{Z}[i]$$$ since similar to the case of integers, the extended euclidean algorithm works in $$$O(\log \max(N(a),N(b))$$$, which does not hold for the general euclidean domain, where $$$N(a+bi)=a^2+b^2$$$.

Now, we just have to show that we can compute the size of the ideal generated by $$$a+bi$$$ quickly. In the normal linear basis algorithm, it is intuitive to see that the size of the ideal generated by $$$a$$$ is $$$\frac{m}{\gcd(a,m)}$$$. It seems that this intuition actually carries over to $$$\mathbb{Z}[i]$$$ i.e. the size of the ideal is $$$\frac{N(p)}{N(\gcd(p,a+bi))}$$$.

I don't really have a good proof of this but I think I have some intuition on why this should work. A proof sketch would probably be looking at these things as groups and considering the size of the coset.

The size of the ideal generated by $$$a+bi$$$ is intuitively the same as the ideal generated by $$$\gcd(p,a+bi)$$$. Since $$$\gcd(p,a+bi) \cdot k_1 = a+bi$$$ for some $$$k_1$$$ and $$$(a+bi) \cdot k_2 = \gcd(p,a+bi)$$$ since the extended euclidean had found some $$$(s,t)$$$ such that $$$s \cdot (a+bi)+ p \cdot t=\gcd(p,a+bi)$$$.

Now, I will just ask you to appeal to geometric intuition for this part. Suppose we tile the plane with $$$pZ[i]$$$. Let's use $$$p=5+5i$$$ here.

Now, I claim that if $$$\gcd(g,p)=g$$$, then each "tile" will look the same.

Here are some examples:

$$$g=1+2i$$$

$$$g=3+i$$$

Some examples of $$$g$$$ that are not divisors of $$$p$$$ are $$$g=2+3i$$$.

Here is what it looks like.

Anyways, we can see that each blue tile takes up $$$N(g)$$$ squares. And since the blue tiles are a repeated pattern in the red tiles, we can only look at a single tile. Therefore, we can see that the size of the ideal is $$$\frac{N(p)}{N(g)}$$$, which was what we wanted.

It is probably better if you just tried these things yourself so here is the desmos graphs that I used to get the intuitions.

Open Problems

(SOLVED) In line 22 of Algorithm 1, $$$\texttt{Add-Vector}(\frac{m}{A_{i,i}}A_i)$$$ is called. This fact is used only in case 3 of theorem 2. Is the algorithm correct without this line? We have not found a set of vectors that causes the resulting matrix $$$A$$$ to be different when that line is removed.
Is there a fast (offline) algorithm for linear basis that allows for range query similar to 1100F - Ivan and Burgers and ABC223H when the vectors are in $$$(\mathbb{Z}/m\mathbb{Z})^d$$$?
What is the most general ring on which this algorithm works?

Implementation

Of course, this is a competitive programming site, so I will also provide a sample implementation in C++. I have also created an example problem in a mashup if you would like to submit (although testcases may not be super strong, if you want to provide stronger testcases please message me and I don't mind adding you on polygon).

Code

//雪花飄飄北風嘯嘯
//天地一片蒼茫

#include <bits/stdc++.h>
#include <ext/pb_ds/assoc_container.hpp>
#include <ext/pb_ds/tree_policy.hpp>
#include <ext/rope>
using namespace std;
using namespace __gnu_pbds;
using namespace __gnu_cxx;
#define ll long long
#define ii pair<ll,ll>
#define iii pair<ii,ll>
#define fi first
#define se second
#define endl '\n'
#define debug(x) cout << #x << ": " << x << endl

#define pub push_back
#define pob pop_back
#define puf push_front
#define pof pop_front
#define lb lower_bound
#define ub upper_bound

#define rep(x,start,end) for(auto x=(start)-((start)>(end));x!=(end)-((start)>(end));((start)<(end)?x++:x--))
#define all(x) (x).begin(),(x).end()
#define sz(x) (int)(x).size()

#define indexed_set tree<ll,null_type,less<ll>,rb_tree_tag,tree_order_statistics_node_update>
//change less to less_equal for non distinct pbds, but erase will bug

mt19937 rng(chrono::system_clock::now().time_since_epoch().count());

const int MOD=998244353;

ll q,d,m;
vector<ll> arr[105];

ll fix(ll i){
	i%=m;
	if (i<0) i+=m;
	return i;
}

pair<ll,ll> euclidean(ll a,ll b){ 
	//returns {s,t}
	
	if (b==0) return {1,0};
	else{
		ll x,y;
		tie(x,y)=euclidean(b,a%b);
		return {y,fix(x-(a/b)*y)};
	}
}

void add_vector(vector<ll> v){
	rep(x,0,d) if (v[x]){
		ll s,t;
		if (arr[x][x]==0){
			tie(s,t)=euclidean(v[x],m);
			rep(y,0,d) arr[x][y]=v[y]*s%m;
			ll temp=m/arr[x][x];
			rep(y,0,d) v[y]=v[y]*temp%m;
		}
		else if (v[x]%arr[x][x]==0){
			ll mul=m-(v[x]/arr[x][x]);
			rep(y,0,d) v[y]=(v[y]+arr[x][y]*mul)%m;
		}
		else{
			vector<ll> w=arr[x];
			tie(s,t)=euclidean(v[x],w[x]);
			rep(y,0,d) arr[x][y]=(v[y]*s+w[y]*t)%m;
			ll mul1=m-(v[x]/arr[x][x]),mul2=m-(w[x]/arr[x][x]);
			rep(y,0,d) v[y]=(v[y]+arr[x][y]*mul1)%m;
			rep(y,0,d) w[y]=(w[y]+arr[x][y]*mul2)%m;
			add_vector(w);
			
			vector<ll> v2=arr[x];
			ll mul=m/arr[x][x];
			rep(y,0,d) v2[y]=v2[y]*mul%m;
			add_vector(v2);
		}
	}
}

bool inside(vector<ll> v){
	rep(x,0,d){
		if (arr[x][x]!=0 && v[x]%arr[x][x]==0){
			ll mul=m-(v[x]/arr[x][x]);
			rep(y,0,d) v[y]=(v[y]+arr[x][y]*mul)%m;
		}
		if (v[x]) return false;
	}
	return true;
}

vector<ll> largest(){
	vector<ll> res(d,0);
	rep(x,0,d) if (arr[x][x]){
		ll mul=(m-1-res[x])/arr[x][x];
		rep(y,0,d) res[y]=(res[y]+arr[x][y]*mul)%m;
	}
	return res;
}

ll count(){
	ll res=1;
	rep(x,0,d) if (arr[x][x]){
		res=(res*(m/arr[x][x]))%MOD;
	}
	return res;
}

int main(){
	ios::sync_with_stdio(0);
	cin.tie(0);
	cout.tie(0);
	cin.exceptions(ios::badbit | ios::failbit);
	
	cin>>q>>d>>m;
	
	rep(x,0,d) arr[x]=vector<ll>(d,0);
	
	int a;
	vector<ll> v(d);
	while (q--){
		cin>>a;
		
		if (a<=2) rep(x,0,d) cin>>v[x];
		
		if (a==1){
			add_vector(v);
		}
		else if (a==2){
			cout<<inside(v)<<endl;
		}
		else if (a==3){
			v=largest();
			rep(x,0,d) cout<<v[x]<<" "; cout<<endl;
		}
		else{
			cout<<count()<<endl;
		}
	}
}

Full text and comments »

linear algebra, number theory, xor basis

+381

errorgorn
2 years ago
16

Invitation to TOKI Regular Open Contest #24

By errorgorn, 2 years ago, In English

Hello again Codeforces 👋,

We are excited to invite you to TOKI Regular Open Contest #24!

Key details:

Contest link: TLX
Time: 18th December 2021, 13:05 UTC
Writers: errorgorn, maomao90, oolimry
Duration: 150 minutes
Problems: 7, but one of the problems will have a subtask
Scoring distribution: 100 — 200 — 400 — 600 — (600 + 200) — 900 — 1000

Many thanks to:

prabowo 👨‍❤️‍💋‍👨 for 🕒 coordinating the 👦🌷 contest and translating our 💰🅱 stupid 😕💩 jokes into ➡ Indonesian 🇮🇩.
icypiggy for helping 💻 us 🅱👍 to prepare testdata.
Our 💩🏽 army of 😫👩 testers dantoh, tqbfjotld, -rs-, jamessngg, iLoveIOI, pavement for testing the problems and ~~making us delete a problem 😡🤬~~ giving useful feedback 🥰😇.
hocky for drawing beautiful diagrams 👌😍 for our problems.
fushar for the wonderful 🌈 TLX platform and fixing TLX after 🔜💯 we 😀👦 broke it multiple times.

Please 😩 register for 👍 the 🚪🅱 contest, and we hope you will 😘 enjoy 😋💋 the contest! Good 👌 luck 😄 have 😏😝 fun!

P.S. Please note ✏️📔 the unusual 🤪 duration 🕐⏱️🕝 I recommend 👍 not reading 🙈📖 all the problems 💀💀 to lose rating❗❗

UPD: contest is over! (no more emoji spam sadly)

Congratulations to our top 5:

Congratulations to our first solvers:

A: Itst
B: tourist
C: Maksim1744
D: Itst
E1: tourist
E2: tourist
F: Nyaan
G: no one :<

Rating has been updated.

You can upsolve the problems here.

Editorial is available in the upsolve link.

Thank you for participating and see you in the next contest!

Full text and comments »

tlx, troc, toki, indonesia

+227

errorgorn
2 years ago
27

Pragma Insanity: Weird Behaviour with Pragma O3

By errorgorn, 2 years ago, In English

Here is a simple problem. Count the number of array $$$A$$$ of non-negative integers size $$$N$$$ such that $$$\sum\limits_{i=1}^{N} A_i \cdot i=K$$$. Where $$$N,K \leq 5000$$$. Find the answer modulo $$$1000000007$$$.

I unfortunately cannot share the link to a submissions page as it is on a private local judge.

Update: I have asked judge devs of the local judge, the compile command is g++ -O2 -o {compiledName} {sourceName} -m64 -static -std=gnu++14 -lm -s -w -Wall -Wshadow -fmax-errors=512

But anyways, here is the problem.

AC code

#pragma GCC optimize("O3,unroll-loops")
#pragma GCC target("avx2,bmi,bmi2,lzcnt,popcnt")

#include <bits/stdc++.h>
using namespace std;
#define rep(x,start,end) for(auto x=(start)-((start)>(end));x!=(end)-((start)>(end));((start)<(end)?x++:x--))

const int MOD=1000000007;

int n,k;
int memo[5005];

int main(){
	cin>>n>>k;
	
	memo[k]=1;
	
	rep(x,n+1,1) if (k>=x) rep(y,k+1,x){
		assert(y-x>=0);
		memo[y-x]=(memo[y-x]+memo[y])%MOD;
	}
	
	cout<<memo[0]<<endl;
}

WA code

#pragma GCC optimize("O3,unroll-loops")
#pragma GCC target("avx2,bmi,bmi2,lzcnt,popcnt")

#include <bits/stdc++.h>
using namespace std;
#define rep(x,start,end) for(auto x=(start)-((start)>(end));x!=(end)-((start)>(end));((start)<(end)?x++:x--))

const int MOD=1000000007;

int n,k;
int memo[5005];

int main(){
	cin>>n>>k;
	
	memo[k]=1;
	
	rep(x,n+1,1) if (k>=x) rep(y,k+1,x){
		//assert(y-x>=0);
		memo[y-x]=(memo[y-x]+memo[y])%MOD;
	}
	
	cout<<memo[0]<<endl;
}

For the testcase $$$N=K=2500$$$ (which is the partition number for $$$2500$$$) the answer should be $$$729287017$$$, but the WA code gives $$$735199643$$$ (testing on atcoder custom invocation with C++ (GCC 9.2.1) ). This makes zero sense because the only thing different between these two codes is just the commented assert.

After further investigating, I managed to figure out that the problem was something to do with #pragma GCC optimize("O3").

Below is another set of AC and WA codes where the only difference is #pragma GCC optimize("O3").

AC code

#include <bits/stdc++.h>
using namespace std;

//#pragma GCC optimize("O3")

const int MOD=1000000007;

int n,k;
int memo[5005];

int main(){
	cin>>n>>k;
	
	memo[k]=1;
	
	for (int x=n;x>=1;x--) for (int y=k;y>=x;y--){
		memo[y-x]=(memo[y-x]+memo[y])%MOD;
	}
	
	cout<<memo[0]<<endl;
}

WA code

#include <bits/stdc++.h>
using namespace std;

#pragma GCC optimize("O3")

const int MOD=1000000007;

int n,k;
int memo[5005];

int main(){
	cin>>n>>k;
	
	memo[k]=1;
	
	for (int x=n;x>=1;x--) for (int y=k;y>=x;y--){
		memo[y-x]=(memo[y-x]+memo[y])%MOD;
	}
	
	cout<<memo[0]<<endl;
}

I was super confused when I found out about this so I told kostia244. He told me that perhaps it is overflow. And indeed, after changing int memo[5005] to long long memo[5005], it is AC.

AC code with long long

#include <bits/stdc++.h>
using namespace std;

#pragma GCC optimize("O3")

const int MOD=1000000007;

int n,k;
long long memo[5005];

int main(){
	cin>>n>>k;
	
	memo[k]=1;
	
	for (int x=n;x>=1;x--) for (int y=k;y>=x;y--){
		memo[y-x]=(memo[y-x]+memo[y])%MOD;
	}
	
	cout<<memo[0]<<endl;
}

So now we are both scratching our heads at this weird behaviour. Does anyone have an idea why this happens? Should we include macros in our codes by default now?

Sorry for tagging but ToxicPie9 nor

Full text and comments »

pragma

+122

errorgorn
2 years ago
4

Generalized FWHT: How I realized I did not understand FWHT

By errorgorn, 3 years ago, In English

I would like to preface this blog by saying this is not a tutorial on FWHT but some (maybe obvious) observations on FWHT that I wanted to share.

Thanks to rama_pang and oolimry for proof reading. Love you guys <3

Of course I (and the proof-readers) are not perfect so if you find anything unclear just tell me in the comments.

We will use the symbols $$$\&,|,\oplus$$$ to represent bitwise and, bitwise or, bitwise xor respectively.

FWHT

Given $$$2$$$ arrays $$$A$$$ and $$$B$$$. We will want to find array $$$C$$$ such that $$$C_i=\sum\limits_{j \star k=i} A_j \cdot B_k$$$. For our normal FFT, the operation $$$\star$$$ is just addition.

When finding fast for FFT, I found library code that does FWHT $$$^{[1]}$$$, that is it does the above operation where $$$\star$$$ can also be any of the $$$3$$$ bitwise operations.

We focus on the case for $$$\oplus$$$. We focus even more specifically on the very simple case of $$$A$$$ and $$$B$$$ being size $$$2$$$. We want to convolute $$$\{p_0,p_1\}$$$ and $$$\{q_0,q_1\}$$$ quickly. The way we do this is find some magic matrix $$$T$$$ such that we can just perform the dot product to "convolute" these transformed vectors then apply $$$T^{-1}$$$ to get the convoluted vector. We want to convolute these arrays into $$$\{p_0q_0 + p_1q_1 , p_0q_1 + p_1q_0\}$$$. So we want to find some invertible matrix $$$T$$$ that satisfies $$$T \Big( \matrix{p_0 \\ p_1} \Big) \cdot T \Big( \matrix{q_0 \\ q_1} \Big)=T \Big( \matrix{p_0q_0+p_1q_1 \\ p_0q_1+p_1q_0} \Big)$$$.

Let $$$T=\Big( \matrix{a & b \\ c& d} \Big)$$$. Expanding these, we obtain $$$\Big( \matrix{ a^2p_0q_0+abp_0q_1+abp_1q_0+b^2p_1q_1 \\ c^2p_0q_0+cdp_0q_1+cdp_1q_0+d^2p_1q_1} \Big)=\Big( \matrix{ ap_0q_0+bp_0q_1+bp_1q_0+ap_1q_1 \\ cp_0q_0+dp_0q_1+dp_1q_0+cp_1q_1} \Big)$$$.

We obtain $$$a^2=a,ab=b,b^2=a$$$ and similar set of equations for $$$c,d$$$. We find that $$$(a,b)={(0,0),(1,1),(1,-1)}$$$. But the only way to make a invertible matrix is $$$T=\Big( \matrix{1 & 1 \\ 1& -1} \Big)$$$. We also find that $$$T^{-1}=\frac 12 \Big( \matrix{1 & 1 \\ 1& -1} \Big)$$$.

We can do a similar thing for $$$\&$$$ and $$$|$$$ and find that $$$T=\Big( \matrix{0 & 1 \\ 1 & 1} \Big)$$$ and $$$T=\Big( \matrix{1 & 0 \\ 1 & 1} \Big)$$$ works.

Then we just apply the matrix $$$T \otimes T \otimes \ldots \otimes T$$$ to our vector to get it's point-coordinate form $$$^{[4]}$$$, where $$$\otimes$$$ is the Kronecker product $$$^{[5],[6]}$$$. Since we are applying the same operation $$$T$$$ to each "layer" of the vector, it makes sense why the Kronecker product is used.

I initially just whacked the math and didn't really pay much attention to it. So here is my attempt at going deeper I suppose.

Kronecker Product

It turns out that $$$(A \otimes B)(C \otimes D)=AC \otimes BD$$$ $$$^{[5]}$$$. This gives us a fast way to compute $$$T \otimes T \otimes \ldots \otimes T$$$.

We assume that $$$A,B,C$$$ are $$$n \times n$$$ square matrices and $$$I_n$$$ is the $$$n \times n$$$ identity matrix.

Claim: $$$A \otimes B \otimes C = (A \otimes I_n \otimes I_n)(I_n \otimes B \otimes I_n)(I_n \otimes I_n \otimes C)=(I_n \otimes I_n \otimes C)(I_n \otimes B \otimes I_n)(A \otimes I_n \otimes I_n)$$$.

Proof: Trivial

Corollary: we can switch the order we multiply the matrices $$$(A \otimes I_n \otimes I_n),(I_n \otimes B \otimes I_n),(I_n \otimes I_n \otimes C)$$$, (i.e. they are all pairwise commutative).

Corollary: $$$((A \otimes I_n \otimes I_n)(I_n \otimes B \otimes I_n)(I_n \otimes I_n \otimes C))^{-1}=(A \otimes I_n \otimes I_n)^{-1}(I_n \otimes B \otimes I_n)^{-1}(I_n \otimes I_n \otimes C)^{-1}$$$.

Notice that these things works for arbitrary finite length $$$T \otimes T \otimes \ldots \otimes T$$$ but I will not write it down or it will become notation hell. Also, this works if $$$A,B,C$$$ are not of the same size (but are still square matrices).

This gives us a fast way to multiply a matrix by $$$\underbrace{T \otimes T \otimes \ldots \otimes T}_{k \text{ times}}$$$ as each element of $$$(I_n \otimes \ldots \otimes I_n \otimes T \otimes I_n \otimes \ldots \otimes I_n)$$$ has at most $$$n^{k+1}$$$ non-zero element. So multiplying a vector by that matrix only takes $$$O(n^{k+1})$$$ time and we are multiplying by $$$O(k)$$$ matrices. This is exactly the speedup given by FWHT.

Generalization of Xor

What I did not realize was that $$$\oplus$$$ was literally addition under modulo $$$2$$$. That is, we were literally doing a "small" FFT in each layer of our FWHT.

Recall that in FFT, we when we convert a polynomial to the point-coordinate form, we are multiplying the coefficients of the polynomial by the matrix $$$T=\begin{pmatrix} \omega_n^0 & \omega_n^0 & \omega_n^0 & \ldots & \omega_n^{0} \\ \omega_n^0 & \omega_n^1 & \omega_n^2 & \ldots & \omega_n^{n-1} \\ \vdots & \vdots & \vdots & \ddots & \\ \omega_n^0 & \omega_n^{n-1} & \omega_n^{2n-2} & & \omega_n^{n^2-2n+1} \end{pmatrix}$$$, where $$$w_a^b=e^{i\frac{b\tau}{a}}$$$.

Well, when we are doing convolution with $$$\oplus$$$, our matrix is $$$T=\Big( \matrix{1 & 1 \\ 1& -1} \Big)=\Big( \matrix{\omega_2^0 & \omega_2^0 \\ \omega_2^0 & \omega_2^1} \Big)$$$. That means we can do our "FWHT" with other things such as addition modulo $$$3$$$ with the matrix $$$T=\begin{pmatrix} \omega_3^0 & \omega_3^0 & \omega_3^0 \\ \omega_3^0 & \omega_3^1 & \omega_3^2 \\ \omega_3^0 & \omega_3^2 & \omega_3^4\end{pmatrix}$$$. This idea is used in GP of Tokyo 2021 G with addition modulo $$$7$$$ used. Note that we naively apply the matrix in $$$O(b^2)$$$ if we are dealing with $$$b$$$-bit numbers.

Generalization of And/Or

A way to think about $$$\&$$$ and $$$|$$$ when we are dealing with $$$0 \leq a,b < 2$$$ (i.e. single-bit numbers) is $$$a \& b=\min(a,b)$$$ and $$$a | b = \max (a,b)$$$. Since these are symmetric we can focus on the case of $$$|$$$.

Let's try to generalize this to each bit having digits from $$$0$$$ to $$$2$$$.

Our matrix will look like $$$T=\begin{pmatrix} a_0 & b_0 & c_0 \\ a_1 & b_1 & c_1 \\ a_2 & b_2 & c_2 \end{pmatrix}$$$. If we apply the math we have used above, we find that we have to satisfy the equations $$$a_i^2=a_i, a_ib_i=b_i^2=b_i,a_ic_i=b_ic_i=c_i^2=c_i$$$.

So we have $$$(a_i,b_i,c_i)=\{(1,1,1),(1,1,0),(1,0,0),(0,0,0)\}$$$. Since we want $$$T$$$ to be invertible we can just have $$$T=\begin{pmatrix} 1 & 0 & 0 \\ 1 & 1 & 0 \\ 1 & 1 & 1 \end{pmatrix}$$$.

Notice an observation about this $$$T$$$, it is just a "staircase" matrix, where the bottom right triangle is filled with $$$1$$$ and the rest of the elements are $$$0$$$.

Now what happens if we do $$$T \cdot \begin{pmatrix}a \\ b \\ c\end{pmatrix} = \begin{pmatrix}a \\ a+b \\ a+b+c \end{pmatrix}$$$. That is, $$$T$$$ is literally a "do prefix sum matrix".

So what happens if we consider numbers with $$$n$$$ bits, what would multiplying the matrix by $$$\underbrace{T \otimes T \otimes \ldots \otimes T}_{n \text{ times}}$$$ do? (Yes, we are considering the case for operator $$$|$$$).

Let us try to expand out $$$T \otimes T \otimes T$$$.

$$$T \otimes T \otimes T= \begin{pmatrix} 1 & 0 \\ 1 & 1 \end{pmatrix} \otimes \begin{pmatrix} 1 & 0 \\ 1 & 1 \end{pmatrix} \otimes \begin{pmatrix} 1 & 0 \\ 1 & 1 \end{pmatrix}= \begin{pmatrix} 1 & 0 & 0 & 0 \\ 1 & 1& 0 & 0 \\ 1 & 0 & 1 & 0 \\ 1 & 1& 1 & 1 \end{pmatrix} \otimes \begin{pmatrix} 1 & 0 \\ 1 & 1 \end{pmatrix}=\begin{pmatrix} 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 1 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 1 & 1 & 1 & 1 & 0 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 1 & 1 & 0 & 0 \\ 1 & 0 & 1 & 0 & 1 & 0 & 1 & 0 \\ 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \end{pmatrix}$$$.

This may look like garbage but we will try to compute $$$(T \otimes T \otimes T) \cdot \begin{pmatrix} 000 \\ 001 \\ 010 \\ 011 \\ 100 \\ 101 \\ 110 \\ 111 \end{pmatrix}=\begin{pmatrix} 000 \\ 000+001 \\ 000+010 \\ 000+001+010+011 \\ 000+100 \\ 000+001+100+101 \\ 000+010+100+110 \\ 000+001+010+011+100+101+110+111 \end{pmatrix}$$$.

Notice that the effect of $$$T \otimes T \otimes T$$$ is exactly SOS. That means, FWHT for $$$\&$$$ and $$$|$$$ is just applying the zeta transform and the mobius transform in $$$O(n \cdot 2^n)$$$ using SOS dp $$$^{[2],[3]}$$$, which is the same complexity as the $$$O(N \cdot \log N)$$$ we have using FWHT taking $$$N=2^n$$$.

So we can generalize FWHT for the $$$b$$$-bit version of $$$ \&$$$ and $$$|$$$ using (generalized) SOS dp in $$$O(n \cdot b^n)$$$ where the array sizes of the convolution are at most size $$$b^n$$$.

The reason for why FWHT for $$$ \&$$$ and $$$|$$$ can be done using subset sum is actually quite intuitive in hindset, so I am quite confused how I have never made the connection before.

We can think of this as some PIE, by focusing on the case of $$$|$$$ on binary numbers (for simplicity). Define $$$\bar A_{mask}=\sum\limits_{sub \subseteq mask} A_{sub}$$$ and $$$\bar B_{mask}=\sum\limits_{sub \subseteq mask} B_{sub}$$$.

Now, by PIE, we know that $$$C_{mask}=\sum\limits_{sub \subseteq mask} (-1)^{|mask \setminus sub|} \bar A_{sub} \bar B_{sub}$$$. It is clear that $$$C=\mu(\bar A \cdot \bar B)=\mu(\zeta(A)\cdot \zeta(B))$$$.

Subset Sum Convolution

Now, we can actually use ideas from FWHT to do subset sum convolution, where we want to find $$$C(mask)=\sum\limits_{sub \subseteq mask} A_{sub} \cdot B_{mask \setminus sub}$$$ for all $$$mask$$$.

Define:

$$$\hat A_{i,mask}\begin{cases}A_{mask} &, \text{if } |mask|=i\\ 0 &, \text{otherwise}\end{cases}$$$
$$$\hat B_{i,mask}\begin{cases}B_{mask} &, \text{if } |mask|=i\\ 0 &, \text{otherwise}\end{cases}$$$

It can be shown that $$$C(mask)=\mu \Bigg(\sum\limits_{i=0}^{|mask|} \zeta(\hat A_{i}) \cdot \zeta(\hat B_{|mask|-i}) \Bigg)$$$ and it can be calculated in $$$O(n^2 \cdot 2^n)$$$ $$$^{[3]}$$$.

Well, we can reason about it very easily now. Since for convolution with $$$|$$$, FWHT is same as $$$\zeta$$$ and the inverse FWHT is same as $$$\mu$$$. So it is obvious why the above equation holds, the only way that some non-zero element in $$$\hat{A}_{i}$$$ and $$$\hat B_{|mask|-i}$$$ can convolute to an element with $$$|mask|$$$ bits is for those element 's bits to be mutually exclusive.

Also note that we can apply the inverse FWHT on the outside of the summation as $$$\mu(A)+\mu(B)=\mu(A+B)$$$ since $$$\mu$$$ can be thought of as some arbitrary matrix.

Actually, we don't have to restrict ourselves to convolution with $$$|$$$ for this task. Convolution with $$$\oplus$$$ and $$$\&$$$ works fine too.

More Trash

We can do more weird stuff like performing the operation $$$\oplus$$$ on the $$$0$$$-th bit, performing $$$\&$$$ on the $$$1$$$-th bit and performing $$$|$$$ on the $$$2$$$-th bit.

Recalling that the matrices for these are $$$T_\oplus=\Big( \matrix{1 & 1 \\ 1 & -1} \Big)$$$, $$$T_\&=\Big( \matrix{0 & 1 \\ 1 & 1} \Big)$$$ and $$$T_|=\Big( \matrix{1 & 0 \\ 1 & 1} \Big)$$$ respectively. We just multiply our vector by $$$T_| \otimes T_\& \otimes T_\oplus$$$ for the forward transform and do the inverse for the backward transform.

We can also "mix and match" transformations of various sizes as long as we modify the result given in the Kronecker Product section.

As an example, suppose we have a number system of $$$(a,b)$$$ where $$$0 \leq a < 2$$$ and $$$0 \leq b < 3$$$ and we define $$$(a,b)+(c,d)=(a+c \mod 2, b+d \mod 3)$$$.

We can do convolution on this by noting that we have $$$T_2=\Big( \matrix{\omega_2^0 & \omega_2^0 \\ \omega_2^0 & \omega_2^1} \Big)$$$ and $$$T_3=\begin{pmatrix} \omega_3^0 & \omega_3^0 & \omega_3^0 \\ \omega_3^0 & \omega_3^1 & \omega_3^2 \\ \omega_3^0 & \omega_3^2 & \omega_3^4\end{pmatrix}$$$. So the transformation is given by $$$T_2 \otimes T_3=(T_2 \otimes I_3) (I_2 \otimes T_3)$$$.

So as long as you can define the transformation matrix, you can combine them and do weird stuff.

References

Full text and comments »

fft, fwht, sos

+293

errorgorn
3 years ago
4

Interesting Coincidence

By errorgorn, history, 3 years ago, In English

Спойлер

Full text and comments »

+112

errorgorn
3 years ago
6

Codeforces Round #723 (Div. 2) Editorial

By errorgorn, history, 3 years ago, In English

1526A - Mean Inequality

Setter: antontrygubO_o
Preparer: errorgorn

Hint 1

Hint 2

Solution

Code (C++)

//雪花飄飄北風嘯嘯
//天地一片蒼茫

#include <bits/stdc++.h>
#include <ext/pb_ds/assoc_container.hpp>
#include <ext/pb_ds/tree_policy.hpp>
#include <ext/rope>
using namespace std;
using namespace __gnu_pbds;
using namespace __gnu_cxx;
#define ll long long
#define ii pair<ll,ll>
#define iii pair<ii,ll>
#define fi first
#define se second
#define endl '\n'
#define debug(x) cout << #x << " is " << x << endl

#define pub push_back
#define pob pop_back
#define puf push_front
#define pof pop_front
#define lb lower_bound
#define ub upper_bound

#define rep(x,start,end) for(auto x=(start)-((start)>(end));x!=(end)-((start)>(end));((start)<(end)?x++:x--))
#define all(x) (x).begin(),(x).end()
#define sz(x) (int)(x).size()

#define indexed_set tree<ll,null_type,less<ll>,rb_tree_tag,tree_order_statistics_node_update>
//change less to less_equal for non distinct pbds, but erase will bug

mt19937 rng(chrono::system_clock::now().time_since_epoch().count());

int n;
int arr[55];

int main(){
	ios::sync_with_stdio(0);
	cin.tie(0);
	cout.tie(0);
	cin.exceptions(ios::badbit | ios::failbit);
	
	int TC;
	cin>>TC;
	while (TC--){
		cin>>n;
		rep(x,0,2*n) cin>>arr[x];
		
		sort(arr,arr+2*n);
		
		rep(x,0,n) cout<<arr[x]<<" "<<arr[x+n]<<" ";
		cout<<endl;
	}
}

Code (Python)

[(lambda n: (lambda arr : [print(f'{arr[i]} {arr[i+n]}', end = ' \n'[i==n-1]) for i in range(n)])(sorted(list(map(int,input().split())))))(int(input())) for _ in range(int(input()))]

1526B - I Hate 1111

Setter: errorgorn
Preparer: errorgorn

Hint 0

Hint 1

Hint 2

Solution

Code (C++)

//雪花飄飄北風嘯嘯
//天地一片蒼茫

#include <bits/stdc++.h>
#include <ext/pb_ds/assoc_container.hpp>
#include <ext/pb_ds/tree_policy.hpp>
#include <ext/rope>
using namespace std;
using namespace __gnu_pbds;
using namespace __gnu_cxx;
#define ll long long
#define ii pair<ll,ll>
#define iii pair<ii,ll>
#define fi first
#define se second
#define endl '\n'
#define debug(x) cout << #x << " is " << x << endl

#define pub push_back
#define pob pop_back
#define puf push_front
#define pof pop_front
#define lb lower_bound
#define ub upper_bound

#define rep(x,start,end) for(auto x=(start)-((start)>(end));x!=(end)-((start)>(end));((start)<(end)?x++:x--))
#define all(x) (x).begin(),(x).end()
#define sz(x) (int)(x).size()

#define indexed_set tree<ll,null_type,less<ll>,rb_tree_tag,tree_order_statistics_node_update>
//change less to less_equal for non distinct pbds, but erase will bug

mt19937 rng(chrono::system_clock::now().time_since_epoch().count());

int main(){
	ios::sync_with_stdio(0);
	cin.tie(0);
	cout.tie(0);
	cin.exceptions(ios::badbit | ios::failbit);
	
	int TC;
	cin>>TC;
	while (TC--){
		int n;
		cin>>n;
		
		rep(x,0,20){
			if (n%11==0){
				cout<<"YES"<<endl;
				goto done;
			}
			
			n-=111;
			if (n<0) break;
		}
		cout<<"NO"<<endl;
		
		done:;
	}
}

Code (Python)

[(lambda n : print("YES" if (n >= 111*(n%11)) else "NO"))(int(input())) for _ in range(int(input()))]

1526C1 - Potions (Easy Version) and 1526C2 - Potions (Hard Version)

Setter: errorgorn and oolimry
Preparer: oolimry

Hint 1

Hint 2

Hint 3

Solution

Method 1 — DP

Let's consider a dynamic programming solution. Let $$$dp[i][k]$$$ be the maximum possible health achievable if we consider only the first $$$i$$$ potions, and $$$k$$$ is the total number of potions taken.

The transition is as follows: $$$dp[i][k] = max(dp[i-1][k-1] + a_i, dp[i-1][k])$$$ if $$$dp[i-1][k-1] + a_i \geq 0$$$, and just $$$dp[i-1][k]$$$ otherwise. The first term represents the case if we take potion $$$i$$$ and the second term is when we ignore potion $$$i$$$.

This runs in $$$O(n^2)$$$ and passes the easy version.

Method 2 — Greedy 1

We iterate through the potions in non-decreasing order and drink the potion if we do not die.

For convenience, we define $$$c_i=a_i+i \cdot \epsilon$$$, where $$$\epsilon$$$ is a very small real number so that we can treat $$$c_i$$$ as distinct integers. We will show by exchange argument that our greedy is optimal.

Let $$$S$$$ be the set of potions that is an optimal solution.

Suppose $$$i<j$$$ and $$$c_i<c_j$$$. If $$$i \in S$$$ and $$$j \notin S$$$, then removing $$$i$$$ and adding $$$j$$$ into $$$S$$$ will still make $$$S$$$ a valid solution.

Now, suppose that position $$$i$$$ is not drunk. We can assume that there are no $$$k$$$ ($$$k<i$$$) such that $$$c_k<c_i$$$ and $$$k$$$ is drunk by the previous assertion. Suppose we add $$$i$$$ into $$$S$$$. If we reach a position $$$j$$$ (we possibly die at $$$j$$$) where $$$c_j<c_i$$$, then we can remove $$$j$$$ from $$$S$$$ and add $$$i$$$ to $$$S$$$. Otherwise, our greedy will also not choose $$$i$$$, as it was not chosen when we only consider indexes $$$k$$$ such that $$$c_k<c_i$$$.

Notice that we can arbitrarily define the way $$$\epsilon$$$ is added, so basically we can process $$$a_i$$$ in any non-increasing fashion.

Doing this naively is $$$O(n^2)$$$ as well. However, using a range add range max lazy propagation segment tree, we can check if a certain potion can be drunk without dying, and the solution runs in $$$O(n \log n)$$$.

Method 3 — Greedy 2 / DP 2

We process the potions from left to right. At the same time, we maintain the list of potions we have taken so far. When processing potion $$$i$$$, if we can take $$$i$$$ without dying, then we take it. Otherwise, if the most negative potion we've taken is more negative than potion $$$i$$$, then we can swap out potion $$$i$$$ for that potion. To find the most negative potion we've taken, we can maintain the values of all potions in a minimum priority_queue. This runs in $$$O(nlogn)$$$ as well

To prove that this works, let's consider best solution where we take exactly $$$k$$$ potions (best as in max total health). The solution involves taking the $$$k$$$ largest values in our priority queue. Then when considering a new potion, we should see whether swapping out the new potion for the $k$th largest potion will improve the answer.

Since the priority queue is strictly decreasing, there will be a cutoff $$$K$$$, where for $$$k$$$ at most $$$K$$$, the answer is not affected, and for larger than $$$K$$$, we swap out the $$$k$$$th largest potion. It turns out this process is equivalent to inserting the new potion's value into the priority_queue. For those positions at most $$$K$$$, they are not affected. For the positions larger than $$$K$$$, the elements get pushed back one space, meaning that the smallest element is undrinked.

This can also be seen as an efficient way to to transition from one layer of the $$$dp$$$ table to the next.

Code (C++)

#include <bits/stdc++.h>
using namespace std;
typedef pair<long long, long long> ii;

struct node{
	int s, e, m;
	long long val = 0; long long lazy = 0;
	node *l, *r;
	
	node(int S, int E){
		s = S, e = E, m = (s+e)/2;
		if(s != e){
			l = new node(s, m);
			r = new node(m+1, e);
		}
	}
	
	void apply(long long L){
		val += L;
		lazy += L;
	}
	void push(){
		if(s == e) return;
		l->apply(lazy);
		r->apply(lazy);
		lazy = 0;
	}
	
	void update(int S, int E, long long L){
		push();
		if(s == S && E == e){
			apply(L);
			return;
		}
		else if(E <= m) l->update(S, E, L);
		else if(S >= m+1) r->update(S, E, L);
		else l->update(S, m, L), r->update(m+1, E, L);
		val = min(l->val, r->val);
	}
	
	long long query(int S, int E){
		push();
		if(s == S && E == e) return val;
		else if(E <= m) return l->query(S, E);
		else if(S >= m+1) return r->query(S, E);
		else return min(l->query(S, m), r->query(m+1, E));
	}
} *root;	


int main(){
	ios_base::sync_with_stdio(false); cin.tie(0);
	
	int n; cin >> n;
	ii arr[n];
	for(int i = 0;i < n;i++){
		cin >> arr[i].first;
		arr[i].second = i;
	}
	
	sort(arr,arr+n);
	reverse(arr,arr+n);
	int ans = 0;
	root = new node(0, n-1);
	for(int i = 0;i < n;i++){
		int pos = arr[i].second;
		long long x = arr[i].first;
		
		if(x + root->query(pos,n-1) >= 0){
			ans++;
			root->update(pos,n-1,x);
		}
	}
	
	cout << ans;
}

Code 2 (C++)

#include <bits/stdc++.h>
using namespace std;

int main(){
	ios_base::sync_with_stdio(false); cin.tie(0);
	
	int n; cin >> n;
	priority_queue<long long, vector<long long>, greater<long long> > pq;
	long long S = 0;
	
	for(int i = 1;i <= n;i++){
		long long x; cin >> x;
		S += x;
		pq.push(x);
		
		while(S < 0){
			S -= pq.top();
			pq.pop();
		}
	}
	
	cout << (int) pq.size();
}

Code (Python)

import heapq
H = []
n=int(input())
for i in list(map(int,input().split())):
    tot = (tot+i if 'tot' in locals() or 'tot' in globals() else i)
    tot -= ((heapq.heappush(H,i) if tot >= 0 else heapq.heappushpop(H,i)) or 0)
print(len(H))

1526D - Kill Anton

Setter: errorgorn
Preparer: errorgorn

Hint 1

Solution

We claim that in the optimal answer all characters of some type will appear consecutively.

Proof

Consider the character at position $$$i$$$ in string $$$s$$$. We define $$$C_i$$$ as the position in which that character will be in string $$$t$$$. For $$$s=\texttt{CAGCAA}$$$ and $$$t=\texttt{AAGCAC}$$$, $$$C={4,1,3,6,2,5}$$$. Then the minimum number of moves to transformed $$$s$$$ to $$$t$$$ is given by the inversion index of $$$C$$$.

Suppose we find a substring of $$$s$$$ where it is $$$\texttt{XXAA...AA[A]XX...XX[A]AA...AAXX}$$$, where characters $$$\texttt{X}$$$ can be either one of $$$\texttt{TCG}$$$. The square brackets are just for clarity of explanation. We will show that it our solution will not be worse if we merge these $$$2$$$ contiguous segments of $$$\texttt{A}$$$.

Consider transforming the string into $$$\texttt{XXAA...AAXX...XX[A][A]AA...AAXX}$$$ and $$$\texttt{XXAA...AA[A][A]XX...XXAA...AAXX}$$$. Let the difference in the number of moves be $$$D_1$$$ and $$$D_2$$$ respectively. And let the index of $$$\texttt{[A]}$$$s be $$$i$$$ and $$$j$$$ respectively.

Then $$$D_1=\sum\limits_{k=i+1}^{j-1} \sigma(C_k-C_i)$$$ and $$$D_2=\sum\limits_{k=i+1}^{j-1} \sigma(C_j-C_k)$$$, where $$$\sigma(x)=\frac{x}{|x|}$$$, or rather $$$\sigma$$$ is the sign function.

I claim that $$$D_1+D_2 \geq 0$$$. Suppose that $$$D_1+D_2<0$$$, then there exist $$$k$$$ such that $$$\sigma(C_k-C_i)+\sigma(C_j-C_k)<0$$$ which implies $$$C_k-C_i<0$$$ and $$$C_j-C_k<0$$$. However, this implies that $$$C_j<C_i$$$ which is clearly a contradiction.

Since $$$D_1+D_2 \geq 0$$$, either $$$D_1 \geq 0$$$ or $$$D_2 \geq 0$$$. WLOG, $$$D_1 \geq 0$$$. This implies that we turn $$$s$$$ into $$$\texttt{XXAA...AAXX...XX[A][A]AA...AAXX}$$$ without decreasing the number of moves to transform $$$s$$$ into $$$t$$$. Now, consider the rest of the $$$\texttt{A}$$$ on the left segment of $$$\texttt{A}$$$. We can move it too the right also. Recall that $$$D_1 + D_2 \geq 0$$$. In this case, we know that $$$D_2 \leq 0$$$ as it is the cost to move the left $$$\texttt{[A]}$$$ back to its original position. So $$$D_1 \geq 0$$$. Therefore, we have merged $$$2$$$ different segments of $$$A$$$ without decreasing the number of moves to transform $$$s$$$ into $$$t$$$.

So, we can try all $$$24$$$ possible strings and check the number of moves Anton's body needs to transform each string. The time limit is relaxed enough for a $$$O(n \log n)$$$ similar to 1430E - String Reversal.

But there is a $$$O(n)$$$ solution, the number of moves Anton's body needs to transform each string is given by the number of inversions in the string. But since we know that we only care about strings that have all the same characters appear consecutively, we can just keep a count of the number of inversions for each pair of characters.

Code (C++)

//雪花飄飄北風嘯嘯
//天地一片蒼茫

#include <bits/stdc++.h>
#include <ext/pb_ds/assoc_container.hpp>
#include <ext/pb_ds/tree_policy.hpp>
#include <ext/rope>
using namespace std;
using namespace __gnu_pbds;
using namespace __gnu_cxx;
#define ll long long
#define ii pair<ll,ll>
#define iii pair<ii,ll>
#define fi first
#define se second
#define endl '\n'
#define debug(x) cout << #x << " is " << x << endl

#define pub push_back
#define pob pop_back
#define puf push_front
#define pof pop_front
#define lb lower_bound
#define ub upper_bound

#define rep(x,start,end) for(auto x=(start)-((start)>(end));x!=(end)-((start)>(end));((start)<(end)?x++:x--))
#define all(x) (x).begin(),(x).end()
#define sz(x) (int)(x).size()

#define indexed_set tree<ll,null_type,less<ll>,rb_tree_tag,tree_order_statistics_node_update>
//change less to less_equal for non distinct pbds, but erase will bug

mt19937 rng(chrono::system_clock::now().time_since_epoch().count());

int n;
string s;
int cnt[4];

ll flips[4][4];

map<char,int> id={{'A',0},{'N',1},{'O',2},{'T',3}};
string cset="ANOT";

int main(){
	ios::sync_with_stdio(0);
	cin.tie(0);
	cout.tie(0);
	cin.exceptions(ios::badbit | ios::failbit);
	
	int TC;
	cin>>TC;
	while (TC--){
		cin>>s;
		n=sz(s);
		
		memset(cnt,0,sizeof(cnt));
		memset(flips,0,sizeof(flips));
		
		rep(x,0,n){
			cnt[id[s[x]]]++;
		}
		rep(a,0,4){
			int curr=0;
			rep(x,0,n){
				flips[a][id[s[x]]]+=curr;
				if (id[s[x]]==a) curr++;
			}
		}
		
		ll best=-1;
		vector<int> ans;
		
		vector<int> v={0,1,2,3};
		do{
			ll curr=0;
			rep(x,0,4){
				rep(y,x+1,4){
					curr+=flips[v[y]][v[x]];
				}
			}
			
			if (curr>best){
				best=curr;
				ans=v;
			}
		} while (next_permutation(all(v)));
		
		for (auto &it:ans) rep(x,0,cnt[it]) cout<<cset[it];
		cout<<endl;
	}
}

Code (Python)

import itertools

TC=int(input())

m={"A":0,"N":1,"O":2,"T":3}
st="ANOT"

for _ in range(TC):
	s=input()
	arr=[]
	
	for ch in s:
		arr.append(m[ch])
	
	cnt=[[0,0,0,0],[0,0,0,0],[0,0,0,0],[0,0,0,0]]
	cnt1=[0,0,0,0]
	
	for i in arr:
		for j in range(4):
			cnt[j][i]+=cnt1[j]
		cnt1[i]+=1
	
	val=-1
	best=[]
	
	for perm in itertools.permutations([0,1,2,3]):
		curr=0
		for i in range(4):
			for j in range(i+1,4):
				curr+=cnt[perm[j]][perm[i]]	
		
		if (curr>val):
			val=curr
			best=perm
			
	
	for i in range(4):
		for j in range(cnt1[best[i]]): print(st[best[i]],end="")
	
	print()

1526E - Oolimry and Suffix Array

Setter: iLoveIOI
Preparer: iLoveIOI

Hint 1

Solution

First, let's consider a simpler problem.

Is it possible to make a string with a certain suffix array given an alphabet size $$$k$$$.

Consider two adjacent suffixes in the suffix array "xy" and "ab" where $$$b$$$ and $$$y$$$ are some strings and $$$x$$$ and $$$a$$$ are some characters i.e. $$$x$$$ is the first character of that suffix and similarly for $$$a$$$. If $$$b$$$ and $$$y$$$ don't exist we consider them as \$, i.e. smaller than everything. Also, "xy" comes before "ab".

Observation: If the position of $$$b$$$ is less than the position of $$$y$$$ in the suffix array, $$$x$$$ must be less than $$$a$$$. Otherwise, $$$x$$$ must be less than or equal to $$$a$$$.

This can be easily shown as "xy" must be lexicographically smaller than "ab". This is sufficient also.

Thus we can iterate through the suffix array and check if $$$pos[arr[i]+1]>pos[arr[i+1]+1]$$$ where $$$arr$$$ is the suffix array and $$$pos$$$ is the position of the $$$i$$$th element in the suffix array. If this condition holds then the $$$arr[i]$$$ th character must be strictly less than the $$$arr[i+1]$$$ th character. Thus we can just count how many such pairs exist. If this count is larger than the alphabet size no such string is possible. Otherwise, such string exists. Note that special care must be taken when considering $$$arr[i]=n-1$$$ as $$$pos[n]$$$ may not be defined ($$$0$$$-indexed).

After tackling the simpler question we move on to the full question of counting how many such strings are there. If we consider the string as an array and the order of the array as the order of the suffix array meaning that the $$$i$$$ th element of this array is the $$$arr[i]$$$ th element of the string. We have now transformed the question into "Given that some elements must be greater than the previous elements while others can be equal. Count how many arrays are there such that the largest element is less than $$$k$$$".

Consider the difference array, some elements must be $$$\geq 1$$$ while some can be $$$\geq 0$$$. We add padding to the front and back of the array so as to account for the first value being non-zero and the last element being less than $$$k$$$. These two elements are both $$$\geq 0$$$. Let $$$cnt$$$ be the number of elements that must be $$$\geq 1$$$ in the difference array. Now, this becomes count how many arrays of $$$(n-1+2)=(n+1)$$$ non-negative elements sum to $$$k-cnt$$$. This can be solved using stars and bars so the final answer comes out to be $$$n+1+k-cnt \choose n$$$ which can be found easily. Note that we define $$$\binom{a}{b}=0$$$ when $$$a<b$$$.

Final Complexity $$$O(n\log MOD)$$$ or $$$O(n)$$$ depending on how you find the modular inverse.

Btw $$$k$$$ was chosen to also be $$$\leq 2\times 10^5$$$ so as to hide the final complexity :)

Code (C++)

/*input
5 4
3 4 2 1 0

*/
#include<bits/stdc++.h>
#include <ext/pb_ds/assoc_container.hpp>
#include <ext/pb_ds/tree_policy.hpp>
using namespace std;
using namespace __gnu_pbds;
typedef tree<long long, null_type, less_equal<long long>, rb_tree_tag, tree_order_statistics_node_update> indexed_set;
//order_of_key #of elements less than x
// find_by_order kth element
typedef long long int ll;
#define ld long double
#define pii pair<ll,ll>
#define f first
#define s second
#define pb push_back
#define REP(i,n) for(ll i=0;i<n;i++)
#define REP1(i,n) for(int i=1;i<=n;i++)
#define FILL(n,x) memset(n,x,sizeof(n))
#define ALL(_a) _a.begin(),_a.end()
#define sz(x) (int)x.size()
const ll maxn=2e5+5;
const ll maxlg=__lg(maxn)+2;
const ll INF64=4e17;
const int INF=0x3f3f3f3f;
const ll MOD=998244353;
const ld PI=acos(-1);
const ld eps=1e-9;
#define lowb(x) x&(-x)
#define MNTO(x,y) x=min(x,(__typeof__(x))y)
#define MXTO(x,y) x=max(x,(__typeof__(x))y)
#define SORT_UNIQUE(c) (sort(c.begin(),c.end()), c.resize(distance(c.begin(),unique(c.begin(),c.end()))))
ll mult(ll a,ll b){
    return ((a%MOD)*(b%MOD))%MOD;
}
ll mypow(ll a,ll b){

    if(b<=0) return 1;
    if(a<=0) return 0;
    ll res=1LL;
    while(b){
        if(b&1) res=mult(res,a);
        a=mult(a,a);
        b>>=1;
    }
    return res;
}
int arr[maxn],pos[maxn];
ll inv(ll x){
    return mypow(x,MOD-2);
}
int32_t main(){
    ios::sync_with_stdio(false),cin.tie(0);
    int n,k;
    cin>>n>>k;
    --k;
    REP(i,n) cin>>arr[i],pos[arr[i]]=i;
    pos[n]=-1;
    int cnt=0;
    REP(i,n-1){
        // character at arr[i] and arr[i+1] cannot be the same
        if(pos[arr[i]+1]>pos[arr[i+1]+1]){
            ++cnt;
        }
    } 
    //calculate c(k-cnt+n+1-1,n+1-1)
    k-=cnt;
    if(k<0){
        cout<<0;
        return 0;
    }
    ll ans=1;
    REP1(i,n){
        ans=mult(ans,inv(i));
        ans=mult(ans,(k+i));
    }
    cout<<ans;
}

Code (Python)

MOD=998244353

n,k=map(int,input().split())
arr=list(map(int,input().split()))

pos=[0]*(n+1)

for i in range(n):
	pos[arr[i]]=i
pos[n]=-1

cnt=0

for i in range(n-1):
	if (pos[arr[i]+1]>pos[arr[i+1]+1]): cnt+=1

#k-cnt+n-1 choose n
num,denom=1,1

for i in range(n):
	num=num*(k-cnt+n-1-i)%MOD
	denom=denom*(i+1)%MOD

print(num*pow(denom,MOD-2,MOD)%MOD)

1526F - Median Queries

Setter: errorgorn and oolimry
Preparer: errorgorn

Hint 1

Hint 2

Solution

The problem can be broken into $$$2$$$ main parts:

Finding a pair $$$(a,b)$$$ such that $$$|p[a]-p[b]|$$$ is roughly less than a third.
Finding elements $$$1$$$ and $$$2$$$ (actually we may find $$$N$$$ and $$$N-1$$$ but it is the same).
Find the rest of the elements

A convenient way to think about the queries, is that given $$$p[a]<p[b]<p[c]$$$, we will be returned $$$\max(p[b]-p[a],p[c]-p[b])$$$.

We will spend the first queries to find a tuple $$$(a,b,c)$$$ such that the return value of this query is at most $$$\lfloor \frac{n-4}{6} \rfloor$$$. The easiest way to do this is by randomly choosing distinct $$$a$$$,$$$b$$$,$$$c$$$ and querying it. See proof 1 for the random analysis. But there is also a deterministic solution. Choose any $$$13$$$ elements from the array, and it is guaranteed that there exists a tuple in those $$$13$$$ elements such that the return value of the query is at most $$$\lfloor \frac{n-4}{6} \rfloor$$$, see proof 2. To query all tuples in this sub-array of size $$$13$$$, it takes $$$286$$$ queries, which fits very comfortably under $$$420$$$. user14767553 conjectured that a sub-array of size $$$10$$$ works, however we are unable to prove it. You can try to hack his solution at 117685511.

Among $$$a$$$, $$$b$$$, and $$$c$$$, we choose $$$a$$$ and $$$b$$$. The important idea here is that the gap between $$$a$$$ and $$$b$$$ is small, because we have found a tuple $$$(a,b,c)$$$ is less than or equal to $$$\lfloor \frac{n-4}{6} \rfloor$$$, so $$$|p[a]-p[b]| \leq 2 \cdot \lfloor \frac{n-4}{6} \rfloor \leq \lfloor \frac{n-4}{3} \rfloor$$$. Now, we query all tuples $$$(a,b,x)$$$ for all $$$x \ne a$$$ and $$$x \ne b$$$. Since the gap between $$$a$$$ and $$$b$$$ is small, we should be able to identify either $$$(1,2)$$$ or $$$(n,n-1)$$$. This is because either $$$p[a]-1 > p[b]-p[a]$$$ or $$$n-p[b] > p[b] - p[a]$$$ (WLOG assuming $$$p[b] > p[a]$$$), see proof 3. Therefore, the furthest $$$x$$$ will return the largest value when querying for $$$(a,b,x)$$$. (If there's ties choose anyone).

Suppose we found element $$$1$$$, there are at most two possible candidates for element $$$2$$$, $$$y_1$$$ and $$$y_2$$$ (the other possible candidate will be either $$$n$$$ or $$$n-1$$$). We can use 2 queries to figure out which of them is element $$$2$$$. We can query $$$(1, y_1, a)$$$ and $$$(1, y_2, a)$$$. Whichever produces the smaller value is element $$$2$$$, see proof 4.

Once we have found elements $$$1$$$ and $$$2$$$, solving the rest of the problem is simple. Querying $$$(1,2,x)$$$ will always return $$$x-2$$$. And we can find the rest of the elements in another $$$n-2$$$ queries. Do not forgot about the condition that $$$p[0]<p[1]$$$!

Proof 1

Proof 2

Proof 3

Proof 4

Code (C++)

//雪花飄飄北風嘯嘯
//天地一片蒼茫

#include <bits/stdc++.h>
#include <ext/pb_ds/assoc_container.hpp>
#include <ext/pb_ds/tree_policy.hpp>
#include <ext/rope>
using namespace std;
using namespace __gnu_pbds;
using namespace __gnu_cxx;
#define ll long long
#define ii pair<ll,ll>
#define iii pair<ii,ll>
#define fi first
#define se second
#define debug(x) cout << #x << " is " << x << endl

#define pub push_back
#define pob pop_back
#define puf push_front
#define pof pop_front
#define lb lower_bound
#define up upper_bound

#define rep(x,start,end) for(auto x=(start)-((start)>(end));x!=(end)-((start)>(end));((start)<(end)?x++:x--))
#define all(x) (x).begin(),(x).end()
#define sz(x) (int)(x).size()

#define indexed_set tree<ll,null_type,less<ll>,rb_tree_tag,tree_order_statistics_node_update>
//change less to less_equal for non distinct pbds, but erase will bug

mt19937 rng(177013);

int query(int a,int b,int c){
	cout<<"? "<<a<<" "<<b<<" "<<c<<endl;
	int temp;
	cin>>temp;
	
	return temp;
}

void sort(int &a,int &b,int &c){
	 if (a>b) swap(a,b);
	 if (b>c) swap(b,c);
	 if (a>b) swap(a,b);
}

void solve(){
	int n;
	cin>>n;
	
	vector<int> ans(n+5,0);
	
	set<tuple<int,int,int> > s;
	
	int a,b,c;
	rep(x,1,14) rep(y,x+1,14) rep(z,y+1,14){
		if (query(x,y,z)<=(n-4)/6){
			a=x,b=y,c=z;
			break;
		}
	}
	
	map<int,vector<int>,greater<int> > m;
	rep(x,1,n+1) if (x!=a && x!=b){
		m[query(a,b,x)].push_back(x);
	}
	
	int hv=(*m.begin()).fi;
	if (sz(m[hv-1])>=2){ //there must be exactly 2
		if (query(m[hv][0],m[hv-1][0],a)>
			query(m[hv][0],m[hv-1][1],a))
				swap(m[hv-1][0],m[hv-1][1]);
	}
	
	int hi=m[hv][0],hi2=m[hv-1][0];
	
	//cout<<hi2<<" "<<hi<<endl;
	
	ans[hi]=1,ans[hi2]=2;
	rep(x,1,n+1) if (x!=hi2 && x!=hi){
		ans[x]=query(hi,hi2,x)+2;
	}
	
	if (ans[1]>ans[2]) rep(x,1,n+1) ans[x]=n-ans[x]+1;
	
	cout<<"! "; rep(x,1,n+1) cout<<ans[x]<<" "; cout<<endl;
	
	cin>>n; //read stupid value
}

int main(){
	ios::sync_with_stdio(0);
	cin.tie(0);
	cout.tie(0);
	cin.exceptions(ios::badbit | ios::failbit);
	
	int TC;
	cin>>TC;
	while (TC--){
		solve();
	}
}

Code (Python)

import sys, random

print = sys.stdout.write
input = sys.stdin.readline

def query(a, b, c):
    print("? " + str(a+1) + " " + str(b+1) + " " + str(c+1) + "\n")
    sys.stdout.flush()
    return int(input())

def answer():
    print("! " + " ".join([str(i) for i in ans]) + "\n")
    sys.stdout.flush()
    return int(input())

T = int(input())
testVal = 13

randomQueryOrder = []
for i in range(testVal):
    for j in range(i+1, testVal):
        for k in range(j+1, testVal):
            randomQueryOrder.append((i, j, k))

for caseno in range(T):
    N = int(input())
    
    a = -1
    b = -1

    ans = [0 for i in range(N)]
    res = [0 for i in range(N)]
    order = [i for i in range(N)]
    
    minVal = N+5
    
    random.shuffle(order)
    random.shuffle(randomQueryOrder)
    for ijk in randomQueryOrder:
        i, j, k = ijk
        i = order[i]
        j = order[j]
        k = order[k]
        temp = query(i, j, k)
        
        if temp <= (N-4)//6:
            a = i
            b = j
            break
    
    assert(a != -1 and b != -1)

    ''' Now, the key thing to note is that abs(P[a]-P[b]) is less than (roughly speaking) N/3.
    Observe that the index c such that query(a, b, c) is maximum must be such that P[c] == 1 or P[c] == N (some steps skipped).
    But given the condition P[1] < P[2], we can assume either, then flip using P[i] = N+1-P[i] if condition not satisfied.
    '''

    # Now, we assume P[c] = 1
    # There always exists another index d such that query(a, b, d) = query(a, b, c)-1 and P[d] = 2
    # There can be only at most 2 d which satisfy the former, and we can check which is which by comparing query(a, c, d)

    maxVal = -1
    c = -1
    for i in range(N):
        if a == i or b == i:
            continue
        
        res[i] = query(a, b, i)
        if res[i] > maxVal:
            maxVal = res[i]
            c = i

    d2 = []
    for i in range(N):
        if a == i or b == i:
            continue
        
        if res[i] == maxVal-1:
            d2.append(i)

    d = -1
    if len(d2) == 1:
        d = d2[0]
    elif len(d2) == 2:
        temp1 = query(a, c, d2[0])
        temp2 = query(a, c, d2[1])
        if temp1 < temp2:
            d = d2[0]
        else:
            d = d2[1]
    else:
        assert(False)

    # After finding indices c and d such P[c] = 1 and P[d] = 2, we can now find the rest of the values
    ans[c] = 1
    ans[d] = 2
    for i in range(N):
        if i == c or i == d:
            continue
        ans[i] = 2 + query(c, d, i)

    # Flip since P[1] < P[2] not satisfied
    if ans[0] >= ans[1]:
        for i in range(N):
            ans[i] = N+1-ans[i]

    result = answer()

    # I dislike getting WA, hence I make my code MLE
    if result != 1:
        MLE = [1 for i in range(1<<27)]
        sys.stdout.write(MLE[0])

Code Golf

You may have realized that python codes in the editorial are quite short. We actually had a code golf challenge among testers. You are invited to try code golf our problems and put them in the comments. Maybe I will put the shortest codes in the editorial ;)

Rules for code golf:

any language allowed
codes will be judged by number of characters
must get AC in the respective problems

Full text and comments »

Tutorial of Codeforces Round 723 (Div. 2)

+199

errorgorn
3 years ago
291

Codeforces Round #723 (Rated for Div 2)

By errorgorn, history, 3 years ago, In English

Hello again Codeforces!

errorgorn, oolimry and iLoveIOI are glad to invite you to participate in Codeforces Round 723 (Div. 2) which will be held at May/28/2021 17:05 (Moscow time). The round will be rated for the participants with a rating lower than 2100. Participants from the first division are also welcomed to take part in the competition but it will be unrated for them.

You will be given 2 hours and 30 minutes to solve 6 questions. One of the puzzles is interactive, please read the guide of interactive problems before the contest.

We would like to thank:

Our lord and saviour antontrygubO_o for amazing coordination and rejecting our poopoo tasks
dantoh and tqbfjotld for roasting our enigmas and testing them before they were prepared
our many testers balbit, kshitij_sodani, Monogon, T1duS, icypiggy,user14767553, maomao90, hbarp, wxhtzdy, Jellyman102, ETK. and jamessngg for bricking on bamboozlers and destroying our testcases
MikeMirzayanov for Codeforces and Polygon platforms

We hope you find the bugaboos interesting and fun! Wish you high rating!

Here is scoring distribution because you guys deserve it <3

500 — 1000 — (750+1000) — 2250 — 2500 — 3500

Btw, remember to downvote all testers who writes stupid comments to beg for likes.

UPD: editorial released

Full text and comments »

Announcement of Codeforces Round 723 (Div. 2)

+923

errorgorn
3 years ago
214

Codeforces Global Round 13 Editorial

By errorgorn, 3 years ago, In English

1491A - K-е наибольшее

Setter: syksykCCC
Prepared by: syksykCCC

Hint 1

Hint 2

Solution

Code (C++)

#include <bits/stdc++.h>
using namespace std;
const int N = 1e6 + 5;
int n, q, a[N], cnt;
int main()
{
    scanf("%d%d",&n,&q);
	for(int i = 1; i <= n; ++i)
	{
    	scanf("%d",a+i);
		cnt += a[i];
	}
	while(q--)
	{
		int opt, x;
		scanf("%d%d",&opt,&x);
		if(opt == 1)
		{
			if(a[x]) cnt--;
			else cnt++;
			a[x] = 1 - a[x];
		}
		else
		{
			if(cnt >= x) puts("1");
			else puts("0");
		}
	}
	return 0;
}

Code (Python) (vim1729)

n, q = map(int,input().split())
a = list(map(int,input().split()))
zero = a.count(0)
one = n - zero
for _ in range(q):
    t, x = map(int,input().split())
    if t == 1:
        if a[x-1] == 1:
            zero += 1
            one -= 1
            a[x-1] = 0
        else:
            zero -= 1
            one += 1
            a[x-1] = 1
    else:
        if x<=one:
            print(1)
        else:
            print(0)

1491B - Минимальная стоимость

Setter: syksykCCC
Prepared by: syksykCCC

Hint 1

Hint 2

Solution

Code (C++)

#include <bits/stdc++.h>
using namespace std;
const int N = 1e6 + 5;
int n, a[N], ans = INT_MAX, u, v, T;
int main()
{
	ios::sync_with_stdio(false);
	cin>>T;
	while(T--){
	    ans = INT_MAX;
	    cin >> n >> u >> v;
	    for(int i = 1; i <= n; i++)
		    cin >> a[i];
	    for(int i = 2; i <= n; i++)
	    {
		    if(abs(a[i] - a[i - 1]) >= 2) ans = 0;
	    	if(abs(a[i] - a[i - 1]) == 1) ans = min(ans, min(u, v));
		    if(a[i] == a[i - 1]) ans = min(ans, v + min(u, v));
	    }
	    cout << ans << endl;
	}
	
	return 0;
}

1491C - Pekora и батуты

Setter: oolimry
Prepared by: errorgorn

Hint 1

Hint 2

Solution

Code (C++)

//雪花飄飄北風嘯嘯
//天地一片蒼茫

#include <bits/stdc++.h>
#include <ext/pb_ds/assoc_container.hpp>
#include <ext/pb_ds/tree_policy.hpp>
#include <ext/rope>
using namespace std;
using namespace __gnu_pbds;
using namespace __gnu_cxx;
#define ll long long
#define ii pair<ll,ll>
#define iii pair<ii,ll>
#define fi first
#define se second
#define endl '\n'
#define debug(x) cout << #x << " is " << x << endl

#define pub push_back
#define pob pop_back
#define puf push_front
#define pof pop_front
#define lb lower_bound
#define up upper_bound

#define rep(x,start,end) for(auto x=(start)-((start)>(end));x!=(end)-((start)>(end));((start)<(end)?x++:x--))
#define all(x) (x).begin(),(x).end()
#define sz(x) (int)(x).size()

#define indexed_set tree<ll,null_type,less<ll>,rb_tree_tag,tree_order_statistics_node_update>
//change less to less_equal for non distinct pbds, but erase will bug

mt19937 rng(chrono::system_clock::now().time_since_epoch().count());

int n;
int arr[5005];
ll curr[5005];

int main(){
	ios::sync_with_stdio(0);
	cin.tie(0);
	cout.tie(0);
	cin.exceptions(ios::badbit | ios::failbit);
	
	int TC;
	cin>>TC;
	while (TC--){
		cin>>n;
		rep(x,0,n) cin>>arr[x];
		rep(x,0,n) curr[x]=0;
		
		ll ans=0;
		
		rep(x,0,n){
			ll temp=curr[x];
			
			if (temp<arr[x]-1){
				ans+=arr[x]-1-temp;
				temp+=arr[x]-1-temp;
			}
			
			curr[x+1]+=temp-arr[x]+1;
			if (x+2<n) rep(y,x+2,min(n,x+arr[x]+1)) curr[y]++;
		}
		
		cout<<ans<<endl;
	}
}

Code (Python)

TC=int(input())

for tc in range(TC):
    n=int(input())
    arr=list(map(int,input().split()))
    
    curr=[0]*(n+5)
    
    ans=0
    
    for x in range(n):
        temp=curr[x]
        
        if (temp<arr[x]-1):
            ans+=arr[x]-1-temp
            temp+=arr[x]-1-temp
            
        curr[x+1]+=temp-arr[x]+1
        for y in range(x+2,min(n,x+arr[x]+1)):
            curr[y]+=1
        
    print(ans)

1491D - Хранительница Бесконечного Зоопарка

Setter: errorgorn
Prepared by: errorgorn

Hint 1

Hint 2

Hint 3

Hint 4

Solution

Code (C++)

//雪花飄飄北風嘯嘯
//天地一片蒼茫

#include <bits/stdc++.h>
#include <ext/pb_ds/assoc_container.hpp>
#include <ext/pb_ds/tree_policy.hpp>
#include <ext/rope>
using namespace std;
using namespace __gnu_pbds;
using namespace __gnu_cxx;
#define ll long long
#define ii pair<ll,ll>
#define iii pair<ii,ll>
#define fi first
#define se second
#define endl '\n'
#define debug(x) cout << #x << " is " << x << endl

#define pub push_back
#define pob pop_back
#define puf push_front
#define pof pop_front
#define lb lower_bound
#define up upper_bound

#define rep(x,start,end) for(auto x=(start)-((start)>(end));x!=(end)-((start)>(end));((start)<(end)?x++:x--))
#define all(x) (x).begin(),(x).end()
#define sz(x) (int)(x).size()

#define indexed_set tree<ll,null_type,less<ll>,rb_tree_tag,tree_order_statistics_node_update>
//change less to less_equal for non distinct pbds, but erase will bug

mt19937 rng(chrono::system_clock::now().time_since_epoch().count());

int main(){
	ios::sync_with_stdio(0);
	cin.tie(0);
	cout.tie(0);
	cin.exceptions(ios::badbit | ios::failbit);
	
	int TC;
	cin>>TC;
	while (TC--){
		int a,b;
		cin>>a>>b;
		
		vector<int> va,vb;
		
		if (a>b){
			cout<<"NO"<<endl;
			goto done;
		}
		
		rep(x,0,30){
			if (a&(1<<x)) va.pub(x);
			if (b&(1<<x)) vb.pub(x);
		}
		
		if (sz(va)<sz(vb)){
			cout<<"NO"<<endl;
			goto done;
		}
		
		rep(x,0,sz(vb)){
			if (va[x]>vb[x]){
				cout<<"NO"<<endl;
				goto done;
			}
		}
		
		cout<<"YES"<<endl;
		
		done:;
	}
}

Code (C++) (gamegame)

#include<bits/stdc++.h>
using namespace std;
typedef long long ll;
int main(){
	ios::sync_with_stdio(false);cin.tie(0);
	int t;cin >> t;
	while(t--){
		int x,y;cin >> x >> y;
		int z=0;
		bool ok=(x<=y);
		for(int i=0; i<30 ;i++){
			if((x>>i)&1) z++;
			if((y>>i)&1) z--;
			if(z<0) ok=false;
		}
		if(ok) cout << "YES\n";
		else cout << "NO\n";
	}
}

Code (Python)

def lsb(x):
    return x & (-x)
Q = int(input())
for q in range(Q):
    a,b = map(int,input().split(" "))
    if a > b:
        print("NO")
    else:
        can = True
        while b > 0:
            if lsb(b) < lsb(a) or a == 0:
                can = False
                break
            a -= lsb(a)
            b -= lsb(b)
        print("YES" if can else "NO")

1491E - Фиб-дерево

Setter: Widowmaker
Prepared by: Widowmaker and syksykCCC

Hint

Solution

Code (C++)

#include <bits/stdc++.h>
using namespace std;
const int N = 2e5 + 5;
int n, siz[N];
vector<int> fib;
vector<pair<int, bool> > G[N];
void NO() { cout << "NO\n"; exit(0); }
void GetSize(int u, int fa)
{
    siz[u] = 1;
    for(pair<int, bool> e : G[u])
    {
        if(e.second) continue;
        int v = e.first;
        if(v == fa) continue;
        GetSize(v, u);
        siz[u] += siz[v];
    }
}
void CutEdge(int u, int fa, int k, int &pu, int &pv, int &kd)
{
    for(pair<int, bool> e : G[u])
    {
        if(pu) return;
        if(e.second) continue;
        int v = e.first;
        if(v == fa) continue;
        if(siz[v] == fib[k - 1] || siz[v] == fib[k - 2])
        {
            pu = u; pv = v;
            kd = (siz[v] == fib[k - 1]) ? k - 1 : k - 2;
            break;
        }
        CutEdge(v, u, k, pu, pv, kd);
    }
}
void Check(int u, int k)
{
//    cout << "Check " << u << ' ' << k << endl;
    if(k <= 1) return;
    GetSize(u, 0);
    int pu = 0, pv = 0, kd = 0;
    CutEdge(u, 0, k, pu, pv, kd);
//    cout << pu << ' ' << pv << ' ' << kd << endl;
    if(!pu) NO();
    for(pair<int, bool> &e : G[pu])
        if(e.first == pv) e.second = true;
    for(pair<int, bool> &e : G[pv])
        if(e.first == pu) e.second = true;
    Check(pv, kd);
    Check(pu, (kd == k - 1) ? k - 2 : k - 1);
}
int main()
{
    ios::sync_with_stdio(false);
    cin >> n;
    fib.push_back(1);
    fib.push_back(1);
    for(int i = 1; ; i++)
    {
        if(fib[i] >= n) break;
        int new_fib = fib[i] + fib[i - 1];
        fib.push_back(new_fib);
    }
    for(int i = 1; i < n; i++)
    {
        int u, v;
        cin >> u >> v;
        G[u].push_back(make_pair(v, false));
        G[v].push_back(make_pair(u, false));
    }
    if(fib[fib.size() - 1] != n) NO();
    Check(1, fib.size() - 1);
    cout << "YES\n";
    return 0;
}

1491F - Магниты

Setter: 3.141592653 and star_xingchen_c
Prepared by: 3.141592653 and star_xingchen_c

Hint 1

Hint 2

Hint 3

Solution

Code (C++)

#include<bits/stdc++.h>
int n;
std::vector<int>ans,tmp,hlf;
int main()
{
    int T;
	scanf("%d",&T);
	for(;T--;)
	{
	    ans.clear(),tmp.clear(),hlf.clear();
    	scanf("%d",&n);
    	register int i,ii;
    	int sec=0;
    	for(i=2;i<=n;i++)
    	{
    		printf("? 1 %d\n%d\n",i-1,i);
    		for(ii=1;ii<i;ii++)printf("%d ",ii);
    		puts(""),fflush(stdout);
    		int f;
    		scanf("%d",&f);
    		if(f){sec=i;break;}
    	}for(i=sec+1;i<=n;i++)
    	{
    		printf("? 1 1\n%d\n%d\n",sec,i);
    		fflush(stdout);
    		int f;
    		scanf("%d",&f);
    		if(!f)ans.push_back(i);
    	}for(i=1;i<sec;i++)tmp.push_back(i);
    	while(tmp.size()>1)
    	{
    		int md=tmp.size()>>1;
    		hlf.clear();
    		for(i=1;i<=md;i++)
    			hlf.push_back(tmp.back()),tmp.pop_back();
    		printf("? 1 %d\n%d\n",md,sec);
    		for(int t:hlf)printf("%d ",t);
    		puts(""),fflush(stdout);
    		int f;
    		scanf("%d",&f);
    		if(f)tmp=hlf;
    	}
    	for(i=1;i<sec;i++)if(tmp[0]!=i)ans.push_back(i);
    	printf("! %u",ans.size());
    	for(int t:ans)printf(" %d",t);
    	puts(""),fflush(stdout);
	}
}

1491G - Обменяйте и переверните

Setter: errorgorn and oolimry
Prepared by: errorgorn

Hint 1

Hint 2

Hint 3

Solution

Code (C++)

//雪花飄飄北風嘯嘯
//天地一片蒼茫

#include <bits/stdc++.h>
#include <ext/pb_ds/assoc_container.hpp>
#include <ext/pb_ds/tree_policy.hpp>
#include <ext/rope>
using namespace std;
using namespace __gnu_pbds;
using namespace __gnu_cxx;
#define ll long long
#define ii pair<ll,ll>
#define iii pair<ii,ll>
#define fi first
#define se second
#define endl '\n'
#define debug(x) cout << #x << " is " << x << endl

#define pub push_back
#define pob pop_back
#define puf push_front
#define pof pop_front
#define lb lower_bound
#define up upper_bound

#define rep(x,start,end) for(auto x=(start)-((start)>(end));x!=(end)-((start)>(end));((start)<(end)?x++:x--))
#define all(x) (x).begin(),(x).end()
#define sz(x) (int)(x).size()

#define indexed_set tree<ll,null_type,less<ll>,rb_tree_tag,tree_order_statistics_node_update>
//change less to less_equal for non distinct pbds, but erase will bug

mt19937 rng(chrono::system_clock::now().time_since_epoch().count());

int n;
int arr[200005];
bool vis[200005];
vector<ii> ans;

void cswap(int i,int j){
	swap(arr[i],arr[j]);
	arr[i]=-arr[i],arr[j]=-arr[j];
	ans.pub(ii(i,j));
}

void swap_cyc(int i,int j){
	cswap(i,j);
	
	int curr=i;
	while (arr[-arr[curr]]>0){
		cswap(curr,-arr[curr]);
	}
	
	curr=-arr[curr];
	while (arr[-arr[curr]]>0){
		cswap(curr,-arr[curr]);
	}
	
	cswap(curr,-arr[curr]);
}

int main(){
	ios::sync_with_stdio(0);
	cin.tie(0);
	cout.tie(0);
	cin.exceptions(ios::badbit | ios::failbit);
	
	cin>>n;
	rep(x,1,n+1) cin>>arr[x];
	
	int p=-1;
	
	rep(x,1,n+1) if (!vis[x]){
		if (arr[x]==x) continue;
		
		int curr=x;
		do{
			vis[curr]=true;
			curr=arr[curr];
		} while (curr!=x);
		
		if (p==-1) p=x;
		else{
			swap_cyc(p,x);
			p=-1;
		}
	}
	
	if (p!=-1){
		bool can=false;
		rep(x,1,n+1) if (arr[x]==x){
			swap_cyc(p,x);
			can=true;
			break;
		}
		
		if (!can){
			int t1=arr[p],t2=arr[arr[p]];
			cswap(p,t1);
			swap_cyc(t1,t2);
		}
	}
	
	cout<<sz(ans)<<endl;
	for (auto &it:ans) cout<<it.fi<<" "<<it.se<<endl;
}

Code (Python)

from sys import stdin, stdout

n=int(stdin.readline())

#make 1-indexed
arr=[0]+list(map(int,stdin.readline().split()))

vis=[0]*(n+1)

ans=[]

def cswap(i,j):
    arr[i],arr[j]=-arr[j],-arr[i]
    ans.append((i,j))
    
def swap_cyc(i,j):
    cswap(i,j)
    
    curr=i
    while (arr[-arr[curr]]>0):
        cswap(curr,-arr[curr])
        
    curr=-arr[curr]
    while (arr[-arr[curr]]>0):
        cswap(curr,-arr[curr])
        
    cswap(curr,-arr[curr])
    
p=-1
for i in range(1,n+1):
    if (vis[i]==1): continue
    if (arr[i]==i): continue
    
    curr=i
    while (True):
        vis[curr]=1
        curr=arr[curr]
        if (curr==i): break
        
    if (p==-1): 
        p=i
    else:
        swap_cyc(p,i)
        p=-1

if (p!=-1):
    can=False
    for i in range(1,n+1):
        if (arr[i]==i):
            swap_cyc(p,i)
            can=True
            break
        
    if (can==False):
        t1,t2=arr[p],arr[arr[p]]
        cswap(p,t1)
        swap_cyc(t1,t2)
            
print(len(ans))
[print(i[0],i[1]) for i in ans]

1491H - Yuezheng Ling и динамическое дерево

Setter: Ynoi
Prepared by: Widowmaker and Ynoi

Hint 1

Solution

Code (C++)

#include <bits/stdc++.h>
#define N 100005
using namespace std;

template <typename T>

void read(T &a)
{
	T x = 0,f = 1;
	char ch = getchar();
	while (ch < '0' || ch > '9')
	{
		if (ch == '-') f = -1;
		ch = getchar();
	}
	while (ch >= '0' && ch <= '9')
	{
		x = (x << 1) + (x << 3) + (ch ^ 48);
		ch = getchar();
	}
	a = x * f;
}

template <typename T>

void write(T x)
{
	if (x < 0) putchar('-'),x = -x;
	if (x < 10) return (void) putchar(x + '0');
	write(x / 10);
	putchar(x % 10 + '0');
}

template <typename T>

void writeln(T x)
{
	write(x);
	putchar('\n');
}

template <typename T>

void writes(T x)
{
	write(x);
	putchar(' ');
}

template <typename T,typename... Args> 

void read(T &maxx,Args &... args)
{
	read(maxx);
	read(args...);
}

template <typename T,typename... Args>

void writeln(T maxx,Args ... args)
{
	writes(maxx);
	writes(args...);
	putchar('\n');
}

const int B = 300;
int n,q,a[N],num,bl[N],vis[N];
std::vector<int> v;

struct Block
{
	int f[B],lazy,flag,cnt,L,R;
	void update()
	{
		flag = 1;
		for (int i = L; i <= R; i++)
		{
			if (bl[i] != bl[a[i]]) f[i - L] = i;
			else f[i - L] = f[a[i] - L],flag = 0;
		}
	}
	void update(int l,int r,int x)
	{
		for (int i = l; i <= r; i++)
			a[i] = i == 1 ? 0 : max(a[i] - x,1);
		update();
	}
	void update(int x)
	{
		if (flag) lazy += x;
		else update(L,R,x);
	}
}T[(N - 1) / B + 5];

void update(int l,int r,int x)
{
	if (bl[l] == bl[r])
	{
		T[bl[l]].update(l,r,x);
		return ;
	}
	T[bl[l]].update(l,T[bl[l]].R,x);
	for (int i = bl[l] + 1; i < bl[r]; i++)
		T[i].update(x);
	T[bl[r]].update(T[bl[r]].L,r,x);
	// cout << l << ' ' << r << ' ' << x << endl;
	// cout << '0' << ' ';
	// for (int i = 2; i <= n; i++)
	// 	cout << max(a[i] - T[bl[i]].lazy,1) << ' ';
	// cout << endl;
}

void clear()
{
	for (int i = 0; i < v.size(); i++)
		vis[v[i]] = 0;
	v.clear();
}

int query(int x,int y)
{
	// cout << '0' << ' ';
	// for (int i = 2; i <= n; i++)
	// 	cout << max(a[i] - T[bl[i]].lazy,1) << ' ';
	// cout << endl;
	while (T[bl[x]].f[x - T[bl[x]].L] != T[bl[y]].f[y - T[bl[y]].L])
	{
		if (bl[x] < bl[y]) swap(x,y);
		// cout << x << ' ' << T[bl[x]].f[x - T[bl[x]].L] << endl;
		x = (T[bl[x]].f[x - T[bl[x]].L] == 1) ? 0 : max(a[T[bl[x]].f[x - T[bl[x]].L]] - T[bl[x]].lazy,1);
	}
	int qaq = (T[bl[x]].f[x - T[bl[x]].L] == 1) ? 0 : max(a[T[bl[x]].f[x - T[bl[x]].L]] - T[bl[x]].lazy,1);
	while (x != qaq)
	{
		assert(x != qaq);
		vis[x] = 1;
		v.push_back(x);
		x = (x == 1) ? 0 : max(a[x] - T[bl[x]].lazy,1);
	}
	while (y != qaq)
	{
		if (vis[y]) return clear(),y;
		y = (y == 1) ? 0 : max(a[y] - T[bl[y]].lazy,1);
	}
}

signed main()
{
	read(n,q);
	for (int i = 2; i <= n; i++) read(a[i]);
	num = (n - 1) / B + 1;
	for (int i = 1; i <= num; i++)
	{
		T[i].L = (i - 1) * B + 1;
		T[i].R = min(i * B,n);
		T[i].cnt = i * B - (i - 1) * B;
		T[i].flag = T[i].lazy = 0;
		for (int j = T[i].L; j <= T[i].R; j++)
			bl[j] = i;
	}
	for (int i = 1; i <= num; i++)
		T[i].update();
	while (q--)
	{
		int opt;
		read(opt);
		if (opt == 1)
		{
			int l,r,x;
			read(l,r,x);
			update(l,r,x);
		}
		if (opt == 2)
		{
			int u,v;
			read(u,v);
			writeln(query(u,v));
		}
	}
	// cerr << (double) clock() / CLOCKS_PER_SEC << endl;
	return 0;
}

1491I - Правитель зоопарка

Setter: oolimry
Prepared by: oolimry

As the solution to this problem is very long, the full editorial is split into $$$4$$$ parts. If you want to challenge yourself, you can try reading one part at a time and see if you get any inspiration. You can also try to read the specific hints for each part.

Part 1 Hint 1

Part 1

Part 2 Hint 1

Part 2 Hint 2

Part 2 Hint 3

Part 2

Part 3 Hint 1

Part 3 Hint 2

Part 3 Hint 3

Part 3 Hint 4

Part 3 Hint 5

Part 3 Hint 6

Part 3

As such, we have the following conclusions:

REDS are never created only destroyed
The order of NONREDS will not change unless a RED inserts itself within the sequence
REDS will never battle REDS (because of * assumption)

Since the number of times REDS can be destroyed is limited, we should exploit that. We define an event as either when a RED is destroyed or when the game ends. Since events can occur at most $$$O(n)$$$ times, we just need to find some way to quickly simulate the moves until the next event.

Let's visualize the moves like this: instead of the RED and NONRED swapping positions, the NONREDs are fixed in position, and the REDs move around anticlockwise.

After $$$n-1$$$ moves, every RED moves one space anticlockwise around the belt of NONREDs Then in that case, we just need to check how many sets of $$$n-1$$$ moves is needed until the first event occurs.

For convenience, let's split NONRED into BLUE and GREEN. If a NONRED beats a red in front of it, and it loses to next animal, then it is BLUE. If it wins one more time (and hence the entire game) it is GREEN.

Let's look at the conditions for the events:

A RED is destroyed. This occurs when $$$B_{nr} < A_{r}$$$ where $$$nr$$$ is any BLUE or GREEN
The game ends. This occurs when $$$B_{g} > A_{r}$$$ where $$$g$$$ is any GREEN

To find the first how many times the reds need to move anticlockwise, we could maintain a monotonic vector across the belt of NONREDs, and then for each RED, binary search the earliest position where an event occurs.

Finally, when we find the number of sets of $$$n-1$$$ we need to move backwards, we move everything manually, check if the game ends. Then we recompute the color of all animals, and repeat. If no event occurs, then it will repeat infinitely.

The step of finding the number of steps before first event occurs takes $$$O(nlogn)$$$ in total. Since REDs can disappear at most $$$O(n)$$$ times, then the total time complexity is $$$O(n^2logn)$$$.

It's worth noting that the constant factor for this algorithm is small as the number of REDs is $$$0.5n$$$ and binary search is a very fast log. Hence, it passes quite comfortably (author's solution is under 400ms) even though $$$n \leq 6000$$$.

Part 4

Code (C++)

#include <bits/stdc++.h>
using namespace std;
const int RED = 0, BLUE = 1, GREEN = 2;
const long long inf = 1e16;
typedef pair<long long, long long> ii;
typedef pair<long long, ii> iii;

long long n;
struct animal{ int a, b, c, id, pos, colour, redHere; };
vector<animal> belt;
vector<animal> arr;

vector<iii> s; ///strictly increasing stack
vector<ii> reds;
void insert(long long b, long long beltPos, long long pos){
	while(!s.empty() && s.back().first >= b) s.pop_back();
	s.push_back(iii(b,ii(beltPos, pos)));
}

long long totalMoves = 0;

void brute(){
	deque<animal> q;
	animal w = arr[0];
	
	for(int i = 1;i < n;i++){
		q.push_back(arr[i]);
	}

	long long count = 1;
	for(int x = 1;;x++){
		if(x > 2*n){
			return;
		}
		
		if(count == 3){
			cout << w.id << " " << x << "\n"; // << "\n" << (int)((clock() - start));
			exit(0);
		}
			
		if(count == 1){
			if(w.b > q.front().a){
				animal y = q.front();
				q.push_back(y);
				q.pop_front();
				count++;
			}
			else{
				q.push_back(w);
				w = q.front();
				q.pop_front();
				count = 1;
			}
		}
		else{
			if(w.c > q.front().a){
				animal y = q.front();
				q.push_back(y);
				q.pop_front();
				count++;
			}
			else{
				q.push_back(w);
				w = q.front();
				q.pop_front();
				count = 1;
			}
		}
	}
}

int main(){
	ios_base::sync_with_stdio(false); cin.tie(0); cout.tie(0);
	
	cin >> n;
	for(int i = 0;i < n;i++){
		int a, b, c; cin >> a >> b >> c;
		assert(a > b and b < c);
		arr.push_back({a,b,c,i,-1,1,-1});
	}
	
	if(arr[0].a > arr[1].a) swap(arr[0],arr[1]); ///settle first game
	arr.push_back(arr[0]); arr.erase(arr.begin());
	
	brute();
	
	///deciding which are RED
	for(int i = 1;i < n;i++){
		if(arr[i-1].b > arr[i].a){
			arr[i].colour = RED;
		}
	}
	
	///for non-RED, decide if it's BLUE or GREEN
	for(int i = 0;i < n;i++){
		arr[i].pos = i;
		
		if(arr[i].colour == RED) continue;
		int nxt = i+1; if(nxt == n) nxt = 0;
		if(arr[nxt].colour == RED) nxt++; if(nxt == n) nxt = 0;
		
		if(arr[i].c > arr[nxt].a) arr[i].colour = GREEN;
		else arr[i].colour = BLUE;
	}
	
	///get ready the belt
	arr.push_back(arr[0]);
	for(int i = 0;i < n;i++){
		if(arr[i].colour != RED){
			if(arr[i+1].colour == RED){
				arr[i].redHere = arr[i+1].pos;
			}
			belt.push_back(arr[i]);
		}
	}
	
	while(true){
		s.clear(); reds.clear();
		
		///Account for cyclic nature, so every element is inserted once, then later inserted again
		for(int i = 0;i < (int) belt.size();i++){
			animal A = belt[i];
			if(A.colour == BLUE) insert(A.b, A.pos, i);
			else insert(-inf, A.pos, i); ///GREEN trigger events regardless of the value of a
		}
				
		long long minMoves = inf;
		for(int i = 0;i < (int) belt.size();i++){
			animal A = belt[i];
			if(A.colour == BLUE) insert(A.b, A.pos, i);
			else insert(-inf, A.pos, i); ///GREEN trigger events regardless of the value of a
			
			///find the RED at that position, if any
			int red = A.redHere; if(red == -1) continue;
			reds.push_back(ii(red, i));
			
			///find the earliest event that triggers this RED (rightmost non-Red with nonRED.a < RED.b or rightmost GREEN)
			auto early = lower_bound(s.begin(), s.end(), iii(arr[red].a, ii(-1,-1)));
			if(early == s.begin()) continue; ///no event in this case
			early--;
			
			///Number of rotations before event (rotation = N-1 turns)
			long long distance = i - (early->second).second;
			if(distance < 0) distance += (belt.size());
			minMoves = min(distance, minMoves); ///find the minimum number of rotations
		}
		
		///No event can occur, answer last infiinite
		if(minMoves == inf){
			cout << "-1 -1";
			return 0;
		}
		else if(minMoves != 0){ ///Move the reds CCW along the belt
			totalMoves += minMoves * (n-1);
			for(ii R : reds) belt[R.second].redHere = -1;
			for(ii R : reds){
				int pos = R.second;
				pos -= minMoves; if(pos < 0) pos += belt.size();
				belt[pos].redHere = R.first;
			}
		}
		
		ii ans = ii(inf, inf);
		
		///Updating the belt
		for(int i = 0;i < (int) belt.size();i++){
			if(belt[i].redHere == -1) continue;
			animal R = arr[belt[i].redHere];
			if(belt[i].b < R.a){ ///BLUE type event occured
				belt[i].redHere = -1;		
				belt.insert(belt.begin()+(i+1), R); ///inserting in middle of vector
				
				///updating the colours of the surrounding non-REDs
				if(belt[i].c > R.a) belt[i].colour = GREEN;
				else belt[i].colour = BLUE;
				
				int nxt = i+2; if(nxt == (int) belt.size()) nxt = 0;
				if(belt[i+1].c > belt[nxt].a) belt[i+1].colour = GREEN;
				else belt[i+1].colour = BLUE;
			}
			else if(belt[i].colour == GREEN){ ///GREEN type event occured, game ends
				ans = min(ans, ii(totalMoves + R.pos + 2, belt[i].id));
			}
		}
		
		if(ans.first != inf){
			cout << ans.second << " " << ans.first;
			return 0;
		}
	}
}

Code (C++) (errorgorn)

//雪花飄飄北風嘯嘯
//天地一片蒼茫

#include <bits/stdc++.h>
#include <ext/pb_ds/assoc_container.hpp>
#include <ext/pb_ds/tree_policy.hpp>
#include <ext/rope>
using namespace std;
using namespace __gnu_pbds;
using namespace __gnu_cxx;
#define ll long long
#define ii pair<ll,ll>
#define iii pair<ii,ll>
#define fi first
#define se second
#define endl '\n'
#define debug(x) cout << #x << " is " << x << endl

#define pub push_back
#define pob pop_back
#define puf push_front
#define pof pop_front
#define lb lower_bound
#define up upper_bound

#define rep(x,start,end) for(auto x=(start)-((start)>(end));x!=(end)-((start)>(end));((start)<(end)?x++:x--))
#define all(x) (x).begin(),(x).end()
#define sz(x) (int)(x).size()

#define indexed_set tree<ll,null_type,less<ll>,rb_tree_tag,tree_order_statistics_node_update>
//change less to less_equal for non distinct pbds, but erase will bug

mt19937 rng(chrono::system_clock::now().time_since_epoch().count());

struct dat{
	int a,b,c;
	int idx;
};

int n;
dat head;
deque<dat> dq;

dat state[6005];
dat trans_belt[6005];
bool green[6005];

int curr;
ll moves=1;

void brute(){
	rep(x,0,1000000){
		if ((curr==0?head.a:(curr==1?head.b:head.c))>dq.front().a){
			dq.pub(dq.front());
			dq.pof();
			curr++;
			if (curr==3){
				cout<<head.idx<<" "<<moves<<endl;
				exit(0);
			}
		}
		else{
			dq.pub(head);
			head=dq.front();
			dq.pof();
			curr=1;
		}
		
		moves++;
		
		if (x>n && curr==1) break;
	}
}

int main(){
	ios::sync_with_stdio(0);
	cin.tie(0);
	cout.tie(0);
	cin.exceptions(ios::badbit | ios::failbit);
	
	cin>>n;
	
	rep(x,0,n){
		int a,b,c;
		cin>>a>>b>>c;
		
		if (!x) head={a,b,c,x};
		else dq.pub({a,b,c,x});
	}
	
	curr=0;
	brute();
	
	//convert to state
	state[0]=head;
	rep(x,1,n) state[x]=dq[x-1];
	
	while (true){
		//cout<<"move: "<<moves-1<<endl;
		//rep(x,0,n) cout<<state[x].idx<<" "; cout<<endl;
		
		vector<ii> stk;
		
		//find the first instance where state[z].b<state[x].a
		//maintain min stack of state[z].b
		
		//now, we focus on the belt itself
		//there is a case where the red does not convert to a anything but the game ends
		//we need to handle that case carefully!
		//i think lets just insert it into stack with value -1
		//then if it appears its easily handled
		
		memset(green,false,sizeof(green));
		
		int cnt=0;
		
		int pidx=-1;
		rep(x,0,n){
			if (x && state[x-1].b>state[x].a){ //red
				
			}
			else{ //not red
				if (pidx!=-1 && state[pidx].c>state[x].a){
					green[pidx]=true;
				}
				
				pidx=x;
			}
		}
		if (state[pidx].c>state[0].a) green[0]=true;
		
		rep(x,0,n){
			if (x && state[x-1].b>state[x].a){ //red
				
			}
			else{ //not red
				int temp=green[x]?-1:state[x].b;
				
				while (!stk.empty() && stk.back().fi>temp) stk.pob();
				stk.pub(ii(temp,cnt));
				cnt++;
			}
		}
		
		int best=1e9; //shortest distance until red becomes not red
		int cnt2=cnt;
		
		//cout<<"non-reds: "<<cnt<<endl;
		
		rep(x,0,n){
			if (x && state[x-1].b>state[x].a){ //red
				if (stk.front().fi>state[x].a) continue; //it wont become red here
				
				int lo=0,hi=sz(stk),mi;
				while (hi-lo>1){
					mi=hi+lo>>1;
					if (stk[mi].fi<state[x].a) lo=mi;
					else hi=mi;
				}
				
				//cout<<"debug: "<<x<<endl;
				//for (auto &it:stk) cout<<it.fi<<"_"<<it.se<<" "; cout<<endl;
				//cout<<state[x].a<<endl;
				
				//cout<<"found: "<<pos[stk[hi]]<<endl;
				int dist=cnt2-stk[lo].se;
				//cout<<dist<<endl;
				if (dist<best){
					best=dist;
				}
				
				//cout<<endl;
				
			}
			else{ //not red
				int temp=green[x]?-1:state[x].b;
				
				while (!stk.empty() && stk.back().fi>temp) stk.pob();
				stk.pub(ii(temp,cnt2));
				cnt2++;
			}
		}
		
		//cout<<"number in belt: "<<cnt<<endl;
		//cout<<"d: "<<best<<" "<<idx<<endl;
		
		if (best==1e9){
			cout<<"-1 -1"<<endl;
			return 0;
		}
		
		//we simulate best-1 cycles for the reds
		
		best--;
		
		//cout<<"hmm: "<<endl;
		//cout<<best<<" "<<cnt<<endl;
		if (best>=0){
			moves+=best*(n-1);
			
			//cout<<endl;
			cnt2=0;
			rep(x,0,n){
				if (x && state[x-1].b>state[x].a){ //red
				
				}
				else{ //not red
					trans_belt[cnt2]=state[x];
					cnt2++;
				}
			}
			
			//rep(x,0,cnt2) cout<<trans_belt[x].idx<<" "; cout<<endl;
			
			rep(x,n,0){
				if (x && state[x-1].b>state[x].a){ //red
					
				}
				else{ //not red
					cnt2--;
					//cout<<x<<" "<<(cnt2-best+cnt)%cnt<<endl;
					state[x]=trans_belt[(cnt2-best+cnt)%cnt];
				}
			}
		}
		
		//cout<<cnt<<endl;
		//cout<<"move: "<<moves-1<<endl;
		//rep(x,0,n) cout<<state[x].idx<<" "; cout<<endl;
		
		head=state[0];
		dq.clear();
		rep(x,1,n) dq.pub(state[x]);
		
		curr=1;
		brute();
		
		//convert to state
		state[0]=head;
		rep(x,1,n) state[x]=dq[x-1];
		
		//cout<<"move: "<<moves-1<<endl;
		//rep(x,0,n) cout<<state[x].idx<<" "; cout<<endl;
		//cout<<endl;
		
		//break;
	}
}

Full text and comments »

Tutorial of Codeforces Global Round 13

#global round 13, global round, editorial

+255

errorgorn
3 years ago
188

Codeforces Raif Round 1 [Div. 1 + Div. 2]

By errorgorn, 4 years ago, In English

Hello, Codeforces!

Welcome to the Codeforces Raif Round 1 (Div. 1 + Div. 2) supported by Raiffeisenbank, that will start on Oct/17/2020 16:05 (Moscow time). It will be a combined rated round for both divisions. Note that the start time is unusual.

All problems were authored and prepared by bensonlzl, oolimry, errorgorn, dvdg6566, shenxy13.

Ari gato to:

isaf27 and KAN for round coordination and help with preparation
zhangguangxuan99 for being epic IOI trainer <3
Our army of testers: Ari, KAN, Monogon, prabowo, SYY, SleepyShashwat, TeaTime, Tlatoani, -rs-, agul, cstuart, dantoh, jhkoh, kai824, kymmykym, lperovskaya, mango_lassi, morzer, Retired_cherry, socho, zhangguangxuan99, zscoder for their invaluable advice and suggestions.
MikeMirzayanov for great systems Codeforces and Polygon.

You will be given 8 problems, one of which would be divided into easy and hard versions, and 150 minutes to solve them.

We hope that statements are short and pretests are strong and that you find the problems interesting! Good luck, have fun and we wish everyone high ratings!

The scoring distribution will be announced closer to the beginning of the round.

Thanks to Raiffeisenbank, winners will get awesome swag:

1st-3rd place = Bluetooth speaker
4th-10th place = Bumbag
11th-20th place = Power Bank

3HCumg.md.png

Random 50 participants outside of top-20, who solved at least one problem will receive:

Thermos Mugs
Raiffeisenbank t-shirt

About Algorithmic Trading team in Raiffeisenbank

We develop a high-frequency trading (HFT) system for equity, currency and derivative markets. Our business edge is in technology. The main goal is to create a top-notch platform based on fundamental and statistical models and machine learning, with low latency and high throughput. The efficiency and scalability of our code give us a competitive advantage. We are passionate about code quality and strive for highest standards in product development.

If you are interested in internship and employment opportunities in the Raiffeisenbank algo-trading team Capital Markets, or want to get in touch with the bank's recruitment , fill out a form below.

FILL OUT FORM →

UPD: Scoring distribution: 500 — 1000 — 1000 — 1500 — 1750 — 1750 — (2250+750) — 4000

UPD2: Editorial out!

UPD 3: First ACs and winners

First ACs

A: 300iq

B: icecuber

C: Not-Afraid

D: Radewoosh

E: Errichto

F: fmota

G: Radewoosh

H: Radewoosh

Top 5

Congratulations to everyone!

Full text and comments »

Announcement of Codeforces Raif Round 1 (Div. 1 + Div. 2)

+1254

errorgorn
4 years ago
270

Tutorial on Permutation Tree (析合树）

By errorgorn, 4 years ago, In English

So my O level Chinese exam is in 2 days so I decided to learn a data structure that I can only find resources for in Chinese. I thought I might as well write a tutorial in English.

This data structure is called 析合树, directly translated is cut join tree, but I think permutation tree is a better name. Honestly, after learning about it, it seems like a very niche data structure with very limited uses, but anyways here is the tutorial on it.

Thanks to dantoh and oolimry for helping me proofread.

Motivation

Consider this problem. We are given a permutation,$$$P$$$ of length $$$n$$$. A good range is a contiguous subsequence such that $$$\max\limits_{l \leq i \leq r} P_i - \min\limits_{l \leq i \leq r} P_i = r-l$$$. This can be thought of the number of contiguous subsequence such that when we sort the numbers in this subsequence, we get contiguous values. Count the number of good ranges.

Example: $$$P=\{5,3,4,1,2\}$$$.

All good ranges are $$$[1,1], [2,2], [3,3], [4,4], [5,5], [2,3], [4,5], [1,3], [2,5], [1,5]$$$.

The $$$O(n^2)$$$ solution for this is using sparse table and checking every subsequence if it fits the given conditions. But it turns out we can speed this up using permutation tree to $$$O(n\log{n})$$$.

Definitions

A permutation $$$P$$$ of length $$$n$$$ is defined as:

$$$|P|=n$$$
$$$\forall i, P_i \in [1,n]$$$
$$$\nexists i,j \in [1,n], P_i \ne P_j$$$

A good range is defined as a range, $$$[l,r]$$$ such that $$$\max\limits_{l \leq i \leq r} P_i - \min\limits_{l \leq i \leq r} P_i = r-l$$$ or equivalently $$$\nexists x,z \in [l,r], y \notin [l,r], P_x<P_y<P_z$$$.

We denote a good range $$$[l,r]$$$ of $$$P$$$ as $$$(P, [l,r])$$$, and also denote the set of all good ranges as $$$I_g$$$.

Permutation Tree

So we want a structure that can store all good ranges efficiently.

Firstly, we can notice something about these good ranges. They are composed by the concatenation of other good ranges.

So the structure of the tree is that a node can have some children and the range of the parent is made up of the concatenation of the children's ranges.

Here is an example permutation. $$$P=\{9,1,10,3,2,5,7,6,8,4\}$$$.

As we can see from the above image, every node represents a certain good range, where the values in the node represent the minimum and maximum values contains in this range.

Notice in this data structure, for any 2 nodes $$$[l_1,r_1]$$$ and $$$[l_2,r_2]$$$, WLOG $$$l_1 \leq l_2$$$, either $$$r_1<l_2$$$ or $$$r_2 \leq r_1$$$.

Definition of Cut Nodes and Join Nodes

We shall define some terms used in this data structure:

Node range: For some node $$$u$$$, $$$[u_l,u_r]$$$ will describe the minimum and maximum value contained in the range the node represents
Ranges of children: For some non-leaf node $$$u$$$, let the array $$$S_u$$$ denote the array of the ranges of its children. For example, the root node the above picture, $$$S_u$$$ is $$$\{[9,9],[1,1],[10,10],[2,8]\}$$$.
Order of children: For some non-leaf node $$$u$$$, we can discretize the ranges in $$$S_u$$$. Again using the example of the root node, the order of its children is $$$\{3,1,4,2\}$$$, we will call this $$$D_u$$$.
Join node: For some non-leaf node $$$u$$$, we call it a join node if $$$D_u=\{1,2,3,\cdots\}$$$ or $$$D_u=\{\cdots,3,2,1\}$$$. For simplicity we also consider all leaf nodes to be join nodes.
Cut node: Any node that is not a join node.

Properties of Cut Nodes and Join Nodes

Firstly, we have this very trivial property. The union of all ranges of children is the node's range. Or in fancy math notation, $$$\bigcup_{i=1}^{|S_u|} S_u[i]=[u_l,u_r]$$$.

For a join node $$$u$$$, any contiguous subsequence of ranges of its children is a good range. Or, $$$\forall i,j,1 \leq i \leq j \leq |S_u|, \bigcup_{i=l}^{r} S_u[i]\in I_g$$$.

For a cut node $$$u$$$, the opposite is true. Any contiguous subsequence of ranges of its children larger than 1 is not a good range. Or, $$$\forall i,j,1 \leq i < j \leq |S_u|, \bigcup_{i=l}^{r} S_u[i]\notin I_g$$$.

The property of join nodes is not too hard to show by looking at the definition of what a join node is.

But the property of cut nodes is slightly harder to prove. A way to think about this is that for some cut node such that there is a subsequence of ranges bigger than 1 that form a good range, then that subsequence would have formed a range. This is a contradiction.

Construction of Permutation Tree

Now we will discuss a method to create the Permutation Tree in $$$O(n\log{n})$$$. According to a comment by CommonAnts, the creator of this data structure, a $$$O(n)$$$ algorithm exists, but I could not find any resources on it.

Brief overview of algorithm

We process the permutation from left to right. We will also keep a stack of cut and join nodes that we have processed previously. Now let us consider adding $$$P_i$$$ to this stack. We firstly make a new node $$$[P_i,P_i]$$$ and call it the node we are currently processing.

Check if we can add the currently processed as a child of the node on top of the stack.
If we cannot, check if we can make a new parent node (this can either be a cut or join node) such that it contains some suffix of the stack and the current processed node as children.
Repeat this process until we cannot do any more operations of type 1 or 2.
Finally, push the currently processed node to the stack.

Notice that after processing all nodes, we will only have 1 node left on the stack, which is the root node.

Details of the algorithm

For operation 1, if we note that we can only do this if the node on top of the stack is a join node. Because if we can add this as a child to a cut node, then it contradicts the fact that no contiguous subsequence of ranges of children larger than 1 of a cut node can be a good range.

For operation 2, we need a fast way to find if there exists a good range such that we can make a new node from. There are 3 cases:

We cannot make a new node.
We can make a new join node. This new node has 2 children.
We can make a new cut node.

Checking if there exists a good range

We have established for a good range $$$(P,[l,r])$$$ that $$$\max\limits_{l \leq i \leq r} P_i - \min\limits_{l \leq i \leq r} P_i = r-l$$$.

Since $$$P$$$ is a permutation, we also have $$$\max\limits_{l \leq i \leq r} P_i - \min\limits_{l \leq i \leq r} P_i \geq r-l$$$ for all ranges $$$[l,r]$$$.

Equivalently, we have $$$\max\limits_{l \leq i \leq r} P_i - \min\limits_{l \leq i \leq r} P_i - (r-l) \geq 0$$$, where we have equality only for good ranges.

Say that we are currently processing $$$P_i$$$. We define a value $$$Q$$$ for each range $$$[j,i], Q_j=\max\limits_{j \leq k \leq i} P_k - \min\limits_{j \leq k \leq i} P_k - (i-j),0< j \leq i$$$. Now we just need to check if there is some $$$Q_j=0$$$, where $$$j$$$ is not in the current node being processed.

Now we only need to know how to maintain this values of $$$Q_j$$$ quickly when we transition from $$$P_i$$$ to $$$P_{i+1}$$$. We can do this by updating the max and min values every time it changes. How can we do this?

Let's focus on updating the max values since updating the min values are similar. Let's consider when the max value will change. It changes every time $$$P_{i+1} > \max $$$. Let us maintain a stack of the values of $$$\max\limits_{j \leq k \leq i}P_k$$$, where we will store distinct values only. It can be seen that this stack is monotonically decreasing. When we add a new element to this stack, we will pop all elements in the stack which are smaller than it and update their maximum values using a segment tree range add update. This amortizes to $$$O(n)$$$ as each value is pushed into the stack once.

Do note to decrement all $$$Q_j$$$ by 1 since we are incrementing $$$i$$$ by 1.

Now that we can maintain all values of $$$Q_j$$$, we can simply check the minimum value of the range we are interested in using segment tree range minimum queries.

If we can make a new cut node, then we greedily try to make new cut node. We can do this by adding another node from our stack until our new cut node is valid.

Since the above may be confusing, here is a illustration of how the construction looks like.

Problems using Permutation Tree

Codeforces 526F – Pudding Monsters

Idea

Code

#include <bits/stdc++.h>
#include <ext/pb_ds/assoc_container.hpp>
#include <ext/pb_ds/tree_policy.hpp>
#include <ext/rope>
using namespace std;
using namespace __gnu_pbds;
using namespace __gnu_cxx;
#define ll long long
#define ii pair<ll,ll>
#define iii pair<ii,ll>
#define fi first
#define se second
#define endl '\n'
#define debug(x) cout << #x << " is " << x << endl;

#define rep(x,start,end) for(auto x=(start)-((start)>(end));x!=(end)-((start)>(end));((start)<(end)?x++:x--))
#define all(x) (x).begin(),(x).end()
#define sz(x) (int)(x).size()

ll MAX(ll a){return a;}
ll MIN(ll a){return a;}
template<typename... Args>
ll MAX(ll a,Args... args){return max(a,MAX(args...));}
template<typename... Args>
ll MIN(ll a,Args... args){return min(a,MIN(args...));}

#define indexed_set tree<ll,null_type,less<ll>,rb_tree_tag,tree_order_statistics_node_update>

mt19937 rng(chrono::system_clock::now().time_since_epoch().count());

struct node{
	int s,e,m;
	ll val=0,lazy=0,num;
	node *l,*r;
	
	node (int _s,int _e){
		s=_s,e=_e,m=s+e>>1;
		num=e-s+1;
		
		if (s!=e){
			l=new node(s,m);
			r=new node(m+1,e);
		}
	}
	
	void propo(){
		if (lazy){
			val+=lazy;
			if (s!=e){
				l->lazy+=lazy;
				r->lazy+=lazy;
			}
			lazy=0;
		}
	}
	
	void update(int i,int j,ll k){
		if (s==i && e==j) lazy+=k;
		else{
			if (j<=m) l->update(i,j,k);
			else if (m<i) r->update(i,j,k);
			else l->update(i,m,k),r->update(m+1,j,k);
			
			l->propo(),r->propo();
			
			val=min(l->val,r->val);
			num=(l->val==val?l->num:0)+(r->val==val?r->num:0);
		}
	}
	
	ll query(int i,int j){
		propo();
		
		if (s==i && e==j){
			if (val==0) return num;
			else return 0;
		}
		else if (j<=m) return l->query(i,j);
		else if (m<i) return r->query(i,j);
		else return l->query(i,m)+r->query(m+1,j);
	}
}*root=new node(0,300005);

int n;
int arr[300005];

int main(){
	ios::sync_with_stdio(0);
	cin.tie(0);
	cout.tie(0);
	
	cin>>n;
	rep(x,0,n){
		int a,b;
		cin>>a>>b;
		arr[a-1]=b;
	}
	
	vector<int> mx={-1},mn={-1};
	ll ans=0;
	
	rep(x,0,n){
		while (mx.back()!=-1 && arr[mx.back()]<arr[x]){
			int temp=mx.back();
			mx.pop_back();
			root->update(mx.back()+1,temp,arr[x]-arr[temp]);
		}
		mx.push_back(x);
		
		while (mn.back()!=-1 && arr[mn.back()]>arr[x]){
			int temp=mn.back();
			mn.pop_back();
			root->update(mn.back()+1,temp,arr[temp]-arr[x]);
		}
		mn.push_back(x);
		
		ans+=root->query(0,x);
		
		root->update(0,x,-1);
	}
	
	cout<<ans<<endl;
}

CERC 17 Problem I – Instrinsic Interval

Idea

Code

#include <bits/stdc++.h>
#include <ext/pb_ds/assoc_container.hpp>
#include <ext/pb_ds/tree_policy.hpp>
#include <ext/rope>
using namespace std;
using namespace __gnu_pbds;
using namespace __gnu_cxx;
#define ll long long
#define ii pair<ll,ll>
#define iii pair<ii,ll>
#define fi first
#define se second
#define endl '\n'
#define debug(x) cout << #x << " is " << x << endl;

#define rep(x,start,end) for(auto x=(start)-((start)>(end));x!=(end)-((start)>(end));((start)<(end)?x++:x--))
#define all(x) (x).begin(),(x).end()
#define sz(x) (int)(x).size()

ll MAX(ll a){return a;}
ll MIN(ll a){return a;}
template<typename... Args>
ll MAX(ll a,Args... args){return max(a,MAX(args...));}
template<typename... Args>
ll MIN(ll a,Args... args){return min(a,MIN(args...));}

#define indexed_set tree<ll,null_type,less<ll>,rb_tree_tag,tree_order_statistics_node_update>

mt19937 rng(chrono::system_clock::now().time_since_epoch().count());

struct node{
	int s,e,m;
	ll val=0,lazy=0;
	node *l,*r;
	
	node (int _s,int _e){
		s=_s,e=_e,m=s+e>>1;
		
		if (s!=e){
			l=new node(s,m);
			r=new node(m+1,e);
		}
	}
	
	void propo(){
		if (lazy){
			val+=lazy;
			if (s!=e){
				l->lazy+=lazy;
				r->lazy+=lazy;
			}
			lazy=0;
		}
	}
	
	void update(int i,int j,ll k){
		if (s==i && e==j) lazy+=k;
		else{
			if (j<=m) l->update(i,j,k);
			else if (m<i) r->update(i,j,k);
			else l->update(i,m,k),r->update(m+1,j,k);
			
			l->propo(),r->propo();
			val=min(l->val,r->val);
		}
	}
	
	ll query(int i,int j){
		propo();
		
		if (s==i && e==j) return val;
		else if (j<=m) return l->query(i,j);
		else if (m<i) return r->query(i,j);
		else return min(l->query(i,m),r->query(m+1,j));
	}
};

int n,q;
int arr[100005];
ii range[200005];
ii span[200005];
vector<int> children[200005];
int parent[200005];
int typ[200005];
int idx; //new index to assign to nodes

ii get_range(ii i,ii j){
	return ii(min(i.fi,j.fi),max(i.se,j.se));
}

void add_edge(int u,int v){ //u is parent of v
	parent[v]=u;
	children[u].push_back(v);
}

bool adj(int i,int j){
	return range[i].se==range[j].fi-1;
}

int length(int i){
	return range[i].se-range[i].fi+1;
}

void build(){
	idx=n;
	memset(parent,-1,sizeof(parent));
	
	node *root=new node(0,100005);
	vector<int> mx={-1},mn={-1}; //stacks for max and min
	
	vector<int> nodes; //stack of cut and join nodes
	
	rep(x,0,n){
		//update Q values
		while (mx.back()!=-1 && arr[mx.back()]<arr[x]){
			int temp=mx.back();
			mx.pop_back();
			root->update(mx.back()+1,temp,arr[x]-arr[temp]);
		}
		mx.push_back(x);
		
		while (mn.back()!=-1 && arr[mn.back()]>arr[x]){
			int temp=mn.back();
			mn.pop_back();
			root->update(mn.back()+1,temp,arr[temp]-arr[x]);
		}
		mn.push_back(x);
		
		//handle stack updates
		range[x]=ii(arr[x],arr[x]);
		span[x]=ii(x,x);
		int curr=x;
		
		while (true){
			if (!nodes.empty() && (adj(nodes.back(),curr) || adj(curr,nodes.back()))){
				if ((adj(nodes.back(),curr) && typ[nodes.back()]==1)||
				  (adj(curr,nodes.back()) && typ[nodes.back()]==2)){
					add_edge(nodes.back(),curr);
					
					range[nodes.back()]=get_range(range[nodes.back()],range[curr]);
					span[nodes.back()]=get_range(span[nodes.back()],span[curr]);
					
					curr=nodes.back();
					nodes.pop_back();
				}
				else{ //make a new join node
					typ[idx]=(adj(nodes.back(),curr) ? 1:2);
					add_edge(idx,nodes.back());
					add_edge(idx,curr);
					
					range[idx]=get_range(range[nodes.back()],range[curr]);
					span[idx]=get_range(span[nodes.back()],span[curr]);
					
					nodes.pop_back();
					curr=idx++;
				}
			}
			else if (x-(length(curr)-1) && root->query(0,x-length(curr))==0){
				int len=length(curr);
				ii r=range[curr];
				ii s=span[curr];
				
				add_edge(idx,curr);
				
				do{
					len+=length(nodes.back());
					r=get_range(r,range[nodes.back()]);
					s=get_range(s,span[nodes.back()]);
					
					add_edge(idx,nodes.back());
					
					nodes.pop_back();
				} while (r.se-r.fi+1!=len);
				
				reverse(all(children[idx]));
				range[idx]=r;
				span[idx]=s;
				curr=idx++;
			}
			else{
				break;
			}
		}
		
		nodes.push_back(curr);
		root->update(0,x,-1);
	}
}

int tkd[200005][20];

void dfs(int i){
	for (auto &it:children[i]){
		int curr=tkd[it][0]=i;
		for (int x=0;curr!=-1;x++){
			curr=tkd[it][x+1]=tkd[curr][x];
		}
		
		dfs(it);
	}
}

int main(){
	ios::sync_with_stdio(0);
	cin.tie(0);
	cout.tie(0);
	
	cin>>n;
	rep(x,0,n) cin>>arr[x];
	
	build();
	
	memset(tkd,-1,sizeof(tkd));
	rep(x,0,idx){
		if (parent[x]==-1) dfs(x);
	}
	
	cin>>q;
	int a,b;
	while (q--){
		cin>>a>>b;
		
		if (a==b){
			cout<<a<<" "<<b<<endl;
			continue;
		}
		
		a--,b--;
		int curr=a;
		
		rep(x,20,0){
			if (tkd[curr][x]!=-1 && span[tkd[curr][x]].se<b) curr=tkd[curr][x];
		}
		
		curr=tkd[curr][0];
		if (typ[curr]==0) cout<<span[curr].fi+1<<" "<<span[curr].se+1<<endl;
		else{
			int lo=-1,hi=sz(children[curr]);
			
			rep(x,20,0){
				if (lo+(1<<x)<sz(children[curr]) && span[children[curr][lo+(1<<x)]].se<a) lo+=(1<<x);
				if (0<=hi-(1<<x) && b<span[children[curr][hi-(1<<x)]].fi) hi-=(1<<x);
			}
			
			cout<<span[children[curr][lo+1]].fi+1<<" "<<span[children[curr][hi-1]].se+1<<endl;
		}
	}
	
}

Codeforces 997E – Good Subsegments

Codeforces 1205F – Beauty of a Permutation

CodeChef – Army of Me

CodeChef – Good Subsequences

Full text and comments »

#permutation, #tutorial

+448

errorgorn
4 years ago
25

Sumitomo Mitsui Trust Bank Programming Contest 2019 — Unofficial Editorial

By errorgorn, history, 4 years ago, In English

Sumitomo Mitsui Trust Bank Programming Contest 2019 has just finished, and this is an unofficial editorial.

Thanks to my friends oolimry and shenxy13 for helping write some of the editorial.

A — November 30

You can solve it simply by checking for each end date of the Gregorian calendar. However, note that as the second date directly follows the first date (a fact which I think is not clear in the English translation), we can also check whether they're in different months, or whether the second date is the first day of a month. This can be done in constant time.

Code

B — Tax Rate

We note that there is monotonicty in this question. $$$\lfloor 1.08(X+1) \rfloor \geqslant \lfloor 1.08X \rfloor$$$. Hence, we can binary search the answer. When we binary search the value of X(the answer), if $$$\lfloor 1.08X \rfloor = N$$$, then we have our answer. Otherwise, we can search higher if $$$\lfloor 1.08X \rfloor > N$$$ and search lower otherwise. If we find that no number gives us desired N, then it is impossible.

Code

C — 100 to 105

We can simply do a 0-infinity knapsack with weights 100,101,...,105 and check if some value is reachable. We get a time complexity of $$$O(N)$$$.

Code

D — Lucky PIN

First, we note that there are $$$O(N^{3})$$$ subsequences of the string, so generating all of them and using a set to check for number of distinct subsequences is TLE. However, there are only at most 1000 distinct such subsequences, from 000 to 999. We can linearly scan through the string for each of these possible subsequences to check if it is actually a subsequence of the string in $$$O(N)$$$. Thus, this can be solved in $$$O(1000N)$$$, which is AC.

Code

E — Colorful Hats 2

Firstly, we can imagine there are 3 imaginary people standing at the very front, each with a different colour hat. For each person, we consider how many possible people could be the first person in front of him with the same hat colour. If the current person has number X, then the number of ways is:

(no. of ppl with X-1 in front) — (no. of ppl with X in front)

This is because those with X in front of him must be paired with one of the X-1 already, so this reduces our options.

Implementation wise, we can store an array which keeps track of the number of people with no. X who are not paired yet. Initially, all values are 0, except at index -1 with the value of 3. Then when processing the current p[user:AtillaAk]erson X, we multiply the answer by arr[X-1], decrease arr[X-1] by 1, and increase arr[X] by 1.

Code

F — Interval Running

Firstly, if $$$T_{1}A_{1}+T_{2}A_{2}=T_{1}B_{1}+T_{2}B_{2}$$$, the answer is infinity.

Else, WLOG, we let $$$T_{1}A_{1}+T_{2}A_{2} > T_{1}B_{1}+T_{2}B_{2}$$$. If $$$A_{1} > B_{1}$$$, then Takahashi and Aoki will never meet each other. The answer is 0. Now, we have solved all the corner cases. We shall move on to the general case. We shall call the period $$$T_{1}+T_{2}$$$. Now, we shall find the number of periods where Takahashi and Aoki meet each other. If we do some math, we get the number of periods to be $$$\lceil \frac{T_{1}(B_{1}-A_{1})}{(T_{1}A_{1}+T_{2}A_{2})-(T_{1}B_{1}+T_{2}B_{2})} \rceil$$$.

The number of times that Takahashi and Aoki meet each other is $$$2periods-1$$$ since every period they meet each other twice when Aoki overtakes Takahashi and Takahashi overtakes Aoki. We need to minus 1 since we do not count the first time they meet each other at the very start. We submit this and... WA.

Yes, we need to think about the case where $$$\frac{T_{1}(B_{1}-A_{1})}{(T_{1}A_{1}+T_{2}A_{2})-(T_{1}B_{1}+T_{2}B_{2})}$$$ is a whole number. In this case, we need to add one, as there will be one more time where Aoki will meet up with Takahashi but never overtake him. And now we get AC. (Thanks to Atill83 for pointing out the mistake)

Code

Full text and comments »

#atcoder

errorgorn
4 years ago
22

[Help] How to interpolate on lagrange (or aliens) speedup trick?

By errorgorn, history, 5 years ago, In English

Example problem: (I dont have the link as this problem only exists in my school's private judge)

We have an array of n positive intergers and we need to group them into k segments such that each element of the array is in a segment and that all elements in a segment are contigous. The cost of a segment is the sum of the numbers in the segment squared.

We need to minimize the total cost of all the segments.

Example 1, we have n=5 and k=3. Where the array is [1,2,3,4,5]. The optimal answer is by grouping (1+2+3) , (4) and (5) into different segments. The total cost is 6^2 + 4^2 + 5^2 =77.

Example 2, we have n=4, k=2. Where the array is [1,1000000,3,4]. The optimal answer is by grouping (1,1000000) and (3,4) into different segments. The total cost is 1000001^2+7^2=10000200050

Now I asked some people and they said this can be solved by doing the lagrange (or aliens) speedup trick. We shall define a cost C. Then do a convex hull trick where we try to minimize the value of all our segments but we also add C to our cost whenever we make a new segment. Now, if C is INF, then we will only have 1 segment, while if C is 0 we will have n segments. So people claim that we can binary search on C to find some C where there will be K segments existing.

This allowed me to AC the problem, but my solution was not always correct as the test cases were weak. I thought of this test case: n=4, k=3. Where is array is [1,1,1,1] Now, when C is 2, we get 4 segments. But when C is 3, we get 2 segments only. Therefore when I ran my code against this case my code printed 8, even though the answer was (1+1)^2+1^2+1^2=6.

So I think I need someway to iterpolate the lagrange speedup trick. Anyone can help? My code is here.

For my code, the input format is:

n k

a1 a2 a3 a3... an

Where ai is the ith element in the array.

Full text and comments »

convex hull optimization, lagrange, #help me

errorgorn
5 years ago
9

Make a discord server for rounds

Ensure that round is ready before even scheduling the contest

WE NEED MORE COORDINATORS

Splay Trees

Link-Cut Trees

$$$O(\log^2 n)$$$ Time Complexity Bound

$$$O(\log n)$$$ Time Complexity Bound

Tree Path Queries

HLD + Segment Tree

Worst Case for HLD

Really Fast Segment Trees

Link-Cut Tree

HLD + Splay Tree

Balanced HLD

Path and Subtree Queries

Benchmarks

1/3 Centroid Decomposition

References

The actual algorithm

Finding $$$h$$$

Handling Repeated Powers

Time Complexity

Mapping 1

Lemma

Case 1

Case 2

Mapping 2

Generating Balanced Bracket Sequences

A Surprisingly Fast Generation

References

Random Sidetrack

What is your philosophy on d2ABs?

Why should we care about the quality of d2AB if most div 1 competitors forget them after 5 mins anyways?

What things do you like/dislike about d2ABs on codeforces?

Do you think d2ABs must necessarily be short implementation?

Some high rated users have difficulty differentiating d2As-d2Cs what would be a good rule of thumb to determine the correct position of a easy problem?

Do you prefer div 2 rounds or educational rounds more? Why?

Ratings of d2ABs

Full Comments about each Problem

Poll

Results of Poll

Prerequisites

Knapsack

Unbounded Knapsack

Subset Sum

Subset Sum Speedup 1

Subset Sum Speedup 2

Knapsack Speedup 1

(max,+) convolution

Knapsack Speedup 2

Knapsack Speedup 3

References

Introduction

The Algorithm

Case 1: $$$V_i=0$$$

Case 2: lines 7 to 11

Case 3: lines 12 to 14

Case 4: lines 15 to 23

Case 1: lines 7 to 11

Case 2: lines 12 to 14

Case 3: lines 15 to 23

Case 1: lines 7 to 11

Case 2: lines 12 to 14

Case 3: lines 15 to 23

Complexity

Extension to Handling Elements of Arbitrary Finite Abelian Group

Generalization to Euclidean Domains

Open Problems

Implementation

FWHT

Kronecker Product

Generalization of Xor

Generalization of And/Or

Subset Sum Convolution

More Trash

References

Method 1

Method 2

1526C1 - Potions (Easy Version) and 1526C2 - Potions (Hard Version)

Method 1 — DP