Detailed Editorial for problem "The Struggle" from XXII Opencup, Grand Prix of XiAn

#	User	Rating
1	ecnerwala	3649
2	Benq	3581
3	orzdevinwang	3570
4	Geothermal	3569
4	cnnfls_csy	3569
6	tourist	3565
7	maroonrk	3531
8	Radewoosh	3521
9	Um_nik	3482
10	jiangly	3468

#	User	Contrib.
1	maomao90	174
2	awoo	164
3	adamant	161
4	TheScrasse	159
5	nor	158
6	maroonrk	156
7	-is-this-fft-	152
8	SecondThread	147
9	orz	146
10	pajenegod	145

Hello, Codeforces!

"The Struggle" (Codeforces Gym 103329F) is a problem I authored which appeared in the HDU Multi-university Training, the Ptz Summer Camp and the Open Cup. Despite appearing in contests where there are a total of ~1300 three people teams, I know of few (possibly no more than 5) people who have learned and independently implemented the solution.

The problem is pretty much fun and the solution is quite easy to implement (actual implementation < 2kb). hos_lyric said that this is a good problem! From this blog you will easily learn how the algorithm works and how to implement the solution effortlessly. There shall be no more mystery, and you will become able to solve this OpenCup problem that few people have solved right today!

The problem statement is very simple: Given an ellipse $$$E$$$ that is contained in $$$(0,4 \times 10^6) \times (0,4 \times 10^6)$$$, calculate the value $$$\sum_{(x, y) \in E}(x \oplus y)^{33} x^{-2} y^{-1} \mod 10^9+7$$$ over all integer points $$$(x,y)$$$. In this problem, $$$\oplus$$$ is the bitwise XOR operation.

While the solution does not seem to be obvious, we shall consider a easier case: how should we compute $$$\sum_{x = 0}^{2^n-1}\sum_{y = 0}^{2^n-1} (x \oplus y)^{33} x^{-2} y^{-1} \mod 10^9+7$$$? i.e. If the aria is a square $$$[0,2^n-1] \times [0,2^n-1]$$$, how to calculate the value? (For our purposes we shall consider $$$0^{-2} = 0^{-3} \equiv 0 \mod 10^9+7$$$.)

This is quite simple! This can be done in $$$O(n \log n)$$$ time, using an algorithm called "Fast Walsh Hadamard Transforms" or FWHT or FWT or fast xor convolution. The convolution basically calculates $$$c_i = \sum_{x = 0}^{2^n-1}\sum_{y = 0}^{2^n-1} [x \oplus y = i]a_xb_y$$$. If we set $$$a_i = i^{-2}$$$ and $$$b_i = i^{-3}$$$, we can calculate $$$\sum_{i = 0}^{2^n-1} c_i \times i^{33}$$$ and this will be the answer for our easier case.

We shall then consider: What if my square is different than $$$[0,2^n-1] \times [0,2^n-1]$$$? What if the square we want to calculate on is $$$[x\times2^n,x\times2^n+2^n-1] \times [y\times2^n,y\times2^n+2 ^n-1]$$$?

This case turns out to be just as simple! We can see that as all bits in the binary representation except the last $$$n$$$ bit changes, for $$$0 \le i,j < 2^n$$$ we have $$$(x\times 2^n+i) \oplus (y\times 2^n+j) = 2^n(x \oplus y)+i \oplus j$$$. Based on this observation we can simply set $$$a_i = (i+x\times 2^n)^{-2}, b_i = (i+y\times 2^n)^{-3}$$$ and calculate $$$c_i = \sum_{x = 0}^{2^n-1}\sum_{y = 0}^{2^n-1} [x \oplus y = i]a_xb_y$$$. $$$\sum_{i = 0}^{2^n-1} c_i \times (i+(x \oplus y)2^n)^{33}$$$ will be the answer. The complexity will be $$$O(n \log n)$$$.

After making the above observations, we can come up with a quite efficient algorithm already! The algorithm simply works as the following pseudocode:

int solve(square S = [x*2^n,x*2^n+2^n-1]*[y*2^n,y*2^n+2^n-1]){
    if(S is completely in the ellipse){
        return the value calculated by the above discussed FWHT method.
    }
    Let the four sub-squares be S1,S2,S3,S4;
    return solve(S1)+solve(S2)+solve(S3)+solve(S4);
}

What is the complexity of this algorithm? Unfortunately, the complexity is $$$O(n \log^2 n)$$$. The analysis is not simple, but I would assure you that I did the analysis for simpler cases where the range is like $$$|x-y|<c$$$ and there is two logs. The analysis is omitted here. The algorithm does not run fast enough.

How shall we improve this algorithm? We shall look at the code for FWHT as follows, which is very simple:

for(int i=0;i<n;i++){
    for(int j=0;j<1<<(n-1);j++){
        if(j&(1<<i) != 0)continue;
        int l = a[c],r = a[c|1<<i];
        a[c] = l+r;a[c|1<<i] = l-r;
    }
}

There are two features of the algorithm which we will exploit.

It is easy to see that we can "merge" two FWHT arrays of $$$[x\times 2^n,x\times 2^n+2^{n-1}-1]$$$ and $$$[x\times 2^n+2^{n-1},x\times 2^n+2^{n}-1]$$$ into the FWHT array of the interval $$$[x\times 2^n,x\times 2^n+2^{n}-1]$$$ in $$$O(n)$$$ time.
Each element $$$b_i$$$ in a FWHT array is of an interval $$$[x\times 2^n,x\times 2^n+2^{n}-1]$$$ of array $$$a$$$ is a linear combination of elements in $$$a$$$. Namely, $$$b_i = \sum_{j = 0}^{2^n-1} a_j c_{ij}$$$, where coefficients $$$c$$$ are the same for any FWHT transform of the same length $$$2^n$$$.

Actually, the process of the FWHT algorithm is simply recursively merging arrays. The reason that this is not obvious is that the implementation is optimised to use loops.

Now, we can improve the $$$O(n \log^2 n)$$$ algorithm using these observations. We are going to calculate using the same squares the $$$O(n \log^2 n)$$$ algorithm would calculate, but for each square of edge length $$$m$$$, we will spend $$$O(m)$$$ instead of $$$O(m \log m)$$$ time.

We calculate the FWHT arrays bottom-up, each time merging intervals to intervals that are two time larger. After merging intervals, we will calculate all squares of that size which we were originally going to calculate. For each square $$$[x\times2^n,x\times2^n+2^n-1] \times [y\times2^n,y\times2^n+2 ^n-1]$$$, we can add the value from multiplying $$$a_{x2^n+i}$$$ and $$$b_{y2^n+i}$$$ to $$$sum_{(x\oplus y)2^n+i}$$$, as squares with same $$$x \oplus y$$$ are the same when inverting the FWHT arrays.

But how shall we invert the array $$$sum$$$? The answer is simple: Just as we can calculate the FWHT arrays bottom-up, we can also "merge" sum arrays! This may seem to not make sense, but if we merge sum array up and split it back down, the "contribution" of values at the $$$n$$$-th level will be the same. This is because all transforms are linear, and that the FWHT transform is invertible.

One issue in the complexity analysis of this question is to prove that the sum of the side lengths of all squares is $$$O(n \log n)$$$. This fact can be proved on the condition that the border function is a monotone function, and the boundary of the ellipse can be split into four monotone functions. The idea of the proof is to see that the y-intervals corresponding to each x-interval must be a constant plus some "extra" intervals, and for x-coordinate intervals of the same size, the total length of the "extra y-intervals" cannot exceed $$$n$$$. Since there is only $$$\log n$$$ sizes for x-intervals, the proof is done.

For implementation, please reference the author's solution.

Here is the author's solution for reference

#include <bits/stdc++.h>
using namespace std;
using ll = long long; 
#define tcT template<class T
#define tcTU tcT, class U
#define FOR(i,a,b) for (int i = (a); i < (b); ++i)
#define F0R(i,a) FOR(i,0,a)
#define ROF(i,a,b) for (int i = (b)-1; i >= (a); --i)
#define R0F(i,a) ROF(i,0,a)
#define each(a,x) for (auto& a: x)
const int mod = 1000000007;
constexpr int pct(int x) { return __builtin_popcount(x); } // # of bits set
ll fdiv(ll a, ll b) { return a/b-((a^b)<0&&a%b); } // divide a by b rounded down
tcTU> T lstTrue(T lo, T hi, U f) { lo --; assert(lo <= hi); while (lo < hi) { T mid = lo+(hi-lo+1)/2; f(mid) ? lo = mid : hi = mid-1; } return lo; }

const int MX = (2<<22)+10;
ll a,b,c,d,e,f;
int N,miv[MX],xv2[MX],yv2[MX],resv[MX],li[MX*2],ri[MX*2];

inline int mul(int x,int y){return 1ll*x*y%mod;}
inline int add(int x,int y){return x+y>=mod?x+y-mod:x+y;}
inline int sub(int x,int y){return x-y<0?x-y+mod:x-y;}
inline int sq(int x){return 1ll*x*x%mod;}
int mpow(int a,int b){return b == 0 ? 1 : ( b&1 ? mul(a,sq(mpow(a,b/2))) : sq(mpow(a,b/2)));}

void solve(){
    cin>>a>>b>>c>>d>>e>>f;
    ll xbnd = lstTrue(0,4000000,[&](ll x){return (4*c*a-e*e)*x*x<=4*c*f;});
    ll ybnd = lstTrue(0,4000000,[&](ll x){return (4*c*a-e*e)*x*x<=4*a*f;});
    int cn = max(b+xbnd,d+ybnd)+10;
    N = 1;while(N<cn)N*=2;
    F0R(i,N){
        yv2[i] = miv[i];
        xv2[i] = 1ll*yv2[i]*yv2[i]%mod;
        resv[i] = 0;
    }
    F0R(ii,N){
        if(ii<b-xbnd || ii>b+xbnd){
            li[ii+N] = 1;ri[ii+N] = 0;
            continue;
        }
        ll i = ii-b,cv = e*e*i*i-4*c*(a*i*i-f),ce = sqrt(cv);
        while(ce*ce>cv)ce-=1; while((ce+1)*(ce+1)<=cv)ce+=1;
        ri[ii+N] = fdiv(-e*i+ce,c*2)+d;
        li[ii+N] = fdiv(-e*i-ce+c*2-1,c*2)+d;
    }
    int msk = N-2;
    R0F(i,N){
        li[i] = N-((N-max(li[i*2],li[i*2+1]))&msk);
        ri[i] = ((min(ri[i*2],ri[i*2+1])+1)&msk)-1;
        if(pct(i) == 1)msk-=msk&-msk;
    }
    auto conv = [&](int* xxa,int i){
        for(int s =0;s<N;s+=i*2){
            int* f1 = xxa+s,*f2 =xxa+s+i;
            for(int j=0;j<i;j++){ int c1 = f1[j],c2 = f2[j]; f1[j]=add(c1,c2); f2[j]=sub(c1,c2); }
        }
    };
    for(int i = 1;i<N;i*=2){
        int s;
        function<void(int,int)> calc= [&](int l,int r){
            for(int j=l;j<r;j+=i){
                int *a = xv2+s,*b = yv2+j,*res = resv+(s^j);
                for(int k=0;k<i;k++) res[k]=(1ll*a[k]*b[k]+res[k])%mod;
            }
        };
        for(s =0;s<N;s+=i){
            int id = (N+s)/i;
            if(li[id]>ri[id])continue;
            if(li[id/2]>ri[id/2]){
                calc(li[id],ri[id]+1);
            }else{
                calc(li[id],li[id/2]);
                calc(ri[id/2]+1,ri[id]+1);
            }
        }
        conv(xv2,i);conv(yv2,i);conv(resv,i);
    }
    for(int i = 1;i<N;i*=2) conv(resv,i);
    int ans = 0;
    F0R(i,N) ans=add(ans,mul(resv[i],mpow(i,33)));
    ans=mul(ans,mpow(N,mod-2));
    cout<<ans<<"\n";
}

int main() {
    int T;cin>>T;
    miv[0] = miv[1]= 1;
    FOR(i,2,MX) miv[i] = mod-(long long)mod/i*miv[mod%i]%mod;
    while(T--){
        solve();
    }
    return 0;
}

During the HDU competition the problem was $$$\sum_{(x, y) \in E}(x \oplus y)^{3} x^{-2} y^{-1} \mod 10^9+7$$$. The team Inverted Cross wrote a data structures based solution which works on $$$O(3 n \log n)$$$ time (with large constant). The program was unfortunately, not fast enough.

Another solution written by Inverted Cross, but only works when 33 is substituted for 3

#include<bits/stdc++.h>
#define ll long long
#define ull unsigned long long
#define For(i,j,k) for (int i=(int)(j);i<=(int)(k);i++)
#define Rep(i,j,k) for (int i=(int)(j);i>=(int)(k);i--)
using namespace std;

const int max_N=1<<22;
const int mo=1000000007;
int power(int x,int y){
    int s=1;
    for (;y;y/=2,x=1ll*x*x%mo)
        if (y&1) s=1ll*s*x%mo;
    return s;
}

long long tim=0;

struct Solver{
    int fac[max_N];
    int inv[max_N];
    int nn;
    
    Solver(){
        inv[0]=inv[1]=1;
        for (int i=2;i<max_N;i++)
            inv[i]=1ll*inv[(mo%i)]*(mo-mo/i)%mo;
    }
    
    int t[max_N*2][4],pl[max_N*2],pr[max_N*2];
    int fl[max_N*2],lvl[max_N*2];
    void pushup(int k){
        int ls=2*k+fl[k];
        int rs=2*k+1-fl[k],w=1ll*lvl[k]*t[rs][0]%mo,ww=1ll*lvl[k]*lvl[k]%mo;
        t[k][0]=(t[ls][0]+t[rs][0]>=mo?t[ls][0]+t[rs][0]-mo:t[ls][0]+t[rs][0]);
        t[k][1]=(t[ls][1]+t[rs][1]>=mo?t[ls][1]+t[rs][1]-mo:t[ls][1]+t[rs][1]);
        t[k][1]=(t[k][1]+w>=mo?t[k][1]+w-mo:t[k][1]+w);
        t[k][2]=(t[ls][2]+t[rs][2]+1ll*lvl[k]*w+2ll*lvl[k]*t[rs][1])%mo;
        t[k][3]=(t[ls][3]+t[rs][3]+1ll*ww*w+3ll*ww*t[rs][1]+3ll*lvl[k]*t[rs][2])%mo;
    }
    unsigned long long ansl,ansr;
    void query(int l,int r,int x){
        l+=nn-1; r+=nn+1;
        for (;l^r^1;l>>=1,r>>=1){
            if (!(l&1)){
                int S=pl[l^1]^x; S-=S&(lvl[l>>1]-1);
                ansl=(ansl+((1ll*t[l^1][0]*S%mo+3ll*t[l^1][1])*S+3ll*t[l^1][2])%mo*S+t[l^1][3]);
            }
            if (r&1){
                int S=pl[r^1]^x; S-=S&(lvl[r>>1]-1);
                ansr=(ansr+((1ll*t[r^1][0]*S%mo+3ll*t[r^1][1])*S+3ll*t[r^1][2])%mo*S+t[r^1][3]);
            }
        }
    }
    int calc(int n,int *ly,int *ry){
        nn=1; int my=n;
        for (int i=1;i<=n;i++) my=max(my,ry[i]);
        for (;nn<=my;nn<<=1);
        for (int d=1,nw=nn>>1;d<nn;d<<=1,nw>>=1)
            for (int i=d;i<d+d;i++) lvl[i]=nw;
        for (int i=0;i<nn;i++){
            t[i+nn][0]=inv[i]; pl[i+nn]=pr[i+nn]=i;
            t[i+nn][1]=t[i+nn][2]=t[i+nn][3]=0;
        }
        for (int i=nn-1;i>=1;i--){
            fl[i]=0,pushup(i);
            pl[i]=pl[i*2];
            pr[i]=pr[i*2+1];
        }
        int x=0,ans=0,maxv=0;
        for (int i=1;i<nn;i++){
            int v=i^(i-1);
            for (;v!=(v&(-v));v-=v&(-v));
            for (int j=2*v-1;j>=v;j--) fl[j]^=1;
            maxv=max(maxv,2*v-1);
            x^=nn/v/2;
            if (x<=n&&ly[x]!=-1&&ly[x]<=ry[x]){
                for (;maxv;--maxv) pushup(maxv);
                ansl=ansr=0;
                query(ly[x],ry[x],x),ansl%=mo,ansr%=mo;
                ans=(ans+1ll*inv[x]*inv[x]%mo*(ansl+ansr))%mo;
            }
        }
        return ans;
    }    
}PJY;
long long a,b,c,d,e,f;
void calc(int n,int *ly,int *ry) {
    long long A,B,C;
    for (int x=0; x<=n; x++){
        A=c,B=-2ll*c*d+1ll*e*(x-b),C=1ll*a*(x-b)*(x-b)+1ll*c*d*d-1ll*e*(x-b)*d-f;
        if (B*B-4*A*C<0) ly[x]=ry[x]=-1; else
        {
            long double len=sqrt(B*B-4*A*C),l=(-B-len)/(2.0*A),r=(-B+len)/(2.0*A);
            ly[x]=floor(l-(1e-12))+1,ry[x]=floor(r+(1e-12));
        }
    }
}
int ly[max_N*2],ry[max_N*2];
void solve(){
    scanf("%lld%lld%lld%lld%lld%lld",&a,&b,&c,&d,&e,&f);
    calc(2*b,ly,ry);
    int n=2*b;
    for (;ly[n]==-1;--n);
    printf("%d\n",PJY.calc(n,ly,ry));
}
int main(){
    int T;
    scanf("%d",&T);
    while (T--) solve();
}

So there you have it, now you know how to solve "The Struggle"!

Comments (13)

Write comment?

realRainFestivalqwq

2 years ago, # |

← Rev. 5 →

-70

Thanks for your great idea and wonderful analysis! It is a great problem.

→ Reply

jiangbowen_

2 years ago, # ^ |

← Rev. 2 →

-18

[deleted]

nocriz

+28

Best comedy of codeforces I've seen yet

hellocp

-59

Too much of maths

utensiale

-110

how to activate windows 11? i dont have money!!!

docriz

+33

Does what you said have anything to do with the content of the blog?

Um_nik

+65

It was insanity to put it in a contest, but I had fun upsolving it at 3 o'clock, so thanks!

hotedujikemiano

-51

مرحبا ، هل يمكن أن تعطيني بيانات الاختبار؟

أريد استخدام هذه المشكلة في بطولتنا

-55

sorry my translate computer dont live!!! can you give test data problem of to me? i like use problem on contest of me for school compute class your jumper is very good, great and math. can you give problem data at private later in mail. i want know problem data test!!!! 我为没有直接传输到我的电脑而道歉！！！你能告诉我测试数据的问题吗？我喜欢使用问卷调查有关学校计算机科学课程的问题你的鸟非常好，善良和数学。此信息可以通过个人电子邮件发送。我想体验数据测试问题！！！！

+18

I don't think it will be necessary, nobody will solve this in the contest

ecnerwala

+38

There's a different approach/viewpoint to the "merge sum arrays step". Let's use $$$a[i]$$$ to denote a sequence and $$$\hat{a}[i]$$$ to denote its FWT. Let's use $$$(f * g)$$$ to denote the convolution of two sequences $$$f$$$ and $$$g$$$. Consider a single block. Instead of trying to run inverse FWT to find each $$$c[i]$$$ and then taking $$$\sum c[i] (k2^n + i)^{33}$$$, we can directly write it as a linear combination of $$$\hat{c}[i]$$$ (since FWT is just a linear transformation anyways).

What are the coefficients of this linear combination? Well, if we set $$$d[i] = (k2^n + i)^{33}$$$, we can view $$$\sum c[i] (k2^n + i)^{33}$$$ as the 0th term of the convolution: $$$(c * d)[0]$$$. The 0th term is easy to compute: it equals the mean of the elements of the FWT: $$$\frac{1}{2^s} \sum_{i=0}^{2^s} \hat{(c*d)}[i] = \frac{1}{2^s} \sum_{i=0}^{2^s} \hat{c}[i] \hat{d}[i]$$$. We can compute the $$$\hat{d}[i]$$$ the same way we compute the $$$\hat{a}[i]$$$ and $$$\hat{b}[i]$$$ from the bottom up! This lets us compute the contribution of a block without running any IFWT.

(In fact, $$$\hat{c}[i] = \hat{a}[i] \hat{b}[i]$$$, so we're actually summing $$$\hat{a}[i] \hat{b}[i] \hat{d}[i]$$$. This makes sense if you view the original sum as $$$\sum_{x,y,z} [x \oplus y \oplus z = 0] z^{33} x^{-2} y^{-1}$$$.)

I think I have seen your implementation and understood it, but a few months later I forgot seeing it...... Makes total sense!

nocriz's blog