Blog entries - Codeforces

#	User	Rating
1	tourist	3690
2	jiangly	3647
3	Benq	3581
4	orzdevinwang	3570
5	Geothermal	3569
5	cnnfls_csy	3569
7	Radewoosh	3509
8	ecnerwala	3486
9	jqdai0815	3474
10	gyh20	3447

#	User	Contrib.
1	maomao90	174
2	awoo	165
3	adamant	161
4	TheScrasse	160
5	nor	158
6	maroonrk	156
7	-is-this-fft-	152
8	orz	146
8	SecondThread	146
10	pajenegod	145

CristianoPenaldo's blog

Need An improvement for my solution [H. Asterism Stream, Berlekamp-Massey]

By CristianoPenaldo, history, 8 months ago, In English

I was using the Berlekamp-Massey (BM) algo for yesterday's H, but my sol worked too slow. Here is my idea:

(1) Divide $$$[1, n]$$$ into scales. Let $$$S(scale)$$$ be the scale $$$scale$$$, which is $$$S(scale)$$$ is $$$\{x | x \times scale \geq n, x \times scale/2 < n\}$$$. For example, if $$$n=7$$$, scale $$$1$$$ is $$$[7, 7]$$$, scale $$$2$$$ is $$$[4, 6]$$$, scale $$$4$$$ is $$$[2, 3]$$$, scale $$$8$$$ is $$$[1, 1]$$$. It is guaranteed that $$$scale$$$ is a power of $$$2$$$ in my sol.

(2)Fetch $$$O(log scale)$$$ consecutive items for each scale using the matrix fast power algorithm. The matrix is constructed in step (3) from the last scale.

(3)Using the BM algorithm to build a recurrence of length $$$O(log\,scale)$$$ and a recurrence matrix $$$M \in \mathbb{Z}/p\mathbb{Z}^{O(log\,scale) \times O(log\,scale)}$$$.

I mean for each scale we fetch some items $$$I$$$, get a recurrence using the BM algorithm, and construct a recurrence matrix $$$M$$$ using that recurrence, for example $$$a_r = a_{r+1} + a_{r+2}$$$, then

The matrix $$$M$$$ and the items $$$I$$$, and the fast matrix power algorithm, are used to fetching items for the next scale. This is a coarse-to-fine algorithm. The problem is that, for each scale, I need to fetch $$$O(log\,scale)$$$ items and each item costs $$$O(log\,scale^3log\,n)$$$ using a fast matrix power algorithm (matrix multiplication is $$$O(log\,scale^3)$$$, and there are up to $$$O(n)$$$ elements for each scale. Therefore the complexity for each scale is $$$O(log\,scale ^ 4 log\,n)$$$ and the overall complexity is $$$\sum\limits_{scale} O(log\,scale ^ 4 log\,n) = O((logn)^6)$$$ for each test case which is too slow. Any idea to speed up?

Code:

Spoiler

#define ATCODER 0
#include <bits/stdc++.h>

#if ATCODER
#include <atcoder/all>
using namespace atcoder;
#endif

using namespace std;
#pragma GCC optimize("Ofast,unroll-loops")
#pragma GCC target("avx2,tune=native")

#define ull unsigned long long
#define lp (p << 1)
#define rp ((p << 1)|1)
#define ll long long
#define ld long double
#define pb push_back
#define all(s) s.begin(), s.end()
#define sz(x) (int)(x).size()
#define fastio cin.tie(0) -> sync_with_stdio(0) 
#define pii pair<int, int>
#define pil pair<int, ll>
#define pli pair<ll, int>
#define pll pair<ll, ll>
#define F(i, a, b) for(int i=(a); i <= (b); ++i)
#define SUM 0
#define MAX 1
#define fi first
#define se second
#define il inline
#define YES cout << "YES\n"
#define Yes cout << "Yes\n"
#define NO cout << "NO\n"
#define No cout << "No\n"
#define cans cout << ans << "\n"
#define PQ(TYPE, FUNCTOR) priority_queue<TYPE, vector<TYPE>, FUNCTOR<TYPE>> 
#define HERE printf("HERE, __LINE__==%d\n", __LINE__);
#define INF 0x3f3f3f3f
#define INFLL 0x3f3f3f3f3f3f3f3fll
#define ld long double
#define fl std::cout << setprecision(15) << fixed
#define BT(x, i) (((x) & (1 << (i))) >> (i))
#define BTLL(x, i) (((x) & (1ll << (i))) >> (i))
const ld pi = acosl(-1);

long long power(long long a, long long b, int mod)
{
    long long res=1;
    while(b>0)
    {
        //a=a%mod;(有时候n的值太大了会超出long long的储存，所以要先取余)
        if(b&1)//&位运算：判断二进制最后一位是0还是1，&的运算规则为前后都是1的时候才是1；
            res=res*a%mod;
        b=b>>1;//相当于除以2；
        a=a*a%mod;
    }
    return res;
}

template<typename T, T...>
struct myinteger_sequence { };

template<typename T, typename S1 = void, typename S2 = void>
struct helper{
    std::string operator()(const T& s){
        return std::string(s);
    }
}; 

template<typename T>
struct helper<T, decltype((void)std::to_string(std::declval<T>())), typename std::enable_if<!std::is_same<typename std::decay<T>::type, char>::value, void>::type>{
    std::string operator()(const T& s){
        return std::to_string(s);
    }
};

template<typename T>
struct helper<T, void, typename std::enable_if<std::is_same<typename std::decay<T>::type, char>::value, void>::type>{
    std::string operator()(const T& s){
        return std::string(1, s);
    }
};

template<typename T, typename S1 =void, typename S2 =void>
struct seqhelper{
    const static bool seq = false;
};

template<typename T>
struct seqhelper<T, decltype((void)(std::declval<T>().begin())), decltype((void)(std::declval<T>().end()))>{
    const static bool seq = !(std::is_same<typename std::decay<T>::type, std::string>::value);
};

template<std::size_t N, std::size_t... I>
struct gen_indices : gen_indices<(N - 1), (N - 1), I...> { };
template<std::size_t... I>
struct gen_indices<0, I...> : myinteger_sequence<std::size_t, I...> { };

template<typename T, typename REDUNDANT = void>
struct converter{
    template<typename H>
    std::string& to_string_impl(std::string& s, H&& h)
    {
        using std::to_string;
        s += converter<H>().convert(std::forward<H>(h));
        return s;    
    }

    template<typename H, typename... T1>
    std::string& to_string_impl(std::string& s, H&& h, T1&&... t)
    {
        using std::to_string;
        s += converter<H>().convert(std::forward<H>(h)) + ", ";
        return to_string_impl(s, std::forward<T1>(t)...);
    }

    template<typename... T1, std::size_t... I>
    std::string mystring(const std::tuple<T1...>& tup, myinteger_sequence<std::size_t, I...>)
    {
        std::string result;
        int ctx[] = { (to_string_impl(result, std::get<I>(tup)...), 0), 0 };
        (void)ctx;
        return result;
    }

    template<typename... S>
    std::string mystring(const std::tuple<S...>& tup)
    {
        return mystring(tup, gen_indices<sizeof...(S)>{});
    }

    template<typename S=T>
    std::string convert(const S& x){
        return helper<S>()(x);
    }

    template<typename... S>
    std::string convert(const std::tuple<S...>& tup){
        std::string res = std::move(mystring(tup));
        res = "{" + res + "}";
        return res;
    }

    template<typename S1, typename S2>
    std::string convert(const std::pair<S1, S2>& x){
        return "{" + converter<S1>().convert(x.first) + ", " + converter<S2>().convert(x.second) + "}";
    }
};

template<typename T>
struct converter<T, typename std::enable_if<seqhelper<T>::seq, void>::type>{
    template<typename S=T>
    std::string convert(const S& x){
        int len = 0;
        std::string ans = "{";
        for(auto it = x.begin(); it != x.end(); ++it){
            ans += std::move(converter<typename S::value_type>().convert(*it)) + ", ";
            ++len;
        }
        if(len == 0) return "{[EMPTY]}";
        ans.pop_back(), ans.pop_back();
        return ans + "}";
    }
};

template<typename T>
std::string luangao(const T& x){
    return converter<T>().convert(x);
}

static std::random_device rd; // random device engine, usually based on /dev/random on UNIX-like systems
// initialize Mersennes' twister using rd to generate the seed
static std::mt19937_64 rng{rd()}; 

//jiangly Codeforces
int P = 998244353;
using i64 = long long;
// assume -P <= x < 2P
int norm(int x) {
    if (x < 0) {
        x += P;
    }
    if (x >= P) {
        x -= P;
    }
    return x;
}
template<class T>
T power(T a, i64 b) {
    T res = 1;
    for (; b; b /= 2, a *= a) {
        if (b % 2) {
            res *= a;
        }
    }
    return res;
}
struct Z {
    int x;
    Z(int x = 0) : x(norm(x)) {}
    Z(i64 x) : x(norm((int)(x % P))) {}
    int val() const {
        return x;
    }
    Z operator-() const {
        return Z(norm(P - x));
    }
    Z inv() const {
        assert(x != 0);
        return power(*this, P - 2);
    }
    Z &operator*=(const Z &rhs) {
        x = i64(x) * rhs.x % P;
        return *this;
    }
    Z &operator+=(const Z &rhs) {
        x = norm(x + rhs.x);
        return *this;
    }
    Z &operator-=(const Z &rhs) {
        x = norm(x - rhs.x);
        return *this;
    }
    Z &operator/=(const Z &rhs) {
        return *this *= rhs.inv();
    }
    friend Z operator*(const Z &lhs, const Z &rhs) {
        Z res = lhs;
        res *= rhs;
        return res;
    }
    friend Z operator+(const Z &lhs, const Z &rhs) {
        Z res = lhs;
        res += rhs;
        return res;
    }
    friend Z operator-(const Z &lhs, const Z &rhs) {
        Z res = lhs;
        res -= rhs;
        return res;
    }
    friend Z operator/(const Z &lhs, const Z &rhs) {
        Z res = lhs;
        res /= rhs;
        return res;
    }
    friend std::istream &operator>>(std::istream &is, Z &a) {
        i64 v;
        is >> v;
        a = Z(v);
        return is;
    }
    friend std::ostream &operator<<(std::ostream &os, const Z &a) {
        return os << a.val();
    }
};

template<typename T>
T exgcd(T &x, T &y, T a, T b)
{
    if(!b)
    {
        x=1;
        y=0;
        return a;
    }
    exgcd(x,y,b,a%b);
    T t=x;
    x=y;
    y=t-a/b*y;
    return x*a+y*b;
}

#define MAXN  200005
 
// stores smallest prime factor for every number
int spf[MAXN];
bool notprime[MAXN];
int prime[MAXN], miu[MAXN], ptot;
int phi[MAXN];

// Calculating SPF (Smallest Prime Factor) for every
// number till MAXN.
// Time Complexity : O(nloglogn)
void sieve()
{
    spf[1] = 1;
    for (int i=2; i<MAXN; i++)
 
        // marking smallest prime factor for every
        // number to be itself.
        spf[i] = i;
 
    // separately marking spf for every even
    // number as 2
    for (int i=4; i<MAXN; i+=2)
        spf[i] = 2;
 
    for (int i=3; i*i<MAXN; i++)
    {
        // checking if i is prime
        if (spf[i] == i)
        {
            // marking SPF for all numbers divisible by i
            for (int j=i*i; j<MAXN; j+=i)
 
                // marking spf[j] if it is not
                // previously marked
                if (spf[j]==j)
                    spf[j] = i;
        }
    }
}

void getmiu()
{
    memset(notprime, 0, sizeof(notprime));
    miu[1] = 1;
    ptot = 0;
    miu[1] = 1;
    for(int i = 2; i < MAXN; i++) {
        if(!notprime[i]) {
            prime[ptot++] = i;
            miu[i] = -1;
        }
        for(int j = 0; j < ptot && prime[j] * i < MAXN; j++) {
            int k = prime[j] * i;
            notprime[k] = true;
            if(i % prime[j] == 0) {
                miu[k] = 0; break;
            } else {
                miu[k] = -miu[i];
            }
        }
    }
}

void getphi()
{
    int n = MAXN - 1;
    for (int i = 1; i <= n; i++)
        phi[i] = i; // 除1外没有数的欧拉函数是本身，所以如果phi[i] = i则说明未被筛到
    for (int i = 2; i <= n; i++)
        if (phi[i] == i) // 未被筛到
            for (int j = i; j <= n; j += i) // 所有含有该因子的数都进行一次操作
                phi[j] = phi[j] / i * (i - 1);
}


template<typename T>
void facsieve(T x, map<T, T>& f)
{
    while (x != 1)
    {
        f[spf[x]]++;
        x = x / spf[x];
    }
}

template<typename T>
void facnaive(T x, map<T, T>& f){
    for (T p = 2; p * p <= x; ++p) {
        if (x % p == 0) {
            T k = 1;
            for (x /= p; x % p == 0; x /= p) ++k;
            f[p]+=k;
        }
    }
    if (x > 1) f[x]++;
}

ll intsqrt (ll x) {
    ll ans = 0;
    for (ll k = 1LL << 30; k != 0; k /= 2) {
        if ((ans + k) * (ans + k) <= x) {
            ans += k;
        }
    }
    return ans;
}

ll safelog(ll x){
    for(ll i = 63; i >= 0; --i){
        if(BTLL(x, i)) return i;
    }
    return -1;
}

template<typename T> 
struct perf{
    uint64_t t1, t2;
    std::string hintline="";
    bool verbose;
    bool is_run;
    perf():verbose(true), is_run(true){
        t1 = (uint64_t)std::chrono::time_point_cast<T>(std::chrono::system_clock::now()).time_since_epoch().count();
    }
    perf(std::string hintline):hintline(hintline), verbose(true), is_run(true){
        t1 = (uint64_t)std::chrono::time_point_cast<T>(std::chrono::system_clock::now()).time_since_epoch().count();
    }
    ~perf(){
        if(!is_run) return;
        t2 = (uint64_t)std::chrono::time_point_cast<T>(std::chrono::system_clock::now()).time_since_epoch().count();
        if(verbose)
            std::cout << hintline << ": current:" << t2 - t1 << "\n";
        is_run = false;
    }
};
using perfm = perf<std::chrono::microseconds>;

template<typename S>
struct shash{
    const S& t;
    int radix, base;
    std::vector<int> primes;

    constexpr std::pair<long long, long long> inv_gcd(long long a, long long b) {
        a = safe_mod(a, b);
        if (a == 0) return {b, 0};

        // Contracts:
        // [1] s - m0 * a = 0 (mod b)
        // [2] t - m1 * a = 0 (mod b)
        // [3] s * |m1| + t * |m0| <= b
        long long s = b, t = a;
        long long m0 = 0, m1 = 1;

        while (t) {
            long long u = s / t;
            s -= t * u;
            m0 -= m1 * u;  // |m1 * u| <= |m1| * s <= b

            // [3]:
            // (s - t * u) * |m1| + t * |m0 - m1 * u|
            // <= s * |m1| - t * u * |m1| + t * (|m0| + |m1| * u)
            // = s * |m1| + t * |m0| <= b

            auto tmp = s;
            s = t;
            t = tmp;
            tmp = m0;
            m0 = m1;
            m1 = tmp;
        }
        // by [3]: |m0| <= b/g
        // by g != b: |m0| < b/g
        if (m0 < 0) m0 += b / s;
        return {s, m0};
    }

    constexpr long long safe_mod(long long x, long long m) {
        x %= m;
        if (x < 0) x += m;
        return x;
    }

    struct barrett {
        unsigned int _m;
        unsigned long long im;

        // @param m `1 <= m < 2^31`
        barrett(){}

        void set(unsigned int m){
            _m = m;
            im = (unsigned long long)(-1) / m + 1;
        }
        explicit barrett(unsigned int m) : _m(m), im((unsigned long long)(-1) / m + 1) {}

        // @return m
        unsigned int umod() const { return _m; }

        // @param a `0 <= a < m`
        // @param b `0 <= b < m`
        // @return `a * b % m`
        unsigned int mul(unsigned int a, unsigned int b) const {
            // [1] m = 1
            // a = b = im = 0, so okay

            // [2] m >= 2
            // im = ceil(2^64 / m)
            // -> im * m = 2^64 + r (0 <= r < m)
            // let z = a*b = c*m + d (0 <= c, d < m)
            // a*b * im = (c*m + d) * im = c*(im*m) + d*im = c*2^64 + c*r + d*im
            // c*r + d*im < m * m + m * im < m * m + 2^64 + m <= 2^64 + m * (m + 1) < 2^64 * 2
            // ((ab * im) >> 64) == c or c + 1
            unsigned long long z = a;
            z *= b;
    #ifdef _MSC_VER
            unsigned long long x;
            _umul128(z, im, &x);
    #else
            unsigned long long x =
                (unsigned long long)(((unsigned __int128)(z)*im) >> 64);
    #endif
            unsigned int v = (unsigned int)(z - x * _m);
            if (_m <= v) v += _m;
            return v;
        }
    };


    struct dynamic_modint {
        using mint = dynamic_modint;
        int mod() { return (int)(bt.umod()); }
        static mint raw(int v) {
            mint x;
            x._v = v;
            return x;
        }

        dynamic_modint() : _v(0) {}
        template<typename T>
        dynamic_modint(T v, int m) {
            bt.set(m);
            long long x = (long long)(v % (long long)(mod()));
            if (x < 0) x += mod();
            _v = (unsigned int)(x);
        }

        unsigned int val() const { return _v; }

        mint& operator++() {
            _v++;
            if (_v == umod()) _v = 0;
            return *this;
        }
        mint& operator--() {
            if (_v == 0) _v = umod();
            _v--;
            return *this;
        }
        mint operator++(int) {
            mint result = *this;
            ++*this;
            return result;
        }
        mint operator--(int) {
            mint result = *this;
            --*this;
            return result;
        }

        mint& operator+=(const mint& rhs) {
            _v += rhs._v;
            if (_v >= umod()) _v -= umod();
            return *this;
        }
        mint& operator-=(const mint& rhs) {
            _v += mod() - rhs._v;
            if (_v >= umod()) _v -= umod();
            return *this;
        }
        mint& operator*=(const mint& rhs) {
            _v = bt.mul(_v, rhs._v);
            return *this;
        }
        mint& operator/=(const mint& rhs) { return *this = *this * rhs.inv(); }

        mint operator+() const { return *this; }
        mint operator-() const { return mint() - *this; }

        mint pow(long long n) const {
            assert(0 <= n);
            mint x = *this, r = 1;
            while (n) {
                if (n & 1) r *= x;
                x *= x;
                n >>= 1;
            }
            return r;
        }
        mint inv() const {
            auto eg = inv_gcd(_v, mod());
            assert(eg.first == 1);
            return eg.second;
        }

        friend mint operator+(const mint& lhs, const mint& rhs) {
            return mint(lhs) += rhs;
        }
        friend mint operator-(const mint& lhs, const mint& rhs) {
            return mint(lhs) -= rhs;
        }
        friend mint operator*(const mint& lhs, const mint& rhs) {
            return mint(lhs) *= rhs;
        }
        friend mint operator/(const mint& lhs, const mint& rhs) {
            return mint(lhs) /= rhs;
        }
        friend bool operator==(const mint& lhs, const mint& rhs) {
            return lhs._v == rhs._v;
        }
        friend bool operator!=(const mint& lhs, const mint& rhs) {
            return lhs._v != rhs._v;
        }

    private:
        unsigned int _v;
        barrett bt;
        unsigned int umod() { return bt.umod(); }
    };

    std::map<int, std::vector<dynamic_modint>> f; //不同素数的前缀和
    std::map<int, std::vector<dynamic_modint>> radixmap;
    shash()=delete;
    shash(const S& t, int radix=27, int base=(int)'a'-1, std::vector<int> primes={998244353, 1000000007}):t(t), radix(radix), base(base), primes(primes){
        for(int p: primes){
            f[p].resize(t.size());
            radixmap[p].resize(t.size());
            f[p][0] = dynamic_modint(0, p);
            dynamic_modint tmp(radix, p);
            radixmap[p][0] = dynamic_modint(1, p);
            for(int i = 1; i < t.size(); ++i){
                f[p][i] = f[p][i-1]*tmp + dynamic_modint(t[i] - base, p);
                radixmap[p][i] = radixmap[p][i-1] * tmp;
            }
        }
    }

    std::map<int, dynamic_modint> operator()(const int l, const int r){
        std::map<int, dynamic_modint> res;
        if(l > r) return res;
        for(int p: primes){
            res[p] = f[p][r] - f[p][l-1]*radixmap[p][r-l+1];
        }
        return res;
    }

    std::map<int, dynamic_modint> operator()(){
        return this->operator()(1, (int)t.size()-1);
    }

    std::map<int, dynamic_modint> operator()(const std::vector<std::pair<int, int>>& prs){
        std::map<int, dynamic_modint> res;
        for(int p: primes){
            res[p] = dynamic_modint(0, p);
        }
        for(const auto& [l, r]: prs){
            std::map<int, dynamic_modint> res2 = this->operator()(l, r);
            if(res2.empty()){
                continue;
            }else{
                for(int p: primes){
                    res[p] = res[p]*radixmap[p][r-l+1] + res2[p];
                }
            }
        }
        return res;
    }
};

struct dsu{
    int n;
    int *pa, *dsusz;
    dsu(int n): n(n){
        pa = new int[n+1];
        dsusz = new int[n+1];
        reset();
    }
    int find(int x){
        if(pa[x] == x) return x;
        pa[x] = find(pa[x]);
        return pa[x];
    }
    int un(int x, int y, int swapsz=1){
        int fx = find(x), fy = find(y);
        if(fx == fy) return -1;
        if(dsusz[fx] > dsusz[fy] && swapsz) swap(fx, fy);
        pa[fx] = fy;
        dsusz[fy] += dsusz[fx];
        dsusz[fx] = 0;
        return fy;
    }
    int comp(){
        int st = 0;
        for(int i = 1; i <= n; ++i){
            if(pa[i] == i){
                ++st;
            }
        }
        return st;
    }
    void reset(){
        for(int i = 1; i <= n; ++i){
            pa[i] = i;
            dsusz[i] = 1;
        }
    }
    ~dsu(){
        delete[] pa;
        delete[] dsusz;
    }
};


template<typename T=int, typename S=T>
struct BIT{
    bool usep;
    int n, digits;
    T* p; //元素类型
    S* q; //数组类型
    template<typename SIGNED>
    SIGNED lowbit(SIGNED x){
        return x & (-x);    
    }
    
    BIT(int n, T* p=nullptr):n(n), digits(0), p(p){
        usep = (p != nullptr);
        q = new S[n+1];
        memset(q, 0, (n+1)*sizeof(S));
        getdigits();
        if(usep) init();
    }
    
    void init(){
        //O(n)时间内建树
        for(int i = 1; i <= n; ++i){
            q[i] += (S)p[i];
            int j = i + lowbit(i);
            if(j <= n) {
                q[j] += q[i];
            }
        }       
    }
    
    void add(int x, int k){
        while(x <= n && x >= 1){
            q[x] = q[x] + (S)k;
            x += lowbit(x);         
        }
    }
    
    S getsum(int x){
        S ans = 0;
        while(x >= 1){
           ans += q[x];
           x -= lowbit(x);
        }           
        return ans;
    }
    
    void getdigits(){
        if(digits) return;
        int tmp = n;
        while(tmp){
            tmp >>= 1;
            digits++;
        } 
    }
    
    int search(S target){
        //最长前缀和
        int t = 0;
        S s = 0;
        for(int i = digits-1; i >= 0; --i){
            if((t + (1 << i) <= n) && (s + q[t + (1<<i)] <= target)){
                s += q[t + (1<<i)]; 
                t += (1 << i);
            }
        }
        return t;
    }
    
    int binsearch(S target){
        int l = 1, r = n, ans = 0;
        while(l <= r){
            int mid = (l + r)/2;
            if(getsum(mid) == target){
                ans = mid;
                l = mid + 1;
            }else if(getsum(mid) < target){
                l = mid + 1;
            }else{
                r = mid - 1;    
            }
        }
        return ans;     
    }
    
    ~BIT(){
        delete[] q;
    }
};

vector<int> digits(ll x){
    stack<int> st;
    while(x){
        st.push(x%10);
        x/=10;
    }
    vector<int> res;
    while(!st.empty()){
        res.push_back(st.top());
        st.pop();
    }
    return res;
}

bool isprime(int x){
    if(x <= 3) return (x!=1);
    for(int i = 2; i * i <= x; ++i){
        if(x%i == 0){
            return false;
        }
    }
    return true;
}

struct Comb {
    int n;
    std::vector<Z> _fac;
    std::vector<Z> _invfac;
    std::vector<Z> _inv;
    
    Comb() : n{0}, _fac{1}, _invfac{1}, _inv{0} {}
    Comb(int n) : Comb() {
        init(n);
    }
    
    void init(int m) {
        if (m <= n) return;
        _fac.resize(m + 1);
        _invfac.resize(m + 1);
        _inv.resize(m + 1);
        
        for (int i = n + 1; i <= m; i++) {
            _fac[i] = _fac[i - 1] * i;
        }
        _invfac[m] = _fac[m].inv();
        for (int i = m; i > n; i--) {
            _invfac[i - 1] = _invfac[i] * i;
            _inv[i] = _invfac[i] * _fac[i - 1];
        }
        n = m;
    }
    
    Z fac(int m) {
        if (m > n) init(2 * m);
        return _fac[m];
    }
    Z invfac(int m) {
        if (m > n) init(2 * m);
        return _invfac[m];
    }
    Z inv(int m) {
        if (m > n) init(2 * m);
        return _inv[m];
    }
    Z binom(int n, int m) {
        if (n < m || m < 0) return 0;
        return fac(n) * invfac(m) * invfac(n - m);
    }
} comb;

namespace atcoder {
namespace internal {

template <class E> struct csr {
    std::vector<int> start;
    std::vector<E> elist;
    explicit csr(int n, const std::vector<std::pair<int, E>>& edges)
        : start(n + 1), elist(edges.size()) {
        for (auto e : edges) {
            start[e.first + 1]++;
        }
        for (int i = 1; i <= n; i++) {
            start[i] += start[i - 1];
        }
        auto counter = start;
        for (auto e : edges) {
            elist[counter[e.first]++] = e.second;
        }
    }
};

// Reference:
// R. Tarjan,
// Depth-First Search and Linear Graph Algorithms
struct scc_graph {
  public:
    explicit scc_graph(int n) : _n(n) {}

    int num_vertices() { return _n; }

    void add_edge(int from, int to) { edges.push_back({from, {to}}); }

    // @return pair of (# of scc, scc id)
    std::pair<int, std::vector<int>> scc_ids() {
        auto g = csr<edge>(_n, edges);
        int now_ord = 0, group_num = 0;
        std::vector<int> visited, low(_n), ord(_n, -1), ids(_n);
        visited.reserve(_n);
        auto dfs = [&](auto self, int v) -> void {
            low[v] = ord[v] = now_ord++;
            visited.push_back(v);
            for (int i = g.start[v]; i < g.start[v + 1]; i++) {
                auto to = g.elist[i].to;
                if (ord[to] == -1) {
                    self(self, to);
                    low[v] = std::min(low[v], low[to]);
                } else {
                    low[v] = std::min(low[v], ord[to]);
                }
            }
            if (low[v] == ord[v]) {
                while (true) {
                    int u = visited.back();
                    visited.pop_back();
                    ord[u] = _n;
                    ids[u] = group_num;
                    if (u == v) break;
                }
                group_num++;
            }
        };
        for (int i = 0; i < _n; i++) {
            if (ord[i] == -1) dfs(dfs, i);
        }
        for (auto& x : ids) {
            x = group_num - 1 - x;
        }
        return {group_num, ids};
    }

    std::vector<std::vector<int>> scc() {
        auto ids = scc_ids();
        int group_num = ids.first;
        std::vector<int> counts(group_num);
        for (auto x : ids.second) counts[x]++;
        std::vector<std::vector<int>> groups(ids.first);
        for (int i = 0; i < group_num; i++) {
            groups[i].reserve(counts[i]);
        }
        for (int i = 0; i < _n; i++) {
            groups[ids.second[i]].push_back(i);
        }
        return groups;
    }

  private:
    int _n;
    struct edge {
        int to;
    };
    std::vector<std::pair<int, edge>> edges;
};

}  // namespace internal

struct scc_graph {
public:
    scc_graph() : internal(0) {}
    explicit scc_graph(int n) : internal(n) {}

    void add_edge(int from, int to) {
        int n = internal.num_vertices();
        assert(0 <= from && from < n);
        assert(0 <= to && to < n);
        internal.add_edge(from, to);
    }

    std::vector<std::vector<int>> scc() { return internal.scc(); }

private:
    internal::scc_graph internal;
};

}  // namespace atcoder

#define DEBUG 1
#define SINGLE 0
#define cstr(x) (luangao(x).c_str())

void debug(const char* p){
    #if DEBUG
    freopen(p, "r", stdin); 
    #else
    fastio;
    #endif      
}

constexpr int p = 998244353;
#define cstr(x) (luangao(x).c_str())
vector<int> berlekamp_massey(vector<int> a, int p) {
    for(int& x: a){
        if(x < 0) x%=p, x+=p;
    }
    vector<int> v, last;  // v is the answer, 0-based, p is the module
    int k = -1, delta = 0;

    for (int i = 0; i < (int)a.size(); i++) {
        int tmp = 0;
        for (int j = 0; j < (int)v.size(); j++)
            tmp = (tmp + (long long)a[i - j - 1] * v[j]) % p;

        if (a[i] == tmp) continue;

        if (k < 0) {
            k = i;
            delta = (a[i] - tmp + p) % p;
            v = vector<int>(i + 1);

            continue;
        }

        vector<int> u = v;
        int val = (long long)(a[i] - tmp + p) * power(delta, p - 2, p) % p;

        if (v.size() < last.size() + i - k) v.resize(last.size() + i - k);

        (v[i - k - 1] += val) %= p;

        for (int j = 0; j < (int)last.size(); j++) {
            v[i - k + j] = (v[i - k + j] - (long long)val * last[j]) % p;
            if (v[i - k + j] < 0) v[i - k + j] += p;
        }

        if ((int)u.size() - i < (int)last.size() - k) {
            last = u;
            k = i;
            delta = a[i] - tmp;
            if (delta < 0) delta += p;
        }
    }

    for (auto &x : v) x = (p - x) % p;
    v.insert(v.begin(), 1);

    return v;  // $$$\forall i, \sum_{j = 0} ^ m a_{i - j} v_j = 0$$$
}


vector<vector<Z>> getmatrix(vector<int>& bm){
    vector<vector<Z>> res;
    int m = (int)bm.size()-1;
    res.resize(m+1);
    F(i, 0, m) res[i].resize(m+1);
    F(i, 1, m){
        F(j, 1, m){
            if(i == 1){
                res[i][j] = -Z(bm[j]);
            }
            else{
                res[i][j] = (i - j == 1);
            }
        }
    }
    return res; 
}

vector<vector<Z>> matmul(vector<vector<Z>>& a, vector<vector<Z>>& b){
    int m = a.size()-1;
    vector<vector<Z>> res(m+1, vector<Z>(m+1));
    F(i, 1, m){
        F(j, 1, m){
            F(k, 1, m){
                res[i][j] += a[i][k] * b[k][j];
            }
        }
    }
    return res;
}

vector<vector<Z>> fp(vector<vector<Z>>& bm, ll x){
    int m = bm.size()-1;
    if(x == 0){
        vector<vector<Z>> res(m+1, vector<Z>(m+1));
        F(i, 1, m){
            F(j, 1, m){
                res[i][j] = (i == j);
            }
        }
        return res;
    }else if(x == 1) return bm;
    auto tmp = fp(bm, x/2);
    auto res = matmul(tmp, tmp);
    if(x%2){
        res = matmul(res, bm);
    }
    return res;
}

void solve(int test){
    ll n;
    cin >> n;
    ll scale = 1; //处理x * scale >= n的区间
    vector<vector<Z>> lastbm;
    vector<Z> lastres, lastres2;

    auto getlbub = [&](ll scale) -> pll{
        return {(n + scale - 1)/scale, (scale == 1 ? n : ((n + scale/2 - 1)/(scale/2) - 1))};
    };

    function<Z(ll)> getans = [&](ll x) -> Z{
        if(x >= n) return 0;
        if(scale/2 * x >= n){
            //assert(lastres.size()+1 == lastbm.size());
            //属于上一个scale
            auto [lastlb, lastub] = getlbub(scale/2);
            int m = (int)lastbm.size()-1;
            ll diff = lastub - x;
            if(diff < lastres.size()){
                //printf("[1]scale==%lld, x==%lld, diff==%lld, ans==%d, lastres.size()==%zu, lastbm.size()==%zu\n", scale, x, diff, lastres[diff].val(), lastres.size(), \
                    lastbm.size());
                return lastres[diff];
            }
            ll times = diff - m + 1;
            auto fpres = fp(lastbm, times);
            //printf("lastres.size()==%zu, m==%d\n", lastres.size(), m);
            Z ans = 0;
            for(int i = 1; i <= m; ++i){
                ans += fpres[1][i] * lastres[m - i];
            }
            //printf("[2]scale==%lld, x==%lld, ans==%lld\n", scale, x, ans);
            return ans;
        }
        return  1 + getans(x+1)/2 + getans(2*x)/2;
    };
    ll scaletimes = 0;
    while(scale <= n){
        auto [lb, ub] = getlbub(scale);
        lastres2.clear();
        vector<int> trial;
        vector<int> lastbmres;
        ll x = ub;
        for(; x >= max(lb, ub-3*scaletimes); --x){
            Z res = getans(x);
            trial.pb(res.val());
            lastres2.pb(res);
        }
        lastbmres = berlekamp_massey(trial, P);
        //printf("scale==%lld, x==%lld, lastbmres==%d\n", scale, x, (int)lastbmres.size());
        printf("scale==%lld[%lld, %lld], trial==%s, lastbmres==%s\n", scale, lb, ub, cstr(trial), cstr(lastbmres));
        lastres = lastres2;
        lastbm = getmatrix(lastbmres);
        scale *= 2;
        ++scaletimes;
    }
    Z ans = getans(1);
    cans;
} 

signed main(int argc, char** argv){
    debug(argc==1?"test1.txt":argv[1]);
    int t = 1;
    int test = 1; 
    #if !SINGLE
    cin >> t;
    #endif
    while(t--){
        solve(test++);
    }
}

Full text and comments »

berlekamp-massey, turtle

-3

CristianoPenaldo
8 months ago
0

What will the Codeforces be like when the quantum computers become prevalent?

By CristianoPenaldo, history, 14 months ago, In English

I am serious, not a troll.

Full text and comments »

quantum computing

CristianoPenaldo
14 months ago
2

My learning note on AGC061C based on Little09's (Simplified) Chinese Blog

By CristianoPenaldo, history, 15 months ago, In English

I admit that AGC061C is too hard for me, and I don't even understand its official editorial. Today I see a Luogu blog with a very genius and clear idea on this problem. I learned a lot from it, and I would like to share it to you now. The writer of this blog is possibly Little09, but I am not quite sure.

Part1: Problem Statement and Constraints

There are $$$N$$$ customers named $$$1$$$, ..., $$$N$$$ visiting a shop. Customer $$$i$$$ arrives at time $$$A_i$$$ and leaves at time $$$B_i$$$ The queue order is first in first out, so $$$A_i$$$ and $$$B_i$$$ are both increasing. Additionally, all $$$A_i$$$ and $$$B_i$$$ are pairwise distinct.

At the entrance, there's a list of visitors to put their names in. Each customer will write down their name next in the list exactly once, either when they arrive or when they leave. How many different orders of names can there be in the end? Find the count modulo $$$998244353$$$.

Constraints:

$$$\cdot 1 \leq N \leq 500000$$$.

$$$\cdot 1 \leq A_i \leq B_i \leq 2N$$$.

$$$\cdot A_i < A_{i+1}\,(1 \leq i \leq N-1)$$$.

$$$\cdot B_i < B_{i+1}\,(1 \leq i \leq N-1)$$$.

$$$\cdot A_i \neq B_j \,(i \neq j)$$$.

$$$\cdot \text{All values in the input are integers}$$$.

Part2: Idea of the blog

The complexity of the algorithm in the blog is $$$O(N)$$$.

Consider two dynamic programming tables named $$$f$$$ and $$$g$$$ ($$$dp$$$ in the original blog). The blog computes $$$f,\,g$$$ in the reverse order. Formally,

$$$f(i)\,(1 \leq i \leq n)$$$: The number of permutations when we have already considered persons $$$i,\,i+1,\,...,\,n$$$.

$$$g(i)\,(1 \leq i \leq n)$$$: The number of permutations when we have already considered persons $$$i,\,i+1,\,...,\,n$$$, and the person $$$i$$$ is forced to sign her (or his) name on $$$B_i$$$.

The blog defines $$$c(i)$$$ as:

$$$c(i) := max\{j|A_j < B_i\}$$$. (1)

Obviously $$$i \in \{j|A_j < B_i\}$$$ and $$$c(i) \geq i$$$.

First, we consider $$$f(i)$$$. If $$$i$$$ signs on $$$A_i$$$, then no matter when $$$i+1,\,i+2,\,...\,n$$$ sign their names, $$$i$$$ is always placed in front of them. It is a bijection, given a permutation $$$p$$$ of $$$[i+1,\,i+2\,...\,n]$$$, just add $$$i$$$ in the front of $$$p$$$. The number of distinct permutations when $$$i$$$ signs on $$$A_i$$$ is just $$$f(i+1)$$$. If $$$i$$$ signs on $$$B_i$$$, then $$$i$$$ must not be placed the first among $$$\{i,\,i+1,\,...,\,n\}$$$, otherwise it is overlapped with a situation where $$$i$$$ signs on $$$A_i$$$. Now we consider the number of permutations that $$$i$$$ is placed the first among $$$\{i,\,i+1,\,...,\,n\}$$$. In this case ($$$i$$$ is the first), $$$x \in \{i+1,\,i+2\,...,\,c(i)\}$$$ should all sign on $$$B_x$$$ because $$$A_x < B_i$$$ (according to the definition of $$$c$$$ in Eq.(1)), and the choices of $$$x \in \{c(i)+1,\,...,\,n\}$$$ are not important. The number of the permutations that $$$i$$$ signs on $$$B_i$$$ and $$$i$$$ is placed the first among $$$\{i,\,i+1,\,...,\,n\}$$$ is $$$g(c(i))$$$, because $$$c(i)$$$ must chooses $$$B_{c(i)}$$$ and $$$\{c(i)+1,\,c(i)+2\,...,\,n\}$$$ can choose freely. Therefore, the situation where $$$i$$$ signed on $$$B_i$$$ contributes $$$g(i) - g(c(i))$$$ to $$$f(i)$$$. $$$f(i)$$$ depends on $$$g(i)$$$:

$$$f(i) = f(i+1) + g(i) - g(c(i))$$$. (2)

Second, we consider $$$g(i)$$$. To count $$$g(i)$$$, an important idea is to divide it into non-overlapping cases. The blog consider

$$$P_j$$$ := The set of permutations where $$$j$$$ is the least integer among $$$\{i+1,\,i+2\,...,\,c(i)\}$$$ such that $$$j$$$ chooses $$$B_j$$$. In other words, $$$i$$$ chooses $$$B_i$$$, $$$\{i+1,\,i+2\,...,\,j-1\}$$$ all choose $$$A$$$, and $$$j$$$ chooses $$$B_j$$$. Also, we should not forget $$$P_{c(i)+1}$$$, because $$$\{i+1,\,i+2\,...,\,c(i)\}$$$ might all choose $$$A$$$!

(1)Do these $$$P_j$$$ (including $$$P_{c(i)+1}$$$) overlap? No! Because the permutations in $$$P_j$$$ has a unique prefix: $$$[i+1,\,i+2\,...,\,j-1]$$$.

(2)Do these $$$P_j$$$ cover all situations? Yes, this is obvious.

(3)How to calculate the size of $$$P_j$$$, i.e., $$$|P_j|$$$? For $$$j \leq c(i)$$$, since the prefix is fixed ($$$[i+1\,i+2\,...,\,j-1]$$$ mentioned in (1)), there is a bijection from the set $$$g(j)$$$ to $$$P_j$$$ by adding the fixed prefix to the element in $$$g(j)$$$. So the size of $$$|P_j|$$$ is just $$$g(j)$$$. For $$$j=c(i)+1$$$, $$$|P_j|=f(c(i)+1)$$$ because all $$$\{i+1,\,i+2\,...,\,c(i)\}$$$ have to choose $$$A$$$.

Therefore,

$$$g(i) = \sum\limits_{j=i+1}^{c(i)} g(j) + f(c(i)+1)$$$ (3)

Although we consider $$$f$$$ first, we should calculate $$$g$$$ first because $$$f(i)$$$ depends on $$$g(i)$$$.

Part3: Implementation details

(1) You can use two pointers to calculate $$$c$$$.

(2) To make the computation of $$$g$$$ faster, you can use the prefix sum trick.

(3) The original blog considers two cases: $$$c(i)=i$$$ and $$$c(i) > i$$$. They can be unified into equations (2) and (3), i.e., it is not necessary to consider two cases.

(4) In the original blog Little09 calculates from $$$n$$$ to $$$1$$$. You can also calculate from $$$1$$$ to $$$n$$$. Symmetrically, you should define $$$c(i)$$$ as $$$min\{j|B_j > A_i\}$$$, and define $$$g(i)$$$ as "The number of permutations when we have already considered persons $$$1,\,2,\,...,\,i-1$$$, and the person $$$i$$$ is forced to sign her (or his) name on $$$A_i$$$". Here is my implementation that calculates from $$$1$$$ to $$$n$$$.

Part4: What I have learned

(1) The most important idea in counting is dividing into non-overlapping sets whose unions are the whole set. Besides, the counting of each set should be much simpler.

(2) The most important thing of dp is defining states.

(3) Bijections are really important in combinatorics. For example, the Dyck path.

(4) Achieving (1) and (2) requires high IQ and talent. For example, the definition of $$$f$$$ in the blog is normal, while the definition of $$$g$$$ is genius!

Part5: The last

If you have any questions, please don't hesitate to ask me. It is beneficial for both of us. For you, I might resolve your confusion. For me, I can check whether I make a typo in the blog or not and whether I really understand this blog or not.

Full text and comments »

combinatorics, dp

CristianoPenaldo
15 months ago
15

I need comfort

By CristianoPenaldo, history, 16 months ago, In English

Sorry for this shit blog. I am quite emotional now. I admit I am stupid!!!

Still only solve 4 problems in ABC284:

RE in problem A because I select the Python interpreter instead of G++.
Stuck at D because of an unknown bug of my Pollard Rho template, and stuck it again because of a buggy binary search template.
Stuck at E.
Get idea for F, pass 2 minutes after the contest. I use string hash: https://atcoder.jp/contests/abc284/submissions/37848937

Please comfort me or just tell me I am stupid. I admit that I am stupid and behave like a so-called giant baby but I really need comfort, thank you for your help!

Full text and comments »

CristianoPenaldo
16 months ago
33

Atcoder 279G-At most 2 colors: Generating functions [Second-hand, Newbie Oriented].

By CristianoPenaldo, history, 17 months ago, In English

Part1: Introduction

Atcoder Beginner Contest 279 is good for learning generating functions (GFs). GFs could solve two of them (G and Ex). The English version tutorial of 279Ex has already been posted on the MathStackExchange by a kind person. Unfortunately, the English tutorial on 279G is not available. Now I would like to make a second-hand tutorial on 279G based on PCTprobability's idea. I spend a huge amount of time understanding this idea. For contestants at about my level, it is quite difficult to understand
the idea even if it is written in English, let alone it is written in Japanese only (My Japanese is N5 level). I will make the following contributions.

(1)Write the tutorial in English. The original tutorial is in Japanese only.

(2)Fill in the details. I believe you can understand my words.

(3)Offer an accepted implementation.

I have to state that, using generating function is definitely not the best way to solve this problem. It could be solved much simpler by using dynamic programming with monotone deque optimization. However, this problem is also a good chance to learn generating functions (GFs). Generating functions encode the information of sequences in a continuous way. If you do not know anything about GFs, I suggest you read:

(1) How to use GFs to solve recurrence? Link.

(2) How to prove the Vandemonde convolution identity using GFs? Link.

(3) And most relevant to this problem, how to solve partitions using GFs? Link.

The most important notation $$$[x^k]f(x)$$$ denotes the coefficient of $$$x^k$$$ in function $$$f(x)$$$. For example, $$$[x^2](x^3+2x^2+1)=2$$$. And $$$[x^2]\frac{1}{1-2x}=4$$$, because we can expand $$$\frac{1}{1-2x}$$$ to $$$\sum\limits_{i=0}^\infty (2x)^i$$$ in the region of convergence $$$(-\frac{1}{2}, \frac{1}{2})$$$. So the coefficient of $$$x^2$$$ is $$$4$$$.

Part2: Problem Statement

The problem says: There is a grid with $$$1×N$$$ squares, numbered $$$1,2,…,N$$$ from left to right.

Takahashi prepared paints of $$$C$$$ colors and painted each square in one of the C colors. Then, there were at most two colors in any consecutive K squares. Formally, for every integer $$$i$$$ such that $$$1≤i≤N−K+1$$$, there were at most two colors in squares $$$i,i+1,…,i+K−1$$$.

In how many ways could Takahashi paint the squares? Since this number can be enormous, find it modulo $$$998244353$$$.

$$$\cdot \text{All inputs are integers.}$$$

$$$\cdot 2 \leq K \leq N \leq 10^6$$$.

$$$\cdot 1 \leq C \leq 10^9$$$.

Test Case $$$1$$$: $$$N=K=C=3$$$. In this input, we have a $$$1×3$$$ grid. Among the $$$27$$$ ways to paint the squares, there are $$$6$$$ ways to paint all squares in different colors, and the remaining $$$21$$$ ways are such that there are at most two colors in any consecutive three squares.

Test Case $$$2$$$: $$$N=10, K=5, C=2$$$: Print $$$1024$$$.

Test Case $$$3$$$: $$$N=998, K=244, C=353$$$: Print $$$952364159$$$.

Part3: Idea

(1) What are GFs good at? Gf is good at solving partitions, for example, the Pentagonal number theorem. So, the first step is to compress the colors by Run-Length Encoding (RLE). For example, if the colors are $$$(1,1,1,2,2,3,2,2)$$$, then they are uniquely compressed to $$$((1, 3), (2, 2), (3, 1), (2, 2))$$$. With RLE, $$$[1, n]$$$ is partitioned into $$$l$$$ segments with different colors. Let me denote these segments as $$$S_1, S_2, ..., S_l$$$. Please note that adjacent segments must be painted with different colors, otherwise you can merge the adjacent segments into one, which violates the definition of RLE. The idea to divide the interval into segments for counting also appears in Pinely Round Problem D.

(2)Consider $$$2 \leq i \leq l-1$$$. If $$$|S_i| \leq K-2$$$, then $$$S_{i+1}$$$ only has one choice: Paint it with the same color as $$$S_{i-1}$$$. Otherwise, the last element of $$$S_{i-1}$$$, the whole segment $$$S_{i}$$$ and the first element of $$$S_{i+1}$$$ will form an interval with size $$$\leq K$$$ and three colors, violating the rule. If $$$|S_i| > K-2$$$, then $$$S_{i+1}$$$ has $$$C-1$$$ choices. It is only required that the color of $$$S_{i+1}$$$ is different from that of $$$S_{i}$$$.

(3)I claim that the generating function for segmentation of length $$$l \geq 2$$$ is:

$$$f(x, l) := C(C-1)(\sum\limits_{j=1}^\infty x^j)^2(\sum\limits_{j=1}^{K-2}x^j + \sum\limits_{j=K-1}^\infty (C-1)x^j)^{l-2} \tag{1}$$$.

The $$$C$$$ is because the first segment has $$$C$$$ choices.

The $$$C-1$$$ is because the second segment always has $$$C-1$$$ choices.

The $$$(\sum\limits_{j=1}^{K-2}x^j + \sum\limits_{j=K-1}^\infty (C-1)x^j)^{l-2}$$$ contain two parts: $$$x^j$$$ encodes the length of segment $$$i$$$ ($$$2 \leq i \leq i-1$$$). $$$1$$$, and $$$C-1$$$ encode the transfer contribution between $$$i \rightarrow i+1$$$. If $$$len(S_i) \leq K-2$$$, then $$$S_{i+1}$$$ has only 1 choice (See (2)). Otherwise, $$$S_{i+1}$$$ has $$$C-1$$$ choices.

Now we have dealt with the length of $$$S_i (2 \leq i \leq l-1)$$$ and the transfer contribution between $$$i$$$ and $$$i+1$$$ ($$$2 \leq i \leq l-1$$$). But we still omit two things: The length of head and tail! So we encode each of them with $$$\sum\limits_{j=1}^\infty x^j$$$. See the below picture:

$$$\sum\limits_{j=1}^\infty x^j=\frac{x}{1-x} \tag{2}$$$.

$$$\sum\limits_{j=1}^{K-2}x^j + \sum\limits_{j=K-1}^\infty (C-1)x^j = \frac{x-x^{K-1}}{1-x} + (C-1)\frac{x^{K-1}}{1-x} = \frac{x+(C-2)x^{K-1}}{1-x} \tag{3}$$$.

And put $$$(1), (2), (3)$$$ together, $$$f(x, l)=C(C-1)\frac{x^2}{(1-x)^2}(\frac{x+(C-2)x^{K-1}}{1-x})^{l-2} \tag{4}$$$.

Here you might get confused: Will sum to infinite, e.g., $$$\sum\limits_{j=1}^\infty x^j$$$, lead to overflow (i.e., contain terms whose orders are higher than $$$x^N$$$)? The answer is YES, but we are not afraid of it. This answer refers to a core idea of GFs: I only care about $$$[x^N]f(x, l)$$$, because the sum of length of segments should be $$$N$$$. For the terms higher than $$$x^N$$$, I don't fxxking care about it!!! It is quite magic that, including more terms may decrease the computational complexity! If you are curious about it, you might read the English tutorial of 279Ex carefully. That is because summing more terms might lead to closed-form expressions which are easier to compute. However, we should be cautious about the low terms. For example, If we mistake $$$\sum\limits_{j=1}^\infty x^j$$$ for $$$\sum\limits_{j=0}^\infty x^j$$$, we will get into big trouble as we count the contribution of "empty segments" that should not be counted. So, for GF methods, overflowing parts are not important, but we should be careful about the correctness of the non-overflowing parts, especially the low-order terms.

Now, to take all possible length $$$l \geq 2$$$ into consideration, we sum $$$f(x, l)$$$ over $$$l$$$:

$$$g(x) := \sum\limits_{l=2}^\infty f(x, l) = \sum\limits_{l=2}^\infty C(C-1)\frac{x^2}{(1-x)^2}(\frac{x+(C-2)x^{K-1}}{1-x})^{l-2} \tag{5}$$$

And,

$$$\sum\limits_{l=0}^\infty(\frac{x+(C-2)x^{K-1}}{1-x})^{l} = \frac{1}{1-\frac{x+(C-2)x^{K-1}}{1-x}} = \frac{1-x}{1-2x-(C-2)x^{K-1}} \tag{6}$$$

Combining (5), (6):

$$$g(x) = \frac{C(C-1)x^2}{(1-x)(1-2x-(C-2)x^{K-1})} \tag{7}$$$.

The denominator of $$$g(x)$$$ is $$$(C-2)x^K - (C-2)x^{K-1} + 2x^2 - 3x + 1$$$. We care about $$$[x^N]g(x) = [x^{N-2}]\frac{C(C-1)}{(C-2)x^K - (C-2)x^{K-1} + 2x^2 - 3x + 1}$$$. The thing we care about is "polynomial inversion", which could be computed using FFT/NTT in $$$O(NlogN)$$$ time. Here is a useful summation: Operations on Formal Power Series. Formally, if you want to compute $$$[x^k]\frac{1}{A(x)}$$$, we need to find a $$$B(x) (deg(B(x)) \leq k)$$$ such that $$$A(x)B(x) \equiv 1 (\mod x^{k+1})$$$, and $$$[x^k]\frac{1}{A(x)} = [x^k]B(x)$$$.

Submission: https://atcoder.jp/contests/abc279/submissions/36865779

Core code (with explanation):

int main(void){
    int n, k, c; cin >> n >> k >> c;
    poly<998244353> x, xinv; //define two polynomials
    x.a.resize(n-1); //We want to get [x^{N-2}](1/((C-2)x^K - (C-2)x^{K-1} + 2x^2 - 3x + 1)), so the inversion should be set to n-1
    if(n-1 > 0) x.a[0] += 1;
    if(n-1 > 1) x.a[1] -= 3;
    if(n-1 > 2) x.a[2] += 2; 
    if(n-1 > k-1) x.a[k-1] -= poly<998244353>::mint(c-2);
    if(n-1 > k) x.a[k] += poly<998244353>::mint(c-2); //remember to use += instead of =, as k and/or k-1 may equal to 0, 1, 2
    xinv = x.inverse(n-1); //Inverse!
    cout << (c + ((1ll*c*(c-1))%998244353)*xinv[n-2]())%998244353 << "\n"; //Care about integer overflow! Remember to add c for the case where l==1.
}

Full text and comments »

combinatorics, generating function

CristianoPenaldo
17 months ago
13

A newbie oriented tutorial for ABC278G — Generalized Subtraction Game

By CristianoPenaldo, history, 17 months ago, In English

Details will be updated next week. I am a little busy now, so please do not downvote me. If you are interested or confused, you can comment or chat with me.

The idea is from official. This blog is about the implementation.

My submission: submission.

My code here:

#include<bits/stdc++.h>
using namespace std;
#define BIT(x, i) (((x) & (1 << (i))) >> (i))
#define DEBUG 0
#define DEBUG_INTERACTIVE 0

constexpr int C = 11, MAXN=2005; //Since N <= 2000, nim number <= 2000, so we only need to scan 1<<0 to 1<<11.
int n, l, r;
int sg[MAXN]; //sg number for 1-n
int f[MAXN][MAXN]; 
//f[i][j] If we want to get sg number j from [1, i], where should I cut?
//For example, if sg(i) = 4 and get sg number 3 after cutting [j, k], then f[i][3] = j;

int nim(const vector<int>& v, int* index, int* value_after_change){
    //input: 
    //v: A vector of stones.
    //index and value_after_change: If the XOR sum of v is 0, then index remains unchanged. we modify v[*index]
    //to *value_after_change to guarantee that the XOR sum of v after modification is 0.
    //Output:
    //The XOR sum of v (current v)

    //Example1:
    //vector<int> v = {1, 4, 3};
    //int index, value_after_change;
    //int value = nim(v, &index, &value_after_change);
    //cout << value << " " << index << " " << value_after_change << "\n"; //6 1 2
    //value = 1^4^3 = 6;
    //change 4->2, 1^2^3=0
    
    //Example2:
    //vector<int> v = {1, 2, 3};
    //int index, value_after_change;
    //int value = nim(v, &index, &value_after_change);
    //cout << value << " " << index << " " << value_after_change << "\n"; //0 32763 -1204770170
    //1^2^3 = 0. You cannot modify any number, then index and value_after_change are uninitialized random numbers (may not be 32763 -1204770170)

    int sz = (int)v.size();
    vector<int> ones[C+1];
    int res = 0;
    for(int b = 0; b <= C; ++b){
        for(int i = 0; i < sz; ++i){
            if(BIT(v[i], b)){
                ones[b].push_back(i);
            }
            if(!b) res ^= v[i]; //Calculate the XOR sum, also known as the Nim sum. Remember only xor once.
        }
    }
    if(res == 0) return res;
    bool flipped = false;
    *value_after_change = 0;
    int choosed = -1; //FIX
    for(int b = C; b >= 0; --b){
        if(BIT(res, b)){
            if(!flipped){
                choosed = b;//FIX
                flipped = true;
                *index = ones[b][0]; //Never overflows, think out why?
            }else{
                *value_after_change += ((!BIT(v[*index], b)) << b);
            }
        }else if(flipped){
            (*value_after_change) += (v[*index] & (1 << b));
        }
    }
    for(int i = choosed+1; i <= C; ++i){ //FIX
        *value_after_change += (v[*index] & (1<<i)); //FIX
    } //FIX
    return res;
}

void sgdp(bool debug){ //dynamic programming to calculate sprague-grundy numbers
    //The following code is unnecessary because sg is global variable.
    //for(int i = 0; i <= l-1; ++i){
    //    sg[i] = 0; //Nowhere to fetch, have to lose, sg number is 0.
    //}
    for(int len = l; len <= n; ++len){
        set<int> mex;
        for(int i = 1; i <= len-l+1; ++i){
            //We fetch [i, i+l-1]
            int sgleft = sg[i-1]; //sg[0] is defined
            int sgright = sg[len-i-l+1];
            int sgall = sgleft ^ sgright; //Grundy theorem!!!
            f[len][sgall] = i;
            mex.insert(sgall);
        }
        int t = 0; 
        while(mex.count(t)) ++t;
        sg[len] = t;
    }
    //DEBUG
    if(!debug) return;
    for(int len = l; len <= n; ++len){
        printf("sg[%d]==%d\n", len, sg[len]);
        for(int k = 0; k < sg[len]; ++k){
            printf("f[%d][%d]==%d\n", len, k, f[len][k]);
        }
        printf("\n");
    }
}

int main(int argc, char** argv){
    if(DEBUG){
        //freopen(argc == 1?"test1.txt":argv[1], "r", stdin);
    }else if(DEBUG_INTERACTIVE){
        ; //DO NOTHING
    }else{
        cin.tie(0) -> sync_with_stdio(0);
    }
    cin >> n >> l >> r;
    //The first case: r is large enough:
    //Prefix (length <= l-1), Middle (length==r), Appendix (length <= l-1)
    int a = 1, b = 1; //Ready to receive moves from the AI.
    if(r + 2 * l - 2 >= n && l + r - 1 <= n){
        //Killer move
        cout << "First" << endl;
        cout << l << " " << r << endl;
        cin >> a >> b;
        //AI fails.
        return 0;
    }
    //The second case: Mimicking
    if(l != r || n % 2 == l % 2){
        //According to the tutorial, we can use the first move to separate [1-n] into two symmetric parts.
        int tmp = r;
        while((n - tmp) % 2) --tmp; //Don't be scary, it is O(1). 
        //Such tmp is always valid. If l != r, one of {r-1, r} is ok. Otherwise if l==r, r is eligible for tmp.
        int interval = (n - tmp)/2;
        cout << "First" << endl;
        cout << interval+1 << " " << tmp << endl;
        //Here we cut the interval [1, n] into two parts: [1, interval], [interval+tmp+1, n]
        //interval == n-interval-tmp.
        while(a || b){
            cin >> a >> b;
            if(a == 0 && b == 0){
                return 0;
            }
            if(a <= interval){
                //cout << a+interval+tmp << " " << b+interval+tmp << endl; //BUG. The second parameter is the LENGTH, not the ENDING POINT.
                cout << a+interval+tmp << " " << b << endl;
            }else{
                //cout << a-interval-tmp << " " << b-interval-tmp << endl; //BUG. The second parameter is the LENGTH, not the ENDING POINT.
                cout << a-interval-tmp << " " << b << endl;
            }
        }
        return 0;
    }
    //The third case: l == r && n % 2 != l % 2. We cannot mimick. No matter where we choose [x, x+y-1], We cannot separate [1, n]
    sgdp(DEBUG || DEBUG_INTERACTIVE);
    set<pair<int, int>> intervals = {{1, n}};
    bool turn = true, initial = true;
    if(sg[n] == 0) {
        cout << "Second" << endl;
        turn = false;
    }else{
        cout << "First" << endl;
    }
    while(a || b){
        if(turn){
            if(!initial){
                //Use {a, a+b-1} to separate intervals
                pair<int, int> ab = {a, a+b-1};
                auto it = intervals.upper_bound(ab); 
                if(it==intervals.end() || it->first > ab.first) it = prev(it);//Always valid
                int fi = it->first, se = it->second;
                if(DEBUG || DEBUG_INTERACTIVE) {
                    for(auto pa: intervals){
                        cout << "[Intervals1]" << pa.first << " " << pa.second << "\n";
                    }
                    printf("DEBUG1: it=={%d, %d}, ab=={%d, %d}\n", fi, se, ab.first, ab.second);
                }
                if(ab.first-1 >= it->first){
                    intervals.insert({it->first, ab.first-1});
                }
                if(ab.second+1 <= it->second){
                    intervals.insert({ab.second+1, it->second});
                }
                intervals.erase({fi, se});
            }
            int isz = (int)intervals.size();
            vector<int> v(isz);
            vector<pair<int, int>> vp(intervals.begin(), intervals.end()); //This is bad, but convenient...
            if(DEBUG || DEBUG_INTERACTIVE) {
                for(int i = 0; i < isz; ++i) {
                    printf("DEBUG_PLACE1 <SG should NOT be 0>: vp[%d]=={%d, %d}, gr==%d\n", i, vp[i].first, vp[i].second, sg[vp[i].second - vp[i].first + 1]);
                }
            }
            for(int i = 0; i < isz; ++i){
                v[i] = sg[vp[i].second - vp[i].first + 1];
            }
            int index = -1, value_after_change = -1;
            nim(v, &index, &value_after_change); //I will never make nim(v, &index, &value_after_change)==0 here!
            int cutplace = f[vp[index].second - vp[index].first + 1][value_after_change];

            if(DEBUG || DEBUG_INTERACTIVE) {
                for(int i = 0; i < isz; ++i) printf("v[%d]==%d, index==%d, value_after_change==%d\n", i, v[i], index, value_after_change);
                cout << vp[index].first + cutplace - 1 << " " << l << " AI" << endl;
            }
            else cout << vp[index].first + cutplace - 1 << " " << l << endl;

            //Also remember to separate intervals!
            pair<int, int> ab = {vp[index].first + cutplace - 1, vp[index].first + cutplace + l - 2};
            auto it = intervals.upper_bound(ab); //Always valid
            if(it==intervals.end() || it->first > ab.first) it = prev(it);
            int fi = it->first, se = it->second;
            if(DEBUG || DEBUG_INTERACTIVE) {
                for(auto pa: intervals){
                    cout << "[Intervals2]" << pa.first << " " << pa.second << "\n";
                }
                printf("DEBUG2: it=={%d, %d}, ab=={%d, %d}\n", fi, se, ab.first, ab.second);
            }
            if(ab.first-1 >= it->first){
                intervals.insert({it->first, ab.first-1});
            }
            if(ab.second+1 <= it->second){
                intervals.insert({ab.second+1, it->second});
            }
            intervals.erase({fi, se});
            if(DEBUG || DEBUG_INTERACTIVE) {
                int i = 0;
                for(pair<int, int> pa: intervals) {
                    printf("DEBUG_PLACE2 <SG should be 0>: i==%d, {%d, %d}, sg==%d\n", i, pa.first, pa.second, sg[pa.second-pa.first+1]);
                    ++i;
                }
            }
        }else{
            cin >> a >> b;
        }
        turn  = !turn;
        initial = false;
    }
    if(a == -1 && b == -1) return 0;
}

Acknowledgment: ABalobanov, Nyaan.

Full text and comments »

game theory, sg function

-3

CristianoPenaldo
17 months ago
0

Rethink the Dijkstra algorithm -- Let's go deeper

By CristianoPenaldo, history, 19 months ago, In English

This is a blog for cp newbies (like me).

For a long time I think the Dijkstra algorithm (dij) only has two usages:

(1) Calculate the distance between vertices when the weights of edges are non-negative.

(2) (Minimax) Given a path $$$p = x_1x_2...x_n$$$, define $$$f(p) := \max_{i=1}^{n-1}d(x_i, x_{i+1})$$$. Given source vertex $$$s$$$ and target vertex $$$t$$$, dij is able to calculate $$$min \{f(p)|p\,\text{is a s-t path}\}$$$.

However, dij works for a function class, not only the sum/max functions. The sum/max functions are only the two most frequently used members of this function class, but the function class is far larger. Once the function $$$f$$$ satisfies several mandatory properties, you could use dij. The word "function class" is like an abstract class in C++:

struct f{
    virtual bool induction_property() = 0; //bool: satisfy or not
    virtual bool extension_property() = 0; //bool: satisfy or not
    virtual bool dp_property() = 0; //bool: satisfy or not
};//dijkstra function.
dij(f);

The graph $$$G=(V(G),E(G))$$$. WLOG we assume $$$G$$$ is connected. There is a non-empty source set $$$S \in V(G)$$$ that needs not be a singleton. A path $$$p$$$ is a vector of vertices $$$\{v_1,v_2,...,v_n\}$$$ and the function $$$f$$$ maps paths to real numbers: $$$\text{path} \rightarrow \mathbb{R}$$$. To be brief, $$$f(\{v_1,v_2,...,v_n\})$$$ is shorten to $$$f(v_1,v_2,...,v_n)$$$. We say that dij works for $$$f$$$ if $$$\forall v \in V(G)$$$, dij correctly computes $$$d(v) := \min f(\{p \mid p\,\text{is a S-v path}\})$$$. $$$f$$$ should satisfy three properties that are also sufficient:

(1, induction base) $$$\forall s \in S,\,f(s)$$$ should be correctly initialized. Pay attention that $$$f(s)$$$ needs not to be $$$0$$$.

(2, Extension property) $$$\forall p$$$, for every vertex $$$v$$$ adjacent to the end of $$$p$$$ (p.back()), $$$f(p \cup v) \geq f(p)$$$.

(3, dynamic programming (dp) property) If path $$$p, q$$$ has the same end (i.e., p.back()==q.back()) and $$$f(p) \geq f(q)$$$, then for every vertex $$$v$$$ adjacent to p.back(), $$$f(p \cup v) \geq f(q \cup v)$$$.

Remarks of the dp property:

First, The dp property guarantees that, if $$$\{v_1, v_2, ..., v_n\}$$$ is the shortest path from $$$v_1$$$ to $$$v_n$$$, then for $$$\forall 1 \leq k \leq n$$$, $$$\{v_1, v_2, ..., v_k\}$$$ is the shortest $$$v_1$$$-$$$v_k$$$ path. You can do induction from end to front.

Second, if $$$f(p) = f(q)$$$, then $$$f(p \cup v) = f(q \cup v)$$$. Because $$$f(p) \geq f(q)$$$, $$$f(p \cup v) \geq f(q \cup v)$$$. Because $$$f(q) \geq f(p)$$$, $$$f(q \cup v) \geq f(p \cup v)$$$. Therefore $$$f(p \cup v) = f(q \cup v)$$$.

The necessity of the induction base is obvious. Otherwise, the $$$f$$$(source vertices) are wrong. For the second property, it is well known that the sum version of dij (shortest path) does not work for edges with negative cost (but the max version works!). For the dp property, let's consider the following example:

In this example $$$f(p):=\sum\limits_{i=1}^{n-1} i \text{dis}(v_i, v_{i+1})$$$. $$$f(ABC) = 1+4*2 = 9$$$; $$$f(AC)=10$$$; $$$f(ABCD) = 1+4*2 + 5*3=24$$$, $$$f(ACD)=10+5*2=20$$$. Since $$$f(ABC) < f(AC),\,f(ABCD) > f(ACD)$$$, $$$f$$$ violates the dp property. In this case, dij does not work, because dij uses ABC to slack D, giving a wrong result $$$24$$$ instead of the correct answer $$$20$$$.

We denote $$$d(v)$$$ as the real value of the shortest path (i.e., $$$\min f(\{p \mid p\,\text{is a S-v path}\}$$$), and $$$h(v)$$$ as the value calculated by dij. We want to prove $$$\forall v \in V(G),\,h(v) = d(v)$$$. We briefly review the outline of the dij:

define a distance array d. Initially fill d with INF;
Initialize d(s), h(s) for all source vertices s in S;
define a set T which is initially S;
constexpr int calculation_limit = |V(G)|; //You can set the calculation limit to an arbitrary number you want, default |V(G)|
int i = 0; //frame number

while T != V(G) && i < calculation_limit{
    choose a vertex u that is not in T with the lowest h value; //Choice step
    for every vertice v not in T and adjacent to u, update h(v) as h(v) = min(h(v), f([path from S to u] + v)). //You can record the shortest path from S to v here; //Slack step
    T.insert(u);
    i++;
}

In the induction step, the $$$d$$$ is correctly computed for every source vertex. In the $$$i$$$-th frame, a vertex $$$u$$$ is chosen in the choice step. We assume that for every $$$v$$$ chosen in the choice step of $$$0,\,1,\,2...\,i-1$$$ frames, $$$h(v) = d(v)$$$. For the $$$i$$$-th frame, I am going to show $$$h(u) = d(u)$$$. Consider the shortest path from $$$S$$$ to $$$u$$$ (we call it $$$S-u$$$ path, and it must exist since the graph is finite), and denote $$$y$$$ as the last vertex that does not belong to $$$T$$$:

In the following proof, $$$S-x$$$ means a path from S to $$$x$$$. It is not set difference.

First, for all source vertices, i.e., $$$\forall s \in S$$$, we have $$$h(s) = d(s)$$$. That is property 1, induction base.

(1) $$$h(u) \leq h(y)$$$. That is irrelevant to these properties, because dij always choose the vertex $$$u \notin T$$$ with the lowest $$$h$$$ value, if $$$h(u) > h(y)$$$, dij will not choose $$$u$$$ in this frame.

(2) $$$d(y) \leq d(u)$$$. This uses the extension property. $$$S-u$$$ path is shortest, so $$$f(S-u)=d(u)$$$. Due to the extension property, $$$f(S-y) \leq f(S-u)$$$. Since $$$d(y) = \min \text{all}\,f(S-y)\,\text{paths}$$$, we get $$$d(y) \leq d(u)$$$.

(3) $$$h(y) \leq f(S-x-y)$$$. This uses the dp property. By the induction hypothesis, $$$y$$$ is slacked using some path $$$p$$$ from S to x. Please be careful, $$$p$$$ needs not to be the $$$S-x$$$ path in the above picture. But according to the first remark of the dp property, $$$f(S-x) = f(p) = d(x)$$$. Therefore, by the second remark of the dp property, $$$y$$$ is slacked by $$$f(p-y) = f(S-x-y)$$$. When calculating $$$f(p-y)$$$, the algorithm actually uses $$$h(x)$$$ instead of $$$f(p)$$$, by the induction hypothesis, $$$h(x)$$$ is correctly computed, equals to $$$f(p)$$$.

(4) $$$d(y) = f(S-x-y)$$$. This uses the dp property (the first remark).

(5) $$$h(y) = d(y)$$$. $$$h(y)$$$ gives a feasible path from $$$S$$$ to $$$y$$$, which is greater or equal than the shortest path $$$d(y)$$$, that is $$$h(y) \geq d(y)$$$. By (3, 4) $$$h(y) \leq d(y)$$$. Therefore $$$h(y) = d(y)$$$.

(6) $$$h(u) = d(u)$$$. $$$h(u) \leq h(y) = d(y) \leq d(u) \leq h(u)$$$. Therefore $$$h(u) = d(u)$$$.

Now I offer three examples of elegant usage of the function class.

Example 1:

Leetcode2386: Find the K-Sum of an Array. I failed to solve it during the contest. The problem is: You are given an integer array nums and a positive integer k. You can choose any subsequence of the array and sum all of its elements together. We define the K-Sum of the array as the kth largest subsequence sum that can be obtained (not necessarily distinct). Return the K-Sum of the array. For example:

Input: nums = [2,4,-2], k = 5

Output: 2

Explanation: All the possible subsequence sums that we can obtain are the following sorted in decreasing order: 6, 4, 4, 2, 2, 0, 0, -2. The 5-Sum of the array is 2.

n := len(nums) <= 100000; k <= min(2000, 1 << n)

First, we are calculating the "largest", so we need to change $$$\leq$$$ to $$$\geq$$$ in the three properties. We first add all nonnegative elements in nums. The sum of nonnegative elements is the source vertex that is calculated correctly (property 1). We denote the sum as $$$S$$$. For example, if $$$nums=[2, 4, -2],\,S = 6$$$. We set every $$$nums[i]$$$ to $$$-nums[i]$$$ if $$$nums[i] < 0$$$. After this statement, all $$$nums[i] \geq 0$$$. Then, we sort nums ascendingly. For example, nums: $$$[2, 4, -2] \rightarrow [2, 4, 2] \rightarrow [2, 2, 4]$$$. We define the graph $$$G$$$ as follows: The vertex $$$v \in [0, 2^n-1]$$$ is a binary string of length $$$n$$$. If $$$v[i]=1$$$, keep $$$nums[i]$$$, otherwise, if $$$v[i]=0$$$, discard $$$nums[i]$$$. For each vertex the out degree is at most $$$2$$$: (1)Set $$$v[i]$$$ to $$$0$$$. (2) If $$$i \geq 1$$$, Set $$$v[i]$$$ to $$$0$$$ and recover $$$v[i-1]$$$ to $$$1$$$. Below is the search tree:

First, the graph $$$G$$$ is a connected tree. It is left to you as an exercise. Feel free to comment if you disagree with it. For a vertex $$$v$$$, let $$$f(v) := S - \sum\limits_{i=0}^{n-1}(1 - v[i]) * nums[i]$$$ ($$$f(v)$$$ is well-defined, there is only one path from root to $$$v$$$). You can verify that $$$f$$$ satisfies the extension property (That is why we need to sort nums), and due to the nature of add, it satisfies the dp property. Even when the abstracted graph is super large, If the degree of each vertex (which decides how many adjacent vertices we need to explore, in this problem, 2) and the calculation_limit is small enough (In this problem, $$$\leq 2000$$$), the dij is feasible even if the abstracted graph is super large, remember to use a priority queue! Below is the code written by 0x3f:

//Original code address: https://leetcode.cn/problems/find-the-k-sum-of-an-array/solution/zhuan-huan-dui-by-endlesscheng-8yiq/
class Solution {
public:
    long long kSum(vector<int> &nums, int k) {
        long sum = 0L;
        for (int &x : nums)
            if (x >= 0) sum += x;
            else x = -x;
        sort(nums.begin(), nums.end());
        priority_queue<pair<long, int>> pq;
        pq.emplace(sum, 0);
        while (--k) {
            auto[sum, i] = pq.top();
            pq.pop();
            if (i < nums.size()) {
                pq.emplace(sum - nums[i], i + 1); //Discard nums[i-1]
                if (i) pq.emplace(sum - nums[i] + nums[i - 1], i + 1); //Keeps nums[i-1]
            }
        }
        return pq.top().first;
    }
};

Example 2:

ABC267E Erasing Vertices 2. Left as practice.

Example 3

ARC150C Path and Subsequence. This problem is super interesting, editorial: https://atcoder.jp/contests/arc150/editorial/4999. There are two sequences $$$A$$$ and $$$B$$$, and an undirected connected graph $$$G=(V(G), E(G))$$$. The problem asks: For every simple path from vertex $$$1$$$ to vertex $$$N$$$, $$$p = (v_1, v_2, ..., v_k);\,v_1=1,\;v_k=N$$$, is $$$B$$$ a (not necessarily contiguous) subsequence of $$$(A_{v_1}, A_{v_2}, ..., A_{v_k})$$$? All sequences are 1-indexed.

For every simple path $$$p$$$ (not necessarily starting from $$$1$$$ and ending at $$$n$$$), we define:

A(p) := list(map(lambda x: A_x, p))

$$$f(p) := \max\{j|[B_1, B_2,...,\,B_j] \text{is a subsequence of}\,A(p)\}$$$.

For example,

$$$A = [1, 2, 3, 5, 2], B = [1, 3, 2]$$$. If $$$p = [1, 2, 5]$$$, $$$A(p) = [A[1], A[2], A[5]] = [1, 2, 2]$$$, $$$f(p)=1$$$, because only the first $$$1$$$ elements of $$$B$$$ ($$$[1]$$$) is a subsequence of $$$A(p)$$$. If $$$p = [1, 2, 3, 4, 5]$$$, $$$A(p) = [A[1], A[2], A[3], A[4], A[5]] = [1, 2, 3, 5, 2]$$$ and $$$f(p) = 3$$$, because the first $$$3$$$ elements of $$$B$$$ ($$$[1, 3, 2]$$$) is a subsequence of $$$A(p)$$$. Does such $$$f$$$ satisfy the three properties? For the first property, we need to correctly initialize $$$f({1}) := [A[1] = B[1]]$$$. Second, the extension property obviously holds. Third, for the dp property, if $$$p_1 = p + [u]$$$:

$$$ f(p_1)= \begin{cases} f(p)+1, A_u = B_{f(p)+1}\\ f(p), A_u \neq B_{f(p)+1} \end{cases} $$$

Therefore, if $$$p$$$ and $$$q$$$ are simple paths with same the same ending, and $$$f(p) > f(q)$$$, we must have $$$f(p + [u]) \geq f(q + [u])$$$. That is because adding one vertex could increase $$$f(q)$$$ at most $$$1$$$. If $$$f(p) = f(q)$$$, Then $$$f(p + [u]) = f(q + [u])$$$. That is because, whether $$$f(p_1)$$$ ($$$f(q_1)$$$) is $$$f(p)$$$ or $$$f(p)+1$$$ ($$$f(q)$$$ or $$$f(q)+1$$$) depends only on $$$A_u$$$ and $$$B_{f(p)+1}$$$, which are the same for $$$p$$$ and $$$q$$$. Therefore, the dp property holds.

You can solve this problem using dij in $$$O(|V(G)| + |E(G)|)$$$.

--- Update: Fix some typos. I do not intentionally occupy the "Recent actions" column.

Full text and comments »

dijkstra, graph theory, functional programming

+156

CristianoPenaldo
19 months ago
16

On Atcoder Beginner Contest 272F -- Two Strings

By CristianoPenaldo, history, 19 months ago, In English

Please note that this blog is a review of this problem. It is not novel enough to be an editorial. This problem may not be difficult for those people who are beyond purple, but very difficult for me.

To be more self-contained, I copy the problem here:

Problem Statement:

You are given strings $$$S$$$ and $$$T$$$ of length $$$n$$$ each, consisting of lowercase English letters.

For a string $$$X$$$ and an integer $$$i$$$, let $$$f(X,i)$$$ be the string obtained by performing on $$$X$$$ the following operation $$$i$$$ times:

Remove the first character of $$$X$$$ and append the same character to the end of $$$X$$$.

Find the number of integer pairs $$$(i,j)$$$ satisfying $$$0≤i,j≤n−1$$$ such that $$$f(S,i) \leq f(T,j)$$$ lexicographically.

For example, when $$$n=3$$$, $$$S=$$$"adb", $$$T=$$$"cab", there are 4 pairs: (1)$$$i=j=0$$$, "adb" $$$\leq$$$ "cab"; (2)$$$i=2, j=0$$$, "bad" $$$\leq$$$ "cab"; (3)$$$i=0, j=2$$$, "adb" $$$\leq$$$ "bca"; (4)$$$i=j=2$$$, "bad" $$$\leq$$$ "bca". You can find more test cases here: https://atcoder.jp/contests/abc272/tasks/abc272_f.

Here the problem statement ends. Now we analyze this problem.

The operation $$$f$$$ is a cyclic shift (CS). String concatenation is a powerful tool to handle string CS, eg, for $$$i=2$$$, $f(S, 2)$= "bad" is a substring of "adbadb". The string concatenation gets rid of mod operation to find index after CS, since mod operation is quite difficult to handle.

Let $$$S2=S+S$$$, $$$T2=T+T$$$. Here we want to count the pair $$$(i, j)$$$ such that $$$S2.substr(i, n) \leq T2.substr(j, n)$$$. But this is $$$O(n^3)$$$. So here we need another tool: suffix array. Suffix array sorts the suffix of a string. For example, the suffix array of "baad" is $$$\{1, 2, 0, 3\}$$$, because the lexicographic order (from small to big) is: ("aad" $$$1$$$, "ad" $$$2$$$, "baad" $$$0$$$, "d" $$$3$$$). Please note that the SA-IS algorithm could compute suffix array efficiently enough in O(Length of String). Here is the example code, you may find a C++ implementation of SA-IS in Ac-library:

#include <bits/stdc++.h>
using namespace std;

template<typename T, T...>
struct myinteger_sequence { };

template<typename T, typename S1 = void, typename S2 = void>
struct helper{
    std::string operator()(const T& s){
        return std::string(s);
    }
}; 

template<typename T>
struct helper<T, decltype((void)std::to_string(std::declval<T>())), typename std::enable_if<!std::is_same<typename std::decay<T>::type, char>::value, void>::type>{
    std::string operator()(const T& s){
        return std::to_string(s);
    }
};

template<typename T>
struct helper<T, void, typename std::enable_if<std::is_same<typename std::decay<T>::type, char>::value, void>::type>{
    std::string operator()(const T& s){
        return std::string(1, s);
    }
};

template<typename T, typename S1 =void, typename S2 =void>
struct seqhelper{
    const static bool seq = false;
};

template<typename T>
struct seqhelper<T, decltype((void)(std::declval<T>().begin())), decltype((void)(std::declval<T>().end()))>{
    const static bool seq = !(std::is_same<typename std::decay<T>::type, std::string>::value);
};

template<std::size_t N, std::size_t... I>
struct gen_indices : gen_indices<(N - 1), (N - 1), I...> { };
template<std::size_t... I>
struct gen_indices<0, I...> : myinteger_sequence<std::size_t, I...> { };

template<typename T, typename REDUNDANT = void>
struct converter{
    template<typename H>
    std::string& to_string_impl(std::string& s, H&& h)
    {
        using std::to_string;
        s += converter<H>().convert(std::forward<H>(h));
        return s;    
    }

    template<typename H, typename... T1>
    std::string& to_string_impl(std::string& s, H&& h, T1&&... t)
    {
        using std::to_string;
        s += converter<H>().convert(std::forward<H>(h)) + ", ";
        return to_string_impl(s, std::forward<T1>(t)...);
    }

    template<typename... T1, std::size_t... I>
    std::string mystring(const std::tuple<T1...>& tup, myinteger_sequence<std::size_t, I...>)
    {
        std::string result;
        to_string_impl(result, std::get<I>(tup)...);
        return result;
    }

    template<typename... S>
    std::string mystring(const std::tuple<S...>& tup)
    {
        return mystring(tup, gen_indices<sizeof...(S)>{});
    }

    template<typename S=T>
    std::string convert(const S& x){
        return helper<S>()(x);
    }

    template<typename... S>
    std::string convert(const std::tuple<S...>& tup){
        std::string res = std::move(mystring(tup));
        res = "{" + res + "}";
        return res;
    }

    template<typename S1, typename S2>
    std::string convert(const std::pair<S1, S2>& x){
        return "{" + converter<S1>().convert(x.first) + ", " + converter<S2>().convert(x.second) + "}";
    }
};

template<typename T>
struct converter<T, typename std::enable_if<seqhelper<T>::seq, void>::type>{
    template<typename S=T>
    std::string convert(const S& x){
        int len = 0;
        std::string ans = "{";
        for(auto it = x.begin(); it != x.end(); ++it){
            ans += move(converter<typename S::value_type>().convert(*it)) + ", ";
            ++len;
        }
        if(len == 0) return "{[EMPTY]}";
        ans.pop_back(), ans.pop_back();
        return ans + "}";
    }
};

template<typename T>
std::string luangao(const T& x){
    return converter<T>().convert(x);
}

namespace atcoder {

namespace internal {

std::vector<int> sa_naive(const std::vector<int>& s) {
    int n = int(s.size());
    std::vector<int> sa(n);
    std::iota(sa.begin(), sa.end(), 0);
    std::sort(sa.begin(), sa.end(), [&](int l, int r) {
        if (l == r) return false;
        while (l < n && r < n) {
            if (s[l] != s[r]) return s[l] < s[r];
            l++;
            r++;
        }
        return l == n;
    });
    return sa;
}

std::vector<int> sa_doubling(const std::vector<int>& s) {
    int n = int(s.size());
    std::vector<int> sa(n), rnk = s, tmp(n);
    std::iota(sa.begin(), sa.end(), 0);
    for (int k = 1; k < n; k *= 2) {
        auto cmp = [&](int x, int y) {
            if (rnk[x] != rnk[y]) return rnk[x] < rnk[y];
            int rx = x + k < n ? rnk[x + k] : -1;
            int ry = y + k < n ? rnk[y + k] : -1;
            return rx < ry;
        };
        std::sort(sa.begin(), sa.end(), cmp);
        tmp[sa[0]] = 0;
        for (int i = 1; i < n; i++) {
            tmp[sa[i]] = tmp[sa[i - 1]] + (cmp(sa[i - 1], sa[i]) ? 1 : 0);
        }
        std::swap(tmp, rnk);
    }
    return sa;
}

// SA-IS, linear-time suffix array construction
// Reference:
// G. Nong, S. Zhang, and W. H. Chan,
// Two Efficient Algorithms for Linear Time Suffix Array Construction
template <int THRESHOLD_NAIVE = 10, int THRESHOLD_DOUBLING = 40>
std::vector<int> sa_is(const std::vector<int>& s, int upper) {
    int n = int(s.size());
    if (n == 0) return {};
    if (n == 1) return {0};
    if (n == 2) {
        if (s[0] < s[1]) {
            return {0, 1};
        } else {
            return {1, 0};
        }
    }
    if (n < THRESHOLD_NAIVE) {
        return sa_naive(s);
    }
    if (n < THRESHOLD_DOUBLING) {
        return sa_doubling(s);
    }

    std::vector<int> sa(n);
    std::vector<bool> ls(n);
    for (int i = n - 2; i >= 0; i--) {
        ls[i] = (s[i] == s[i + 1]) ? ls[i + 1] : (s[i] < s[i + 1]);
    }
    std::vector<int> sum_l(upper + 1), sum_s(upper + 1);
    for (int i = 0; i < n; i++) {
        if (!ls[i]) {
            sum_s[s[i]]++;
        } else {
            sum_l[s[i] + 1]++;
        }
    }
    for (int i = 0; i <= upper; i++) {
        sum_s[i] += sum_l[i];
        if (i < upper) sum_l[i + 1] += sum_s[i];
    }

    auto induce = [&](const std::vector<int>& lms) {
        std::fill(sa.begin(), sa.end(), -1);
        std::vector<int> buf(upper + 1);
        std::copy(sum_s.begin(), sum_s.end(), buf.begin());
        for (auto d : lms) {
            if (d == n) continue;
            sa[buf[s[d]]++] = d;
        }
        std::copy(sum_l.begin(), sum_l.end(), buf.begin());
        sa[buf[s[n - 1]]++] = n - 1;
        for (int i = 0; i < n; i++) {
            int v = sa[i];
            if (v >= 1 && !ls[v - 1]) {
                sa[buf[s[v - 1]]++] = v - 1;
            }
        }
        std::copy(sum_l.begin(), sum_l.end(), buf.begin());
        for (int i = n - 1; i >= 0; i--) {
            int v = sa[i];
            if (v >= 1 && ls[v - 1]) {
                sa[--buf[s[v - 1] + 1]] = v - 1;
            }
        }
    };

    std::vector<int> lms_map(n + 1, -1);
    int m = 0;
    for (int i = 1; i < n; i++) {
        if (!ls[i - 1] && ls[i]) {
            lms_map[i] = m++;
        }
    }
    std::vector<int> lms;
    lms.reserve(m);
    for (int i = 1; i < n; i++) {
        if (!ls[i - 1] && ls[i]) {
            lms.push_back(i);
        }
    }

    induce(lms);

    if (m) {
        std::vector<int> sorted_lms;
        sorted_lms.reserve(m);
        for (int v : sa) {
            if (lms_map[v] != -1) sorted_lms.push_back(v);
        }
        std::vector<int> rec_s(m);
        int rec_upper = 0;
        rec_s[lms_map[sorted_lms[0]]] = 0;
        for (int i = 1; i < m; i++) {
            int l = sorted_lms[i - 1], r = sorted_lms[i];
            int end_l = (lms_map[l] + 1 < m) ? lms[lms_map[l] + 1] : n;
            int end_r = (lms_map[r] + 1 < m) ? lms[lms_map[r] + 1] : n;
            bool same = true;
            if (end_l - l != end_r - r) {
                same = false;
            } else {
                while (l < end_l) {
                    if (s[l] != s[r]) {
                        break;
                    }
                    l++;
                    r++;
                }
                if (l == n || s[l] != s[r]) same = false;
            }
            if (!same) rec_upper++;
            rec_s[lms_map[sorted_lms[i]]] = rec_upper;
        }

        auto rec_sa =
            sa_is<THRESHOLD_NAIVE, THRESHOLD_DOUBLING>(rec_s, rec_upper);

        for (int i = 0; i < m; i++) {
            sorted_lms[i] = lms[rec_sa[i]];
        }
        induce(sorted_lms);
    }
    return sa;
}

}  // namespace internal

std::vector<int> suffix_array(const std::vector<int>& s, int upper) {
    assert(0 <= upper);
    for (int d : s) {
        assert(0 <= d && d <= upper);
    }
    auto sa = internal::sa_is(s, upper);
    return sa;
}

template <class T> std::vector<int> suffix_array(const std::vector<T>& s) {
    int n = int(s.size());
    std::vector<int> idx(n);
    iota(idx.begin(), idx.end(), 0);
    sort(idx.begin(), idx.end(), [&](int l, int r) { return s[l] < s[r]; });
    std::vector<int> s2(n);
    int now = 0;
    for (int i = 0; i < n; i++) {
        if (i && s[idx[i - 1]] != s[idx[i]]) now++;
        s2[idx[i]] = now;
    }
    return internal::sa_is(s2, now);
}

std::vector<int> suffix_array(const std::string& s) {
    int n = int(s.size());
    std::vector<int> s2(n);
    for (int i = 0; i < n; i++) {
        s2[i] = s[i];
    }
    return internal::sa_is(s2, 255);
}

}  // namespace atcoder

int main(){
    vector<int> v = move(atcoder::suffix_array("baad"));
    cout << luangao(v) << "\n"; //{1, 2, 0, 3}
}

Let $$$Sf(S, i)$$$ be the suffix of S from index $$$i$$$ ($$$0$$$-indexed). For example, $$$Sf$$$("abcd", 1) = "bcd". The key idea is making a transform $$$g$$$ of $$$S, T$$$, together with their indices, such that $$$S2.substr(i, n) \leq T2.substr(j, n) \leftrightarrow Sf(g(S2, T2), g(i)) < Sf(g(S2, T2), g(j))$$$, where $$$g(S2, T2)$$$ is a large string obtained from $$$S2, T2$$$, and $$$g(i), g(j)$$$ are the transformed indices (because $$$i, j$$$ correspond to other indices different from $$$i, j$$$ in the large string $$$g(S2, T2)$$$). We need to transform the comparison of substrings to the comparison of suffices, by doing so we can utilize the powerful tool: SA-IS. Please note that there are never two equal suffices of a string because of different lengths.

One failed construction is $$$g(S2, T2) := T2 + S2$$$. For example, $$$T=$$$"zabf", $$$T2=$$$"zabfzabf", $$$S=$$$"abfz", $$$S2=$$$"abfzabfz", T2+S2="zabfzabfabfzabfz". One can check that, the suffix corresponding to the first "abf" ("abfzabfabfzabfz") < the suffix corresponding to the second "abf" ("abfzabfz").

Let's review the reason for the failure of our aforementioned construction. First, $$$Sf(T2+S2, i+2n) < Sf(T2+S2, j) \rightarrow S2.substr(i, n) \leq T2.substr(j, n)$$$ is obviously correct. If $$$S2.substr(i, n) < T2.substr(j, n)$$$, then $$$Sf(T2+S2, i+2n) < Sf(T2+S2, j)$$$. The problem occurs when $$$S2.substr(i, n) = T2.substr(j, n)$$$. We can imagine there are two pointers $$$p, q$$$ whose initial places are $$$i+2n, j$$$, respectively. Pointer $$$p, q$$$ go simultaneously to the end of the string $$$T2+S2$$$, each step they travel one character. If $$$j > i$$$, then $$$q$$$ arrives index $$$2n-1$$$ earlier than $$$p$$$ arrives index $$$4n-1$$$, therefore $$$q$$$ would go into the $$$S$$$ area and lose control.

Both $$$S2$$$ and $$$T2$$$ are circular. If $$$S2.substr(i, n) = T2.substr(j, n)$$$, then $$$S2.substr(i, 2n-max(i, j)) = T2.substr(j, 2n-max(i, j))$$$. For example $$$T2=$$$"zabfzabf", $$$S2=$$$"abfzabfz", $$$i=1, j=0, n=4$$$, then $$$2n-max(i, j)=7$$$, "abfzabf"=="abfzabf". When the pointers $$$p, q$$$ are in the area of $$$S2, T2$$$, respectively, they are "safe". The key problem is pointer $$$q$$$ might wander out $$$T2$$$. So we need to protect $$$q$$$ when $$$q$$$ is out $$$T2$$$. A key idea is to add protection. We construct $$$g(T2, S2)$$$ as $$$T2$$$ + string(n-1, 'z') + $$$S2$$$. The string(n-1, 'z') is a safe zone. Since $$$0 \leq i,j \leq n-1$$$, When $$$p$$$ reaches the end of the string, it is guaranteed that $$$q$$$ is in $$$T2$$$ or the safe zone string(n-1, 'z') and $$$SF(g(T2, S2), j) > SF(g(T2, S2), i + 3n-1)$$$. Reason of $$$3n-1$$$: The length of $$$T2$$$ is $$$2n$$$ and the length of the safe zone is $$$n-1$$$. $$$2n + n-1 = 3n-1$$$.

Wrong answer (during contest) without the "safe zone": WA.

Correct answer (after contest, sad, QwQ) with a "safe zone": AC.

Core code:

int main(){
    int n;
    string s, t;
    cin >> n >> s >> t;
    string large = t + t + string(n-1, 'z') + s + s;
    vector<int> v = move(atcoder::suffix_array(large)); //O(n)
    ll ans = 0, snum = 0;
    //You can list S-T suffice like S S T S S T T...
    //When you meet T, add the number of S before that T to answer.
    for(int i = 0; i < 5*n-1; ++i){
    	int x = v[i];
    	if(x >= 3*n-1 && x <= 4*n-2){
    		++snum;
    	}
    	if(x >= 0 && x <= n-1){
    		ans += snum;
    	}
    }
    cout << ans << "\n";
}

Full text and comments »

string, suffix array

CristianoPenaldo
19 months ago
4