[Variants] An interesting counting problem related to square product

#	User	Rating
1	ecnerwala	3649
2	Benq	3581
3	orzdevinwang	3570
4	Geothermal	3569
4	cnnfls_csy	3569
6	tourist	3565
7	maroonrk	3531
8	Radewoosh	3521
9	Um_nik	3482
10	jiangly	3468

#	User	Contrib.
1	maomao90	174
2	awoo	164
3	adamant	161
4	TheScrasse	159
5	nor	158
6	maroonrk	156
7	-is-this-fft-	151
8	SecondThread	147
9	orz	146
10	pajenegod	145

The statement:

Given three integers $$$n, k, p$$$, $$$(1 \leq k \leq n < p)$$$.

Count the number of array $$$a[]$$$ of size $$$k$$$ that satisfied

$$$1 \leq a_1 < a_2 < \dots < a_k \leq n$$$
$$$a_i \times a_j$$$ is perfect square $$$\forall 1 \leq i < j \leq k$$$

Since the result can be big, output it under modulo $$$p$$$.

For convenient, you can assume $$$p$$$ is a large constant prime $$$10^9 + 7$$$

Notice that in this blog, we will solve for generalized harder variants

For original problem you can see in this blog [Tutorial] An interesting counting problem related to square product

Extra Tasks

These are harder variants, and generalization from the original problem. You can see more detail here

*Marked as solved only if tested with atleast $$$10^6$$$ queries

Solved A: Can we also use phi function or something similar to solve for $$$k = 2$$$ in $$$O(\sqrt{n})$$$ or faster ?

Solved B: Can we also use phi function or something similar to solve for general $$$k$$$ in $$$O(\sqrt{n})$$$ or faster ?

Solved C: Can we also solve the problem where there can be duplicate: $$$a_i \leq a_j\ (\forall\ i < j)$$$ and no longer $$$a_i < a_j (\forall\ i < j)$$$ ?

Solved D: Can we solve the problem where there is no restriction between $$$k, n, p$$$ ?

Solved E: Can we solve for negative integers, whereas $$$-n \leq a_1 < a_2 < \dots < a_k \leq n$$$ ?

Solved F: Can we solve for a specific range, whereas $$$L \leq a_1 < a_2 < \dots < a_k \leq R$$$ ?

Solved G: Can we solve for cube product $$$a_i \times a_j \times a_t$$$ effectively ?

H: Can we solve if it is given $$$n$$$ and queries for $$$k$$$ ?

I: Can we solve if it is given $$$k$$$ and queries for $$$n$$$ ?

J: Can we also solve the problem where there are no order: Just simply $$$1 \leq a_i \leq n$$$ ?

K: Can we also solve the problem where there are no order: Just simply $$$0 \leq a_i \leq n$$$ ?

M: Can we solve for $$$q$$$-product $$$a_{i_1} \times a_{i_2} \times \dots \times a_{i_q} = x^q$$$ (for given constant $$$q$$$) ?

N: Given $$$0 \leq \delta \leq n$$$, can we also solve the problem when $$$1 \leq a_1 \leq a_1 + \delta + \leq a_2 \leq a_2 + \delta \leq \dots \leq a_k \leq n$$$ ?

O: What if the condition is just two nearby elements and not all pairs. Or you can say $$$a_i \times a_{i+1} \forall 1 \leq i < n$$$ is a perfect square ?

A better solution for k = 2

Extra task A

Problem

Easy Version

Hard Version

Examples

Example 1

Input 1:

Output 1:

Explanation 1:

There are no satisfied integer pair $$$(a, b)$$$ that $$$1 \leq a < b \leq 1$$$

Example 2

Input 2:

Output 2:

Explanation 2:

There are $$$4$$$ satisfied pairs: {$$$1, 4$$$}, {$$$1, 9$$$}, {$$$2, 8$$$}, {$$$4, 9$$$}.

Example 3

Input 3:

Output 3:

Explanation 3:

There are $$$16$$$ satisfied pairs: {$$$1, 4$$$}, {$$$1, 9$$$}, {$$$1, 16$$$}, {$$$1, 25$$$}, {$$$2, 8$$$}, {$$$2, 18$$$}, {$$$3, 12$$$}, {$$$4, 9$$$}, {$$$4, 16$$$}, {$$$4, 25$$$}, {$$$5, 20$$$}, {$$$6, 24$$$}, {$$$8, 18$$$}, {$$$9, 16$$$}, {$$$9, 25$$$}, {$$$16, 25$$$}.

Idea

Observation

Definition

Property

Formula

Implementation

O(sqrt n log log sqrt n) solution

#include <iostream>
#include <cstring>
#include <numeric>
#include <cmath>

using namespace std;

const int MOD = 1e9 + 7;
const int LIM = 1e7 + 17;
const int SQRT_LIM = ceil(sqrt(LIM) + 1) + 1;

int euler[SQRT_LIM];
void sieve_phi(int n)
{
    iota(euler, euler + n + 1, 0);
    for (int x = 2; x <= n; x++) if (euler[x] == x)
        for (int j = x; j <= n; j += x)
            euler[j] -= euler[j] / x;
}

int solve(int n)
{
    sieve_phi(ceil(sqrt(n) + 1) + 1);
    
    long long res = 0;
    for (int p = 2; p * p <= n; ++p)
        res += 1LL * euler[p] * (n / (p * p));

    res %= MOD;
    return res;
}

int main()
{
    ios::sync_with_stdio(false);
    cin.tie(NULL);
    
    int n;
    cin >> n;
    cout << solve(n);
    return 0;
}

O(sqrt) solution

#include <iostream>
#include <cstring>
#include <numeric>
#include <vector>
#include <cmath>

using namespace std;

const int MOD = 1e9 + 7;

vector<int> lpf;
vector<int> prime;
vector<int> euler;
void linear_sieve_phi(int n)
{
    lpf.assign(n + 1, 0);
    euler.assign(n + 1, 1);
    for (int x = 2; x <= n; ++x)
    {
        if (lpf[x] == 0)
        {
            prime.push_back(lpf[x] = x);
            euler[x] = x - 1;                    
        }
        for (int i = 0; i < prime.size() && x * prime[i] <= n; ++i)
        {
            lpf[x * prime[i]] = prime[i];
            if (x % prime[i] == 0) {
                euler[x * prime[i]] = euler[x] * prime[i];    
                break;
            }
            euler[x * prime[i]] = euler[x] * euler[prime[i]];
        }
    }
}

int solve(int n)
{
    linear_sieve_phi(ceil(sqrt(n) + 1) + 1);
    
    long long res = 0;
    for (int p = 2; p * p <= n; ++p)
        res += 1LL * euler[p] * (n / (p * p));

    res %= MOD;
    return res;
}

int main()
{
    ios::sync_with_stdio(false);
    cin.tie(NULL);
    
    int n;
    cin >> n;
    cout << solve(n);
    return 0;
}

Complexity

Hint

A better solution for general k

Extra task B

Problem

Very Easy Version

Easy Version

Hard Version

Examples

Example 1

Input 1:

2 1

Output 1:

Explanation 1:

There are $$$2$$$ satisfied array of size $$$1$$$: {$$$1$$$}, {$$$2$$$}.

Example 2

Input 2:

10 2

Output 2:

Explanation 2:

There are $$$4$$$ satisfied array of size $$$2$$$: {$$$1, 4$$$}, {$$$1, 9$$$}, {$$$2, 8$$$}, {$$$4, 9$$$}.

Example 3

Input 3:

27 3

Output 3:

Explanation 3:

There are $$$12$$$ satisfied array of size $$$3$$$: {$$$1, 4, 9$$$}, {$$$1, 4, 16$$$}, {$$$1, 4, 25$$$}, {$$$1, 9, 16$$$}, {$$$1, 9, 25$$$}, {$$$1, 16, 25$$$}, {$$$2, 8, 18$$$}, {$$$3, 12, 27$$$}, {$$$4, 9, 16$$$}, {$$$4, 9, 25$$$}, {$$$4, 16, 25$$$}, {$$$9, 16, 25$$$}.

Idea

Definition

The formula

Implementation

O(sqrt n log sqrt n)

const int LIM = 5e6 + 56;
const int SQRT_LIM = ceil(sqrt(LIM) + 1) + 1;
const int MOD = 1e9 + 7;

/// Precalculating factorials under prime modulo
int fact[SQRT_LIM + 10]; /// fact[n] = n!
int invs[SQRT_LIM + 10]; /// invs[n] = n^(-1)
int tcaf[SQRT_LIM + 10]; /// tcaf[n] = (n!)^(-1)
void precal_nck(int n = SQRT_LIM)
{
    fact[0] = fact[1] = 1;
    invs[0] = invs[1] = 1;
    tcaf[0] = tcaf[1] = 1;
    for (int i = 2; i <= n; ++i)
    {
        fact[i] = (1LL * fact[i - 1] * i) % MOD;
        invs[i] = MOD - 1LL * (MOD / i) * invs[MOD % i] % MOD;
        tcaf[i] = (1LL * tcaf[i - 1] * invs[i]) % MOD;
    }
}

/// Calculating binomial coefficient queries
int nck(int n, int k)
{
    k = min(k, n - k);
    if (k < 0) return 0;

    long long res = fact[n];
    res *= tcaf[k];         res %= MOD;
    res *= tcaf[n - k];     res %= MOD;
    return res;
}

/// Linear Sieve
vector<int> prime;           /// prime list              = A000040
bool isPrime[SQRT_LIM + 10]; /// characteristic function = A010051
int lpf[SQRT_LIM + 10];      /// lowest prime factor     = A020639
int mu[SQRT_LIM + 10];       /// mobius                  = A008683
void linear_sieve(int n)
{
    if (n < 1) return ;
    /// Extension Sieve || You can add something more
    memset(lpf, 0, sizeof(lpf[0]) * (n + 1));
    fill_n(mu, n + 1, 1);
    /// Main Sieve || Without this, you barely able to achive linear complexity
    prime.clear();
    prime.reserve(n / log(n - 1));
    memset(isPrime, true, sizeof(isPrime[0]) * (n + 1));
    isPrime[0] = isPrime[1] = false;
    for (int x = 2; x <= n; ++x) /// For each number
    {
        if (isPrime[x]) /// Func[Prime]
        {
            mu[x] = -1;
            lpf[x] = x;
            prime.push_back(x);
        }
        for (int p : prime) /// Func[Prime * X] <- Func[Prime]
        {
            if (p > lpf[x] || x * p > n) break;
            isPrime[x * p] = 0;
            lpf[x * p] = p;
            mu[x * p] = (lpf[x] == p) ? 0 : -mu[x];
        }
    }
}

/// Divisor sieve
vector<int> divisors[SQRT_LIM];
void precal_div(int n) /// O(n log n)
{
    for (int u = n; u >= 1; --u)
    {
        divisors[u].clear();
        for (int v = u; v <= n; v += u)
            divisors[v].push_back(u);
    }
}

/// Solving for n, k
long long solve(int n, int k)
{
    /// We only care for d that 1 <= d <= sqrt(n)
    int t = ceil(sqrt(n) + 1) + 1;
    linear_sieve(t);
    precal_nck(t);
    precal_div(t);

    long long res = 0;
    for (int d = 1; d * d <= n; ++d) /// For each fixed p^2
    {
        long long sum = 0;
        for (int p : divisors[d]) /// For each (p | d)
            sum += mu[d / p] * nck(p - 1, k - 1);

        sum %= MOD;
        res += sum * (n / (d * d));
    }

    res %= MOD;
    return res;
}

int main()
{
    ios::sync_with_stdio(false);
    cin.tie(NULL);

    /// Assumming constant p = 10^9 + 7
    int n, k;
    cin >> n >> k;
    cout << solve(n, k);
    return 0;
}

O(sqrt log log sqrt n)

vector<int> prime;           /// prime list              = A000040
bool isPrime[SQRT_LIM + 10]; /// characteristic function = A010051
int lpf[SQRT_LIM + 10];      /// lowest prime factor     = A020639
void linear_sieve(int n)
{
    if (n < 1) return ;
    prime.clear();
    prime.reserve(n / log(n - 1));
    memset(lpf, 0, sizeof(lpf[0]) * (n + 1));
    memset(isPrime, true, sizeof(isPrime[0]) * (n + 1));
    isPrime[0] = isPrime[1] = false;
    for (int x = 2; x <= n; ++x)
    {
        if (isPrime[x]) /// Func[Prime]
        {
            lpf[x] = x;
            prime.push_back(x);
        }
        for (int p : prime) /// Func[Prime * X] <- Func[Prime]
        {
            if (p > lpf[x] || x * p > n) break;
            isPrime[x * p] = 0;
            lpf[x * p] = p;
        }
    }
}

long long res[SQRT_LIM + 10];
long long solve(int n, int k)
{
    int t = ceil(sqrt(n) + 1) + 1;
    linear_sieve(t);
    precal_nck(t);

    memset(res, 0, sizeof(res[0]) * (t + 1));
    for (int d = 1; d * d <= n; ++d) 
        res[d] = nck(d - 1, k - 1);

    for (int p : prime)
        for (int d = t / p; d > 0; --d)
            res[d * p] -= res[d];

    long long ans = 0;
    for (int d = 1; d * d <= n; ++d)
        ans += res[d] * (n / (d * d));

    ans %= MOD;
    return ans;
}

But while doing research for task H, I found an improvement

O(sqrt (n/k) log log sqrt(n/k) - k) solution

vector<int> valid;
int cnt[SQRT_LIM];
bool is_squarefree[LIM];
long long res[SQRT_LIM + 10];
int solve(int n, int k)
{
    int t = ceil(sqrt(n) + 0.5);
    if (k > t) return 0;
    linear_sieve(t);
    precal_nck(t);

    memset(res, 0, sizeof(res[0]) * (t - k + 1));
    for (int d = k; d * d <= n; ++d) 
        res[d - k] = nck(d - 1, k - 1);

    for (int p : prime)
        for (int d = t / p; d >= k; --d)
            res[d * p - k] -= res[d];

    long long ans = 0;
    for (int d = k; d <= t; ++d)
        ans += res[d - k] * (n / (d * d));

    ans %= MOD;
    return ans;
}

Complexity

The first implementation

The second implementation

The third implementation

Solution for duplicates elements in array

Extra task C

Problem

Given $$$k, n (1 \leq k \leq n \leq 10^9)$$$, count the number of array $$$a[]$$$ of size $$$k$$$ that satisfied

$$$1 \leq a_1 \leq a_2 \leq \dots \leq a_k \leq n$$$
$$$a_i \times a_j$$$ is perfect square $$$\forall 1 \leq i < j \leq k$$$

Since the result can be big, output it under modulo $$$10^9 + 7$$$.

Idea

Observation

Calculation

Implementation

O(n) solution


int fact[SQRT_LIM + 10];
int invs[SQRT_LIM + 10];
int tcaf[SQRT_LIM + 10];
void precal_nck(int n = SQRT_LIM)
{
    fact[0] = fact[1] = 1;
    invs[0] = invs[1] = 1;
    tcaf[0] = tcaf[1] = 1;
    for (int i = 2; i <= n; ++i)
    {
        fact[i] = (1LL * fact[i - 1] * i) % MOD;
        invs[i] = MOD - 1LL * (MOD / i) * invs[MOD % i] % MOD;
        tcaf[i] = (1LL * tcaf[i - 1] * invs[i]) % MOD;
    }
}

int nck(int n, int k)
{
    k = min(k, n - k);
    if (k < 0) return 0;

    long long res = fact[n];
    res *= tcaf[k];         res %= MOD;
    res *= tcaf[n - k];     res %= MOD;
    return res;
}

bool is_squarefree[LIM];
int solve(int n, int k)
{
    memset(is_squarefree, true, sizeof(is_squarefree[0]) * (n + 1));
    precal_nck(2 * n + 1);

    long long res = 0;
    for (int i = 1, j; i <= n; ++i) if (is_squarefree[i]) 
    {
        for (j = 1; i * j * j <= n; ++j)
            is_squarefree[i * j * j] = false;

        res += nck(k + j - 2, k);
    }

    res %= MOD;
    return res;
}

O(sqrt n log sqrt n + k) solution

const int LIM = 5e6 + 56;
const int SQRT_LIM = ceil(sqrt(LIM) + 1) + 1;
const int MOD = 1e9 + 7;

/// Precalculating factorials under prime modulo
int fact[SQRT_LIM + 10]; /// fact[n] = n!
int invs[SQRT_LIM + 10]; /// invs[n] = n^(-1)
int tcaf[SQRT_LIM + 10]; /// tcaf[n] = (n!)^(-1)
void precal_nck(int n = SQRT_LIM)
{
    fact[0] = fact[1] = 1;
    invs[0] = invs[1] = 1;
    tcaf[0] = tcaf[1] = 1;
    for (int i = 2; i <= n; ++i)
    {
        fact[i] = (1LL * fact[i - 1] * i) % MOD;
        invs[i] = MOD - 1LL * (MOD / i) * invs[MOD % i] % MOD;
        tcaf[i] = (1LL * tcaf[i - 1] * invs[i]) % MOD;
    }
}

/// Calculating binomial coefficient queries
int nck(int n, int k)
{
    k = min(k, n - k);
    if (k < 0) return 0;

    long long res = fact[n];
    res *= tcaf[k];         res %= MOD;
    res *= tcaf[n - k];     res %= MOD;
    return res;
}

/// Linear Sieve
vector<int> prime;           /// prime list              = A000040
bool isPrime[SQRT_LIM + 10]; /// characteristic function = A010051
int lpf[SQRT_LIM + 10];      /// lowest prime factor     = A020639
int mu[SQRT_LIM + 10];       /// mobius                  = A008683
void linear_sieve(int n)
{
    if (n < 1) return ;
    /// Extension Sieve || You can add something more
    memset(lpf, 0, sizeof(lpf[0]) * (n + 1));
    fill_n(mu, n + 1, 1);
    /// Main Sieve || Without this, you barely able to achive linear complexity
    prime.clear();
    prime.reserve(n / log(n - 1));
    memset(isPrime, true, sizeof(isPrime[0]) * (n + 1));
    isPrime[0] = isPrime[1] = false;
    for (int x = 2; x <= n; ++x) /// For each number
    {
        if (isPrime[x]) /// Func[Prime]
        {
            mu[x] = -1;
            lpf[x] = x;
            prime.push_back(x);
        }
        for (int p : prime) /// Func[Prime * X] <- Func[Prime]
        {
            if (p > lpf[x] || x * p > n) break;
            isPrime[x * p] = 0;
            lpf[x * p] = p;
            mu[x * p] = (lpf[x] == p) ? 0 : -mu[x];
        }
    }
}

/// Divisor sieve
vector<int> divisors[SQRT_LIM];
void precal_div(int n) /// O(n log n)
{
    for (int u = n; u >= 1; --u)
    {
        divisors[u].clear();
        for (int v = u; v <= n; v += u)
            divisors[v].push_back(u);
    }
}

/// Solving for n, k
long long solve(int n, int k)
{
    /// We only care for d that 1 <= d <= sqrt(n)
    int t = ceil(sqrt(n) + 1) + 1;
    linear_sieve(t);
    precal_nck(t);
    precal_div(t);

    long long res = 0;
    for (int d = 1; d * d <= n; ++d) /// For each fixed p^2
    {
        long long sum = 0;
        for (int p : divisors[d]) /// For each (p | d)
            sum += mu[d / p] * nck(d + k - 2, k - 1);

        sum %= MOD;
        res += sum * (n / (d * d));
    }

    res %= MOD;
    return res;
}

int main()
{
    ios::sync_with_stdio(false);
    cin.tie(NULL);

    /// Assumming constant p = 10^9 + 7
    int n, k;
    cin >> n >> k;
    cout << solve(n, k);
    return 0;
}

O(sqrt n log log sqrt n + k) solution

int fact[SQRT_LIM + 10];
int invs[SQRT_LIM + 10];
int tcaf[SQRT_LIM + 10];
void precal_nck(int n = SQRT_LIM)
{
    fact[0] = fact[1] = 1;
    invs[0] = invs[1] = 1;
    tcaf[0] = tcaf[1] = 1;
    for (int i = 2; i <= n; ++i)
    {
        fact[i] = (1LL * fact[i - 1] * i) % MOD;
        invs[i] = MOD - 1LL * (MOD / i) * invs[MOD % i] % MOD;
        tcaf[i] = (1LL * tcaf[i - 1] * invs[i]) % MOD;
    }
}

int nck(int n, int k)
{
    k = min(k, n - k);
    if (k < 0) return 0;

    long long res = fact[n];
    res *= tcaf[k];         res %= MOD;
    res *= tcaf[n - k];     res %= MOD;
    return res;
}

vector<int> prime;           /// prime list              = A000040
bool isPrime[SQRT_LIM + 10]; /// characteristic function = A010051
int lpf[SQRT_LIM + 10];      /// lowest prime factor     = A020639
void linear_sieve(int n)
{
    if (n < 1) return ;
    prime.clear();
    prime.reserve(n / log(n - 1));
    memset(lpf, 0, sizeof(lpf[0]) * (n + 1));
    memset(isPrime, true, sizeof(isPrime[0]) * (n + 1));
    isPrime[0] = isPrime[1] = false;
    for (int x = 2; x <= n; ++x)
    {
        if (isPrime[x]) /// Func[Prime]
        {
            lpf[x] = x;
            prime.push_back(x);
        }
        for (int p : prime) /// Func[Prime * X] <- Func[Prime]
        {
            if (p > lpf[x] || x * p > n) break;
            isPrime[x * p] = 0;
            lpf[x * p] = p;
        }
    }
}

long long res[SQRT_LIM + 10];
long long solve(int n, int k)
{
    int t = ceil(sqrt(n) + 1) + 1;
    linear_sieve(t + k);
    precal_nck(t + k);

    memset(res, 0, sizeof(res[0]) * (t + 1));
    for (int d = 1; d * d <= n; ++d) 
        res[d] = nck(d + k - 2, k - 1);

    for (int p : prime)
        for (int d = t / p; d > 0; --d)
            res[d * p] -= res[d];

    long long ans = 0;
    for (int d = 1; d * d <= n; ++d)
        ans += res[d] * (n / (d * d));

    ans %= MOD;
    return ans;
}

Complexity

The first implementation

The second and third implementation

Solution when there are no restriction between k, n, p

Extra task D

Problem

Given $$$k, n, p (1 \leq k, n, p \leq 10^9)$$$, count the number of array $$$a[]$$$ of size $$$k$$$ that satisfied

$$$1 \leq a_1 < a_2 < \dots < a_k \leq n$$$
$$$a_i \times a_j$$$ is perfect square $$$\forall 1 \leq i < j \leq k$$$

Since the result can be big, output it under modulo $$$p$$$.

Idea

Observation

Large prime p

For large prime $$$p > max(n, k)$$$

Just using normal combinatorics related to factorial (since $$$p > max(n, k)$$$ nothing will affect the result)
For taking divides under modulo you can just take modular inversion (as a prime always exist such number)
Yet this is standard problem, just becareful of the overflow part
You can also optimize by precalculating factorial, inversion number and inversion factorial in linear too

For general prime $$$p$$$

We can just ignore factors $$$p$$$ in calculating $$$n!$$$.
You also need to know how many times factor $$$p$$$ appears in $$$1 \dots n$$$
Then combining it back when calculating for the answer.
If we dont do this $$$n!$$$ become might divides some factors of $$$p$$$.
By precalculation you can answer queries in $$$O(1)$$$

For squarefree $$$p$$$

Factorize $$$p = p_1 \times p_2 \times p_q$$$ that all $$$p_i$$$ is prime.
Ignore all factors $$$p_i$$$ when calculate $$$n!$$$.
Remember to calculate how many times factors $$$p_i$$$ appear in $$$1 \dots n$$$.
When query for the answer we just combine all those part back.
Remember you can just take modulo upto $$$\phi(p)$$$ which you can also calculate while factorizing $$$p$$$.
Remember that $$$n!$$$ must not divides any factor $$$p_i$$$ otherwise you will get wrong answer.
By precalculation you can answer queries in $$$O(\log p)$$$

For general positive modulo $$$p$$$

Factorize $$$p = p_1^{f_1} \times p_2^{f_2} \times p_q^{f_q}$$$ that all $$$p_i$$$ is unique prime.
We calculate $$$C(n, k)$$$ modulo $$$p_i^{f_i}$$$ for each $$$i = 1 \dots q$$$.
To do that, we need to calculate $$$n!$$$ modulo $$$p_i^{f_i}$$$ which is described here.
To get the final answer we can use CRT.
Yet this is kinda hard to code and debug also easy to make mistake so you must becareful
I will let the implementation for you lovely readers.
Yet depends on how you calculate stuffs that might increase your query complexity
There are few (effective or atleast fully correct) papers about this but you can read the one written here

Implementation

O(n) for prime p > max(n, k)

/// SPyofgame linear template for precalculating factorials under large prime modulo
int fact[SQRT_LIM + 10]; /// fact[n] = n!
int invs[SQRT_LIM + 10]; /// invs[n] = n^(-1)
int tcaf[SQRT_LIM + 10]; /// tcaf[n] = (n!)^(-1)
void precal_nck(int n = SQRT_LIM)
{
    fact[0] = fact[1] = 1;
    invs[0] = invs[1] = 1;
    tcaf[0] = tcaf[1] = 1;
    for (int i = 2; i <= n; ++i)
    {
        fact[i] = (1LL * fact[i - 1] * i) % MOD;
        invs[i] = MOD - 1LL * (MOD / i) * invs[MOD % i] % MOD;
        tcaf[i] = (1LL * tcaf[i - 1] * invs[i]) % MOD;
    }
}

/// Calculating binomial coefficient queries
int nck(int n, int k)
{
    k = min(k, n - k);
    if (k < 0) return 0;

    long long res = fact[n];
    res *= tcaf[k];         res %= MOD;
    res *= tcaf[n - k];     res %= MOD;
    return res;
}

O(n log mod + sqrt(mod)) for prime p or squarefree p

vector<int> factor;
int factorize(int n) /// Calculating phi(n) while factorizing (n) in O(sqrt n)
{
    factor.clear();
    int phi = n;

    if (!(n & 1))
    {
        n >>= __builtin_ctz(n);
        factor.push_back(2);
        phi -= phi / 2;
    }

    for (int x = 3; x * x <= n; x += 2)
    {
        if (n % x == 0)
        {
            do n /= x; while (n % x == 0);
            factor.push_back(x);
            phi -= phi / x;
        }
    }

    if (n > 1)
    {
        factor.push_back(n);
        phi -= phi / n;
    }

    return phi;
}

int f[LIM];    /// f[x] = nck(n, x)
int fact[LIM]; /// n! 
int tcaf[LIM]; /// n!^(-1)
int divp[LIM]; /// x but ignore all factors p[i]
int cntp[LIM][LOG_LIM]; /// cntp[x][i] = Number of time factor p[i] appear in 1..x
void precal(int MOD) /// Calculate f[x] for all x = 1 -> n in O(n log mod + sqrt mod)
{
    int PHIMOD = factorize(MOD);
    for (int x = 1; x <= n; ++x) /// For each part x in n!
    {
        int &t = divp[x] = x;
        for (int i = 0; i < factor.size(); ++i) /// Ignore all factor p[i] of p
        {
            cntp[x][i] = cntp[x - 1][i];
            for (; t % factor[i] == 0; t /= factor[i]) /// Count how many times p[i] appears in 1..n
                ++cntp[x][i];
        }
    }

    fact[0] = fact[1] = 1;
    tcaf[0] = tcaf[1] = 1;
    for (int x = 2; x <= n; ++x) /// Finding n! and n!^(-1)
    {
        fact[x] = (1LL * fact[x - 1] * divp[x]) % MOD;
        tcaf[x] = powMOD(fact[x], PHIMOD - 1, MOD);
    }

    memset(f, 0, sizeof(f[0]) * k);
    for (int x = k; x <= n; ++x)
    {
        /// Calculate nck % p normally
        f[x] = fact[x];
        mulMOD(f[x], tcaf[k], MOD);
        mulMOD(f[x], tcaf[x - k], MOD);
        for (int i = 0; i < factor.size(); ++i) /// Bringing those factors back
        {
            int p = cntp[x][i] - cntp[k][i] - cntp[x - k][i];
            f[x] = 1LL * f[x] * powMOD(factor[i], p, MOD) % MOD;
        }
    }
}

Complexity

Spoiler

Solution when numbers are also bounded by negative number

Extra task E

Problem

Given $$$k, n (1 \leq k \leq n \leq 10^9)$$$, count the number of array $$$a[]$$$ of size $$$k$$$ that satisfied

$$$-n \leq a_1 < a_2 < \dots < a_k \leq n$$$
$$$a_i \times a_j$$$ is perfect square $$$\forall 1 \leq i < j \leq k$$$

Since the result can be big, output it under modulo $$$10^9 + 7$$$.

Idea

Hint

Yet this is the same as extra task C where only the counting part should be changed.

As we only care about integer therefore let not use complex math into this problem.

If there exist a negative number and a positive number, the product will be negative thus the sequence will not satisfied.

Becareful, there are the zeros too.

When the numbers are all unique, or $$$-n \leq a_1 < a_2 < \dots < a_k \leq n$$$

There are 4 cases:

Thus give us the formula of $$$task_E(n, k) = 2 \times task_B(n, k) + 2 \times task_B(n, k - 1)$$$.

Hint 1

Hint 2

Hint 3

Hint 4

Proof

Remember that when $$$k = 0$$$ the answer is $$$0$$$ otherwise you might somewhat having wrong result for negative number in binomial coefficients formula

With duplicates case

So what if I mix the problem with task C too ?

When the numbers can have duplicates, or $$$-n \leq a_1 \leq a_2 \leq \dots \leq a_k \leq n$$$

There are 5 cases:

Yet once again you can simplified it with less cases for easier calculation.

There are 2 main cases:

Thus give us the formula of $$$task_E(n, k) = 1 + 2 \times \overset{k}{\underset{t = 1}{\Large \Sigma}} task_B(n, t)$$$.

Why the formula is 2 * ...?

No I mean why there is no binomial coefficients for selecting the number of zeros ?

So where is the part 1 come frome ? - Why isnt it 2 instead ?

But this give you a $$$O(k)$$$ solution.

You can do better with math

Hint 1

Hint 2

Solution

Implementation

O(sqrt n log log sqrt n) when the numbers are unique


long long res[SQRT_LIM + 10];
long long solve(int n, int k)
{
    int t = ceil(sqrt(n) + 1) + 1;
    linear_sieve(t);
    precal_nck(t);

    memset(res, 0, sizeof(res[0]) * (t + 1));
    for (int d = 1; d * d <= n; ++d) 
    {
        res[d] += (k >= 1) * nck(d - 1, k - 1) * 2;
        res[d] += (k >= 2) * nck(d - 1, k - 2) * 2;  
    }

    for (int p : prime)
        for (int d = t / p; d > 0; --d)
            res[d * p] -= res[d];

    long long ans = 0;
    for (int d = 1; d * d <= n; ++d)
        ans += res[d] * (n / (d * d));

    ans %= MOD;
    return ans;
}

And for duplicates (mixed with task C), we have:

O(kn) = O(n^2)

bool is_squarefree[LIM];
int brute(int n, int k)
{
    memset(is_squarefree, true, sizeof(is_squarefree[0]) * (n + 1));
    precal_nck(2 * n + 1);

    long long res = 0;
    for (int i = 1, j; i <= n; ++i) if (is_squarefree[i]) 
    {
        for (j = 1; i * j * j <= n; ++j)
            is_squarefree[i * j * j] = false;

        res += nck(k + j - 2, k);
    }

    res %= MOD;
    return res;
}


long long solve(int n, int k)
{
    long long res = 1;
    for (int t = 1; t <= k; ++t)
        res += brute(n, t) * 2;

    res %= MOD;
    return res;
}

O(k sqrt n log sqrt n) = O(n sqrt n log n)

long long res[SQRT_LIM + 10];
long long brute(int n, int k)
{
    /// We only care for d that 1 <= d <= sqrt(n)
    int t = ceil(sqrt(n) + 1) + 1;
    linear_sieve(t);
    precal_nck(t + k);
    precal_div(t);

    long long res = 0;
    for (int d = 1; d * d <= n; ++d) /// For each fixed p^2
    {
        long long sum = 0;
        for (int p : divisors[d]) /// For each (p | d)
            sum += mu[d / p] * nck(d + k - 2, k - 1);

        sum %= MOD;
        res += sum * (n / (d * d));
    }

    res %= MOD;
    return res;
}

long long solve(int n, int k)
{
    long long res = 1;
    for (int t = 1; t <= k; ++t)
        res += brute(n, t) * 2;

    res %= MOD;
    return res;
}

O(k sqrt n log log sqrt n) = O(n sqrt n log log n)

long long res[SQRT_LIM + 10];
long long brute(int n, int k)
{
    int t = ceil(sqrt(n) + 1) + 1;
    linear_sieve(t);
    precal_nck(t + k);

    memset(res, 0, sizeof(res[0]) * (t + 1));
    for (int d = 1; d * d <= n; ++d) 
        res[d] = nck(d + k - 2, k - 1);

    for (int p : prime)
        for (int d = t / p; d > 0; --d)
            res[d * p] -= res[d];

    long long ans = 0;
    for (int d = 1; d * d <= n; ++d)
        ans += res[d] * (n / (d * d));

    ans %= MOD;
    return ans;
}

long long solve(int n, int k)
{
    long long res = 1;
    for (int t = 1; t <= k; ++t)
        res += brute(n, t) * 2;

    res %= MOD;
    return res;
}

O(k sqrt n + sqrt n log log sqrt n) = O(n sqrt n)

long long res[SQRT_LIM + 10];
long long solve(int n, int k)
{
    int t = ceil(sqrt(n) + 1) + 1;
    linear_sieve(t);
    precal_nck(t + k);

    memset(res, 0, sizeof(res[0]) * (t + 1));
    for (int d = 1; d * d <= n; ++d) 
        for (int t = 1; t <= k; ++t)
            res[d] += nck(d + t - 2, t - 1) * 2;

    for (int p : prime)
        for (int d = t / p; d > 0; --d)
            res[d * p] -= res[d];

    long long ans = 1;
    for (int d = 1; d * d <= n; ++d)
        ans += res[d] * (n / (d * d));

    ans %= MOD;
    return ans;
}

O(k + sqrt n log log sqrt n) = O(n)


long long res[SQRT_LIM + 10];
long long solve(int n, int k)
{
    int t = ceil(sqrt(n) + 1) + 1;
    linear_sieve(t);
    precal_nck(t + k);

    memset(res, 0, sizeof(res[0]) * (t + 1));
    for (int d = 1; d * d <= n; ++d) 
        res[d] += nck(d + k - 1, k - 1) * 2;

    for (int p : prime)
        for (int d = t / p; d > 0; --d)
            res[d * p] -= res[d];

    long long ans = 1;
    for (int d = 1; d * d <= n; ++d)
        ans += res[d] * (n / (d * d));

    ans %= MOD;
    return ans;
}

Complexity

Spoiler

Conclusion

Spoiler

Solution when numbers are also bounded by a specific range

Extra task F

Problem

Given $$$k, L, R (1 \leq k, L, R \leq 10^9)$$$, count the number of array $$$a[]$$$ of size $$$k$$$ that satisfied

$$$L \leq a_1 < a_2 < \dots < a_k \leq R$$$
$$$a_i \times a_j$$$ is perfect square $$$\forall 1 \leq i < j \leq k$$$

Since the result can be big, output it under modulo $$$10^9 + 7$$$.

Idea

Observation

We split into 4 cases

The cases

You can easilier solve for each cases in linear

Case 1

Case 2

Case 3

Case 4

Now assumming $$$1 \leq L \leq R$$$, here is how we solve it.

As a simple approach, we can just do like original problem for general $$$k$$$.

We just need to iterative for each fixed squarefree $$$u$$$ and count the number of way to select $$$p^2$$$ as usual but a bit of change.

Why do I get WA

By doing so trivially we can have the complexity of $$$O((R - L + 1) \log R)$$$ time and $$$O(R)$$$ space.

Optimize my time complexity

Hint 1 to optimize space complexity

Hint 2 to optimize space complexity

Hint 3 to optimize space complexity

Optimization

Is it over ? Cant we do better ?

Isnt there is trivial way that we forget ?

Is there a way that we can iterative through [L, R] ?

Can we have some ways that you iterative not the whole part [1, R] ?

Can we just iterative through [1, \sqrt{R}] and [L, R] to solve it

What is that way ?

Isnt it bad ?

Can we factorize numbers faster ?

Is there another way then pollard rho ?

But how do we count the number of way to select p^2 ?

But how can we iterative for each p^2 ?

Wait is another way to reduce the complexity ?

Wait still we use factorization ?

Another way, but still faster then to apply pollard rho ?

You mean something like sieving or am I wrong in something ?

Isnt the normal sieve we marked for primes, but now the loop is inside, you you mean to...

Can we use the trick like u * c^2 to reduce the number of cases when we take prime as the first loop ?

Wait isnt this a kind of segment sieve ?

So what is that the better solution and how do we implement it steps by steps ?

Solution

Implementation

Let $$$Z = max(|L|, |R|)$$$

O(Z) time - O(R - L) space

bool is_squarefree[LIM + 10];
int solve(int l, int r, int k)
{
    if (l > r) return 0;
    if (r < 0) return solve(-r, -l, k);
    if (l <= 0 && 0 <= r)
    {
        long long res = 0LL + taskB(abs(l), k) + taskB(abs(l), k - 1) + taskB(abs(r), k) + taskB(abs(r), k - 1);
        while (res >= MOD) res -= MOD;
        return res;
    }

    memset(is_squarefree, true, sizeof(is_squarefree[0]) * (r - l + 1));
    precal_nck(r - l + 1);

    int tot = r - l + 1;
    long long res = 0;
    for (int i = 1; i <= r; ++i)
    {
        int j = sqrt(l / i);
        while (i * j * j > l) --j;
        while (i * j * j < l) ++j;
        if (is_squarefree[i * j * j - l])
        {
            int cnt = 0;
            for (; i * j * j <= r; ++j)
            {
                ++cnt;
                --tot;
                is_squarefree[i * j * j - l] = false;
            }
            res += nck(cnt, k);
            if (tot == 0) break;
        }
    }

    res %= MOD;
    return res;
}

O(R * sqrt(Z) / log(Z)) time - O(R - L) space


long long res[SQRT_LIM + 10];
long long taskB(int n, int k)
{
    if (k == 0) return 0;
    if (k == 1) return n;

    int t = ceil(sqrt(n) + 1) + 1;
    linear_sieve(t);
    precal_nck(t);

    memset(res, 0, sizeof(res[0]) * (t + 1));
    for (int d = 1; d * d <= n; ++d) 
        res[d] = nck(d - 1, k - 1);

    for (int p : prime)
        for (int d = t / p; d > 0; --d)
            res[d * p] -= res[d];

    long long ans = 0;
    for (int d = 1; d * d <= n; ++d)
        ans += res[d] * (n / (d * d));

    ans %= MOD;
    return ans;
}

bool is_squarefree[LIM + 10];
int solve(int l, int r, int k)
{
    if (l > r) return 0;
    if (r < 0) return solve(-r, -l, k);
    if (l <= 0 && 0 <= r)
    {
        long long res = 0LL + taskB(abs(l), k) + taskB(abs(l), k - 1) + taskB(abs(r), k) + taskB(abs(r), k - 1);
        while (res >= MOD) res -= MOD;
        return res;
    }
    
    int t = ceil(sqrt(r + 1) + 1) + 1;
    memset(is_squarefree, true, sizeof(is_squarefree[0]) * (r - l + 1));
    precal_nck(r - l + 1);
    linear_sieve(t);

    long long res = 0;
    for (int x = l; x <= r; ++x) if (is_squarefree[x - l])
    {
        int u = x;
        int v = 1;
        for (int p : prime)
        {
            int q = p * p;
            if (q > u) break;
            while (u % q == 0)
            {
                u /= q;
                v *= p;
            }
        }

        int cnt = 0;
        for (; u * v * v <= r; ++v)
        {
            is_squarefree[u * v * v - l] = false;
            ++cnt;
        }

        res += nck(cnt, k);
    }

    res %= MOD;
    return res;
}

O(sqrt R log log R + (R - L)) time - O(R - L) space

long long res[SQRT_LIM + 10];
long long taskB(int n, int k)
{
    if (k == 0) return 0;
    if (k == 1) return n;

    int t = ceil(sqrt(n) + 1) + 1;
    linear_sieve(t);
    precal_nck(t);

    memset(res, 0, sizeof(res[0]) * (t + 1));
    for (int d = 1; d * d <= n; ++d) 
        res[d] = nck(d - 1, k - 1);

    for (int p : prime)
        for (int d = t / p; d > 0; --d)
            res[d * p] -= res[d];

    long long ans = 0;
    for (int d = 1; d * d <= n; ++d)
        ans += res[d] * (n / (d * d));

    ans %= MOD;
    return ans;
}

#include <algorithm>
#define all(x) (x).begin(), (x).end()
bool is_squarefree[LIM + 10];
int squarefree[LIM + 10];
int sqf_factor[LIM + 10];
int solve(int l, int r, int k)
{
    if (l > r) return 0;
    if (r < 0) return solve(-r, -l, k);
    if (l <= 0 && 0 <= r)
    {
        long long res = 0LL + taskB(abs(l), k) + taskB(abs(l), k - 1) + taskB(abs(r), k) + taskB(abs(r), k - 1);
        while (res >= MOD) res -= MOD;
        return res;
    }
    
    int t = ceil(sqrt(r + 1) + 1) + 1;
    memset(is_squarefree, true, sizeof(is_squarefree[0]) * (r - l + 1));
    precal_nck(r - l + 1);
    linear_sieve(t);
    
    for (int x = l; x <= r; ++x)
    {
        squarefree[x - l] = x;
        sqf_factor[x - l] = 1;
    }

    long long res = 0;
    for (int p : prime)
    {
        for (int q = p * p, x = max(q, (l + q - 1) / q * q); x <= r; x += q)
        {
            while (squarefree[x - l] % q == 0)
            {
                squarefree[x - l] /= q;
                sqf_factor[x - l] *= p;
            }
        }
    }
    
    for (int x = l; x <= r; ++x)
    {
        int u = squarefree[x - l];
        int p = sqf_factor[x - l];
        if (is_squarefree[x - l])
        {
            int cnt = 0;
            for (; u * p * p <= r; ++p)
            {
                is_squarefree[u * p * p - l] = false;
                ++cnt;
            }
    
            res += nck(cnt, k);   
        }
    }
    
    res %= MOD;
    return res;
}

Complexity

Spoiler

Solution when the product you must find is a perfect cube

Extra task G

Problem

Given $$$k, n (1 \leq k \leq n \leq 10^9)$$$, count the number of array $$$a[]$$$ of size $$$k$$$ that satisfied

$$$1 \leq a_1 < a_2 < \dots < a_k \leq n$$$
$$$a_i \times a_j \times a_t$$$ is perfect cube $$$\forall 1 \leq i < j < t \leq k$$$

Since the result can be big, output it under modulo $$$10^9 + 7$$$.

Idea

k < 3

k > 3

For $$$k > 3$$$, you can prove that every number you selected must share same cubefree therefore just make a cubefree sieve in linear.

Hint 1

Hint 2

Hint 3

Hint 4

Hint 5

But still, you can apply the same idea used in extra task B to achive better complexity.

Instead of throwing a bunch of math using weird formulas with long chain of theorems and proving stuffs.

We can see the algorithm in the other way then come to the formula.

What we had done in task B

What we are going to do in task G

So now we come for the formula:

Defining stuffs

Hint 1

Hint 2

Hint 3

Hint 4

Bonus

k = 3

Implementation

k<3::O(1) || k>3::O(n) || k=3::O(n^2 log n) solution

int cnt[LIM];
vector<int> appear[LIM];
int solveG(int n, int k)
{
    linear_sieve(n * n);
    for (int i = 1; i <= n; ++i)
        appear[i].clear();

    for (int i = 1; i <= n; ++i)
    {
        for (int j = i + 1; j <= n; ++j)
        {
            int u = 1;
            for (int x = i * j; x > 1; )
            {
                int a = lpf[x], b = a * a * a;
                for (; x % b == 0; x /= b);
                for (; x % a == 0; x /= a) u *= a;
            }
            appear[j + 1].push_back(u);
        }
    }

    long long res = 0;
    memset(cnt, 0, sizeof(cnt[0]) * (n * n + 1));
    for (int i = 1; i <= n; ++i)
    {
        for (int u : appear[i]) ++cnt[u];
        int v = 1;
        for (int x = i; x > 1; )
        {
            int a = lpf[x], b = a * a * a;
            for (; x % b == 0; x /= b);
            if (x % (a * a) == 0)
            {
                x /= a * a;
                v *= a;
            }
            else if (x % a == 0)
            {
                x /= a;
                v *= a * a;
            }
        }

        res += cnt[v];
    }    

    res %= MOD;
    return res;
}

bool is_cubefree[LIM + 10];
int solve(int n, int k)
{
    if (k == 0) return 0;
    if (k == 1) return n;
    if (k == 2) return (1LL * n * (n - 1) / 2) % MOD;
    if (k == 3) return solveG(n, k);

    memset(is_cubefree, true, sizeof(is_cubefree[0]) * (n + 1));
    precal_nck(n);

    long long res = 0;
    for (int i = 1, j; i <= n; ++i) if (is_cubefree[i]) 
    {
        for (j = 1; i * j * j * j<= n; ++j)
        {
            is_cubefree[i * j * j * j] = false;
        }

        res += nck(j - 1, k);
    }

    res %= MOD;
    return res;
}

k<3::O(1) || k>3::O(cbrt n log cbrt n) || k=3::Õ(n) but practically O(n^(0.59)) for small n

vector<int> divisor[LIM];
int solveG(int n, int k)
{
    linear_sieve(n);
    for (int i = 1; i <= n; ++i)
        divisor[i].clear();
 
    for (int j = 1; j <= n; ++j)
    {
        int x = 1;
        for (int t = j; t > 1; )
        {
            if (x > n) break;
            int a = lpf[t];
            ll b = 1LL * a * a * a;
            while (t % b == 0)
            {
                t /= b;
                if (1LL * x * a > n) goto skip;
                x *= a;
            }
 
            if (t % a == 0)
            {
                if (1LL * x * a > n) goto skip;
                x *= a;
                do t /= a; while (t % a == 0);
            }
        }
 
        for (int i = x; i <= n; i += x)
            divisor[i].push_back(j);
 
        skip:{};
    }
 
    int res = 0;
    for (int i = 1; i <= n; ++i)
    {
        ll t = 1LL * i * i * i;
        for (int x = 0; x < divisor[i].size(); ++x)
        {
            for (int y = x + 1; y < divisor[i].size(); ++y)
            {
                int a = divisor[i][x];
                int b = divisor[i][y];
                int c = t / (1LL * a * b);
                res += (n >= c && c > b && 1LL * a * b * c == 1LL * i * i * i);
            }
        }
    }
 
    return res;
}
 
int res[LIM + 10];
int solve(int n, int k)
{
    if (k == 0) return 0;
    if (k == 1) return n;
    if (k == 2) return (1LL * n * (n - 1) / 2) % MOD;
    if (k == 3) return solveG(n, k);
    int t = ceil(cbrt(n) + 1) + 1;
    linear_sieve(t);
    precal_nck(t);
    precal_div(t);

    long long res = 0;
    for (int d = 1; d * d * d <= n; ++d) /// For each fixed p^2
    {
        long long sum = 0;
        for (int p : divisors[d]) /// For each (p | d)
            sum += mu[d / p] * nck(p - 1, k - 1);

        sum %= MOD;
        res += sum * (n / (d * d * d));
    }

    res %= MOD;
}

k<3::O(1) || k>3::O(cbrt n log log cbrt n) || k=3::Õ(n) but practically fast

int ceil_sqrt(ll x)
{
    int t = sqrt(x);
    while (1LL * t * t > x) --t;
    while (1LL * t * t < x) ++t;
    return t;
}

#define sz(x) int((x).size())
#define all(x) (x).begin(), (x).end()
#define rall(x) (x).rbegin(), (x).rend()
#define lb(x, v) lower_bound(all(x), v) - (x).begin()
#define ub(x, v) upper_bound(all(x), v) - (x).begin()

ll scm[LIM];
vector<int> divisor[LIM];
int solveG(int n, int k)
{
    linear_sieve(n);
    fill_n(scm, n + 1, 1);
    for (int i = 1; i <= n; ++i)
        divisor[i].clear();
        
    for (int p : prime)
    {
        ll q = 1LL * p * p * p;
        for (int t = p; ; t *= q)
        {
            for (int i = t; i <= n; i += t)
                scm[i] = (scm[i] > n / p) ? n + 1 : scm[i] * p;
            
            if (1LL * t > n / q) break;
        }
    }

    for (int j = 1; j <= n; ++j)
        for (int i = scm[j]; i <= n; i += scm[j])
            divisor[i].push_back(j);
    
    int res = 0;
    for (int i = 1; i <= n; ++i)
    {
        ll t = 1LL * i * i * i;
        for (int x = 0; x < divisor[i].size(); ++x)
        {
            int a = divisor[i][x];
            int v = ceil_sqrt(t / a);
            int y0 = (divisor[i].back() < v) ? sz(divisor[i]) : lb(divisor[i], v);
            for (int y = y0 - 1; y > x; --y)
            {
                int b = divisor[i][y];
                int c = t / (1LL * a * b);
                if (c > n) break;
                res += (1LL * a * b * c == t);
            }
        }
    }

    return res;
}

int res[LIM + 10];
int solve(int n, int k)
{
    if (k == 0) return 0;
    if (k == 1) return n;
    if (k == 2) return (1LL * n * (n - 1) / 2) % MOD;
    if (k == 3) return solveG(n, k);
    int t = ceil(sqrt(n) + 1) + 1;
    linear_sieve(t);
    precal_nck(t);

    memset(res, 0, sizeof(res[0]) * (t + 1));
    for (int d = 1; d * d * d <= n; ++d) 
        res[d] = nck(d - 1, k - 1);

    for (int p : prime)
        for (int d = t / p; d > 0; --d)
            res[d * p] -= res[d];

    long long ans = 0;
    for (int d = 1; d * d * d <= n; ++d)
        ans += res[d] * (n / (d * d * d));

    ans %= MOD;
    return ans;
}

Complexity

k < 3

k > 3

k = 3

In the first implementation as you must using factorization, the cost is $$$O(\log n)$$$ for each number, hence you got that complexity.

Bonus

Secondary Bonus

About the actual complexity or better algorithm related to the first the implementation

Yet for the second implementations things gone crazy.

Precalculation part

Calculation part

It has unprovable complexity, though I tried to search for papers, and blogs, even with the help of some GMs I cant find nothing good enough to claim its real complexity.

Yet it might be not the real complexity but also kind of an illusion assumptioning high constant and faking its real higher complexity.

Fast running time

Real Complexity

And the third implementation have a bit optimization on complexity

Precalculation part

Calculation part

Yet it is still hard to find the complexity under the form of $$$O(n \log^k n)$$$

Solution when you are given n and queries for k

Extra task H

Problem

Given $$$n$$$ you have to answer for queries of $$$k$$$ $$$(1 \leq k \leq n \leq 10^9)$$$, count the number of array $$$a[]$$$ of size $$$k$$$ satisfied

$$$1 \leq a_1 < a_2 < \dots < a_k \leq n$$$
$$$a_i \times a_j$$$ is perfect square $$$\forall 1 \leq i < j \leq k$$$

Idea

Simplest idea

Observation

Implementation

O(n) precalculate - O(sqrt n - k) query

vector<int> valid;
int c[SQRT_LIM];
bool is_squarefree[LIM];
void precal(int n)
{
    int t = ceil(sqrt(n) + 0.5);
    memset(c, 0, sizeof(c[0]) * (t + 1));
    memset(is_squarefree, true, sizeof(is_squarefree[0]) * (n + 1));
    for (int i = 1, j; i <= n; ++i) if (is_squarefree[i]) 
    {
        for (j = 1; i * j * j <= n; ++j)
            is_squarefree[i * j * j] = false;

        ++c[j - 1];
    }

    precal_nck(t);
    valid.clear();
    for (int i = t; i >= 1; --i) if (c[i])
        valid.push_back(i);
}

int query(int k)
{
    long long res = 0;
    for (int x : valid)
    {
        if (x < k) break;
        res += 1LL * c[x] * nck(x, k);
        res %= MOD;
    }

    return res;
}

O(sqrt n) precalculate - O(sqrt (n/k) log log sqrt(n/k) + sqrt n - k) query

vector<int> valid;
int cnt[SQRT_LIM];
bool is_squarefree[LIM];
long long res[SQRT_LIM + 10];
int global_t, global_n;
void precal(int n)
{
    global_n = n;
    global_t = ceil(sqrt(n) + 0.5);
    linear_sieve(global_t);
    precal_nck(global_t);
}

int query(int k)
{
    if (k > global_t) return 0;
    memset(res + k, 0, sizeof(res[0]) * (global_t - k + 1));
    for (int d = k; d * d <= global_n; ++d) 
        res[d] = nck(d - 1, k - 1);

    for (int p : prime)
        for (int d = global_t / p; d >= k; --d)
            res[d * p] -= res[d];

    long long ans = 0;
    for (int d = k; d <= global_t; ++d)
        ans += res[d] * (global_n / (d * d));

    ans %= MOD;
    return ans;
}

Complexity

The first implementation

The second implementation

Contribution

Yurushia for pointing out the linear complexity of squarefree sieve.
clyring for fixing typos, and the approach for tasks A, B, C, D, E, G, H, J.
errorgorn for adding details, and the approach for task F, J, M, O, better complexity for C, E, G.
cuom1999 for participating $$$O(n^2)$$$ approach for problem G.
vinfat for participating approach related to factorize $$$p^3$$$ into $$$3$$$ product partions in problem G though failed to achieve better complexity (editted: confirmed that the complexity seems to be better now).
Lihwy, jalsol for combinatorics calculation and the proof of stars and bars in task C.
Editorial Slayers Team Lyde DeMen100ns Duy_e OnionEgg QuangBuiCPP _FireGhost_ Shironi for reviewing, fixing typos and feed backs.

Comments (4)

Write comment?

clyring

2 years ago, # |

← Rev. 3 →

Regarding task G with $$$k=3$$$: It looks like your second implementation is rather close to what I described in my last message. I mentioned then already that it was $$$O(n^{1+\varepsilon})$$$; the reasoning is straightforward. Letting $$$\sigma_0(i)$$$ be the number of divisors of $$$i$$$, the runtime is clearly bounded by $$$\sum_{i=1}^n \sigma_0(i^3)^2$$$, and it's well-known that the divisor-counting function is $$$O(n^\varepsilon)$$$ for any $$$\varepsilon > 0$$$, so that $$$\sum_{i=1}^n \sigma_0(i^3)^2 \leq \sum_{i=1}^n C_{\varepsilon}^6 i^{6 \varepsilon} \in O(n^{1 + 6 \varepsilon})$$$.

I correctly guessed it "may even be $$$\tilde{O}(n)$$$"; the analysis is interesting, but the poly-log factor hidden in the $$$\tilde{O}$$$-notation seems very large. First up: The preprocessing step. Temporarily borrowing your notation, I was able to show precisely that

$$$ \displaystyle \sum_{p=1}^n \left\lfloor\frac{p}{f(p)}\right\rfloor \in \Theta(n \cdot (\log{n})^3). $$$

Proof sketch for upper bound

Proof sketch for lower bound

From this it trivially follows that the main loop takes at least $$$\Omega(n \cdot (\log{n})^6)$$$ time to run. This happens if the $$$\Theta(n \cdot (\log{n})^3)$$$ relevant cube-divisors are reasonably evenly distributed among the $$$n$$$ buckets, but it is quite plausible that this is not the case and the main loop is actually slower. It seems much harder to precisely estimate the complexity of the main loop; the best upper bound I have come up with so far is $$$O(n \cdot (\log{n})^{16})$$$. This comes from applying the same Euler product idea to the multiplicative function $$$i \mapsto \frac{\sigma_0(i^3)^2}{i}$$$, where we have $$$\sum_{i=0}^{\infty} \frac{\sigma_0(p^{3i})^2}{p^i} = 1 + \frac{16}{p} + O(\frac{1}{p^2})$$$, hence the strange 16 exponent. This seems likely to be inefficient for several reasons, not least of which is that applying this direct approach to $$$i \mapsto \frac{\sigma_0(i^3)}{i}$$$ for the preprocess step would give only an $$$O(n \cdot (\log{n})^4)$$$ upper bound, which I already know isn't optimal. Improvement ideas welcome.

EDIT: I am almost sure techniques to estimate this more accurately than I have already exist in the literature. A good first place to look might be the references at oeis:A061502.

→ Reply

SPyofcode

2 years ago, # ^ |

Amazing work on those analyses. Though the real complexity is still unknown and bounded by $$$O(n \log ^ 4 n)$$$, yet I never expected it to be this fast.

In the calculation part, we factorize $$$p^3 = a \times b \times c$$$, and we can have 2 optimization options:

To fix $$$a$$$ then calculate $$$b \times c$$$ but limiting the bound by using binary search or something similar.

$$$O\left(\overset{n}{\underset{p = 1}{\Large \Sigma}} \underset{a | p^3}{\Large \Sigma} \left(g(a) + \log\left(\frac{p^3}{a}\right)\right) \right)$$$ for $$$g(a)$$$ is the number of satisfied $$$b \times c$$$

Or to go through the divisor of $$$\frac{p^3}{a}$$$ itself

$$$O\left(\overset{n}{\underset{p = 1}{\Large \Sigma}} d(a) \times d\left(\frac{p^3}{a}\right) \right)$$$, yet this should be harder to implement without increasing precalculation time by around $$$O(d(n^3))$$$

It should be somewhat also reduce some amount of $$$O(\log)$$$ factors in calculation part too.

Optimizing the pre-computation time isn't all that interesting, since the main loop is dominant.

This paper uses a second-order approximation of $$$\sum_{i=1}^n \frac{f(i)}{i}$$$ for certain multiplicative functions $$$f$$$ and Abel's summation formula to cancel out the largest-order term in my upper bound technique and get a precise estimate on the growth of $$$\sum_{i=1}^n f(i)$$$. Its theorem 1 is directly applicable and shows that $$$\sum_{i=1}^n \sigma_0(i^3) \in \Theta(n \cdot (\log{n})^3)$$$ and $$$\sum_{i=1}^n \sigma_0(i^3)^2 \in \Theta(n \cdot (\log{n})^{15})$$$. The differences between these and the runtimes of the pre-computation loop and main loop respectively come from the [1..n] restriction on factors considered, but this does not gain more than a constant factor. In the former case, this was already shown by my previous comment. For the latter, it is possible to generalize the same idea to pairs of divisors.

The innermost loop runs once for every triple $$$(i, x, y)$$$ with $$$1 \leq i \leq n$$$, $$$1 \leq x < y \leq n$$$, $$$x | i^3$$$, and $$$y | i^3$$$. Ignoring the $$$x < y$$$ requirement will little more than double the number of possible triples, but will make later calculations much easier. It is easy to see that for any pair $$$(x, y)$$$, the triple $$$(i, x, y)$$$ is valid if and only if $$$i$$$ is a multiple of the smallest positive integer whose cube is a multiple of both $$$x$$$ and $$$y$$$. Call a triple $$$(i, x, y)$$$ minimal if no triple $$$(j, x, y)$$$ with $$$j < i$$$ is valid. Then, the number of minimal triples with a given value of $$$i \leq \sqrt[3]{n}$$$ is exactly $$$(\mu * (j \mapsto \sigma_0(j^3)^2))(i)$$$. Call this value $$$h(i)$$$. Then, the number of total valid triples is at least $$$\sum_{i=1}^{\sqrt[3]{n}} \lfloor \frac{n}{i} \rfloor h(i)$$$, which in turn is at least $$$\frac{n}{2} \sum_{i=1}^{\sqrt[3]{n}} \frac{h(i)}{i}$$$. As a Dirichlet convolution of two multiplicative functions, $$$h$$$ is itself multiplicative.

As before, the idea is now to estimate this sum with the Euler product

$$$ \displaystyle \prod_{j=1}^s \sum_{\alpha=0}^{\infty} \frac{h(p_j^{\alpha})}{p_j^{\alpha}} = \sum_{\substack{i \in \mathbb{Z}_{>0} \\ \text{all prime factors of } i \text{ are at most } p_s}} \frac{h(i)}{i}, $$$

where $$$p_j$$$ is the $$$j$$$-th prime number, and $$$s$$$ controls the number of primes to use. Now, imagine $$$x$$$ is a random variable taking positive integer values with $$$P(x = i) \propto \frac{h(i)}{i}$$$ for $$$i$$$ with all prime factors at most $$$p_s$$$ and $$$P(x = i) = 0$$$ otherwise. By the Euler product factorization idea, $$$x$$$ is distributed as a product of $$$s$$$ independent random variables $$$y_j$$$, each one taking values on the powers of the prime $$$p_j$$$, with $$$P(y_j = p_j^{\alpha}) \propto \frac{h(p_j^{\alpha})}{p_j^{\alpha}}$$$. So, the mean of $$$\ln{y_j}$$$ is given by

$$$ \displaystyle E(\ln{y_j}) = \frac{\displaystyle \sum_{\alpha=0}^{\infty} \frac{h(p_j^{\alpha})}{p_j^{\alpha}}\cdot \alpha \cdot \ln{p_j}} {\displaystyle \sum_{\alpha=0}^{\infty} \frac{h(p_j^{\alpha})}{p_j^{\alpha}}} \leq \frac{C_1 \ln{p_j}}{p_j} $$$

for some absolute constant $$$C_1$$$. Now add up over all values of $$$j$$$ and apply the prime number theorem to get

$$$ \begin{array}{rl} \displaystyle E(\ln{x}) & \leq \sum_{j=1}^s \frac{C_1 \ln{p_j}}{p_j} \\ & \approx \sum_{j=1}^s \frac{C_1 \ln{j}}{j \cdot \ln{j}} \\ & \approx C_1 \cdot \ln{s}. \end{array} $$$

So, if $$$s$$$ is chosen to be approximately $$$n^{\frac{1}{6C_1}}$$$, Markov's inequality on $$$\ln{x}$$$ will give that $$$P(x > \sqrt[3]{n}) \leq 0.5$$$, so that

$$$ \begin{array}{rl} \displaystyle \frac{n}{2} \sum_{i=1}^{\sqrt[3]{n}} \frac{h(i)}{i} & \displaystyle \geq \frac{n}{2}\sum_{\substack{i \in \{1, 2, \ldots, \sqrt[3]{n}\} \\ \text{all prime factors of } i \text{ are at most } p_s}} \frac{h(i)}{i} \\ & \displaystyle \geq \frac{n}{4} \sum_{\substack{i \in \mathbb{Z}_{>0} \\ \text{all prime factors of } i \text{ are at most } p_s}} \frac{h(i)}{i} \\ & \displaystyle = \frac{n}{4} \prod_{j=1}^s \sum_{\alpha=0}^{\infty} \frac{h(p_j^{\alpha})}{p_j^{\alpha}} \\ & \displaystyle \geq \frac{n}{4} \exp{\left(\sum_{j=1}^s \ln{\left(\sum_{\alpha=0}^{\infty} \frac{h(p_j^{\alpha})}{p_j^{\alpha}}\right)}\right)} \\ & \displaystyle \geq \frac{n}{4} \exp{\left(\sum_{j=1}^s \frac{15}{p_j} + \frac{C_2}{p_j^2}\right)} \\ & \displaystyle \geq \frac{n}{4} \exp{\left(15 \ln{\ln{s}} + C_3\right)} \\ & \displaystyle = \frac{n}{4} \cdot (\ln{s})^{15} \cdot \exp{C_3} \\ & \geq C_4 \cdot n (\ln{n})^{15}, \end{array} $$$

for appropriate absolute constants $$$C_2$$$, $$$C_2$$$, and $$$C_4$$$, which is the desired bound.

So, the second implementation is in fact $$$\Theta(n \cdot (\log{n})^{15})$$$. It's obviously possible to lower the number of log factors by a decent amount. Considering just factors of $$$\frac{i}{x}$$$ should get it down to $$$\Theta(n \cdot (\log{n})^9)$$$, since there are 10 ways to partition 3 copies of a prime among 3 factors $$$x$$$, $$$y$$$, $$$z$$$.

← Rev. 2 →

Can I ask about $$$O(taskG(k=3))$$$, as it seems all other approach are not faster then iterate through factors in a clever way.

SPyofcode's blog

The statement:

Extra Tasks

A better solution for k = 2

Problem

Examples

Idea

Implementation

Complexity

A better solution for general k

Problem

Examples

Idea

Implementation

Complexity

Solution for duplicates elements in array

Problem

Idea

Implementation

Complexity

Solution when there are no restriction between k, n, p

Problem

Idea

Implementation

Complexity

Solution when numbers are also bounded by negative number

Problem

Idea

Implementation

Complexity

Conclusion

Solution when numbers are also bounded by a specific range

Problem

Idea

Implementation

Complexity

Solution when the product you must find is a perfect cube

Problem

Idea

Implementation

Complexity

Solution when you are given n and queries for k

Problem

Idea

Implementation

Complexity

Contribution