## The statement:

Given three integers $$$n, k, p$$$, $$$(1 \leq k \leq n < p)$$$.

Count the number of array $$$a[]$$$ of size $$$k$$$ that satisfied

- $$$1 \leq a_1 < a_2 < \dots < a_k \leq n$$$
- $$$a_i \times a_j$$$ is perfect square $$$\forall 1 \leq i < j \leq k$$$

Since the result can be big, output it under modulo $$$p$$$.

For convenient, you can assume $$$p$$$ is a large constant prime $$$10^9 + 7$$$

**Notice that in this blog, we will solve for generalized harder variants**

**For original problem you can see in this blog [Tutorial] An interesting counting problem related to square product**

## Extra Tasks

**These are harder variants, and generalization from the original problem. You can see more detail here**

*Marked as solved only if tested with atleast $$$10^6$$$ queries

**Solved A:** Can we also use phi function or something similar to solve for $$$k = 2$$$ in $$$O(\sqrt{n})$$$ or faster ?

**Solved B:** Can we also use phi function or something similar to solve for general $$$k$$$ in $$$O(\sqrt{n})$$$ or faster ?

**Solved C:** Can we also solve the problem where there can be duplicate: $$$a_i \leq a_j\ (\forall\ i < j)$$$ and no longer $$$a_i < a_j (\forall\ i < j)$$$ ?

**Solved D:** Can we solve the problem where there is no restriction between $$$k, n, p$$$ ?

**Solved E:** Can we solve for negative integers, whereas $$$-n \leq a_1 < a_2 < \dots < a_k \leq n$$$ ?

**Solved F:** Can we solve for a specific range, whereas $$$L \leq a_1 < a_2 < \dots < a_k \leq R$$$ ?

**Solved G:** Can we solve for cube product $$$a_i \times a_j \times a_t$$$ effectively ?

**H:** Can we solve if it is given $$$n$$$ and queries for $$$k$$$ ?

**I:** Can we solve if it is given $$$k$$$ and queries for $$$n$$$ ?

**J:** Can we also solve the problem where there are no order: Just simply $$$1 \leq a_i \leq n$$$ ?

**K:** Can we also solve the problem where there are no order: Just simply $$$0 \leq a_i \leq n$$$ ?

**M**: Can we solve for $$$q$$$-product $$$a_{i_1} \times a_{i_2} \times \dots \times a_{i_q} = x^q$$$ (for given constant $$$q$$$) ?

**N**: Given $$$0 \leq \delta \leq n$$$, can we also solve the problem when $$$1 \leq a_1 \leq a_1 + \delta + \leq a_2 \leq a_2 + \delta \leq \dots \leq a_k \leq n$$$ ?

**O**: What if the condition is just two nearby elements and not all pairs. Or you can say $$$a_i \times a_{i+1} \forall 1 \leq i < n$$$ is a perfect square ?

## A better solution for k = 2

Extra task

A

#### Problem

**Easy Version**

Given $$$n\ (1 \leq n \leq 10^6)$$$, count the number of pair $$$(a, b)$$$ that satisfied

- $$$1 \leq a < b \leq n$$$
- $$$a \times b$$$ is a perfect square.

Since the result can be big, output it under modulo $$$10^9 + 7$$$.

You can submit here: https://lqdoj.edu.vn/problem/aicprtsp2.

**Hard Version**

Given $$$q$$$ queries, you must find answer for each query.

For each query, you are given an positive integer $$$n\ (1 \leq n \leq 10^{14})$$$.

Count the number of pairs $$$(a, b)$$$ that satisfies:

- $$$1 \leq a < b \leq n$$$
- $$$a \times b$$$ is a perfect square.

Since the result can be large, output it under modulo $$$977555333311111$$$.

You can submit here: https://lqdoj.edu.vn/problem/vaicprtspa

#### Examples

**Example 1**

**Input 1:**

```
1
```

**Output 1:**

```
0
```

**Explanation 1:**

There are no satisfied integer pair $$$(a, b)$$$ that $$$1 \leq a < b \leq 1$$$

**Example 2**

**Input 2:**

```
10
```

**Output 2:**

```
4
```

**Explanation 2:**

There are $$$4$$$ satisfied pairs: {$$$1, 4$$$}, {$$$1, 9$$$}, {$$$2, 8$$$}, {$$$4, 9$$$}.

**Example 3**

**Input 3:**

```
25
```

**Output 3:**

```
16
```

**Explanation 3:**

There are $$$16$$$ satisfied pairs: {$$$1, 4$$$}, {$$$1, 9$$$}, {$$$1, 16$$$}, {$$$1, 25$$$}, {$$$2, 8$$$}, {$$$2, 18$$$}, {$$$3, 12$$$}, {$$$4, 9$$$}, {$$$4, 16$$$}, {$$$4, 25$$$}, {$$$5, 20$$$}, {$$$6, 24$$$}, {$$$8, 18$$$}, {$$$9, 16$$$}, {$$$9, 25$$$}, {$$$16, 25$$$}.

#### Idea

**Observation**

In the above approach, we fix $$$u$$$ as a squarefree and count $$$p^2$$$.

But what if I fix $$$p^2$$$ to count $$$u$$$ instead ?

Yet you can see that the first loop now is $$$O(\sqrt{n})$$$, but it will still $$$O(n)$$$ total because of the second loop

**Swap for loop implementation**

```
int solve(int n)
{
memset(is_squarefree, true, sizeof(is_squarefree[0]) * (n + 1));
long long res = 0;
int t = sqrt(n);
while (t * t < n) ++t;
while (t * t > n) --t;
for (int j = t; j > 1; --j)
{
for (int i = 1; i * j * j <= n; ++i)
{
if (!used[i * j * j])
{
used[i * j * j] = true;
res += j - 1;
}
}
}
res %= MOD;
return res;
}
```

**Definition**

Let $$$f(n)$$$ is the number of pair $$$(a, b)$$$ that $$$1 \leq a < b \leq n$$$ and $$$(a, b, n)$$$ is a three-term geometric progression.

Let $$$g(n)$$$ is the number of pair $$$(a, b)$$$ that $$$1 \leq a \leq b \leq n$$$ and $$$(a, b, n)$$$ is a three-term geometric progression.

Let $$$F(n) = \overset{n}{\underset{p=1}{\Large \Sigma}} f(p)$$$.

**But why do we need these functions anyway**

Well, you see, since $$$(a, b, n)$$$ form a three-term geometric progression then you have $$$b^2 = an$$$.

It is not hard to prove that $$$F(N)$$$ will be our answer as we count over all possible squarefree $$$u$$$ for every fixed $$$p^2$$$.

We use $$$g(n)$$$ because its beautiful property, which make the calculation a lot easier.

**Property**

So it is no hard to prove that $$$g(n) = f(n) + 1$$$.

This interesting sequence $$$g(n)$$$ is A000188, having many properties, such as

- Number of solutions to $$$x^2 \equiv 0 \pmod n$$$.
- Square root of largest square dividing $$$n$$$.
- Max $$$gcd \left(d, \frac{n}{d}\right)$$$ for all divisor $$$d$$$.

Well, to make the problem whole easier, I gonna skip all the proofs to use this property (still, you can use the link in the sequence for references).

$$$g(n) = \underset{d^2 | n}{\Large \Sigma} \phi(d)$$$.

**Formula**

From this property, we can solve the problem in $$$O(\sqrt{n})$$$.

**Hint 1**

$$$F(N) = \overset{n}{\underset{p=1}{\Large \Sigma}} f(p)$$$.

and

$$$f(n) = g(n) + 1$$$.

**Hint 2**

We have

$$$F(n)$$$

$$$= \overset{n}{\underset{p=2}{\Large \Sigma}} g(p) - n$$$

$$$= \overset{n}{\underset{p=2}{\Large \Sigma}} \underset{d^2 | p}{\Large \Sigma} \phi(d) - n$$$

**Hint 3**

Break the loop, make it simpler.

**Hint 4**

Reduce a for loop to constant.

**Solution**

$$$F(N)$$$

$$$= \overset{n}{\underset{p=1}{\Large \Sigma}} f(p)$$$

$$$= -n + \overset{n}{\underset{p=2}{\Large \Sigma}} g(p)$$$

$$$= -n + \overset{n}{\underset{p=2}{\Large \Sigma}} \underset{d^2 | p}{\Large \Sigma} \phi(d)$$$

$$$= -n + \overset{\left \lfloor \sqrt{n} \right \rfloor}{\underset{p=1}{\Large \Sigma}} \underset{1 \leq u \times p^2 \leq n}{\Large \Sigma} \phi(p)$$$

$$$= -n + \overset{\left \lfloor \sqrt{n} \right \rfloor}{\underset{p=1}{\Large \Sigma}} \phi(p) \times \left \lfloor \frac{n}{p^2} \right \rfloor$$$

$$$= \overset{\left \lfloor \sqrt{n} \right \rfloor}{\underset{p=2}{\Large \Sigma}} \phi(p) \times \left \lfloor \frac{n}{p^2} \right \rfloor$$$

Well, you can sieve $$$\phi(p)$$$ for all $$$2 \leq p \leq \sqrt{n}$$$ in $$$O\left(\sqrt{n} \log \log \sqrt{n} \right)$$$ or improve it with linear sieve to $$$O(\sqrt{n})$$$.

Therefore you can solve the whole problem in $$$O(\sqrt{n})$$$.

Yet this paper also takes you to something similar.

#### Implementation

**O(sqrt n log log sqrt n) solution**

```
#include <iostream>
#include <cstring>
#include <numeric>
#include <cmath>
using namespace std;
const int MOD = 1e9 + 7;
const int LIM = 1e7 + 17;
const int SQRT_LIM = ceil(sqrt(LIM) + 1) + 1;
int euler[SQRT_LIM];
void sieve_phi(int n)
{
iota(euler, euler + n + 1, 0);
for (int x = 2; x <= n; x++) if (euler[x] == x)
for (int j = x; j <= n; j += x)
euler[j] -= euler[j] / x;
}
int solve(int n)
{
sieve_phi(ceil(sqrt(n) + 1) + 1);
long long res = 0;
for (int p = 2; p * p <= n; ++p)
res += 1LL * euler[p] * (n / (p * p));
res %= MOD;
return res;
}
int main()
{
ios::sync_with_stdio(false);
cin.tie(NULL);
int n;
cin >> n;
cout << solve(n);
return 0;
}
```

**O(sqrt) solution**

```
#include <iostream>
#include <cstring>
#include <numeric>
#include <vector>
#include <cmath>
using namespace std;
const int MOD = 1e9 + 7;
vector<int> lpf;
vector<int> prime;
vector<int> euler;
void linear_sieve_phi(int n)
{
lpf.assign(n + 1, 0);
euler.assign(n + 1, 1);
for (int x = 2; x <= n; ++x)
{
if (lpf[x] == 0)
{
prime.push_back(lpf[x] = x);
euler[x] = x - 1;
}
for (int i = 0; i < prime.size() && x * prime[i] <= n; ++i)
{
lpf[x * prime[i]] = prime[i];
if (x % prime[i] == 0) {
euler[x * prime[i]] = euler[x] * prime[i];
break;
}
euler[x * prime[i]] = euler[x] * euler[prime[i]];
}
}
}
int solve(int n)
{
linear_sieve_phi(ceil(sqrt(n) + 1) + 1);
long long res = 0;
for (int p = 2; p * p <= n; ++p)
res += 1LL * euler[p] * (n / (p * p));
res %= MOD;
return res;
}
int main()
{
ios::sync_with_stdio(false);
cin.tie(NULL);
int n;
cin >> n;
cout << solve(n);
return 0;
}
```

##### Complexity

**Hint**

The complexity is depended on euler totient phi sieve and calculating for answer.

But as you can see, both can be done in $$$O(\sqrt n)$$$.

## A better solution for general k

Extra task

B

#### Problem

**Very Easy Version**

Given $$$n\ (1 \leq n \leq 5 \times 10^6)$$$, count the number of triples $$$(a, b, c)$$$ that satisfied

- $$$1 \leq a < b < c \leq n$$$
- $$$a \times b, b \times c, c \times a$$$ are all perfect squares.

You can submit here: https://lqdoj.edu.vn/problem/square or https://oj.vnoi.info/problem/dhbb2020_square

**Easy Version**

Given $$$k, n (1 \leq k \leq n \leq 10^9)$$$, count the number of array $$$a[]$$$ of size $$$k$$$ that satisfied

- $$$1 \leq a_1 < a_2 < \dots < a_k \leq n$$$
- $$$a_i \times a_j$$$ is perfect square $$$\forall 1 \leq i < j \leq k$$$

Since the result can be big, output it under modulo $$$10^9 + 7$$$.

You can submit here: https://lqdoj.edu.vn/problem/aicprtspk.

**Hard Version**

Given $$$q$$$ queries, you must find answer for each query.

Given $$$k, n (1 \leq k \leq n \leq 10^{13})$$$, count the number of array $$$a[]$$$ of size $$$k$$$ that satisfied

- $$$1 \leq a_1 < a_2 < \dots < a_k \leq n$$$
- $$$a_i \times a_j$$$ is perfect square $$$\forall 1 \leq i < j \leq k$$$

Since the result can be big, output it under modulo $$$977555333311111$$$.

You can submit here: https://lqdoj.edu.vn/problem/vaicprtspb.

#### Examples

**Example 1**

**Input 1:**

```
2 1
```

**Output 1:**

```
2
```

**Explanation 1:**

There are $$$2$$$ satisfied array of size $$$1$$$: {$$$1$$$}, {$$$2$$$}.

**Example 2**

**Input 2:**

```
10 2
```

**Output 2:**

```
4
```

**Explanation 2:**

There are $$$4$$$ satisfied array of size $$$2$$$: {$$$1, 4$$$}, {$$$1, 9$$$}, {$$$2, 8$$$}, {$$$4, 9$$$}.

**Example 3**

**Input 3:**

```
27 3
```

**Output 3:**

```
12
```

**Explanation 3:**

There are $$$12$$$ satisfied array of size $$$3$$$: {$$$1, 4, 9$$$}, {$$$1, 4, 16$$$}, {$$$1, 4, 25$$$}, {$$$1, 9, 16$$$}, {$$$1, 9, 25$$$}, {$$$1, 16, 25$$$}, {$$$2, 8, 18$$$}, {$$$3, 12, 27$$$}, {$$$4, 9, 16$$$}, {$$$4, 9, 25$$$}, {$$$4, 16, 25$$$}, {$$$9, 16, 25$$$}.

#### Idea

**Definition**

As what clyring decribed here.

We can generalize those definition above into $$$k$$$-term.

Let $$$f_k(n)$$$ is the number of set $$$(a_1, a_2, \dots, a_k, n)$$$ that $$$1 \leq a_1 < a_2 < \dots < a_k \leq n$$$ and $$$(a_1, a_2, \dots, a_k, n)$$$ is a $$$(k+1)$$$-term geometric progression.

Let $$$g_k(n)$$$ is the number of set $$$(a_1, a_2, \dots, a_k, n)$$$ that $$$1 \leq a_1 \leq a_2 \leq \dots \leq a_k \leq n$$$ and $$$(a_1, a_2, \dots, a_k, n)$$$ is a $$$(k+1)$$$-term geometric progression.

Let $$$F_k(n) = \overset{n}{\underset{p=1}{\Large \Sigma}} f_k(p)$$$.

Let $$$s_k(n)$$$ is the number of way to choose $$$p^2$$$ among those $$$k$$$ numbers when you fix squarefree $$$u$$$ (though we are doing in reverse).

**The formula**

$$$s_k(n) = C^{n-1}_{k-1}$$$

$$$f_k(n) = F_k(n) - F_k(n-1) = s_k(g_k(n)) = \underset{d^2 | n}{\Large \Sigma} (\mu \times s_k)(d) = \underset{d^2 | n}{\Large \Sigma} \underset{p | d}{\Large \Sigma} \mu(p) \times s_k\left(\frac{d}{p}\right)$$$

$$$F_k(n) = \overset{n}{\underset{p=1}{\Large \Sigma}} f_k(p)$$$

$$$= F_k(0) + \overset{\left \lfloor \sqrt{n} \right \rfloor}{\underset{d = 1}{\Large \Sigma}}(\mu \times s_k)(d) \times \left \lfloor \frac{n}{d^2} \right \rfloor$$$

$$$= \overset{\left \lfloor \sqrt{n} \right \rfloor}{\underset{d = 1}{\Large \Sigma}} \left \lfloor \frac{n}{d^2} \right \rfloor \times \left ( \underset{p | d}{\Large \Sigma} \mu(p) \times s_k\left(\frac{d}{p}\right) \right)$$$

#### Implementation

**O(sqrt n log sqrt n)**

```
const int LIM = 5e6 + 56;
const int SQRT_LIM = ceil(sqrt(LIM) + 1) + 1;
const int MOD = 1e9 + 7;
/// Precalculating factorials under prime modulo
int fact[SQRT_LIM + 10]; /// fact[n] = n!
int invs[SQRT_LIM + 10]; /// invs[n] = n^(-1)
int tcaf[SQRT_LIM + 10]; /// tcaf[n] = (n!)^(-1)
void precal_nck(int n = SQRT_LIM)
{
fact[0] = fact[1] = 1;
invs[0] = invs[1] = 1;
tcaf[0] = tcaf[1] = 1;
for (int i = 2; i <= n; ++i)
{
fact[i] = (1LL * fact[i - 1] * i) % MOD;
invs[i] = MOD - 1LL * (MOD / i) * invs[MOD % i] % MOD;
tcaf[i] = (1LL * tcaf[i - 1] * invs[i]) % MOD;
}
}
/// Calculating binomial coefficient queries
int nck(int n, int k)
{
k = min(k, n - k);
if (k < 0) return 0;
long long res = fact[n];
res *= tcaf[k]; res %= MOD;
res *= tcaf[n - k]; res %= MOD;
return res;
}
/// Linear Sieve
vector<int> prime; /// prime list = A000040
bool isPrime[SQRT_LIM + 10]; /// characteristic function = A010051
int lpf[SQRT_LIM + 10]; /// lowest prime factor = A020639
int mu[SQRT_LIM + 10]; /// mobius = A008683
void linear_sieve(int n)
{
if (n < 1) return ;
/// Extension Sieve || You can add something more
memset(lpf, 0, sizeof(lpf[0]) * (n + 1));
fill_n(mu, n + 1, 1);
/// Main Sieve || Without this, you barely able to achive linear complexity
prime.clear();
prime.reserve(n / log(n - 1));
memset(isPrime, true, sizeof(isPrime[0]) * (n + 1));
isPrime[0] = isPrime[1] = false;
for (int x = 2; x <= n; ++x) /// For each number
{
if (isPrime[x]) /// Func[Prime]
{
mu[x] = -1;
lpf[x] = x;
prime.push_back(x);
}
for (int p : prime) /// Func[Prime * X] <- Func[Prime]
{
if (p > lpf[x] || x * p > n) break;
isPrime[x * p] = 0;
lpf[x * p] = p;
mu[x * p] = (lpf[x] == p) ? 0 : -mu[x];
}
}
}
/// Divisor sieve
vector<int> divisors[SQRT_LIM];
void precal_div(int n) /// O(n log n)
{
for (int u = n; u >= 1; --u)
{
divisors[u].clear();
for (int v = u; v <= n; v += u)
divisors[v].push_back(u);
}
}
/// Solving for n, k
long long solve(int n, int k)
{
/// We only care for d that 1 <= d <= sqrt(n)
int t = ceil(sqrt(n) + 1) + 1;
linear_sieve(t);
precal_nck(t);
precal_div(t);
long long res = 0;
for (int d = 1; d * d <= n; ++d) /// For each fixed p^2
{
long long sum = 0;
for (int p : divisors[d]) /// For each (p | d)
sum += mu[d / p] * nck(p - 1, k - 1);
sum %= MOD;
res += sum * (n / (d * d));
}
res %= MOD;
return res;
}
int main()
{
ios::sync_with_stdio(false);
cin.tie(NULL);
/// Assumming constant p = 10^9 + 7
int n, k;
cin >> n >> k;
cout << solve(n, k);
return 0;
}
```

**O(sqrt log log sqrt n)**

```
vector<int> prime; /// prime list = A000040
bool isPrime[SQRT_LIM + 10]; /// characteristic function = A010051
int lpf[SQRT_LIM + 10]; /// lowest prime factor = A020639
void linear_sieve(int n)
{
if (n < 1) return ;
prime.clear();
prime.reserve(n / log(n - 1));
memset(lpf, 0, sizeof(lpf[0]) * (n + 1));
memset(isPrime, true, sizeof(isPrime[0]) * (n + 1));
isPrime[0] = isPrime[1] = false;
for (int x = 2; x <= n; ++x)
{
if (isPrime[x]) /// Func[Prime]
{
lpf[x] = x;
prime.push_back(x);
}
for (int p : prime) /// Func[Prime * X] <- Func[Prime]
{
if (p > lpf[x] || x * p > n) break;
isPrime[x * p] = 0;
lpf[x * p] = p;
}
}
}
long long res[SQRT_LIM + 10];
long long solve(int n, int k)
{
int t = ceil(sqrt(n) + 1) + 1;
linear_sieve(t);
precal_nck(t);
memset(res, 0, sizeof(res[0]) * (t + 1));
for (int d = 1; d * d <= n; ++d)
res[d] = nck(d - 1, k - 1);
for (int p : prime)
for (int d = t / p; d > 0; --d)
res[d * p] -= res[d];
long long ans = 0;
for (int d = 1; d * d <= n; ++d)
ans += res[d] * (n / (d * d));
ans %= MOD;
return ans;
}
```

But while doing research for task **H**, I found an improvement

**O(sqrt (n/k) log log sqrt(n/k) - k) solution**

```
vector<int> valid;
int cnt[SQRT_LIM];
bool is_squarefree[LIM];
long long res[SQRT_LIM + 10];
int solve(int n, int k)
{
int t = ceil(sqrt(n) + 0.5);
if (k > t) return 0;
linear_sieve(t);
precal_nck(t);
memset(res, 0, sizeof(res[0]) * (t - k + 1));
for (int d = k; d * d <= n; ++d)
res[d - k] = nck(d - 1, k - 1);
for (int p : prime)
for (int d = t / p; d >= k; --d)
res[d * p - k] -= res[d];
long long ans = 0;
for (int d = k; d <= t; ++d)
ans += res[d - k] * (n / (d * d));
ans %= MOD;
return ans;
}
```

#### Complexity

**The first implementation**

The complexity of the first implementation is $$$O(\sqrt{n} \log \sqrt{n})$$$.

**Hint 1**

The complexity depends on Möbius function, traverse over all divisor of each number and calculating binomial coefficients.

**Hint 2**

We can sieve for Möbius function in linear.

**Hint 3**

We only care about $$$d$$$ that $$$1 \leq d \leq \sqrt{n}$$$.

**Hint 4**

We can use factorial formula for binomial coefficients as $$$1 \leq k \leq n < p$$$

**Proof**

As we only use $$$d$$$ that $$$d^2 \leq n$$$, let $$$t = \sqrt{n}$$$ for convenient.

We can use linear sieve for $$$\mu{d}$$$ in $$$O(t)$$$.

We can use divisor sieve for finding all divisors $$$p$$$ for each $$$v$$$ in $$$O(t \log t)$$$.

We can calculate the result in $$$O(t \log t)$$$.

Therefore the overall complexity is $$$O(t \log t)$$$.

**The second implementation**

The complexity of the second implementation is $$$O(\sqrt{n} \log \log \sqrt{n})$$$.

**Hint**

The complexity depends on sieve, calculating result and calculating binomial coefficients.

**Proof**

The sieve and binomial coefficients is obviously $$$O(t)$$$ complexity for $$$t = \sqrt{n}$$$.

The calculating part is just simply related to the sum of inversion of primes.

Therefore we achived $$$O(t \log t)$$$ complexity for $$$t = \sqrt{n}$$$.

**The third implementation**

Notice that there is only maximumly $$$O(\sqrt{n})$$$ valid $$$p^2$$$ for any fixed $$$u$$$.

This give you $$$O(1)$$$ solution when $$$k > \sqrt{n}$$$

Also notice that the binomial coefficient of $$$\binom{k}{x}$$$ is $$$0$$$ for all $$$x < k$$$.

This give you $$$O\left(\sqrt{\frac{n}{k}} \log \log \sqrt{\frac{n}{k}}\right)$$$ solution.

## Solution for duplicates elements in array

Extra task

C

#### Problem

Given $$$k, n (1 \leq k \leq n \leq 10^9)$$$, count the number of array $$$a[]$$$ of size $$$k$$$ that satisfied

- $$$1 \leq a_1 \leq a_2 \leq \dots \leq a_k \leq n$$$
- $$$a_i \times a_j$$$ is perfect square $$$\forall 1 \leq i < j \leq k$$$

Since the result can be big, output it under modulo $$$10^9 + 7$$$.

#### Idea

**Observation**

It is no hard to prove that we can use the same algorithm as described in task **A, B** or in original task.

**Hint**

Literally what we do up to now just optimize the counting part. The counting, which is the core is as the same as original.

**Same idea**

The counting part is only about the number of valid sequences bounded by $$$n$$$, or you can say, the number of possible $$$p^2$$$ you can choose when you fix squarefree number $$$u$$$.

And those stuffs above $$$\mu(n)$$$, $$$\phi(n)$$$, sieving or anything else are just optimizations.

**Calculation**

The core of calculating is to find out the number of non-decreasing integer sequence of size $$$k$$$ where numbers are in $$$[1, n]$$$.

**The formula**

$$$\binom{n + k - 1}{k}$$$

and when you fix one number as a similar idea:

$$$s_k(n) = \binom{n + k - 2}{k - 1}$$$

Can you prove it ?

**Hint 1**

It is standard stars and bars problem.

**Hint 2**

Instead of thinking the number of way to select $$$a_i$$$.

You can think of the number of value $$$a_i$$$ you have used.

**Hint 3**

Let $$$x_v$$$ is the number of value $$$v$$$ you used, or the number of $$$a_i = v$$$ in this sequence.

Each sequence can be represented uniquely as a sequence of $$$x_1, x_2, \dots, x_n$$$.

Since we selected $$$k$$$ in total we have $$$x_1 + x_2 + \dots + x_n = k$$$.

**Proof**

Let $$$x_v$$$ is the number of value $$$v$$$ you used, or the number of $$$a_i = v$$$ in this sequence.

For each box $$$v = 1 \dots n$$$, draw $$$x_v$$$ stars ⭐.

You can make $$$k - 1$$$ cut between these box (not cut into the box), and split into $$$k$$$ intervals.

The sum of each interval is the value of selected element.

Since there are $$$n + k - 1$$$ boxes, and you use $$$k - 1$$$ cuts, there are the total of $$$\binom{n + k - 1}{k}$$$ different ways.

Now it is done, just that it !

The idea is the same as what clyring described here but represented in the other way.

#### Implementation

**O(n) solution**

int fact[SQRT_LIM + 10];
int invs[SQRT_LIM + 10];
int tcaf[SQRT_LIM + 10];
void precal_nck(int n = SQRT_LIM)
{
fact[0] = fact[1] = 1;
invs[0] = invs[1] = 1;
tcaf[0] = tcaf[1] = 1;
for (int i = 2; i <= n; ++i)
{
fact[i] = (1LL * fact[i - 1] * i) % MOD;
invs[i] = MOD - 1LL * (MOD / i) * invs[MOD % i] % MOD;
tcaf[i] = (1LL * tcaf[i - 1] * invs[i]) % MOD;
}
}
int nck(int n, int k)
{
k = min(k, n - k);
if (k < 0) return 0;
long long res = fact[n];
res *= tcaf[k]; res %= MOD;
res *= tcaf[n - k]; res %= MOD;
return res;
}
bool is_squarefree[LIM];
int solve(int n, int k)
{
memset(is_squarefree, true, sizeof(is_squarefree[0]) * (n + 1));
precal_nck(2 * n + 1);
long long res = 0;
for (int i = 1, j; i <= n; ++i) if (is_squarefree[i])
{
for (j = 1; i * j * j <= n; ++j)
is_squarefree[i * j * j] = false;
res += nck(k + j - 2, k);
}
res %= MOD;
return res;
}

**O(sqrt n log sqrt n + k) solution**

```
const int LIM = 5e6 + 56;
const int SQRT_LIM = ceil(sqrt(LIM) + 1) + 1;
const int MOD = 1e9 + 7;
/// Precalculating factorials under prime modulo
int fact[SQRT_LIM + 10]; /// fact[n] = n!
int invs[SQRT_LIM + 10]; /// invs[n] = n^(-1)
int tcaf[SQRT_LIM + 10]; /// tcaf[n] = (n!)^(-1)
void precal_nck(int n = SQRT_LIM)
{
fact[0] = fact[1] = 1;
invs[0] = invs[1] = 1;
tcaf[0] = tcaf[1] = 1;
for (int i = 2; i <= n; ++i)
{
fact[i] = (1LL * fact[i - 1] * i) % MOD;
invs[i] = MOD - 1LL * (MOD / i) * invs[MOD % i] % MOD;
tcaf[i] = (1LL * tcaf[i - 1] * invs[i]) % MOD;
}
}
/// Calculating binomial coefficient queries
int nck(int n, int k)
{
k = min(k, n - k);
if (k < 0) return 0;
long long res = fact[n];
res *= tcaf[k]; res %= MOD;
res *= tcaf[n - k]; res %= MOD;
return res;
}
/// Linear Sieve
vector<int> prime; /// prime list = A000040
bool isPrime[SQRT_LIM + 10]; /// characteristic function = A010051
int lpf[SQRT_LIM + 10]; /// lowest prime factor = A020639
int mu[SQRT_LIM + 10]; /// mobius = A008683
void linear_sieve(int n)
{
if (n < 1) return ;
/// Extension Sieve || You can add something more
memset(lpf, 0, sizeof(lpf[0]) * (n + 1));
fill_n(mu, n + 1, 1);
/// Main Sieve || Without this, you barely able to achive linear complexity
prime.clear();
prime.reserve(n / log(n - 1));
memset(isPrime, true, sizeof(isPrime[0]) * (n + 1));
isPrime[0] = isPrime[1] = false;
for (int x = 2; x <= n; ++x) /// For each number
{
if (isPrime[x]) /// Func[Prime]
{
mu[x] = -1;
lpf[x] = x;
prime.push_back(x);
}
for (int p : prime) /// Func[Prime * X] <- Func[Prime]
{
if (p > lpf[x] || x * p > n) break;
isPrime[x * p] = 0;
lpf[x * p] = p;
mu[x * p] = (lpf[x] == p) ? 0 : -mu[x];
}
}
}
/// Divisor sieve
vector<int> divisors[SQRT_LIM];
void precal_div(int n) /// O(n log n)
{
for (int u = n; u >= 1; --u)
{
divisors[u].clear();
for (int v = u; v <= n; v += u)
divisors[v].push_back(u);
}
}
/// Solving for n, k
long long solve(int n, int k)
{
/// We only care for d that 1 <= d <= sqrt(n)
int t = ceil(sqrt(n) + 1) + 1;
linear_sieve(t);
precal_nck(t);
precal_div(t);
long long res = 0;
for (int d = 1; d * d <= n; ++d) /// For each fixed p^2
{
long long sum = 0;
for (int p : divisors[d]) /// For each (p | d)
sum += mu[d / p] * nck(d + k - 2, k - 1);
sum %= MOD;
res += sum * (n / (d * d));
}
res %= MOD;
return res;
}
int main()
{
ios::sync_with_stdio(false);
cin.tie(NULL);
/// Assumming constant p = 10^9 + 7
int n, k;
cin >> n >> k;
cout << solve(n, k);
return 0;
}
```

**O(sqrt n log log sqrt n + k) solution**

```
int fact[SQRT_LIM + 10];
int invs[SQRT_LIM + 10];
int tcaf[SQRT_LIM + 10];
void precal_nck(int n = SQRT_LIM)
{
fact[0] = fact[1] = 1;
invs[0] = invs[1] = 1;
tcaf[0] = tcaf[1] = 1;
for (int i = 2; i <= n; ++i)
{
fact[i] = (1LL * fact[i - 1] * i) % MOD;
invs[i] = MOD - 1LL * (MOD / i) * invs[MOD % i] % MOD;
tcaf[i] = (1LL * tcaf[i - 1] * invs[i]) % MOD;
}
}
int nck(int n, int k)
{
k = min(k, n - k);
if (k < 0) return 0;
long long res = fact[n];
res *= tcaf[k]; res %= MOD;
res *= tcaf[n - k]; res %= MOD;
return res;
}
vector<int> prime; /// prime list = A000040
bool isPrime[SQRT_LIM + 10]; /// characteristic function = A010051
int lpf[SQRT_LIM + 10]; /// lowest prime factor = A020639
void linear_sieve(int n)
{
if (n < 1) return ;
prime.clear();
prime.reserve(n / log(n - 1));
memset(lpf, 0, sizeof(lpf[0]) * (n + 1));
memset(isPrime, true, sizeof(isPrime[0]) * (n + 1));
isPrime[0] = isPrime[1] = false;
for (int x = 2; x <= n; ++x)
{
if (isPrime[x]) /// Func[Prime]
{
lpf[x] = x;
prime.push_back(x);
}
for (int p : prime) /// Func[Prime * X] <- Func[Prime]
{
if (p > lpf[x] || x * p > n) break;
isPrime[x * p] = 0;
lpf[x * p] = p;
}
}
}
long long res[SQRT_LIM + 10];
long long solve(int n, int k)
{
int t = ceil(sqrt(n) + 1) + 1;
linear_sieve(t + k);
precal_nck(t + k);
memset(res, 0, sizeof(res[0]) * (t + 1));
for (int d = 1; d * d <= n; ++d)
res[d] = nck(d + k - 2, k - 1);
for (int p : prime)
for (int d = t / p; d > 0; --d)
res[d * p] -= res[d];
long long ans = 0;
for (int d = 1; d * d <= n; ++d)
ans += res[d] * (n / (d * d));
ans %= MOD;
return ans;
}
```

#### Complexity

**The first implementation**

In the first implementation it is obviously linear.

**Hint 1**

The complexity is depended mainly on sieving part and binomial coefficient precalculation

**Hint 2**

Sieving can be done in linear

Binomial coefficient can also done been linear assumming $$$p$$$ is a sufficient large prime greater than $$$max(n, k)$$$

**The second and third implementation**

The second and third implementation is also easy to show its complexity

**Hint 1**

Like task **B** the algorithm is depended on calculating result and binomial coefficient precalculation

**Hint 2**

Yet we proved calculating result only take $$$O(\sqrt{n} \log \sqrt{n})$$$ [2] and $$$O(\sqrt{n} \log \log \sqrt{n})$$$ [3]

And we need to calculate binomial coefficient upto $$$\binom{d + k - 2}{k - 1}$$$ for $$$d = \left \lfloor \sqrt{n} \right \rfloor$$$ so we need $$$O(\sqrt{n} + k)$$$ precalculation time.

Sadly, since $$$k \leq n$$$. We also conclude that the complexity is $$$O(n)$$$, and even worse it also contains large constant factor compared to that in the first implementation.

But it is still effecient enough to solve problem where $$$k$$$ is small.

**Bonus**

You optimize the problem using this

Then you can solve the problem in $$$O(\sqrt{n} \log \log \sqrt{n} + \sqrt{p} \log^q p)$$$ ($$$q$$$ is depended on how you implement it).

## Solution when there are no restriction between k, n, p

Extra task

D

#### Problem

Given $$$k, n, p (1 \leq k, n, p \leq 10^9)$$$, count the number of array $$$a[]$$$ of size $$$k$$$ that satisfied

- $$$1 \leq a_1 < a_2 < \dots < a_k \leq n$$$
- $$$a_i \times a_j$$$ is perfect square $$$\forall 1 \leq i < j \leq k$$$

Since the result can be big, output it under modulo $$$p$$$.

#### Idea

**Observation**

So first of all, the result **do** depend on how you calculate binomial coefficient **but** they are calculated independently even if you can somehow manage to use the for loop of binomial coeffient go first.

Therefore even if there is no restriction between $$$k, n, p$$$, the counting part and the algorithm doesnt change.

You just need to change how you calculate binomial coefficient, and that is all for this task.

Let just ignore the fact that though this need more detail, but as the blog is not about nck problem I will just make it quick.

You can also notice that for $$$k > n$$$, the result is obviously $$$0$$$.

**Large prime p**

For large prime $$$p > max(n, k)$$$

- Just using normal combinatorics related to factorial (since $$$p > max(n, k)$$$ nothing will affect the result)
- For taking divides under modulo you can just take modular inversion (as a prime always exist such number)
- Yet this is standard problem, just becareful of the overflow part
- You can also optimize by precalculating factorial, inversion number and inversion factorial in linear too

For general prime $$$p$$$

- We can just ignore factors $$$p$$$ in calculating $$$n!$$$.
- You also need to know how many times factor $$$p$$$ appears in $$$1 \dots n$$$
- Then combining it back when calculating for the answer.
- If we dont do this $$$n!$$$ become might divides some factors of $$$p$$$.
- By precalculation you can answer queries in $$$O(1)$$$

For squarefree $$$p$$$

- Factorize $$$p = p_1 \times p_2 \times p_q$$$ that all $$$p_i$$$ is prime.
- Ignore all factors $$$p_i$$$ when calculate $$$n!$$$.
- Remember to calculate how many times factors $$$p_i$$$ appear in $$$1 \dots n$$$.
- When query for the answer we just combine all those part back.
- Remember you can just take modulo upto $$$\phi(p)$$$ which you can also calculate while factorizing $$$p$$$.
- Remember that $$$n!$$$ must not divides any factor $$$p_i$$$ otherwise you will get wrong answer.
- By precalculation you can answer queries in $$$O(\log p)$$$

For general positive modulo $$$p$$$

- Factorize $$$p = p_1^{f_1} \times p_2^{f_2} \times p_q^{f_q}$$$ that all $$$p_i$$$ is unique prime.
- We calculate $$$C(n, k)$$$ modulo $$$p_i^{f_i}$$$ for each $$$i = 1 \dots q$$$.
- To do that, we need to calculate $$$n!$$$ modulo $$$p_i^{f_i}$$$ which is described here.
- To get the final answer we can use CRT.
- Yet this is kinda hard to code and debug also easy to make mistake so you must becareful
- I will let the implementation for you lovely readers.
- Yet depends on how you calculate stuffs that might increase your query complexity
- There are few (effective or atleast fully correct) papers about this but you can read the one written here

#### Implementation

**O(n) for prime p > max(n, k)**

```
/// SPyofgame linear template for precalculating factorials under large prime modulo
int fact[SQRT_LIM + 10]; /// fact[n] = n!
int invs[SQRT_LIM + 10]; /// invs[n] = n^(-1)
int tcaf[SQRT_LIM + 10]; /// tcaf[n] = (n!)^(-1)
void precal_nck(int n = SQRT_LIM)
{
fact[0] = fact[1] = 1;
invs[0] = invs[1] = 1;
tcaf[0] = tcaf[1] = 1;
for (int i = 2; i <= n; ++i)
{
fact[i] = (1LL * fact[i - 1] * i) % MOD;
invs[i] = MOD - 1LL * (MOD / i) * invs[MOD % i] % MOD;
tcaf[i] = (1LL * tcaf[i - 1] * invs[i]) % MOD;
}
}
/// Calculating binomial coefficient queries
int nck(int n, int k)
{
k = min(k, n - k);
if (k < 0) return 0;
long long res = fact[n];
res *= tcaf[k]; res %= MOD;
res *= tcaf[n - k]; res %= MOD;
return res;
}
```

**O(n log mod + sqrt(mod)) for prime p or squarefree p**

```
vector<int> factor;
int factorize(int n) /// Calculating phi(n) while factorizing (n) in O(sqrt n)
{
factor.clear();
int phi = n;
if (!(n & 1))
{
n >>= __builtin_ctz(n);
factor.push_back(2);
phi -= phi / 2;
}
for (int x = 3; x * x <= n; x += 2)
{
if (n % x == 0)
{
do n /= x; while (n % x == 0);
factor.push_back(x);
phi -= phi / x;
}
}
if (n > 1)
{
factor.push_back(n);
phi -= phi / n;
}
return phi;
}
int f[LIM]; /// f[x] = nck(n, x)
int fact[LIM]; /// n!
int tcaf[LIM]; /// n!^(-1)
int divp[LIM]; /// x but ignore all factors p[i]
int cntp[LIM][LOG_LIM]; /// cntp[x][i] = Number of time factor p[i] appear in 1..x
void precal(int MOD) /// Calculate f[x] for all x = 1 -> n in O(n log mod + sqrt mod)
{
int PHIMOD = factorize(MOD);
for (int x = 1; x <= n; ++x) /// For each part x in n!
{
int &t = divp[x] = x;
for (int i = 0; i < factor.size(); ++i) /// Ignore all factor p[i] of p
{
cntp[x][i] = cntp[x - 1][i];
for (; t % factor[i] == 0; t /= factor[i]) /// Count how many times p[i] appears in 1..n
++cntp[x][i];
}
}
fact[0] = fact[1] = 1;
tcaf[0] = tcaf[1] = 1;
for (int x = 2; x <= n; ++x) /// Finding n! and n!^(-1)
{
fact[x] = (1LL * fact[x - 1] * divp[x]) % MOD;
tcaf[x] = powMOD(fact[x], PHIMOD - 1, MOD);
}
memset(f, 0, sizeof(f[0]) * k);
for (int x = k; x <= n; ++x)
{
/// Calculate nck % p normally
f[x] = fact[x];
mulMOD(f[x], tcaf[k], MOD);
mulMOD(f[x], tcaf[x - k], MOD);
for (int i = 0; i < factor.size(); ++i) /// Bringing those factors back
{
int p = cntp[x][i] - cntp[k][i] - cntp[x - k][i];
f[x] = 1LL * f[x] * powMOD(factor[i], p, MOD) % MOD;
}
}
}
```

#### Complexity

**Spoiler**

In the first implementation it is obviously linear.

**Hint**

Each number is visited once and the transistion only cost $$$O(1)$$$

And for the second implementation.

**Hint 1**

As you might notice $$$p$$$ is atleast prime, or atmost a squarefree number.

Since the complexity also depends on the factors of $$$p$$$, in the worst case, $$$p = 2 \times 3 \times 5 \times 7 \times 11 \times 13 \times 17 \times 19 \times \dots$$$ having most factors that is about $$$\log(p)$$$.

**Hint 2**

The total complexity also depended on factorizing $$$p$$$ and calculating $$$\phi{p}$$$.

Yet for normal trial division you can solve in $$$O(\sqrt{p})$$$ time.

So you got $$$O(n \times \log p + \sqrt{p})$$$ in final.

**Bonus**

You can also do all of this in $$$O(n \log p) + Õ(p^{1/4})$$$

Though you can still optimize this but by doing that why dont you just go straight up to solve for non squarefree $$$p$$$ too ?

## Solution when numbers are also bounded by negative number

Extra task

E

#### Problem

Given $$$k, n (1 \leq k \leq n \leq 10^9)$$$, count the number of array $$$a[]$$$ of size $$$k$$$ that satisfied

- $$$-n \leq a_1 < a_2 < \dots < a_k \leq n$$$
- $$$a_i \times a_j$$$ is perfect square $$$\forall 1 \leq i < j \leq k$$$

Since the result can be big, output it under modulo $$$10^9 + 7$$$.

#### Idea

**Hint**

Yet this is the same as extra task **C** where only the counting part should be changed.

As we only care about integer therefore let not use complex math into this problem.

If there exist a negative number and a positive number, the product will be negative thus the sequence will not satisfied.

Becareful, there are the zeros too.

When the numbers are all unique, or $$$-n \leq a_1 < a_2 < \dots < a_k \leq n$$$

**There are 4 cases:**

The sequence contain positive integers only.

The sequence contain negative integers only.

The sequence contain a zero, and the rest are positive integers.

The sequence contain a zero, and the rest are negative integers.

Thus give us the formula of $$$task_E(n, k) = 2 \times task_B(n, k) + 2 \times task_B(n, k - 1)$$$.

**Hint 1**

There cant be both negative number and positive number at the same time

**Hint 2**

You can flip all the sign to positive, thus negative integers sequence are now the same as positive sequences.

**Hint 3**

Without zero, the problem has no difference from the original task.

**Hint 4**

With zero, any product with it is already a perfect square.

Hence zeros dont affect anything other than the remaining number you have not selected.

**Proof**

The sequence contain both negative and positive that satified the problem is not existed as their product is not a perfect square (we only care in integers not complex)

The sequence contain positive integers only:

- This is the original task as positive integers only means $$$1 \leq a_i \leq n$$$.

The sequence contain negative integers only:

- Flip all the sign doesnt change the product for each pairs.
- The problem now is the same as the sequence contain positive integers only.

The sequence contain a zero, and the rest are positive integers:

- The product of a zero with any number is still a perfect square.
- So the zero only reduce the number you have to select.
- Therefore this is same as the original task but now you have to select $$$k - 1$$$ numbers.

The sequence contain a zero, and the rest are negative integers:

- Just also flip the sign, the answer is the same as above.

Remember that when $$$k = 0$$$ the answer is $$$0$$$ otherwise you might somewhat having wrong result for negative number in binomial coefficients formula

**With duplicates case**

So what if I mix the problem with task **C** too ?

When the numbers can have duplicates, or $$$-n \leq a_1 \leq a_2 \leq \dots \leq a_k \leq n$$$

**There are 5 cases:**

The sequence contain zeros only.

The sequence contain positive integers only.

The sequence contain negative integers only.

The sequence contain $$$0 < t < k$$$ positive integers, and the rest $$$k - t$$$ are zeros.

The sequence contain $$$0 < t < k$$$ positive integers, and the rest $$$k - t$$$ are zeros.

Yet once again you can simplified it with less cases for easier calculation.

**There are 2 main cases:**

The sequence contain zeros only.

The sequence contain $$$0 < t \leq k$$$ non-zero integers, and the rest $$$k - t$$$ are zeros.

Thus give us the formula of $$$task_E(n, k) = 1 + 2 \times \overset{k}{\underset{t = 1}{\Large \Sigma}} task_B(n, t)$$$.

**Why the formula is 2 * ...?**

You might wondering where is the $$$2 \times \dots$$$ come from.

Yet it is the same as unique number cases.

You have the answer for positive numbers are the same as negative number when you flip the sign.

So you have $$$1 \times \dots$$$ for positive, and $$$1 \times \dots$$$ for negative.

Therefore you have $$$2 \times \dots$$$ in total

**No I mean why there is no binomial coefficients for selecting the number of zeros ?**

You might wondering why we dont have a formula like this:

$$$task_E(n, k) = 1 + \overset{k}{\underset{t = 1}{\Large \Sigma}} C^t_k \times task_B(n, t)$$$.

But you know, since the sequence is ordered.

No matter which number you selected as zero there is one and only one way to sort it back.

**So where is the part 1 come frome ? - Why isnt it 2 instead ?**

You see, there is only one case for fully zeros sequence.

But this give you a $$$O(k)$$$ solution.

You can do better with math

**Hint 1**

Isnt this sum is familiar to some standard math problem related to binomial coefficient ?

**Hint 2**

This formula is related to $$$\overset{k}{\underset{t = 0}{\Large \Sigma}} \binom{n + t}{t} = \binom{n + k + 1}{k}$$$

**Solution**

$$$s_k(n) = \overset{k}{\underset{t = 1}{\Large \Sigma}} \binom{d + t - 2}{t - 1} \times 2 = \binom{d + k - 1}{k - 1} \times 2$$$

#### Implementation

**O(sqrt n log log sqrt n) when the numbers are unique**

long long res[SQRT_LIM + 10];
long long solve(int n, int k)
{
int t = ceil(sqrt(n) + 1) + 1;
linear_sieve(t);
precal_nck(t);
memset(res, 0, sizeof(res[0]) * (t + 1));
for (int d = 1; d * d <= n; ++d)
{
res[d] += (k >= 1) * nck(d - 1, k - 1) * 2;
res[d] += (k >= 2) * nck(d - 1, k - 2) * 2;
}
for (int p : prime)
for (int d = t / p; d > 0; --d)
res[d * p] -= res[d];
long long ans = 0;
for (int d = 1; d * d <= n; ++d)
ans += res[d] * (n / (d * d));
ans %= MOD;
return ans;
}

And for duplicates (mixed with task **C**), we have:

**O(kn) = O(n^2)**

```
bool is_squarefree[LIM];
int brute(int n, int k)
{
memset(is_squarefree, true, sizeof(is_squarefree[0]) * (n + 1));
precal_nck(2 * n + 1);
long long res = 0;
for (int i = 1, j; i <= n; ++i) if (is_squarefree[i])
{
for (j = 1; i * j * j <= n; ++j)
is_squarefree[i * j * j] = false;
res += nck(k + j - 2, k);
}
res %= MOD;
return res;
}
long long solve(int n, int k)
{
long long res = 1;
for (int t = 1; t <= k; ++t)
res += brute(n, t) * 2;
res %= MOD;
return res;
}
```

**O(k sqrt n log sqrt n) = O(n sqrt n log n)**

```
long long res[SQRT_LIM + 10];
long long brute(int n, int k)
{
/// We only care for d that 1 <= d <= sqrt(n)
int t = ceil(sqrt(n) + 1) + 1;
linear_sieve(t);
precal_nck(t + k);
precal_div(t);
long long res = 0;
for (int d = 1; d * d <= n; ++d) /// For each fixed p^2
{
long long sum = 0;
for (int p : divisors[d]) /// For each (p | d)
sum += mu[d / p] * nck(d + k - 2, k - 1);
sum %= MOD;
res += sum * (n / (d * d));
}
res %= MOD;
return res;
}
long long solve(int n, int k)
{
long long res = 1;
for (int t = 1; t <= k; ++t)
res += brute(n, t) * 2;
res %= MOD;
return res;
}
```

**O(k sqrt n log log sqrt n) = O(n sqrt n log log n)**

```
long long res[SQRT_LIM + 10];
long long brute(int n, int k)
{
int t = ceil(sqrt(n) + 1) + 1;
linear_sieve(t);
precal_nck(t + k);
memset(res, 0, sizeof(res[0]) * (t + 1));
for (int d = 1; d * d <= n; ++d)
res[d] = nck(d + k - 2, k - 1);
for (int p : prime)
for (int d = t / p; d > 0; --d)
res[d * p] -= res[d];
long long ans = 0;
for (int d = 1; d * d <= n; ++d)
ans += res[d] * (n / (d * d));
ans %= MOD;
return ans;
}
long long solve(int n, int k)
{
long long res = 1;
for (int t = 1; t <= k; ++t)
res += brute(n, t) * 2;
res %= MOD;
return res;
}
```

**O(k sqrt n + sqrt n log log sqrt n) = O(n sqrt n)**

```
long long res[SQRT_LIM + 10];
long long solve(int n, int k)
{
int t = ceil(sqrt(n) + 1) + 1;
linear_sieve(t);
precal_nck(t + k);
memset(res, 0, sizeof(res[0]) * (t + 1));
for (int d = 1; d * d <= n; ++d)
for (int t = 1; t <= k; ++t)
res[d] += nck(d + t - 2, t - 1) * 2;
for (int p : prime)
for (int d = t / p; d > 0; --d)
res[d * p] -= res[d];
long long ans = 1;
for (int d = 1; d * d <= n; ++d)
ans += res[d] * (n / (d * d));
ans %= MOD;
return ans;
}
```

**O(k + sqrt n log log sqrt n) = O(n)**

long long res[SQRT_LIM + 10];
long long solve(int n, int k)
{
int t = ceil(sqrt(n) + 1) + 1;
linear_sieve(t);
precal_nck(t + k);
memset(res, 0, sizeof(res[0]) * (t + 1));
for (int d = 1; d * d <= n; ++d)
res[d] += nck(d + k - 1, k - 1) * 2;
for (int p : prime)
for (int d = t / p; d > 0; --d)
res[d * p] -= res[d];
long long ans = 1;
for (int d = 1; d * d <= n; ++d)
ans += res[d] * (n / (d * d));
ans %= MOD;
return ans;
}

#### Complexity

**Spoiler**

So.. you might be tired of calculating complexity again and again for those are too familar to you.

So I gonna skip this as the proof is as the same as what you can read above.

But as a bonus, you optimize the problem using this

Then you can solve the problem in $$$O(\sqrt{n} \log \log \sqrt{n} + \sqrt{p} \log^q \ p)$$$ ($$$q$$$ is depended on how you implement it).

#### Conclusion

**Spoiler**

For unique array, the complexity can be $$$O(\sqrt{n} \log \log \sqrt{n})$$$

For duplicate array, the complexity can be $$$O(\sqrt{n} \log \log \sqrt{n} + k)$$$

For duplicate array and small prime $$$p > max(n, k)$$$, the complexity can be $$$O(\sqrt{n} \log \log \sqrt{n} + \sqrt{p} \log^q \ p)$$$

## Solution when numbers are also bounded by a specific range

Extra task

F

#### Problem

Given $$$k, L, R (1 \leq k, L, R \leq 10^9)$$$, count the number of array $$$a[]$$$ of size $$$k$$$ that satisfied

- $$$L \leq a_1 < a_2 < \dots < a_k \leq R$$$
- $$$a_i \times a_j$$$ is perfect square $$$\forall 1 \leq i < j \leq k$$$

Since the result can be big, output it under modulo $$$10^9 + 7$$$.

#### Idea

**Observation**

We split into 4 cases

**The cases**

Case $$$1$$$: $$$L \leq 0 \leq R$$$

Case $$$2$$$: $$$0 \leq L \leq R$$$

Case $$$3$$$: $$$L \leq R \leq 0$$$

Case $$$4$$$: $$$L > R$$$

You can easilier solve for each cases in linear

**Case 1**

It is $$$task_B(|L|, k) + task_B(|L|, k - 1) + task_B(|R|, k) + task_B(|R|, k - 1)$$$

**Case 2**

This is what we are going to discuss about.

A simple formula is fox each squarefree $$$u$$$ in $$$1 \dots R$$$.

Count the number of $$$p^2$$$ in $$$[L, R]$$$.

Using combinatorics to count the number of way to select $$$p^2$$$ for each $$$u$$$.

The answer is sum over all result for $$$u$$$.

**Case 3**

Same as case 2 when you flip the sign but also remember to swap $$$(L, R)$$$ too.

**Case 4**

Just simply $$$0$$$.

Now assumming $$$1 \leq L \leq R$$$, here is how we solve it.

**As a simple approach,** we can just do like original problem for general $$$k$$$.

We just need to iterative for each fixed squarefree $$$u$$$ and count the number of way to select $$$p^2$$$ as usual but a bit of change.

**Why do I get WA**

**Highly notice that** fixed squarefree $$$u$$$ might also in $$$[1, L - 1]$$$ and not just in $$$[L, R]$$$.

Also remember, just count the $$$p^2$$$ in $$$[L, R]$$$.

By doing so trivially we can have the complexity of $$$O((R - L + 1) \log R)$$$ time and $$$O(R)$$$ space.

**Optimize my time complexity**

Ofcourse we can just optimize it by using squarefree sieve in $$$O(R - L + 1)$$$ time.

Yet again, let use linear sieve and linear precalculation for binomial coefficient too.

**Hint 1 to optimize space complexity**

We just need $$$O(R - L + 1)$$$ space.

But this lead us to another problem, how can we check if $$$[u]$$$ is squarefree when we literally remove the ability to check for any $$$u < L$$$ ???

A simple approach is just to redefine the squarefree sieve.

**Hint 2 to optimize space complexity**

You see, when we go to the first $$$u$$$ that not marked in the sieve, we mark all form of $$$u \times p^2$$$ except $$$u$$$.

Why dont we change a little bit, let say, we check for the first $$$u \times c^2$$$ for some $$$c > 0$$$. And then mark all form of $$$u \times p^2$$$ except $$$u \times c^2$$$.

Yes, we just move the $$$u$$$ to the a number $$$u \times c^2$$$ that suitable for it and marked only once for each unique $$$u$$$.

**Hint 3 to optimize space complexity**

How do we find $$$c$$$ ?

We can take $$$c$$$ as the smallest number that $$$u \times c^2 \geq L$$$.

So by this definition, $$$c = \left \lceil \sqrt{\frac{L}{u}} \right \rceil$$$.

**Optimization**

Is it over ? Cant we do better ?

**Isnt there is trivial way that we forget ?**

- Yes

**Is there a way that we can iterative through [L, R] ?**

- Yes

**Can we have some ways that you iterative not the whole part [1, R] ?**

- Yes

**Can we just iterative through [1, \sqrt{R}] and [L, R] to solve it**

- Yes

**What is that way ?**

Factorizing numbers.

**Isnt it bad ?**

Well, not sure what "bad" you are talking about.

But if it is about the complexity then each number $$$x$$$ in $$$[L, R]$$$ can be factorized in $$$O(\sqrt{x})$$$.

Yet for $$$L \approx 1$$$ this will about to be $$$O(R \sqrt{R} \log{R})$$$ if you implement badly.

Though it is so, let just ignore the time complexity for now.

**Can we factorize numbers faster ?**

Yes, you can use pollard rho to factorize in expected time $$$Õ(x^{1/4})$$$.

Also remember to combine it with miller rabin for prime cases.

For more detail, you can read this https://en.wikipedia.org/wiki/Pollard%27s_rho_algorithm

**Is there another way then pollard rho ?**

Yes, you can sieve for first $$$\sqrt{r}$$$ number for primes.

And for each number in $$$[L, R]$$$ you just need $$$min(\sqrt{x}, \sqrt{r} / \log {r})$$$ time to factorize it.

But in practice this is significant fast.

Though I am not quite sure what is the actual complexity of doing so.

There are rare amount of papers related about this specific topics and even worse that most of it are not well researched.

**But how do we count the number of way to select p^2 ?**

You can just iterative for each form of $$$u \times p^2$$$ that lies in $$$[L, R]$$$.

Also remember that each number should be visited once.

**But how can we iterative for each p^2 ?**

Not sure what you mean though.

When you factorize $$$x$$$, you can get the form $$$x = u \times p^2$$$.

Just remember dont take the whole $$$p^2$$$ but take $$$p$$$ and iterate using $$$p^2$$$, each step increase $$$p$$$ by one.

But for each prime factor $$$f$$$ in $$$x$$$ that appear more than once, we reduce $$$x$$$ by $$$f^2$$$ and increase $$$p$$$ by $$$f$$$ where $$$p$$$ is initially $$$1$$$.

**Wait is another way to reduce the complexity ?**

- Yes

**Wait still we use factorization ?**

- Yes

**Another way, but still faster then to apply pollard rho ?**

- Yes

**You mean something like sieving or am I wrong in something ?**

- Yes

**Isnt the normal sieve we marked for primes, but now the loop is inside, you you mean to...**

- Yes

**Can we use the trick like u * c^2 to reduce the number of cases when we take prime as the first loop ?**

- Yes

**Wait isnt this a kind of segment sieve ?**

- Yes

So what is that the better solution and how do we implement it steps by steps ?

**Solution**

Sieve all prime up to $$$\sqrt{R}$$$.

Initialize $$$u_x$$$ — the squarefree part of each number $$$x$$$ in $$$[L, R]$$$ is $$$x$$$ itself.

Also initialize $$$p_x$$$ — the squarefree factor part of each number $$$x$$$ in $$$[L, R]$$$ is simply $$$1$$$.

Now for each prime $$$p$$$ we sieved, you factorize each number $$$x$$$, reduce $$$u_x$$$ by $$$p^2$$$ while increase $$$p_x$$$ by $$$p$$$.

Now for each number $$$x$$$ in $$$[L, R]$$$, you can do squarefree sieve like normal, with the steps of $$$p_x$$$ initially with $$$u_x$$$.

#### Implementation

Let $$$Z = max(|L|, |R|)$$$

**O(Z) time - O(R - L) space**

```
bool is_squarefree[LIM + 10];
int solve(int l, int r, int k)
{
if (l > r) return 0;
if (r < 0) return solve(-r, -l, k);
if (l <= 0 && 0 <= r)
{
long long res = 0LL + taskB(abs(l), k) + taskB(abs(l), k - 1) + taskB(abs(r), k) + taskB(abs(r), k - 1);
while (res >= MOD) res -= MOD;
return res;
}
memset(is_squarefree, true, sizeof(is_squarefree[0]) * (r - l + 1));
precal_nck(r - l + 1);
int tot = r - l + 1;
long long res = 0;
for (int i = 1; i <= r; ++i)
{
int j = sqrt(l / i);
while (i * j * j > l) --j;
while (i * j * j < l) ++j;
if (is_squarefree[i * j * j - l])
{
int cnt = 0;
for (; i * j * j <= r; ++j)
{
++cnt;
--tot;
is_squarefree[i * j * j - l] = false;
}
res += nck(cnt, k);
if (tot == 0) break;
}
}
res %= MOD;
return res;
}
```

**O(R * sqrt(Z) / log(Z)) time - O(R - L) space**

long long res[SQRT_LIM + 10];
long long taskB(int n, int k)
{
if (k == 0) return 0;
if (k == 1) return n;
int t = ceil(sqrt(n) + 1) + 1;
linear_sieve(t);
precal_nck(t);
memset(res, 0, sizeof(res[0]) * (t + 1));
for (int d = 1; d * d <= n; ++d)
res[d] = nck(d - 1, k - 1);
for (int p : prime)
for (int d = t / p; d > 0; --d)
res[d * p] -= res[d];
long long ans = 0;
for (int d = 1; d * d <= n; ++d)
ans += res[d] * (n / (d * d));
ans %= MOD;
return ans;
}
bool is_squarefree[LIM + 10];
int solve(int l, int r, int k)
{
if (l > r) return 0;
if (r < 0) return solve(-r, -l, k);
if (l <= 0 && 0 <= r)
{
long long res = 0LL + taskB(abs(l), k) + taskB(abs(l), k - 1) + taskB(abs(r), k) + taskB(abs(r), k - 1);
while (res >= MOD) res -= MOD;
return res;
}
int t = ceil(sqrt(r + 1) + 1) + 1;
memset(is_squarefree, true, sizeof(is_squarefree[0]) * (r - l + 1));
precal_nck(r - l + 1);
linear_sieve(t);
long long res = 0;
for (int x = l; x <= r; ++x) if (is_squarefree[x - l])
{
int u = x;
int v = 1;
for (int p : prime)
{
int q = p * p;
if (q > u) break;
while (u % q == 0)
{
u /= q;
v *= p;
}
}
int cnt = 0;
for (; u * v * v <= r; ++v)
{
is_squarefree[u * v * v - l] = false;
++cnt;
}
res += nck(cnt, k);
}
res %= MOD;
return res;
}

**O(sqrt R log log R + (R - L)) time - O(R - L) space**

```
long long res[SQRT_LIM + 10];
long long taskB(int n, int k)
{
if (k == 0) return 0;
if (k == 1) return n;
int t = ceil(sqrt(n) + 1) + 1;
linear_sieve(t);
precal_nck(t);
memset(res, 0, sizeof(res[0]) * (t + 1));
for (int d = 1; d * d <= n; ++d)
res[d] = nck(d - 1, k - 1);
for (int p : prime)
for (int d = t / p; d > 0; --d)
res[d * p] -= res[d];
long long ans = 0;
for (int d = 1; d * d <= n; ++d)
ans += res[d] * (n / (d * d));
ans %= MOD;
return ans;
}
#include <algorithm>
#define all(x) (x).begin(), (x).end()
bool is_squarefree[LIM + 10];
int squarefree[LIM + 10];
int sqf_factor[LIM + 10];
int solve(int l, int r, int k)
{
if (l > r) return 0;
if (r < 0) return solve(-r, -l, k);
if (l <= 0 && 0 <= r)
{
long long res = 0LL + taskB(abs(l), k) + taskB(abs(l), k - 1) + taskB(abs(r), k) + taskB(abs(r), k - 1);
while (res >= MOD) res -= MOD;
return res;
}
int t = ceil(sqrt(r + 1) + 1) + 1;
memset(is_squarefree, true, sizeof(is_squarefree[0]) * (r - l + 1));
precal_nck(r - l + 1);
linear_sieve(t);
for (int x = l; x <= r; ++x)
{
squarefree[x - l] = x;
sqf_factor[x - l] = 1;
}
long long res = 0;
for (int p : prime)
{
for (int q = p * p, x = max(q, (l + q - 1) / q * q); x <= r; x += q)
{
while (squarefree[x - l] % q == 0)
{
squarefree[x - l] /= q;
sqf_factor[x - l] *= p;
}
}
}
for (int x = l; x <= r; ++x)
{
int u = squarefree[x - l];
int p = sqf_factor[x - l];
if (is_squarefree[x - l])
{
int cnt = 0;
for (; u * p * p <= r; ++p)
{
is_squarefree[u * p * p - l] = false;
++cnt;
}
res += nck(cnt, k);
}
}
res %= MOD;
return res;
}
```

#### Complexity

**Spoiler**

Well, you might feel bored of long chain of proving for just a simple complexity, now I just make it quick.

Which part you find hard to understand, you can comment below if you cant accept this as an exercise.

For the first implemenetation it is obviously linear time in $$$max(|L|, |R|)$$$ and linear space in $$$|L - R|$$$

For the second implementation it is harder to prove. Yet it is bounded by the complexity of factorizing each number in range $$$[L, R]$$$.

For the third implementation it is based on the fact that:

- The sum of inversion of prime squared is constant.
- Each number $$$x$$$ will not be factorized more than $$$O(\log(\sqrt{x}))$$$.
- You can precalculate binomial coefficient in linear and query for constant time.
- You can sieve in linear, and only sieve to the square root of $$$R$$$.
- For each number in the range we care, we just need to visit once.

Yet you can furthur optimize it to $$$O\left(\frac{\sqrt{R} \log \log \sqrt{R}}{\log \sqrt{R}} + \frac{R - L}{\pi^2}\right)$$$.

*I am working on a $$$O {\Large ( } Õ(\sqrt{R}) + Õ(\sqrt{R - L}) {\Large ) }$$$ solution right now as it is much harder to apply the same difficult math idea described in task **B**.

## Solution when the product you must find is a perfect cube

Extra task

G

#### Problem

- $$$1 \leq a_1 < a_2 < \dots < a_k \leq n$$$
- $$$a_i \times a_j \times a_t$$$ is perfect cube $$$\forall 1 \leq i < j < t \leq k$$$

Since the result can be big, output it under modulo $$$10^9 + 7$$$.

#### Idea

**k < 3**

For $$$k < 3$$$, the formula is easy to shown as the second condition always satisfied.

**k = 0**

Obviously $$$res = 0$$$.

**k = 1**

Obviously $$$res = n$$$.

**k = 2**

Obviously $$$res = \frac{n(n-1)}{2}$$$.

Becareful of overflow.

**k > 3**

For $$$k > 3$$$, you can prove that every number you selected must share same cubefree therefore just make a cubefree sieve in linear.

**Hint 1**

$$$a_1 < a_2 < \dots < a_k$$$.

**Hint 2**

What if $$$a_1 \times a_2 \times a_3$$$ is a perfect cube and also does $$$a_1 \times a_2 \times a_4$$$

**Hint 3**

Let $$$a_x = b_x \times c_x^3$$$ for $$$b_x$$$ cubefree and $$$c_x$$$ is some integer.

Let $$$b_{12} = b_1 \times b_2$$$

We have $$$b_{12} \times b_3$$$ and $$$b_{12} \times b_4$$$ are all cubes.

**Hint 4**

The fundamental theorem of arithmetic states that every positive integer can be represented in exactly one way apart from rearrangement as a product of prime with positive integer power hence we can express number as prime factorization form as:

$$$b_x = p_0^{f_0} \times p_1^{f_1} \times \dots$$$ for unique prime $$$p_i$$$ and positive integer $$$f_i$$$.

By definition cubefree $$$b_x$$$ we have $$$f_z \in {0, 1, 2}$$$ for all $$$x, z$$$.

**[1]** If $$$b_x \times b_y$$$ is a cube and you know $$$b_x$$$ then you can easily express $$$b_y$$$ uniquely (based on the theorem above)

**[2]** We also have $$$b_{12} \times b_3$$$ and $$$b_{12} \times b_4$$$ are all cubes.

From **[1]** and **[2]** you can prove that $$$b_3 = b_4$$$.

**Hint 5**

Without loss of generality we can conclude that for any different $$$4$$$ numbers you selected $$$a_x, a_y, a_z, a_t$$$.

Then you can easily prove that $$$b_x = b_y = b_z = b_t$$$ for $$$b_p$$$ is cubefree part of $$$a_p$$$

But still, you can apply the same idea used in extra task **B** to achive better complexity.

Instead of throwing a bunch of math using weird formulas with long chain of theorems and proving stuffs.

We can see the algorithm in the other way then come to the formula.

**What we had done in task B**

We have discussed that in extra task **B** we count things for squarefree.

So we just make a squarefree sieve, and for each fixed squarefree $$$u$$$ we count the number of ways to select $$$p^2$$$.

In order to make it faster, we just reverse the loop, using combinatorics to count $$$u$$$ for each fixed $$$p^2$$$.

For those $$$p > \sqrt{n}$$$ or you can say $$$p^2 > n$$$, the answer is obviously $$$0$$$.

So the complexity is depended on how you count $$$u$$$ for each $$$p^2$$$.

**What we are going to do in task G**

We are discussing that in extra task **G** we count things for cubefree.

So we just make a cubefree sieve, and for each fixed cubefree $$$u$$$ we count the number of ways to select $$$p^3$$$.

In order to make it faster, we just reverse the loop, using combinatorics to count $$$u$$$ for each fixed $$$p^3$$$.

For those $$$p > \sqrt[3]{n}$$$ or you can say $$$p^3 > n$$$, the answer is obviously $$$0$$$.

So the complexity is depended on how you count $$$u$$$ for each $$$p^3$$$.

So now we come for the formula:

**Defining stuffs**

Let $$$f_k^3(n)$$$ is the number of set $$$(a_1, a_2, \dots, a_k, n)$$$ that $$$1 \leq a_1 < a_2 < \dots < a_k \leq n$$$ and $$$(a_1, a_2, \dots, a_k, n)$$$ is a $$$(k+1)$$$-term geometric progression in 3 dimensional space.

Let $$$g_k^3(n)$$$ is the number of set $$$(a_1, a_2, \dots, a_k, n)$$$ that $$$1 \leq a_1 \leq a_2 \leq \dots \leq a_k \leq n$$$ and $$$(a_1, a_2, \dots, a_k, n)$$$ is a $$$(k+1)$$$-term geometric progression in 3 dimensional space.

Let $$$F_k^3(n)$$$ once again, is our answer.

Let $$$s_k^3(n)$$$ is the number of way to choose $$$p^3$$$ among those $$$k$$$ numbers when you fix cubefree $$$u$$$ (though we are doing in reverse).

**Hint 1**

What is the differences when you count cubefree instead of squarefree ?

**Hint 2**

$$$s_k^3(n) = C^{n-1}_{k-1}$$$.

Yes, just that simple.

It just count the number of way to select set of size $$$k$$$ among those possibility of choosing $$$p^3$$$.

The formula is not depended by squarefree, cubefree stuffs but only the parameters changes (the "$$$n$$$") ins the formula.

**Hint 3**

$$$F_k^3(p) = \overset{n}{\underset{p=1}{\Large \Sigma}} f_k^3(p)$$$

Just the same way they does, summarize those $$$f_k^3(p)$$$ means you make every possibilities to fix $$$p^3$$$ among those you can fix.

The calculation is not overlap each other, as for each $$$f_k^3(p)$$$, the sequence is unique to each other as its alway end at $$$p$$$.

**Hint 4**

$$$f_k^3(n) = F_k^3(n) - F_k^3(n-1) = s_k^3(g_k^3(n)) = \underset{d^3 | n}{\Large \Sigma} (\mu \times s_k^3)(d) = \underset{d^3 | n}{\Large \Sigma} \underset{p | d}{\Large \Sigma} \mu(p) \times s_k^3\left(\frac{d}{p}\right)$$$

Well, the formula isnt that much difference.

The only thing changed is that we try to fix $$$d^3$$$ instead of $$$d^2$$$.

The $$$\mu(n)$$$ function is not depended on squarefree or cubefree, so it shouldnt be in higher dimensional Möbius function.

Even if it is expected to somewhat related to calculate squarefree, it is not how we use in the formula as **it just appear when you swap two for loop using dirichlet convolution.**

**Hint 4**

$$$F_k^3(n) = \overset{n}{\underset{p=1}{\Large \Sigma}} f_k^3(p)$$$

$$$= F_k^3(0) + \overset{\left \lfloor \sqrt[3]{n} \right \rfloor}{\underset{d = 1}{\Large \Sigma}}(\mu \times s_k^3)(d) \times \left \lfloor \frac{n}{d^3} \right \rfloor$$$

$$$= \overset{\left \lfloor \sqrt[3]{n} \right \rfloor}{\underset{d = 1}{\Large \Sigma}} \left \lfloor \frac{n}{d^3} \right \rfloor \times \left ( \underset{p | d}{\Large \Sigma} \mu(p) \times s_k^3\left(\frac{d}{p}\right) \right)$$$

So now we got the formula, just apply it to the problem

**Bonus**

You can also reduce $$$O(\log n)$$$ factor to $$$O(\log \log n)$$$ too.

**k = 3**

Yet in task $$$k = 3$$$ things got differently.

By the theorem used above that if $$$b_x \times b_y$$$ is a cube and $$$b_x, b_y$$$ are cubefree numbers. Then if you know $$$b_x$$$ you can always find one and only one $$$b_y$$$.

**Which part you are talking about ????**

Let $$$a_x = b_x \times c_x^3$$$ for $$$b_x$$$ cubefree and $$$c_x$$$ is some integer.

The fundamental theorem of arithmetic states that every positive integer can be represented in exactly one way apart from rearrangement as a product of prime with positive integer power hence we can express number as prime factorization form as:

$$$b_x = p_0^{f_0} \times p_1^{f_1} \times \dots$$$ for unique prime $$$p_i$$$ and positive integer $$$f_i$$$.

By definition cubefree $$$b_x$$$ we have $$$f_z \in {0, 1, 2}$$$ for all $$$x, z$$$.

If $$$b_x \times b_y$$$ is a cube and you know $$$b_x$$$ then you can easily express $$$b_y$$$ uniquely (based on the theorem above)

So instead of a brute-forces, you can fix $$$2$$$ numbers and then you factorize it to the form of cubefree number.

Yet you can know the number of ways to select $$$b_x \times b_y$$$ with precalculation.

Then you can just iterate all over $$$b_z$$$ and count the total of ways to select $$$b_x \times b_y\times b_z$$$.

Since they are ordered, you must becareful, you can update new $$$b_x \times b_y (x < y)$$$ exists whenever your $$$z > y$$$.

#### Implementation

**k<3::O(1) || k>3::O(n) || k=3::O(n^2 log n) solution**

```
int cnt[LIM];
vector<int> appear[LIM];
int solveG(int n, int k)
{
linear_sieve(n * n);
for (int i = 1; i <= n; ++i)
appear[i].clear();
for (int i = 1; i <= n; ++i)
{
for (int j = i + 1; j <= n; ++j)
{
int u = 1;
for (int x = i * j; x > 1; )
{
int a = lpf[x], b = a * a * a;
for (; x % b == 0; x /= b);
for (; x % a == 0; x /= a) u *= a;
}
appear[j + 1].push_back(u);
}
}
long long res = 0;
memset(cnt, 0, sizeof(cnt[0]) * (n * n + 1));
for (int i = 1; i <= n; ++i)
{
for (int u : appear[i]) ++cnt[u];
int v = 1;
for (int x = i; x > 1; )
{
int a = lpf[x], b = a * a * a;
for (; x % b == 0; x /= b);
if (x % (a * a) == 0)
{
x /= a * a;
v *= a;
}
else if (x % a == 0)
{
x /= a;
v *= a * a;
}
}
res += cnt[v];
}
res %= MOD;
return res;
}
bool is_cubefree[LIM + 10];
int solve(int n, int k)
{
if (k == 0) return 0;
if (k == 1) return n;
if (k == 2) return (1LL * n * (n - 1) / 2) % MOD;
if (k == 3) return solveG(n, k);
memset(is_cubefree, true, sizeof(is_cubefree[0]) * (n + 1));
precal_nck(n);
long long res = 0;
for (int i = 1, j; i <= n; ++i) if (is_cubefree[i])
{
for (j = 1; i * j * j * j<= n; ++j)
{
is_cubefree[i * j * j * j] = false;
}
res += nck(j - 1, k);
}
res %= MOD;
return res;
}
```

**k<3::O(1) || k>3::O(cbrt n log cbrt n) || k=3::Õ(n) but practically O(n^(0.59)) for small n**

```
vector<int> divisor[LIM];
int solveG(int n, int k)
{
linear_sieve(n);
for (int i = 1; i <= n; ++i)
divisor[i].clear();
for (int j = 1; j <= n; ++j)
{
int x = 1;
for (int t = j; t > 1; )
{
if (x > n) break;
int a = lpf[t];
ll b = 1LL * a * a * a;
while (t % b == 0)
{
t /= b;
if (1LL * x * a > n) goto skip;
x *= a;
}
if (t % a == 0)
{
if (1LL * x * a > n) goto skip;
x *= a;
do t /= a; while (t % a == 0);
}
}
for (int i = x; i <= n; i += x)
divisor[i].push_back(j);
skip:{};
}
int res = 0;
for (int i = 1; i <= n; ++i)
{
ll t = 1LL * i * i * i;
for (int x = 0; x < divisor[i].size(); ++x)
{
for (int y = x + 1; y < divisor[i].size(); ++y)
{
int a = divisor[i][x];
int b = divisor[i][y];
int c = t / (1LL * a * b);
res += (n >= c && c > b && 1LL * a * b * c == 1LL * i * i * i);
}
}
}
return res;
}
int res[LIM + 10];
int solve(int n, int k)
{
if (k == 0) return 0;
if (k == 1) return n;
if (k == 2) return (1LL * n * (n - 1) / 2) % MOD;
if (k == 3) return solveG(n, k);
int t = ceil(cbrt(n) + 1) + 1;
linear_sieve(t);
precal_nck(t);
precal_div(t);
long long res = 0;
for (int d = 1; d * d * d <= n; ++d) /// For each fixed p^2
{
long long sum = 0;
for (int p : divisors[d]) /// For each (p | d)
sum += mu[d / p] * nck(p - 1, k - 1);
sum %= MOD;
res += sum * (n / (d * d * d));
}
res %= MOD;
}
```

**k<3::O(1) || k>3::O(cbrt n log log cbrt n) || k=3::Õ(n) but practically fast**

```
int ceil_sqrt(ll x)
{
int t = sqrt(x);
while (1LL * t * t > x) --t;
while (1LL * t * t < x) ++t;
return t;
}
#define sz(x) int((x).size())
#define all(x) (x).begin(), (x).end()
#define rall(x) (x).rbegin(), (x).rend()
#define lb(x, v) lower_bound(all(x), v) - (x).begin()
#define ub(x, v) upper_bound(all(x), v) - (x).begin()
ll scm[LIM];
vector<int> divisor[LIM];
int solveG(int n, int k)
{
linear_sieve(n);
fill_n(scm, n + 1, 1);
for (int i = 1; i <= n; ++i)
divisor[i].clear();
for (int p : prime)
{
ll q = 1LL * p * p * p;
for (int t = p; ; t *= q)
{
for (int i = t; i <= n; i += t)
scm[i] = (scm[i] > n / p) ? n + 1 : scm[i] * p;
if (1LL * t > n / q) break;
}
}
for (int j = 1; j <= n; ++j)
for (int i = scm[j]; i <= n; i += scm[j])
divisor[i].push_back(j);
int res = 0;
for (int i = 1; i <= n; ++i)
{
ll t = 1LL * i * i * i;
for (int x = 0; x < divisor[i].size(); ++x)
{
int a = divisor[i][x];
int v = ceil_sqrt(t / a);
int y0 = (divisor[i].back() < v) ? sz(divisor[i]) : lb(divisor[i], v);
for (int y = y0 - 1; y > x; --y)
{
int b = divisor[i][y];
int c = t / (1LL * a * b);
if (c > n) break;
res += (1LL * a * b * c == t);
}
}
}
return res;
}
int res[LIM + 10];
int solve(int n, int k)
{
if (k == 0) return 0;
if (k == 1) return n;
if (k == 2) return (1LL * n * (n - 1) / 2) % MOD;
if (k == 3) return solveG(n, k);
int t = ceil(sqrt(n) + 1) + 1;
linear_sieve(t);
precal_nck(t);
memset(res, 0, sizeof(res[0]) * (t + 1));
for (int d = 1; d * d * d <= n; ++d)
res[d] = nck(d - 1, k - 1);
for (int p : prime)
for (int d = t / p; d > 0; --d)
res[d * p] -= res[d];
long long ans = 0;
for (int d = 1; d * d * d <= n; ++d)
ans += res[d] * (n / (d * d * d));
ans %= MOD;
return ans;
}
```

#### Complexity

**k < 3**

The complexity is obviously $$$O(1)$$$ and we cant optimize further more lol.

**k > 3**

In the first implementation, we use a cubefree sieve in $$$O(n)$$$ and precalculatings for bionomial coefficients is also $$$O(n)$$$.

In the second and third implementation. We just apply the idea we used in task **B**.

**k = 3**

In the first implementation as you must using factorization, the cost is $$$O(\log n)$$$ for each number, hence you got that complexity.

**Bonus**

You can reduce the $$$O(\log n)$$$ part to $$$O(\log \log n)$$$.

**Extra Bonus**

You can remove that $$$O(\log \log n)$$$ part too.

**Secondary Bonus**

If the problem is mixed with task **J**, then it might be able to improve furthur more, about $$$O(n \sqrt n)$$$ or even linear.

**About the actual complexity or better algorithm related to the first the implementation**

No papers in Russian, Chinese, English related to this task has been found.

Not a single useful sequence related to this sequence $$$(k = q = 3)$$$ has been found.

Yet this is very hard and specific problem related deep into number theory even with tricks related GEF, DFT, FFT, NTT as recommended.

Yet for the second implementations things gone crazy.

**Precalculation part**

The complexity is $$$O\left( \underset{p=1}{\Large \Sigma} \left \lfloor \frac{n}{f(p)} \right \rfloor \right)$$$ for $$$f(p) = min(x \times p | x \times p \text{ is cube and } x \in \mathbb{N}^*)$$$

Ignore the fact that having $$$O(\log x)$$$ of factorizing but can be optimized to ignore that the part

**Calculation part**

The complexity is $$$O\left( \underset{p=1}{\Large \Sigma} d(p)^2 \right)$$$ for $$$d(p)$$$ is the number of divisors of $$$p^3$$$ that in $$$[1, n]$$$.

Though it might be possible to improve this from $$$O(d(p)^2)$$$ to $$$O(d(p)^{1 + ε})$$$

It has unprovable complexity, though I tried to search for papers, and blogs, even with the help of some GMs I cant find nothing good enough to claim its real complexity.

Yet it might be not the real complexity but also kind of an illusion assumptioning high constant and faking its real higher complexity.

**Fast running time**

I am making the assumption that the running time is $$$(a\times n^b)$$$ for some constant $$$a, b$$$ as $$$n$$$ approach $$$10^8$$$.

Tested for thousands of samples upto $$$n = 10^{8}$$$, using Power Progression Function Algorithm, the power $$$b$$$ seems to be reduced for larger and larger $$$n$$$ as it approach $$$0.5$$$.

Yet for samples upto $$$n = 10^8$$$ it seems to be $$$(225.014140682692 * x^{0.590337564727864})$$$ but I am not really sure.

**Though I said there is no useful papers, there are still some blogs might be possible to apply to find its real complexity**

**Bonus**

It seems possible to be $$$(a \times \sqrt{n})$$$ running time with heavy optimization and far deep into the world of number theory to prove.

Needed significant more time and work though.

**Real Complexity**

And the third implementation have a bit optimization on complexity

**Precalculation part**

$$$O\left( \underset{p=1}{\Large \Sigma} \left \lfloor \frac{n}{f(p)} \right \rfloor + \underset{\text{prime } p \leq n}{\Large \Sigma}\ \ \underset{p^{3u+1} \leq n^3}{\Large \Sigma} \left \lfloor \frac{n^3}{p^{3u+1}} \right \rfloor \right)$$$

No $$$O(\log n)$$$ factor right now

**Calculation part**

$$$O\left(\overset{n}{\underset{p = 1}{\Large \Sigma}} \underset{a | p^3}{\Large \Sigma} \left(g(a) + \log\left(\frac{p^3}{a}\right)\right) \right)$$$ for $$$g(a) \leq d(a)$$$ is the number of $$$b \times c$$$ that $$$(b|p^3)$$$ and $$$(a < b \leq c \leq n)$$$

Kinda $$$d(p)^2 \rightarrow d(p)^{1+\epsilon}$$$ now

Yet it is still hard to find the complexity under the form of $$$O(n \log^k n)$$$

## Solution when you are given n and queries for k

Extra task

H

#### Problem

Given $$$n$$$ you have to answer for queries of $$$k$$$ $$$(1 \leq k \leq n \leq 10^9)$$$, count the number of array $$$a[]$$$ of size $$$k$$$ satisfied

- $$$1 \leq a_1 < a_2 < \dots < a_k \leq n$$$
- $$$a_i \times a_j$$$ is perfect square $$$\forall 1 \leq i < j \leq k$$$

#### Idea

**Simplest idea**

You can just run $$$k$$$ times squareroot sieve for $$$O(nk)$$$ complexity.

You can also reuse task $$$B$$$ and make $$$k$$$ queries on it for $$$O(k \sqrt n \log \log \sqrt{n})$$$.

**Observation**

Let $$$cnt_u$$$ is the number of valid $$$p^2$$$ to choose for a fixed squarefree $$$u$$$.

Let $$$c_x$$$ is the number of squarefree $$$u$$$ having $$$cnt_u = x$$$.

You should precalculate $$$c_x$$$ for all $$$x = 1 \leq \sqrt{n}$$$, this can be done in linear with squarefree sieve.

And then you can query for $$$k$$$ easily

**The formula**

$$$taskH(k) = \underset{x = 1}{\Large \Sigma} c_x \times \binom{x}{k}$$$

Also dont forget to take modulo.

#### Implementation

**O(n) precalculate - O(sqrt n - k) query**

```
vector<int> valid;
int c[SQRT_LIM];
bool is_squarefree[LIM];
void precal(int n)
{
int t = ceil(sqrt(n) + 0.5);
memset(c, 0, sizeof(c[0]) * (t + 1));
memset(is_squarefree, true, sizeof(is_squarefree[0]) * (n + 1));
for (int i = 1, j; i <= n; ++i) if (is_squarefree[i])
{
for (j = 1; i * j * j <= n; ++j)
is_squarefree[i * j * j] = false;
++c[j - 1];
}
precal_nck(t);
valid.clear();
for (int i = t; i >= 1; --i) if (c[i])
valid.push_back(i);
}
int query(int k)
{
long long res = 0;
for (int x : valid)
{
if (x < k) break;
res += 1LL * c[x] * nck(x, k);
res %= MOD;
}
return res;
}
```

**O(sqrt n) precalculate - O(sqrt (n/k) log log sqrt(n/k) + sqrt n - k) query**

```
vector<int> valid;
int cnt[SQRT_LIM];
bool is_squarefree[LIM];
long long res[SQRT_LIM + 10];
int global_t, global_n;
void precal(int n)
{
global_n = n;
global_t = ceil(sqrt(n) + 0.5);
linear_sieve(global_t);
precal_nck(global_t);
}
int query(int k)
{
if (k > global_t) return 0;
memset(res + k, 0, sizeof(res[0]) * (global_t - k + 1));
for (int d = k; d * d <= global_n; ++d)
res[d] = nck(d - 1, k - 1);
for (int p : prime)
for (int d = global_t / p; d >= k; --d)
res[d * p] -= res[d];
long long ans = 0;
for (int d = k; d <= global_t; ++d)
ans += res[d] * (global_n / (d * d));
ans %= MOD;
return ans;
}
```

#### Complexity

**The first implementation**

Notice that $$$cnt_u \leq \sqrt{\frac{n}{u}}$$$, so there are $$$O(\sqrt{n})$$$ unique value.

There is actually an optimization in the code that ignore all the counting related to zero as they dont contribute to the answer.

Also for $$$x < k$$$ we have $$$\binom{x}{k} = 0$$$.

So you can make the query it a little bit faster, and $$$O(1)$$$ for $$$k \gtrapprox \sqrt{n}$$$ and $$$O(\sqrt{n} - k)$$$ for general.

**The second implementation**

As you already know, there is only $$$O(\sqrt{n})$$$ unique number $$$p^2$$$.

So you can just precalculate the prime and binomial coefficient in $$$O(\sqrt{n})$$$.

Also for $$$x < k$$$ we have $$$0 \leq res_x \leq \binom{x}{k} = 0 \Rightarrow res_x = 0$$$.

So we can ignore those part below to achive $$$O(1)$$$ for $$$k > \sqrt{n}$$$ and $$$O\left(\sqrt{\frac{n}{k}} \log \log \sqrt{\frac{n}{k}} + \sqrt{n} - k\right)$$$ for general.

## Contribution

_SS for pointing out the linear complexity of squarefree sieve.

clyring for fixing typos, and the approach for tasks

**A, B, C, D, E, G, H, J**.errorgorn for adding details, and the approach for task

**F, J, M, O**, better complexity for**C, E, G**.cuom1999 for participating $$$O(n^2)$$$ approach for problem

**G**.ta4p for participating approach related to factorize $$$p^3$$$ into $$$3$$$ product partions in problem

**G**though failed to achieve better complexity**(editted: confirmed that the complexity seems to be better now)**.KazamaHoang, jalsol for combinatorics calculation and the proof of stars and bars in task

**C**.Editorial Slayers Team Lyde DeMen100ns Duy_e OnionEgg QuangBuiCP _FireGhost_ Shironi for reviewing, fixing typos and feed backs.

Regarding task G with $$$k=3$$$: It looks like your second implementation is rather close to what I described in my last message. I mentioned then already that it was $$$O(n^{1+\varepsilon})$$$; the reasoning is straightforward. Letting $$$\sigma_0(i)$$$ be the number of divisors of $$$i$$$, the runtime is clearly bounded by $$$\sum_{i=1}^n \sigma_0(i^3)^2$$$, and it's well-known that the divisor-counting function is $$$O(n^\varepsilon)$$$ for any $$$\varepsilon > 0$$$, so that $$$\sum_{i=1}^n \sigma_0(i^3)^2 \leq \sum_{i=1}^n C_{\varepsilon}^6 i^{6 \varepsilon} \in O(n^{1 + 6 \varepsilon})$$$.

I correctly guessed it "may even be $$$\tilde{O}(n)$$$"; the analysis is interesting, but the poly-log factor hidden in the $$$\tilde{O}$$$-notation seems very large. First up: The preprocessing step. Temporarily borrowing your notation, I was able to show precisely that

Proof sketch for upper boundThe parts performing factorization of the variable

`j`

are easily seen to total to $$$\Theta(n \log{\log{n}})$$$ and can be ignored. The limiting factor is the loop`for (int i = x; i <= n; i += x) divisor[i].push_back(j);`

. The variable`x`

ends up as the smallest positive integer whose cube is divisible by`j`

. So, each value $$$x$$$ is used once for each value`j`

in`[1..n]`

which is a factor of $$$x^3$$$ which is not a factor of $$$d^3$$$ for any divisor of $$$x$$$. Ignoring the`[1..n]`

restriction on`j`

, let $$$h(x)$$$ be the number of times a value $$$x$$$ can be used. Then, $$$\sum_{d|x} h(d) = \sigma_0(x^3)$$$, so that by Möbius inversion, $$$h = \mu * (i \mapsto \sigma_0(i^3))$$$ and is therefore a multiplicative function with $$$h(x) = 3^{\omega(x)}$$$, where $$$\omega(x)$$$ is the number of distinct prime factors of $$$x$$$.The runtime is bounded above by $$$n \sum_{x=1}^n \frac{h(x)}{x}$$$. But since $$$h$$$ is multiplicative, this series can be estimated via an Euler product. Specifically,

for some appropriate absolute constants $$$C_1$$$ and $$$C_2$$$. Multiply by $$$n$$$ and this is exactly the desired upper bound.

Proof sketch for lower boundThe only steps in the upper bound which are at all inefficient are:

The first of these is easy to adjust for:

Unable to parse markup [type=CF_MATHJAX]

implies $$$1 \leq j \leq n$$$, and allows us to use the nice multiplicative function $$$h$$$ for the lower bound while only losing at worst a factor of $$$3^3$$$. The second one is tricky; the idea is use Markov's inequality on $$$\ln{x}$$$ to show that some constant positive fraction of the weight in the sumoccurs for $$$x \leq n^4$$$. This can be used to justify using the Euler product on primes less than $$$n^{1/12}$$$, which will provide a good enough lower bound.

I will likely provide more detail on the Markov inequality step at a later date, when I have time to do so.

From this it trivially follows that the main loop takes at least $$$\Omega(n \cdot (\log{n})^6)$$$ time to run. This happens if the $$$\Theta(n \cdot (\log{n})^3)$$$ relevant cube-divisors are reasonably evenly distributed among the $$$n$$$ buckets, but it is quite plausible that this is not the case and the main loop is actually slower. It seems much harder to precisely estimate the complexity of the main loop; the best upper bound I have come up with so far is $$$O(n \cdot (\log{n})^{16})$$$. This comes from applying the same Euler product idea to the multiplicative function $$$i \mapsto \frac{\sigma_0(i^3)^2}{i}$$$, where we have $$$\sum_{i=0}^{\infty} \frac{\sigma_0(p^{3i})^2}{p^i} = 1 + \frac{16}{p} + O(\frac{1}{p^2})$$$, hence the strange 16 exponent. This seems likely to be inefficient for several reasons, not least of which is that applying this direct approach to $$$i \mapsto \frac{\sigma_0(i^3)}{i}$$$ for the preprocess step would give only an $$$O(n \cdot (\log{n})^4)$$$ upper bound, which I already know isn't optimal. Improvement ideas welcome.

EDIT: I am almost sure techniques to estimate this more accurately than I have already exist in the literature. A good first place to look might be the references at oeis:A061502.

Amazing work on those analyses. Though the real complexity is still unknown and bounded by $$$O(n \log ^ 4 n)$$$, yet I never expected it to be this fast.

In the calculation part, we factorize $$$p^3 = a \times b \times c$$$, and we can have 2 optimization options:

To fix $$$a$$$ then calculate $$$b \times c$$$ but limiting the bound by using binary search or something similar.

$$$O\left(\overset{n}{\underset{p = 1}{\Large \Sigma}} \underset{a | p^3}{\Large \Sigma} \left(g(a) + \log\left(\frac{p^3}{a}\right)\right) \right)$$$ for $$$g(a)$$$ is the number of satisfied $$$b \times c$$$

Or to go through the divisor of $$$\frac{p^3}{a}$$$ itself

$$$O\left(\overset{n}{\underset{p = 1}{\Large \Sigma}} d(a) \times d\left(\frac{p^3}{a}\right) \right)$$$, yet this should be harder to implement without increasing precalculation time by around $$$O(d(n^3))$$$

It should be somewhat also reduce some amount of $$$O(\log)$$$ factors in calculation part too.

Optimizing the pre-computation time isn't all that interesting, since the main loop is dominant.

This paper uses a second-order approximation of $$$\sum_{i=1}^n \frac{f(i)}{i}$$$ for certain multiplicative functions $$$f$$$ and Abel's summation formula to cancel out the largest-order term in my upper bound technique and get a precise estimate on the growth of $$$\sum_{i=1}^n f(i)$$$. Its theorem 1 is directly applicable and shows that $$$\sum_{i=1}^n \sigma_0(i^3) \in \Theta(n \cdot (\log{n})^3)$$$ and $$$\sum_{i=1}^n \sigma_0(i^3)^2 \in \Theta(n \cdot (\log{n})^{15})$$$. The differences between these and the runtimes of the pre-computation loop and main loop respectively come from the

`[1..n]`

restriction on factors considered, but this does not gain more than a constant factor. In the former case, this was already shown by my previous comment. For the latter, it is possible to generalize the same idea to pairs of divisors.SpoilerThe innermost loop runs once for every triple $$$(i, x, y)$$$ with $$$1 \leq i \leq n$$$, $$$1 \leq x < y \leq n$$$, $$$x | i^3$$$, and $$$y | i^3$$$. Ignoring the $$$x < y$$$ requirement will little more than double the number of possible triples, but will make later calculations much easier. It is easy to see that for any pair $$$(x, y)$$$, the triple $$$(i, x, y)$$$ is valid if and only if $$$i$$$ is a multiple of the smallest positive integer whose cube is a multiple of both $$$x$$$ and $$$y$$$. Call a triple $$$(i, x, y)$$$ minimal if no triple $$$(j, x, y)$$$ with $$$j < i$$$ is valid. Then, the number of minimal triples with a given value of $$$i \leq \sqrt[3]{n}$$$ is exactly $$$(\mu * (j \mapsto \sigma_0(j^3)^2))(i)$$$. Call this value $$$h(i)$$$. Then, the number of total valid triples is at least $$$\sum_{i=1}^{\sqrt[3]{n}} \lfloor \frac{n}{i} \rfloor h(i)$$$, which in turn is at least $$$\frac{n}{2} \sum_{i=1}^{\sqrt[3]{n}} \frac{h(i)}{i}$$$. As a Dirichlet convolution of two multiplicative functions, $$$h$$$ is itself multiplicative.

As before, the idea is now to estimate this sum with the Euler product

where $$$p_j$$$ is the $$$j$$$-th prime number, and $$$s$$$ controls the number of primes to use. Now, imagine $$$x$$$ is a random variable taking positive integer values with $$$P(x = i) \propto \frac{h(i)}{i}$$$ for $$$i$$$ with all prime factors at most $$$p_s$$$ and $$$P(x = i) = 0$$$ otherwise. By the Euler product factorization idea, $$$x$$$ is distributed as a product of $$$s$$$ independent random variables $$$y_j$$$, each one taking values on the powers of the prime $$$p_j$$$, with $$$P(y_j = p_j^{\alpha}) \propto \frac{h(p_j^{\alpha})}{p_j^{\alpha}}$$$. So, the mean of $$$\ln{y_j}$$$ is given by

for some absolute constant $$$C_1$$$. Now add up over all values of $$$j$$$ and apply the prime number theorem to get

So, if $$$s$$$ is chosen to be approximately $$$n^{\frac{1}{6C_1}}$$$, Markov's inequality on $$$\ln{x}$$$ will give that $$$P(x > \sqrt[3]{n}) \leq 0.5$$$, so that

for appropriate absolute constants $$$C_2$$$, $$$C_2$$$, and $$$C_4$$$, which is the desired bound.

So, the second implementation is in fact $$$\Theta(n \cdot (\log{n})^{15})$$$. It's obviously possible to lower the number of log factors by a decent amount. Considering just factors of $$$\frac{i}{x}$$$ should get it down to $$$\Theta(n \cdot (\log{n})^9)$$$, since there are 10 ways to partition 3 copies of a prime among 3 factors $$$x$$$, $$$y$$$, $$$z$$$.

Can I ask about $$$O(taskG(k=3))$$$, as it seems all other approach are not faster then iterate through factors in a clever way.