### SPyofcode's blog

By SPyofcode, history, 11 months ago, ## The statement:

Given three integers $n, k, p$, $(1 \leq k \leq n < p)$.

Count the number of array $a[]$ of size $k$ that satisfied

• $1 \leq a_1 < a_2 < \dots < a_k \leq n$
• $a_i \times a_j$ is perfect square $\forall 1 \leq i < j \leq k$

Since the result can be big, output it under modulo $p$.

For convenient, you can assume $p$ is a large constant prime $10^9 + 7$

Notice that in this blog, we will solve for generalized harder variants

For original problem you can see in this blog [Tutorial] An interesting counting problem related to square product

These are harder variants, and generalization from the original problem. You can see more detail here

*Marked as solved only if tested with atleast $10^6$ queries

Solved A: Can we also use phi function or something similar to solve for $k = 2$ in $O(\sqrt{n})$ or faster ?

Solved B: Can we also use phi function or something similar to solve for general $k$ in $O(\sqrt{n})$ or faster ?

Solved C: Can we also solve the problem where there can be duplicate: $a_i \leq a_j\ (\forall\ i < j)$ and no longer $a_i < a_j (\forall\ i < j)$ ?

Solved D: Can we solve the problem where there is no restriction between $k, n, p$ ?

Solved E: Can we solve for negative integers, whereas $-n \leq a_1 < a_2 < \dots < a_k \leq n$ ?

Solved F: Can we solve for a specific range, whereas $L \leq a_1 < a_2 < \dots < a_k \leq R$ ?

Solved G: Can we solve for cube product $a_i \times a_j \times a_t$ effectively ?

H: Can we solve if it is given $n$ and queries for $k$ ?

I: Can we solve if it is given $k$ and queries for $n$ ?

J: Can we also solve the problem where there are no order: Just simply $1 \leq a_i \leq n$ ?

K: Can we also solve the problem where there are no order: Just simply $0 \leq a_i \leq n$ ?

M: Can we solve for $q$-product $a_{i_1} \times a_{i_2} \times \dots \times a_{i_q} = x^q$ (for given constant $q$) ?

N: Given $0 \leq \delta \leq n$, can we also solve the problem when $1 \leq a_1 \leq a_1 + \delta + \leq a_2 \leq a_2 + \delta \leq \dots \leq a_k \leq n$ ?

O: What if the condition is just two nearby elements and not all pairs. Or you can say $a_i \times a_{i+1} \forall 1 \leq i < n$ is a perfect square ?

## A better solution for k = 2

Easy Version
Hard Version

Example 1
Example 2
Example 3

Observation
Definition
Property
Formula

#### Implementation

O(sqrt n log log sqrt n) solution
O(sqrt) solution
Hint

## A better solution for general k

#### Problem

Very Easy Version
Easy Version
Hard Version

Example 1
Example 2
Example 3

Definition
The formula

#### Implementation

O(sqrt n log sqrt n)
O(sqrt log log sqrt n)

But while doing research for task H, I found an improvement

O(sqrt (n/k) log log sqrt(n/k) - k) solution

#### Complexity

The first implementation
The second implementation
The third implementation

## Solution for duplicates elements in array

#### Problem

Given $k, n (1 \leq k \leq n \leq 10^9)$, count the number of array $a[]$ of size $k$ that satisfied

• $1 \leq a_1 \leq a_2 \leq \dots \leq a_k \leq n$
• $a_i \times a_j$ is perfect square $\forall 1 \leq i < j \leq k$

Since the result can be big, output it under modulo $10^9 + 7$.

Observation
Calculation

#### Implementation

O(n) solution
O(sqrt n log sqrt n + k) solution
O(sqrt n log log sqrt n + k) solution

#### Complexity

The first implementation
The second and third implementation

## Solution when there are no restriction between k, n, p

#### Problem

Given $k, n, p (1 \leq k, n, p \leq 10^9)$, count the number of array $a[]$ of size $k$ that satisfied

• $1 \leq a_1 < a_2 < \dots < a_k \leq n$
• $a_i \times a_j$ is perfect square $\forall 1 \leq i < j \leq k$

Since the result can be big, output it under modulo $p$.

Observation
Large prime p

#### Implementation

O(n) for prime p > max(n, k)
O(n log mod + sqrt(mod)) for prime p or squarefree p

Spoiler

## Solution when numbers are also bounded by negative number

#### Problem

Given $k, n (1 \leq k \leq n \leq 10^9)$, count the number of array $a[]$ of size $k$ that satisfied

• $-n \leq a_1 < a_2 < \dots < a_k \leq n$
• $a_i \times a_j$ is perfect square $\forall 1 \leq i < j \leq k$

Since the result can be big, output it under modulo $10^9 + 7$.

#### Idea

Hint
With duplicates case

#### Implementation

O(sqrt n log log sqrt n) when the numbers are unique

And for duplicates (mixed with task C), we have:

O(kn) = O(n^2)
O(k sqrt n log sqrt n) = O(n sqrt n log n)
O(k sqrt n log log sqrt n) = O(n sqrt n log log n)
O(k sqrt n + sqrt n log log sqrt n) = O(n sqrt n)
O(k + sqrt n log log sqrt n) = O(n)

Spoiler

Spoiler

## Solution when numbers are also bounded by a specific range

#### Problem

Given $k, L, R (1 \leq k, L, R \leq 10^9)$, count the number of array $a[]$ of size $k$ that satisfied

• $L \leq a_1 < a_2 < \dots < a_k \leq R$
• $a_i \times a_j$ is perfect square $\forall 1 \leq i < j \leq k$

Since the result can be big, output it under modulo $10^9 + 7$.

Observation
Optimization

#### Implementation

Let $Z = max(|L|, |R|)$

O(Z) time - O(R - L) space
O(R * sqrt(Z) / log(Z)) time - O(R - L) space
O(sqrt R log log R + (R - L)) time - O(R - L) space

Spoiler

## Solution when the product you must find is a perfect cube

#### Problem

Given $k, n (1 \leq k \leq n \leq 10^9)$, count the number of array $a[]$ of size $k$ that satisfied

• $1 \leq a_1 < a_2 < \dots < a_k \leq n$
• $a_i \times a_j \times a_t$ is perfect cube $\forall 1 \leq i < j < t \leq k$

Since the result can be big, output it under modulo $10^9 + 7$.

k < 3
k > 3
k = 3

#### Implementation

k<3::O(1) || k>3::O(n) || k=3::O(n^2 log n) solution
k<3::O(1) || k>3::O(cbrt n log cbrt n) || k=3::Õ(n) but practically O(n^(0.59)) for small n
k<3::O(1) || k>3::O(cbrt n log log cbrt n) || k=3::Õ(n) but practically fast

k < 3
k > 3
k = 3

## Solution when you are given n and queries for k

#### Problem

Given $n$ you have to answer for queries of $k$ $(1 \leq k \leq n \leq 10^9)$, count the number of array $a[]$ of size $k$ satisfied

• $1 \leq a_1 < a_2 < \dots < a_k \leq n$
• $a_i \times a_j$ is perfect square $\forall 1 \leq i < j \leq k$

Simplest idea
Observation

#### Implementation

O(n) precalculate - O(sqrt n - k) query
O(sqrt n) precalculate - O(sqrt (n/k) log log sqrt(n/k) + sqrt n - k) query

#### Complexity

The first implementation
The second implementation

## Contribution

• _SS for pointing out the linear complexity of squarefree sieve.

• clyring for fixing typos, and the approach for tasks A, B, C, D, E, G, H, J.

• errorgorn for adding details, and the approach for task F, J, M, O, better complexity for C, E, G.

• cuom1999 for participating $O(n^2)$ approach for problem G.

• ta4p for participating approach related to factorize $p^3$ into $3$ product partions in problem G though failed to achieve better complexity (editted: confirmed that the complexity seems to be better now).

• KazamaHoang, jalsol for combinatorics calculation and the proof of stars and bars in task C.

• Editorial Slayers Team Lyde DeMen100ns Duy_e OnionEgg QuangBuiCP _FireGhost_ Shironi for reviewing, fixing typos and feed backs. phi, Comments (4)
 » 11 months ago, # | ← Rev. 3 →   Regarding task G with $k=3$: It looks like your second implementation is rather close to what I described in my last message. I mentioned then already that it was $O(n^{1+\varepsilon})$; the reasoning is straightforward. Letting $\sigma_0(i)$ be the number of divisors of $i$, the runtime is clearly bounded by $\sum_{i=1}^n \sigma_0(i^3)^2$, and it's well-known that the divisor-counting function is $O(n^\varepsilon)$ for any $\varepsilon > 0$, so that $\sum_{i=1}^n \sigma_0(i^3)^2 \leq \sum_{i=1}^n C_{\varepsilon}^6 i^{6 \varepsilon} \in O(n^{1 + 6 \varepsilon})$.I correctly guessed it "may even be $\tilde{O}(n)$"; the analysis is interesting, but the poly-log factor hidden in the $\tilde{O}$-notation seems very large. First up: The preprocessing step. Temporarily borrowing your notation, I was able to show precisely that $\displaystyle \sum_{p=1}^n \left\lfloor\frac{p}{f(p)}\right\rfloor \in \Theta(n \cdot (\log{n})^3).$ Proof sketch for upper boundThe parts performing factorization of the variable j are easily seen to total to $\Theta(n \log{\log{n}})$ and can be ignored. The limiting factor is the loop for (int i = x; i <= n; i += x) divisor[i].push_back(j);. The variable x ends up as the smallest positive integer whose cube is divisible by j. So, each value $x$ is used once for each value j in [1..n] which is a factor of $x^3$ which is not a factor of $d^3$ for any divisor of $x$. Ignoring the [1..n] restriction on j, let $h(x)$ be the number of times a value $x$ can be used. Then, $\sum_{d|x} h(d) = \sigma_0(x^3)$, so that by Möbius inversion, $h = \mu * (i \mapsto \sigma_0(i^3))$ and is therefore a multiplicative function with $h(x) = 3^{\omega(x)}$, where $\omega(x)$ is the number of distinct prime factors of $x$.The runtime is bounded above by $n \sum_{x=1}^n \frac{h(x)}{x}$. But since $h$ is multiplicative, this series can be estimated via an Euler product. Specifically, $\begin{array}{rl} \displaystyle \sum_{x=1}^n \frac{h(x)}{x} & \leq \displaystyle \sum_{\substack{x \in \mathbb{Z}_{>0} \\ \text{all prime factors of } x \text{ are at most } n}} \frac{h(x)}{x} \\ & = \displaystyle \prod_{\substack{\text{primes } p \\ 1 \leq p \leq n}} \sum_{i=0}^{\infty} \frac{h(p^i)}{p^i} \\ & = \displaystyle \prod_{\substack{\text{primes } p \\ 1 \leq p \leq n}} 1 + \frac{3}{p-1} \\ & = \exp{\left(\displaystyle \sum_{\substack{\text{primes } p \\ 1 \leq p \leq n}} \ln{(1 + \frac{3}{p-1})}\right)} \\ & \leq \exp{\left(\displaystyle \sum_{\substack{\text{primes } p \\ 1 \leq p \leq n}} \frac{3}{p} + \frac{C_1}{p^2}\right)} \\ & \leq \exp{\left(\displaystyle 3 \ln{\ln{n}} + C_2\right)} \\ & = (C_2 \ln{n})^3, \end{array}$for some appropriate absolute constants $C_1$ and $C_2$. Multiply by $n$ and this is exactly the desired upper bound. Proof sketch for lower boundThe only steps in the upper bound which are at all inefficient are: Weakening "$1 \leq j \leq n$" to "$1 \leq x \leq n$ in the construction of $h$", and Weakening "$1 \leq x \leq n$" to "$x$ has no prime factors greater than $n$", in the estimation of $\sum_{x=1}^n \frac{h(x)}{x}$. The first of these is easy to adjust for: Unable to parse markup [type=CF_MATHJAX] implies $1 \leq j \leq n$, and allows us to use the nice multiplicative function $h$ for the lower bound while only losing at worst a factor of $3^3$. The second one is tricky; the idea is use Markov's inequality on $\ln{x}$ to show that some constant positive fraction of the weight in the sum $\displaystyle \sum_{\substack{x \in \mathbb{Z}_{>0} \\ \text{all prime factors of } x \text{ are at most } n}} \frac{h(x)}{x}$occurs for $x \leq n^4$. This can be used to justify using the Euler product on primes less than $n^{1/12}$, which will provide a good enough lower bound.I will likely provide more detail on the Markov inequality step at a later date, when I have time to do so.From this it trivially follows that the main loop takes at least $\Omega(n \cdot (\log{n})^6)$ time to run. This happens if the $\Theta(n \cdot (\log{n})^3)$ relevant cube-divisors are reasonably evenly distributed among the $n$ buckets, but it is quite plausible that this is not the case and the main loop is actually slower. It seems much harder to precisely estimate the complexity of the main loop; the best upper bound I have come up with so far is $O(n \cdot (\log{n})^{16})$. This comes from applying the same Euler product idea to the multiplicative function $i \mapsto \frac{\sigma_0(i^3)^2}{i}$, where we have $\sum_{i=0}^{\infty} \frac{\sigma_0(p^{3i})^2}{p^i} = 1 + \frac{16}{p} + O(\frac{1}{p^2})$, hence the strange 16 exponent. This seems likely to be inefficient for several reasons, not least of which is that applying this direct approach to $i \mapsto \frac{\sigma_0(i^3)}{i}$ for the preprocess step would give only an $O(n \cdot (\log{n})^4)$ upper bound, which I already know isn't optimal. Improvement ideas welcome.EDIT: I am almost sure techniques to estimate this more accurately than I have already exist in the literature. A good first place to look might be the references at oeis:A061502.
•  » » 11 months ago, # ^ | ← Rev. 3 →   Amazing work on those analyses. Though the real complexity is still unknown and bounded by $O(n \log ^ 4 n)$, yet I never expected it to be this fast.In the calculation part, we factorize $p^3 = a \times b \times c$, and we can have 2 optimization options:To fix $a$ then calculate $b \times c$ but limiting the bound by using binary search or something similar.$O\left(\overset{n}{\underset{p = 1}{\Large \Sigma}} \underset{a | p^3}{\Large \Sigma} \left(g(a) + \log\left(\frac{p^3}{a}\right)\right) \right)$ for $g(a)$ is the number of satisfied $b \times c$Or to go through the divisor of $\frac{p^3}{a}$ itself$O\left(\overset{n}{\underset{p = 1}{\Large \Sigma}} d(a) \times d\left(\frac{p^3}{a}\right) \right)$, yet this should be harder to implement without increasing precalculation time by around $O(d(n^3))$It should be somewhat also reduce some amount of $O(\log)$ factors in calculation part too.
•  » » » Optimizing the pre-computation time isn't all that interesting, since the main loop is dominant.This paper uses a second-order approximation of $\sum_{i=1}^n \frac{f(i)}{i}$ for certain multiplicative functions $f$ and Abel's summation formula to cancel out the largest-order term in my upper bound technique and get a precise estimate on the growth of $\sum_{i=1}^n f(i)$. Its theorem 1 is directly applicable and shows that $\sum_{i=1}^n \sigma_0(i^3) \in \Theta(n \cdot (\log{n})^3)$ and $\sum_{i=1}^n \sigma_0(i^3)^2 \in \Theta(n \cdot (\log{n})^{15})$. The differences between these and the runtimes of the pre-computation loop and main loop respectively come from the [1..n] restriction on factors considered, but this does not gain more than a constant factor. In the former case, this was already shown by my previous comment. For the latter, it is possible to generalize the same idea to pairs of divisors. SpoilerThe innermost loop runs once for every triple $(i, x, y)$ with $1 \leq i \leq n$, $1 \leq x < y \leq n$, $x | i^3$, and $y | i^3$. Ignoring the $x < y$ requirement will little more than double the number of possible triples, but will make later calculations much easier. It is easy to see that for any pair $(x, y)$, the triple $(i, x, y)$ is valid if and only if $i$ is a multiple of the smallest positive integer whose cube is a multiple of both $x$ and $y$. Call a triple $(i, x, y)$ minimal if no triple $(j, x, y)$ with $j < i$ is valid. Then, the number of minimal triples with a given value of $i \leq \sqrt{n}$ is exactly $(\mu * (j \mapsto \sigma_0(j^3)^2))(i)$. Call this value $h(i)$. Then, the number of total valid triples is at least $\sum_{i=1}^{\sqrt{n}} \lfloor \frac{n}{i} \rfloor h(i)$, which in turn is at least $\frac{n}{2} \sum_{i=1}^{\sqrt{n}} \frac{h(i)}{i}$. As a Dirichlet convolution of two multiplicative functions, $h$ is itself multiplicative.As before, the idea is now to estimate this sum with the Euler product $\displaystyle \prod_{j=1}^s \sum_{\alpha=0}^{\infty} \frac{h(p_j^{\alpha})}{p_j^{\alpha}} = \sum_{\substack{i \in \mathbb{Z}_{>0} \\ \text{all prime factors of } i \text{ are at most } p_s}} \frac{h(i)}{i},$where $p_j$ is the $j$-th prime number, and $s$ controls the number of primes to use. Now, imagine $x$ is a random variable taking positive integer values with $P(x = i) \propto \frac{h(i)}{i}$ for $i$ with all prime factors at most $p_s$ and $P(x = i) = 0$ otherwise. By the Euler product factorization idea, $x$ is distributed as a product of $s$ independent random variables $y_j$, each one taking values on the powers of the prime $p_j$, with $P(y_j = p_j^{\alpha}) \propto \frac{h(p_j^{\alpha})}{p_j^{\alpha}}$. So, the mean of $\ln{y_j}$ is given by $\displaystyle E(\ln{y_j}) = \frac\sum_{\alpha=0}^{\infty} \frac{h(p_j^{\alpha})}{p_j^{\alpha}}\cdot \alpha \cdot \ln{p_j}} \sum_{\alpha=0}^{\infty} \frac{h(p_j^{\alpha})}{p_j^{\alpha}}} \leq \frac{C_1 \ln{p_j}}{p_j}$for some absolute constant $C_1$. Now add up over all values of $j$ and apply the prime number theorem to get $\begin{array}{rl} \displaystyle E(\ln{x}) & \leq \sum_{j=1}^s \frac{C_1 \ln{p_j}}{p_j} \\ & \approx \sum_{j=1}^s \frac{C_1 \ln{j}}{j \cdot \ln{j}} \\ & \approx C_1 \cdot \ln{s}. \end{array}$So, if $s$ is chosen to be approximately $n^{\frac{1}{6C_1}}$, Markov's inequality on $\ln{x}$ will give that $P(x > \sqrt{n}) \leq 0.5$, so that $\begin{array}{rl} \displaystyle \frac{n}{2} \sum_{i=1}^{\sqrt{n}} \frac{h(i)}{i} & \displaystyle \geq \frac{n}{2}\sum_{\substack{i \in \{1, 2, \ldots, \sqrt{n}\} \\ \text{all prime factors of } i \text{ are at most } p_s}} \frac{h(i)}{i} \\ & \displaystyle \geq \frac{n}{4} \sum_{\substack{i \in \mathbb{Z}_{>0} \\ \text{all prime factors of } i \text{ are at most } p_s}} \frac{h(i)}{i} \\ & \displaystyle = \frac{n}{4} \prod_{j=1}^s \sum_{\alpha=0}^{\infty} \frac{h(p_j^{\alpha})}{p_j^{\alpha}} \\ & \displaystyle \geq \frac{n}{4} \exp{\left(\sum_{j=1}^s \ln{\left(\sum_{\alpha=0}^{\infty} \frac{h(p_j^{\alpha})}{p_j^{\alpha}}\right)}\right)} \\ & \displaystyle \geq \frac{n}{4} \exp{\left(\sum_{j=1}^s \frac{15}{p_j} + \frac{C_2}{p_j^2}\right)} \\ & \displaystyle \geq \frac{n}{4} \exp{\left(15 \ln{\ln{s}} + C_3\right)} \\ & \displaystyle = \frac{n}{4} \cdot (\ln{s})^{15} \cdot \exp{C_3} \\ & \geq C_4 \cdot n (\ln{n})^{15}, \end{array}$for appropriate absolute constants $C_2$, $C_2$, and $C_4$, which is the desired bound.So, the second implementation is in fact $\Theta(n \cdot (\log{n})^{15})$. It's obviously possible to lower the number of log factors by a decent amount. Considering just factors of $\frac{i}{x}$ should get it down to $\Theta(n \cdot (\log{n})^9)$, since there are 10 ways to partition 3 copies of a prime among 3 factors $x$, $y$, $z$.
•  » » » » 11 months ago, # ^ | ← Rev. 2 →   Can I ask about $O(taskG(k=3))$, as it seems all other approach are not faster then iterate through factors in a clever way.