Topcoder SRM 712 - Codeforces

7 years ago, # |

Auto comment: topic has been updated by cgy4ever (previous revision, new revision, compare).

→ Reply

7 years ago, # |

+15

Contest will start in 18 hours.

→ Reply

7 years ago, # ^ |

+15

Will start in 4 hours, and that means the registration is open now.

→ Reply

HandIeNdeede

7 years ago, # |

← Rev. 2 →

-23

Looks like the 600 problem is very similar to this problem: LINK

→ Reply

7 years ago, # ^ |

+10

I don't think it's even simillar. The limits are different, and we are calculating "variance", not "mean".

→ Reply

7 years ago, # |

How to solve Div. 2 1000 ?

→ Reply

MiteshAgrawal

7 years ago, # |

Can anyone please explain solution for Div1 600? TIA.

→ Reply

I_love_tigersugar

7 years ago, # ^ |

+10

Let's take a look at a subtree consisting of N nodes whose values are X₁, X₂,..., X_N. The variance of this subtree is $\text{[math]}$ where $\text{[math]}$ . Which means, whenever a node i belongs to a subtree of size N, it contributes to the "final answer" $\text{[math]}$ (here the "final answer" means the sum of variances over all subtrees, not the average one). Hence, iterate over all node i in the tree, the problem now changes to Find the number of subtree of size j containing the node i, and the sum of values of all nodes over all such subtrees.

→ Reply

7 years ago, # ^ |

+10

But S changes when we change subtree. Shouldn't we iterate over pairs of vertices and compute number of subtrees containing both of them?

→ Reply

I_love_tigersugar

7 years ago, # ^ |

Hmmm. I think my idea is appropriate with my thinking flow and the found solution is not hard to implement. Btw, could you tell me more in details about your idea?

→ Reply

7 years ago, # ^ |

← Rev. 2 →

+10

Let's take a look at a subtree consisting of N nodes whose values are X1, X2,..., XN.

You defined S here for each subtree.

whenever a node i belongs to a subtree of size N, it contributes to the "final answer"....

If you mean S = Sigma(Xi), then I think it contradicts with your definition before.

Btw it's good to share solutions, but if you didn't verified it, please let people know that it is not verified. (By "verifying" I mean AC, or something better)

→ Reply

7 years ago, # ^ |

Ok, I didn't notice that what you did was to state a reduction, not a whole solution. This looks good to me, but it still is not trivial how complete the solution (but definitely doable, I did very similar thing)

→ Reply

kingofnumbers

7 years ago, # |

+31

var(X) = E(x^2) — E^2(x)

I realized it too late, I should have given my probabilities classes in college more attention :(

→ Reply

7 years ago, # ^ |

← Rev. 2 →

Same here xD

I would have realized it faster if he didn't mention the other rule :D

→ Reply

yaswanth

7 years ago, # ^ |

Can you explain what is the approach for 1000 pointer?

→ Reply

7 years ago, # ^ |

yaswanth Sure.

I will tell you my idea, but I didn't implement it yet.

We need to sum the variances where a variance of a sequence S = s1, s2, ..., sn = VAR(X) = E[(si)^2] — E^2[si].

So if we have k sequences S1, S2, S3, ... , Sk

The answer will be = ( ∑ (E[Sij^2] — E^2[Sij]) ) / k = (∑ E[Sij^2] — ∑ E^2[Sij]) / k So we need to know four things :

1- To Know all the Sequences we can make.

2- How to compute E[X] for a Sequence.

3- How to compute E[X ^ 2] for a Sequence.

4- The number of Sequences K.

Ok, Let's Sort the Array and call it A. Now for every A[i], Consider all the Sequences that ends with A[i]. Get the smallest j where j <= i and A[i] — A[j] <= R (The index of the smallest element we can include in a set where A[i] is in) and Call that index Left, and call our index i (Right). So know we want to get ∑E[X] for all the sequences that ends with Right, ∑E[X ^ 2] for all the sequences that ends with Right, number of Sequences K that ends with Right.

If we got these 3 things then the problem is solved.

So I will tell you how to compute The total number of sequences K, ∑E[X] for all sequences that ends with Right and you should be able to continue from there.

So now we fixed Right and for sure any element between indices : [Left, Right — 1] may be included in our sequence and may not, so the total number of sequences here are :

K = 2 ^ ( (Right — 1) — Left + 1).

now I will tell you how to compute E[X] for this sequence S = s1, s2, ..., sm.

for sure s1 < s2 < s3 < ... < sm

You can solve it using dynamic programming. The state would be (curIdx, endIdx, left).

curIdx it the idx of the element you are deciding currently to take it or not (for sure you will add both ways to the answer).

endIdx is the last element you can take.

left is the # of remaining numbers you need to take.

Now what is E[x] for this Sequence S = s1, s2, s3, ... , sm (∑ (i : 0 -> m)) dp(1, m, i).

Which means that I will try to include 0, 1, 2, 3, ..., m Elements from this array.

BTW by S I mean the elements from Left to Right — 1.

→ Reply

yaswanth

7 years ago, # ^ |

Thanks Omar. I have a question. So using dp we have got the sum of means of all the subsets of S. Meaning we have got E1 + E2 + ... Ek. But we have to calculate E1^2 + E2^2 + ... Ek^2. How do we do that?

→ Reply

7 years ago, # ^ |

Use the above DP in computing E(X ^ 2), it should be easy now.

For ∑ E^[X] let's state the problem.

Suppose that we have an array A = [a1, a2, ... , an] For Every Subsequence S = [s1, s2, s3, ..., sm] from A we need to find E^2[X] for that sequence which is ( ∑si/m ) ^ 2, which is ∑si^2 / m^2, So now you can use the same DP, for every element you may take si ^ 2 or you may not.

→ Reply

yaswanth

7 years ago, # ^ |

I got how to compute E(X^2). That's a straight forward dp. For (E[X])^2, how does dp work? I mean E[X] = (s1 + s2 + ... sm) / m. m varies for each subsequence. Moreover, it is (s1 + s2 + ... sm)^2. How do we calculate the next state from this? Can you elaborate on this dp?

→ Reply

Voleking

7 years ago, # ^ |

Thx a lot!

→ Reply

poikniok

7 years ago, # |

I always want to write brute force simulation of states immediately, today I resist doing so, but it turns out this is the exact thing to do for D1 300. So much time wasted for nothing :(

→ Reply

7 years ago, # ^ |

+13

I think people who brute on A will all got TL. So don't worry.

→ Reply

zscoder

7 years ago, # ^ |

+31

I think if you memoize the states brute force can pass also as there are ≤ N states per level.

→ Reply

poikniok

7 years ago, # ^ |

← Rev. 2 →

+18

I tested, it seems to me on any test I generate there are <1600 distinct states visited. That is you can not create more than 1600 valid arrays starting from S, thus if we just do BFS and keep track of already visited states, solution is very fast. If you have counterexample or theory why not I would be happy to see :)

→ Reply

zscoder

7 years ago, # ^ |

+23

I can prove an $\text{[math]}$ bound on the number of states.

Handle the case when the sum of either array is 0 separately.

Let S be the sum of first array and T be the sum of second. Note that the operation multiplies the sum by 2 on each step, so we must end in at most $\text{[math]}$ steps.

Now, for each step, we can have at most n distinct states. To see why, we note that the set of values we obtain will be the same no matter whether we use L or R. In fact, the states of each level are only cyclic shifts of each other. Thus, there are at most n states per level.

This proves the bound.

→ Reply

rajat1603

7 years ago, # ^ |

+28

the operation is commutative anyways so the number of distinct states visited will be very less.

→ Reply

Kaban-5

7 years ago, # ^ |

← Rev. 4 →

+28

If we look at sequence a₀, a₁, ..., a_n - 1 as a polynomial a₀x⁰ + a₁x¹ + ... + a_n - 1x^n - 1, then operations are just multiplication by 1 + x and 1 + x^n - 1 modulo xⁿ - 1, so they commute. Because of that there will be only 100 + 99 + ... + 0 states (only number of operations of each type matters, not their order) at most and actually much less.

UPD. Fixed an error. Thanks to Swistakk.

→ Reply

7 years ago, # ^ |

+18

Modulo xⁿ - 1, not modulo xⁿ : D

→ Reply

7 years ago, # ^ |

← Rev. 2 →

+10

Oh I didn't thought about memoization... I said that because I hacked two codes without memoization.

And yes there are at most 2000 states. This is because LR is equivalent to RL. so just like bubble sort, you can transform the sequence as LLLLL...RRRRR.... and get same results. (actually this observation results in short O(lg(n*ai)^3) solution)

→ Reply

rhezo

7 years ago, # |

← Rev. 2 →

+40

"There is an issue with one of the problems, which we are working on addressing. It may be several hours. We will post results when we are able."

Will it be rated?

→ Reply

7 years ago, # ^ |

+57

Petr challenged my solution of Div1-Medium and we just have a fix.

Now we are running system test. Maybe there are some challenges were affected, but it should be rated.

The test case is:

{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48}

{999999999, 1000000000, 1000000000, 1000000000, 1000000000, 1000000000, 1000000000, 1000000000, 1000000000, 1000000000, 1000000000, 1000000000, 1000000000, 1000000000, 1000000000, 1000000000, 1000000000, 1000000000, 1000000000, 1000000000, 1000000000, 1000000000, 1000000000, 1000000000, 1000000000, 1000000000, 1000000000, 1000000000, 1000000000, 1000000000, 1000000000, 1000000000, 1000000000, 1000000000, 1000000000, 1000000000, 1000000000, 1000000000, 1000000000, 1000000000, 1000000000, 1000000000, 1000000000, 1000000000, 1000000000, 1000000000, 1000000000, 1000000000, 1000000000, 1000000000}

And output should be around 0.0022541745919277614

Sorry for the inconvenience and thanks Petr for finding that.

→ Reply

rng_58

7 years ago, # ^ |

+15

Maybe you forgot to write: this is a precision issue.

The easiest solution is O(N⁴), but I guess it can be improved to O(N²) (I'm not very confident right now). Then, we can use slow BigDecimal to compute more precise answers.

Are there other ways to get a solution with high precision?

→ Reply

7 years ago, # ^ |

← Rev. 2 →

+10

I'm using BigDecimal, but looks like Petr didn't use it and still can produce correct result. (The key is divide number of subtrees for each term, instead of divide it in the end). (Edit: No, tried and this can't work)

→ Reply

Petr

7 years ago, # ^ |

+35

Actually, my trick is to subtract the mean from all weights in the beginning.

→ Reply

riadwaw

7 years ago, # ^ |

+10

Could you share your code in copypastable format?

→ Reply

Petr

7 years ago, # ^ |

Source

import java.util.List;
import java.util.ArrayList;

public class AverageVarianceSubtree {
    public double average(int[] p, int[] weight) {
        AverageVarianceSubtree.Vertex[] vs = new AverageVarianceSubtree.Vertex[p.length + 1];
        for (int i = 0; i < vs.length; ++i) {
            vs[i] = new AverageVarianceSubtree.Vertex();
            vs[i].w = weight[i];
        }
        double mean = 0;
        for (AverageVarianceSubtree.Vertex v : vs) mean += v.w;
        mean /= vs.length;
        for (AverageVarianceSubtree.Vertex v : vs) v.w -= mean;
        for (int i = 0; i < p.length; ++i) {
            AverageVarianceSubtree.Vertex a = vs[i + 1];
            AverageVarianceSubtree.Vertex b = vs[p[i]];
            a.adj.add(b);
            b.adj.add(a);
        }
        AverageVarianceSubtree.Vertex root = vs[0];
        root.init(null);
        double[] totalByN = root.totalTrees;
        double total = 0;
        for (double x : totalByN) total += x;
        double res = 0;
        for (AverageVarianceSubtree.Vertex a : vs) {
            a.init(null);
            double[] countWithA = a.findContaining(null, a);
            for (int n = 1; n < countWithA.length; ++n) {
                res += a.w * a.w * ((n - 1) / (double) (n * n)) * countWithA[n] / total;
            }
            for (AverageVarianceSubtree.Vertex b : vs)
                if (a != b) {
                    double[] countWithAB = a.findContaining(null, b);
                    for (int n = 1; n < countWithAB.length; ++n) {
                        res -= a.w * b.w / (double) (n * n) * countWithAB[n] / total;
                    }
                }
        }
        return res;
    }

    static class Vertex {
        double w;
        List<AverageVarianceSubtree.Vertex> adj = new ArrayList<>();
        double[] totalTrees;
        double[] totalTreesWithRoot;

        public void init(AverageVarianceSubtree.Vertex parent) {
            totalTrees = new double[]{0};
            totalTreesWithRoot = new double[]{0, 1};
            for (AverageVarianceSubtree.Vertex v : adj)
                if (v != parent) {
                    v.init(this);
                    double[] tmp = v.totalTreesWithRoot.clone();
                    tmp[0] += 1;
                    totalTreesWithRoot = combine(totalTreesWithRoot, tmp);
                    totalTrees = add(totalTrees, v.totalTrees);
                }
            totalTrees = add(totalTrees, totalTreesWithRoot);
        }

        private double[] combine(double[] a, double[] b) {
            double[] c = new double[a.length + b.length - 1];
            for (int i = 0; i < a.length; ++i) {
                for (int j = 0; j < b.length; ++j) {
                    c[i + j] += a[i] * b[j];
                }
            }
            return c;
        }

        private double[] add(double[] a, double[] b) {
            double[] c = new double[Math.max(a.length, b.length)];
            for (int i = 0; i < c.length; ++i) {
                if (i < a.length) c[i] += a[i];
                if (i < b.length) c[i] += b[i];
            }
            return c;
        }

        public double[] findContaining(AverageVarianceSubtree.Vertex parent, AverageVarianceSubtree.Vertex must) {
            if (this == must) {
                return totalTreesWithRoot;
            }
            double[] res = null;
            AverageVarianceSubtree.Vertex inside = null;
            for (AverageVarianceSubtree.Vertex v : adj)
                if (v != parent) {
                    double[] cur = v.findContaining(this, must);
                    if (cur != null) {
                        res = cur;
                        inside = v;
                    }
                }
            if (inside == null) return null;
            for (AverageVarianceSubtree.Vertex v : adj)
                if (v != parent && v != inside) {
                    double[] tmp = v.totalTreesWithRoot.clone();
                    tmp[0] += 1;
                    res = combine(res, tmp);
                }
            res = combine(res, new double[]{0, 1});
            return res;
        }

    }

}

→ Reply

riadwaw

7 years ago, # ^ |

Thanks.

Wasn't able to get an error anything close to 1e-9

→ Reply

Petr

7 years ago, # ^ |

+36

Here's a rough argument why this trick is enough:

Suppose maximum absolute value of a weight after shifting the mean to 0 is x. Then there exists an edge in the tree with difference in weights of ends of at least x/50. Both ends of this edge belong to at least ~1/50 of all trees (to see that, take any tree that doesn't have them, and add the shortest path to them with them; at most ~50 trees would merge into one here). So at least 1/50 of all trees would have a variance of at least (x/100)*(x/100)*2/50, which means that the return value will be at least x**2/12500000. On the other hand, the "positive" component of the formula is at most x**2, so the most precision we can lose by subtraction is on the scale of 1e7*machine_eps=1e7*1e-16=1e-9. Of course all bounds above were super generous, and in reality the loss is much less than 1e-9.

→ Reply

IvL

7 years ago, # ^ |

+43

My solution writes each term of the sums as Integer + Float, where 0 <= Float < 1, then sums them by 'components', as in all Integers are summed into a __int128, while all Floats into a double. Since the Floats are of form a/b for b <= 50^2 I assumed it would be enough to get high enough precision :\

→ Reply

7 years ago, # ^ |

+20

Is it connected to fact that equality Var=EX^2-(EX)^2 introduces big cancellation errors? Does it mean that most of solutions will fail systests? I thought that it should be a good idea for setters to request computing it modulo some prime.

→ Reply

7 years ago, # ^ |

Yes, basically when we do subtraction of two big numbers, the relative error can be huge. (Even we try the case where all weights are 10^9, we will output something much larger than 0).

Somehow when I did the analysis I thought we are just adding O(n^2) numbers and that should be fine. And I didn't test it thoroughly.

Btw, seems reducing weights to 1000 should be enough for avoid precision error.

→ Reply

fruwajacybyk

7 years ago, # ^ |

There will be another problem: suppose that we have to compute it modulo p = 1e9+7, then the number of trees coulde be equal to p. In that case if the numerator of our huge fraction is also divisible by p, then we should know this numerator mod p^2... I guess bignums are needed in that case.

→ Reply

7 years ago, # ^ |

Uh, but that can be disregarded in few ways. Either guaranteeing denominator will not be divisible by P, or requesting to output 0 in that case (or setting P to something larger than 2^P xD)

→ Reply

7 years ago, # ^ |

Better way is to print sum of variance for all possible subtrees, rather than expectation. This don't change the problem significantly, and also a / b is always divisible.

→ Reply

rng_58

7 years ago, # ^ |

+10

Another problem is that debugging can be harder. For example everyone knows 0.4 is 2/5, but it takes some time to understand that 800000006 is 2/5.

→ Reply

7 years ago, # ^ |

← Rev. 2 →

+38

Was 10^-9 precision necessary? or was it intentional (so making this problem "challenging")? I just don't understand the reason behind such tight bounds

→ Reply

Kostroma

7 years ago, # ^ |

← Rev. 2 →

+43

And I don't understand the authors not proving precision in their double solutions.

→ Reply

7 years ago, # ^ |

+10

cgy did that but did a mistake. And proving precision is terrible thing to do ;__;. And tight bounds always suck. IMHO doubles should be used only in problems without subtraction / division or when no possible way of doing it modulo P exists.

→ Reply

rng_58

7 years ago, # ^ |

+30

I'm the kind of person who always say "prove this! prove that!" but still I ignore precision issues (unless I use epsilons for comparisons). Though I know it's not a very good thing. Do you know how to prove precisions?

→ Reply

MagnificantDigit

7 years ago, # ^ |

← Rev. 7 →

+10

Yes. There's quite elaborate theory in terms of relative error and machine epsilon. For example, one can estimate relative precision of d-dimensional dot-product (or Sigma(X^2) or like) as (d+1)*machine_epsilon. In this problem one may deduce something like O(poly(n)*machine_epsilon), n is the tree size. It seems 10^-9 may not be enough precision, if computations are held in double, that is 10^-14, and we assume (it's not proof) poly(n)=O(n^3) or higher degree, max n = 50.

Details on general theory look, for example, in the book Applied numerical linear algebra by James Demmel.

→ Reply

Gassa

7 years ago, # ^ |

The problem here is not the error slowly accumulating over the last bits. It is catastrophic cancellation. So, at some step, the analysis must take the magnitude of input values (or input precision) into account.

For example, if the given weights were integers up to 1000, not to 10⁹, the whole precision issue would most likely be gone.

→ Reply

MagnificantDigit

7 years ago, # ^ |

← Rev. 2 →

I agree. I believe there exists solution with +/* operations only, i.e. cumulative DP possibly with increased complexity, but I have not come up with a correct one at the moment. It's interesting to see such a solution, if one exists.

→ Reply

7 years ago, # ^ |

← Rev. 2 →

The 1e-9 comes by default — so if we want to override it into 1e-6 we need to write checkAnswer(), and when we don't think that is necessary we use 1e-9.

Btw even we change the precision to 1.0 lots of solution will still fail.

→ Reply

7 years ago, # ^ |

Would it be possible to change default value to 1e-4 or 1e-6? I think 1e-9 is a very strict precision tolerance and it is really not fun to reject solutions that differ from correct answer by 1e-8 instead of 1e-9. What would be cons of such decision?

→ Reply

7 years ago, # ^ |

+10

Hmm I don't know why it is 1e-9, that decision was made before I can write Hello World.

And I agree that maybe 1e-6 is a better solution.

Btw, it is still much better than output 6 digits after the decimal points and then compare strings — I used one task in local OI contest many years ago that the answer is something like 0.12345650000000000001, so in order to get 6 digits right you need to be accurate for 1e-20.

→ Reply

-XraY-

7 years ago, # ^ |

+45

So what's the point? I changed long double to float128 and got AC.

→ Reply

Arterm

7 years ago, # ^ |

I was tester this time, so that's my fault too.
I implemented python solution (code) and I wrongly assumed it has enough precision and that's not an issue at all.
Sorry for that.

Probably good idea to stress solution with solution using double-precision decimals or ask answer by modulo. I like second way more.

→ Reply

7 years ago, # ^ |

+10

I think this time the key is to consider that corner cases. Our solution are wrong but they will give out different result and that is enough to catch the precision error. And if we test the case that all weights are 1e9 we can easily know our solutions are wrong.

I did the stress test with random weights between 1 and 1e9 and our solution can always agree.

→ Reply

saharshluthra

7 years ago, # |

+93

Rank 14, yanQval has 299.84 points in the 300 pointer. Does it seem legit?

→ Reply

jqdai0815

7 years ago, # |

← Rev. 2 →

+52

What is the intended solution for 900pts?

I implemented a very complicated solution (like something in each node in the tree, we color related nodes in two colors and use knapsack to distribute the connect components into two subtrees). And I think it worth more than 900pts.

→ Reply

geniucos

7 years ago, # ^ |

← Rev. 2 →

That's what I thought about too, but when I saw that I need to use knapsack too, I gave up. To do it, were you solving the problem first for subtrees of nodes that have maximum height and appear as a c in some of the constraints? That was my idea but I thought that having just 900 points, it was impossible to be that difficult

→ Reply

jqdai0815

7 years ago, # ^ |

+36

Yes, I think it's exactly what I implemented. I didn't finish it during the contest time. And I failed in the practice room once because of a corner case.

I think it is really difficult to pass using this solution during the contest.

→ Reply

-XraY-

7 years ago, # ^ |

Wow, I thought about this solution and suggested 900 to be too easy.) What's the problem with knapsack?

→ Reply

geniucos

7 years ago, # ^ |

+10

If you're asking me, there was no problem, except for the fact that I already had 150 lines of cod at the time I realized knapsack was needed. It's not the algorithm itself, but the overall implementation was really hard to code. It's true that the idea itself is not very hard, but the implementation was close to being impossible. The problem was nice, but hard to code, at least for me. 900 is not too much for the idea, but too little for the time spent coding on it

→ Reply

Kostroma

7 years ago, # ^ |

+10

I agree with you, I was coding this solution too, I had about half an hour for that during the contest, but I think I needed at least half an hour more.

→ Reply