ICPCNews's blog

By ICPCNews, 2 months ago, In English

text

Hello, Codeforces!

The ICPC Challenge World Finals in Luxor is approaching for some of you, and we are delighted to provide an additional exciting opportunity to compete open to all!

We are happy to invite you to the 2023 Post World Finals Online ICPC Challenge, powered by Huawei starting on May 6, 2024 at 15:00 UTC and ending on May 20, 2024 at 14:59 UTC.

In this Challenge, you will have a unique chance:

  • to compete with top programmers globally

  • to solve 1 exciting problem prepared by Huawei

  • to win amazing prizes from Huawei!

It is an individual competition.

2023 Post World Finals Online ICPC Challenge powered by Huawei:

Start: May 6, 2024 15:00 UTC

Finish: May 20, 2024 14:59 UTC

We hope you'll enjoy this complex Challenge!

Problem

We are glad to propose to you an exciting challenge “Accuracy-preserving summation algorithm”, which is prepared by Huawei Computing Product Line.

With this challenge, we focus on summation algorithms of floating point numbers in different precisions with the goal to use the lowest possible precision without losing too much of the accuracy of the end result. This problem arises in high-performance computing as lower precision can be computed faster. At the same time it also loses accuracy faster and can even lose it completely. Finding the right balance between fast low precision calculations and higher precision intermediate summations is a challenging task on modern architectures, and you would have an opportunity to try addressing this challenge

REGISTER

Prizes from Huawei

Rank Prize
Grand Prize (Rank 1) € 12 000 EUR + a travel trip to the 48th Annual ICPC World Finals in a guest role
First Prize (Rank 2-10) € 8,000 EUR
Second Prize (Rank 11-30) € 3,000 EUR
Third Prize (Rank 31-60): € 800 EUR
TOP 200 Participants Souvenir T-shirt
* If the allocated Huawei Challenge prize cannot be delivered to your region for any reason it may be replaced by another prize (if no legal restrictions), at the discretion of the Sponsor.

Challenge Rules and Conditions

By participating in this Challenge, you agree to the Challenge Rules and Conditions of Participation

Good luck, we hope this will be fun!

  • Vote: I like it
  • +236
  • Vote: I do not like it

»
2 months ago, # |
  Vote: I like it -18 Vote: I do not like it

Why was this postponed in the first place?

»
2 months ago, # |
  Vote: I like it -10 Vote: I do not like it

I remembered about a summation problem long ago, but it got wiped out of CF and I can't find it. Now it has came back with a different name.

»
2 months ago, # |
  Vote: I like it -21 Vote: I do not like it

Is it rated?

  • »
    »
    2 months ago, # ^ |
      Vote: I like it -21 Vote: I do not like it

    No

  • »
    »
    2 weeks ago, # ^ |
      Vote: I like it -76 Vote: I do not like it

    I think it's rated because carrot extension column called** (just now )** is founded in standing table

»
5 weeks ago, # |
  Vote: I like it 0 Vote: I do not like it

Should I be registered somewhere else from CF, or CF registration for the event is sufficient to be eligible for the prizes?

  • »
    »
    5 weeks ago, # ^ |
      Vote: I like it +11 Vote: I do not like it

    According to para. 8 of the challenge rules, "only participants who entered a valid ICPC username (email address) during registration are eligible for recognition as a Challenge winner", which means (if I remember correctly) that you need your codeforces email to match your icpc.global email. It should be possible to register there even if you are not icpc-eligible (e.g. not a student).

    You can probably avoid this as long as possible and only do this after the contest ends if you are among the winners, though.

    • »
      »
      »
      4 weeks ago, # ^ |
        Vote: I like it -8 Vote: I do not like it

      so we should have an account on ICPC website?

      • »
        »
        »
        »
        4 weeks ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        Yes, I think so.

      • »
        »
        »
        »
        4 weeks ago, # ^ |
        Rev. 2   Vote: I like it 0 Vote: I do not like it

        You can participate no matter you have an account or not. If you don't have, then you'll be invited to create one after this contest.

»
4 weeks ago, # |
  Vote: I like it 0 Vote: I do not like it

Best of luck for all the participant.

»
4 weeks ago, # |
  Vote: I like it -23 Vote: I do not like it

Hello this contest has been such a great pleasure to complete! Thank you to all the writers and editors and testers of this :) such a great contest.

»
4 weeks ago, # |
  Vote: I like it -15 Vote: I do not like it

How many tasks are there in this contest?

»
4 weeks ago, # |
  Vote: I like it +18 Vote: I do not like it

What about the T-shirts for the last challenge?

  • »
    »
    4 weeks ago, # ^ |
      Vote: I like it -10 Vote: I do not like it

    I never got mine. Did anyone?

  • »
    »
    3 days ago, # ^ |
      Vote: I like it +10 Vote: I do not like it

    Thanks for the question. We realize the wait has been too long. As mentioned earlier, there were some manufacturing challenges (pardon the pun). The t-shirts, we understand, have finally arrived in EACH region globally for all the top 200 ranked contestants. Each region is now delivering the t-shirts, and we hope you will receive it soon. On behalf of Huawei, thank you for your patience!

»
4 weeks ago, # |
  Vote: I like it -9 Vote: I do not like it

What should I learn to reach the level that I can solve that kind of problems?

»
4 weeks ago, # |
  Vote: I like it -44 Vote: I do not like it

it is rated?

»
4 weeks ago, # |
  Vote: I like it 0 Vote: I do not like it

Have to say, I haven't spent as much money since I was born as 12000 EUR.

»
4 weeks ago, # |
  Vote: I like it 0 Vote: I do not like it

Can I participate in other rated contests during the two-week challenge?

»
4 weeks ago, # |
  Vote: I like it +37 Vote: I do not like it

When will i receive my T-shirt I got last round?

»
4 weeks ago, # |
  Vote: I like it 0 Vote: I do not like it

What is the perfect score?

»
4 weeks ago, # |
  Vote: I like it +88 Vote: I do not like it

There's a lot of minor details about the scoring process that are missing from the statement. Do you plan to release a local scorer to clarify some of them? Thank you

»
4 weeks ago, # |
  Vote: I like it +11 Vote: I do not like it

The testing queue is so long, i need to wait 10 minutes to get the results.

»
4 weeks ago, # |
  Vote: I like it +27 Vote: I do not like it

DelayForces:( I have been waiting for judging for 25 minutes, but it is still "In queue" now.

  • »
    »
    4 weeks ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    For some reason this is happening to random submissions :(

»
4 weeks ago, # |
  Vote: I like it +3 Vote: I do not like it

Maybe the next time we will have to write an evaluator for codeforces :)

  • »
    »
    4 weeks ago, # ^ |
      Vote: I like it +3 Vote: I do not like it

    Or write a program to predict(guess) a proper time to submit to avoid being inq

»
4 weeks ago, # |
Rev. 2   Vote: I like it +12 Vote: I do not like it

Can someone explain which order codeforces judges? I sent submission 1 hour ago. Still in queue while recent ones have been already judged.

»
3 weeks ago, # |
  Vote: I like it 0 Vote: I do not like it

I just noticed the grader is extremely fast, can we expect the same performace for the hidden tests?

»
3 weeks ago, # |
  Vote: I like it -42 Vote: I do not like it

How hard is to reach 3000points? Like how long would a code for those scores be?

»
3 weeks ago, # |
  Vote: I like it +20 Vote: I do not like it

Time limit is "10s", for just adding 1000000 double numbers?. I am so confused: what is the point of solving this? Spending 10s CPU time and generate a not-reusable "program" just for a specific input. Am I missing something important? Really do not understand the "real" problem behind this "abstraction".

  • »
    »
    3 weeks ago, # ^ |
      Vote: I like it +48 Vote: I do not like it

    Not only that, they don't reveal many details which are very important and you have to guess them. If you want your "real" problem solved, you don't hide any details, you provide local tester and a few tests (or test generation method).

  • »
    »
    3 weeks ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    I second this

»
3 weeks ago, # |
  Vote: I like it +102 Vote: I do not like it

Maybe you could think that because I'm in the lead, I've understood the task statement.

I assure you : it's not the case. I have no idea how to add two fp16 numbers in the "right" way, by which I mean the one used in the evaluator. Especially subnormal fp16.

  • »
    »
    3 weeks ago, # ^ |
      Vote: I like it +31 Vote: I do not like it

    I would like to thank the organizers for the latest announcement.

  • »
    »
    3 weeks ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Still have no idea how they cast fp64 to fp32 or sum fp32.

    • »
      »
      »
      3 weeks ago, # ^ |
        Vote: I like it +48 Vote: I do not like it

      Two doubles -> Two floats -> Two doubles -> Sum

      • »
        »
        »
        »
        3 weeks ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        Oohhh thanks a lot! And they do not cast the sum to float? Anyway I will try when I will have access to a computer

»
3 weeks ago, # |
  Vote: I like it 0 Vote: I do not like it

The standings changes so fast!

»
3 weeks ago, # |
  Vote: I like it +69 Vote: I do not like it

Organizers aren't answering questions in the contest system so I will ask here:

1) In order to compute the perfect sum $$$S_e$$$, do we read input values as fp64 (double) or something more precise?
2) What happens when we convert fp16 infinity back to fp64? Does it become infinity or e.g. the max fp16 value?

pinging @ICPCNews

  • »
    »
    3 weeks ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    1) Note that the actual binary64 value is not always exactly equal to the given decimal value. Instead, the actual given value is the closest number representable in binary64. When reading the input, most programming languages will do the conversion for you automatically.

    Based on this statement, it is enough to just read input value into double without any extra actions.

    2) When the algorithm performs an addition in type T, the two arguments are converted into type T, and the result is also of type T. ... The addition of two fp16 happens in fp64, but the result is immediately converted to fp16.

    If I understand correctly: double a, double b -> half a, half b -> double a, double b -> double sum -> half sum

    • »
      »
      »
      3 weeks ago, # ^ |
        Vote: I like it +7 Vote: I do not like it

      This doesn't answer my doubts :(

      1) I'm asking about the definition of $$$S_e$$$, which organizers calculate "as precisely as [they] can". Is it the sum of input values, each rounded to binary64 first? Or is it the precise sum of input values, only at the end rounded to binary 64?

      2) What happens when we convert fp16 infinity back to fp64?

      • »
        »
        »
        »
        3 weeks ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        1) I also struggle with understanding $$$S_e$$$, but if " the actual given value is the closest number representable in binary64 ", I suppose that $$$S_e$$$ is computed based on the "given values".

        2) Based on my submissions, fp16 infinity converts into fp64 infinity. And it would be weird if infinity converts into some non-infinity value.

      • »
        »
        »
        »
        3 weeks ago, # ^ |
          Vote: I like it +25 Vote: I do not like it

        My best guess is the following: the expected sum S_e is the exact sum of input values converted to binary64. The tester has a bug, and does not calculate the sum correctly for 3 of the 76 test cases.

      • »
        »
        »
        »
        3 weeks ago, # ^ |
          Vote: I like it +10 Vote: I do not like it

        2) Shouldn't it be the same what happens when we convert float's infinity to double?

  • »
    »
    3 weeks ago, # ^ |
      Vote: I like it +29 Vote: I do not like it

    For 1), I have a simple algorithm that computes $$$S_e$$$. It may not give the theoretical $$$S_e$$$, but it seems to give the same $$$S_e$$$ as the evaluator. It may be that "as precisely as we can" != "as precisely as possible". I'm not sure I'm allowed to reveal more details.

    I have no idea for 2).

    • »
      »
      »
      3 weeks ago, # ^ |
        Vote: I like it +19 Vote: I do not like it

      "... I'm not sure I'm allowed to reveal more details..."

      Once the contest comes to an end, feel free to reveal all the details (procedure, source code, etc.) :). I'm eager to see a good solution in action.

»
3 weeks ago, # |
  Vote: I like it +54 Vote: I do not like it

The fifth Huawei-CF optimization challenge was on the midway... Participants were still guessing evaluation details...

»
3 weeks ago, # |
  Vote: I like it +38 Vote: I do not like it

I also asked questions, but organizers aren't answering the questions...

The announcement says "When a value is fp64 or fp32, we just use the native data types." , but I don't think this is very clear. This is because IEEE 754 allows several different rounding algorithms, as explained in https://en.wikipedia.org/wiki/IEEE_754#Rounding_rules , and is dependent on languages or compilers. (Python users may be in trouble because it doesn't support fp32 calculation in build-in functions.)

  • »
    »
    3 weeks ago, # ^ |
      Vote: I like it +19 Vote: I do not like it

    It seems to be consistent with C++ double and float.

»
2 weeks ago, # |
  Vote: I like it +16 Vote: I do not like it

In my humble opinion, these recent announcements are not entirely fair to those individuals in top positions, as they were clever enough to unravel the hidden aspects of the contest unaided. Now, they may face additional competition for the top spots. While such adjustments might be acceptable in a 'just for fun' contest, they seem unjustified here, especially considering the significant monetary stakes involved.

Fear not, top performers! I pose no threat to you! I currently reside at the bottom of the leaderboard and intend to stay there for the duration of the competition. :)

Good luck!

»
2 weeks ago, # |
  Vote: I like it +110 Vote: I do not like it

Come on... releasing the checker and some tests after 10 days of the competition? +Extension

I did most of the work in the first week, and planned to do very little this week. Good luck with that plan now, right?

  • »
    »
    2 weeks ago, # ^ |
      Vote: I like it -66 Vote: I do not like it

    We know that not every solution is the best solution, because there are a thousand Hamlets for a thousand readers.

    The current release plan is probably the one that best meets most candidate's needs. I apologize for the impact on you.

    The contest is still on, who knows what the final result will be?

»
2 weeks ago, # |
Rev. 2   Vote: I like it +30 Vote: I do not like it

Why?
"Next, we calculate the expected sum as precisely as we can"
"//'Exact' answer based on Kahan summation"
Or, the Kahan is shown but the expected sum is really exact while testing?
Hope it is.

  • »
    »
    2 weeks ago, # ^ |
      Vote: I like it +31 Vote: I do not like it

    Expected sum is not always exact. It is calculated by Kahan summation as in the published checker implementation.

»
2 weeks ago, # |
  Vote: I like it +5 Vote: I do not like it

My clar few days ago:
Q. In scoring, "Next, we calculate the expected sum S_e as precisely as we can" Is this mean S_e is the exact sum (after converting the inputs into binary64)?
A. The intent is that it is the closest possible binary64 value to the actual (infinite precision) sum.

... then, finding the exact sum with Kahan's algorithm is contradict with this answer, but will the checker fixed?

»
2 weeks ago, # |
  Vote: I like it +78 Vote: I do not like it

Is the checker numerically stable? Will it always add the same numbers to the same value?

I locally got a situation where printing extra stuff in the checker changes the sum that it computes. Maybe that's allowed in C++.

I also created a test where s{a,b,c} != s{s{a,b},c}, but I don't understand float->double->float casting enough to say if this is incorrect.

  • »
    »
    2 weeks ago, # ^ |
    Rev. 2   Vote: I like it +7 Vote: I do not like it

    The checker seems deterministic, why would the order of operations change?

    Could you share this situation or describe how to reproduce it? I don't think it should be legal in this code in standard C++, at least with IEEE 754-compliant FP arithmetic.

    Could you share this test? float->double->float roundtrip should be lossless.

    Edit: The example yosupo provided answers all my questions; "GNU G++17 7.3.0" doesn't strictly adhere to IEEE 754.

    • »
      »
      »
      2 weeks ago, # ^ |
      Rev. 3   Vote: I like it +34 Vote: I do not like it
      #include <iostream>
      #include <stack>
      
      
      using namespace std;
      
      // Calculate the fp32 sum sequence like {s:1,2,3}
      double calculateFp32(stack<double>& nums) {
          float currResultSingle = 0;
          while (!nums.empty()) {
              currResultSingle += static_cast<float>(nums.top());
              nums.pop();
          }
          return static_cast<double>(currResultSingle);
      }
      
      int main() {
          stack<double> s({1.0e-10, 1.0e-10, 1.0e0});
          double answer = calculateFp32(s);
          printf("%.20lf\n", answer);
      }
      

      This code prints 1.00000000020000001655 under "GNU G++17 7.3.0", and prints 1.00000000000000000000 under "GNU G++20 13.2". In addition to that, if we add a printf like printf("value: %.20lf\n", nums.top()); currResultSingle += static_cast<float>(nums.top());, the result changes to 1.00000000000000000000 under "GNU G++17 7.3.0".

      • »
        »
        »
        »
        2 weeks ago, # ^ |
          Vote: I like it +7 Vote: I do not like it

        Interesting. I was aware of this issue a while ago, but lately couldn't reproduce it so discarded the knowledge as no longer relevant in practice. Thank you!

  • »
    »
    9 days ago, # ^ |
    Rev. 2   Vote: I like it 0 Vote: I do not like it

    It looks like the checker behavior is as follows:

    1. For d sums, the accumulator containing the running binary64 sum and the next binary64 number to sum are expanded to 80-bit (long double) floating-point numbers, added, and then rounded back to binary64.

    2. For s sums, the binary64 numbers to sum are rounded to binary32 and then expanded to 80 bits and added into an 80-bit accumulator, except that whenever there are a multiple of 64 numbers left to add, counting the number just added, the accumulator is rounded to binary32 and then expanded back to 80 bits. The accumulator is rounded to binary64 to give the final value of the sum.

    3. For h sums, the accumulator is a 16-bit floating-point number, according to the nonstandard format explained in the contest announcements. Each addend is truncated to the 16-bit format; then the accumulator and addend are expanded to 80 bits, added, rounded to binary64, and then truncated back to 16 bits.

    This strange behavior, including the behavior changing when you insert prints, is similar to other excess precision problems in C++, as mentioned in this blog post; it happens when the compiler produces code that uses the 80-bit 8087 registers. This problem generally doesn't occur when compiling for the 64-bit architecture as compilers use the SSE2 instructions instead.

»
11 days ago, # |
  Vote: I like it 0 Vote: I do not like it

terminelo profe

  • »
    »
    11 days ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    ¡Dale, un esfuercito más y llegamos al top 20! ¡Vamos, vamos! :)

»
11 days ago, # |
  Vote: I like it -41 Vote: I do not like it

reduce the submission rate limits NOW or i riot

  • »
    »
    11 days ago, # ^ |
      Vote: I like it -18 Vote: I do not like it

    Is it the reason you are so rude? Who is afraid of you??

»
11 days ago, # |
  Vote: I like it 0 Vote: I do not like it

Will the tests really be completely different in the end or will some new tests get added?

»
11 days ago, # |
Rev. 3   Vote: I like it 0 Vote: I do not like it

I try to use Python Decimal library to calculate the more accurate results, but that seems different with the results calculated in checker. And Kahan will also lose precision during summation. Can I assume that we are not expected to calculate the most accurate answer, but the answer that matches the checker most?

For example, this self defined input: 5 -1.577492e-05 -2.576098e-05 3.994e-3 100000 -99999.3233. If we load them as double in python and sum through decimal, the result is 0.6806524640963445590490334732164390274533616321. But through checker the result is 0.68065246409969404339790344238281. There are several bits difference, which will lead to a big score downgrade in accuracy.

  • »
    »
    11 days ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Yes you can assume they use Kahan algorithm to compute Se and not the exact sum.

    • »
      »
      »
      11 days ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      Thanks for confirming. It really takes me some time to realize this and I am struggling to match checker's behavior.

»
10 days ago, # |
  Vote: I like it +1 Vote: I do not like it

Queue is stuck. Both new solutions and the ones with ongoing testing aren't progressing at the moment

»
10 days ago, # |
Rev. 2   Vote: I like it -6 Vote: I do not like it

You are right, but we can't get verdicts as fast as before now.

So, can codeforces allocate only 10% to other submissions?

»
9 days ago, # |
  Vote: I like it +5 Vote: I do not like it

In this contest, you can never assume that you have achieved an absolute lead.

»
9 days ago, # |
  Vote: I like it 0 Vote: I do not like it

Dear problem managers, I would really appreciate if you would put some submission restriction.

  • »
    »
    9 days ago, # ^ |
      Vote: I like it +6 Vote: I do not like it

    It won't change much at this point anyway. The queue is almost 1 hour, and the contest ends in 2 hours anyway.

    • »
      »
      »
      9 days ago, # ^ |
        Vote: I like it +7 Vote: I do not like it

      Quick unrelated question. Your profile says you're working at Huawei, doesn't that prevent you from participating ?

      • »
        »
        »
        »
        9 days ago, # ^ |
          Vote: I like it +37 Vote: I do not like it

        I use it in CF as my affiliation but I'm not an employee (and I don't have access to codebase). I sometimes help with events (B2B), e.g. I did workshops for their interns last year, a Summer online IOI camp, a China trip for ICPC students soon.

        Btw. I hope they will let me organize future CF Challenges. There were a lot of issues with this one.

»
9 days ago, # |
  Vote: I like it 0 Vote: I do not like it

the standing is on fire

»
9 days ago, # |
  Vote: I like it -6 Vote: I do not like it

Let’s send our North Korean homie Sung.An to world finals.

»
9 days ago, # |
  Vote: I like it 0 Vote: I do not like it

When can we see others solution?

  • »
    »
    9 days ago, # ^ |
    Rev. 2   Vote: I like it 0 Vote: I do not like it

    Yes! Please, please, please! Now, as everyone reading this post envisions me holding a hat and begging for a tip, what a grotesque image of myself! Hahaha :).

  • »
    »
    8 days ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Never. Every porblem that huawei have a practical application in theirs proprietary researches. So they wont make it opensource

»
9 days ago, # |
  Vote: I like it +35 Vote: I do not like it

Here is a graph showing ranking on the leaderboard vs. score, and how it changed between May 12 and just before the end of the contest, on May 23.

Here is an enlargement of the upper left corner.

  • »
    »
    6 days ago, # ^ |
      Vote: I like it +20 Vote: I do not like it

    Here are the preliminary scores (x axis) shown against the final scores (y axis):

»
9 days ago, # |
Rev. 2   Vote: I like it +56 Vote: I do not like it

I have provisional 6th place. My main idea for 5000+ points:

Build a binary tree, make sure to keep every block of 16 elements as one subtree. Greedily assign the fastest possible operation in every node (Half, Single, Double). Never decrease the type from child to parent. In random order, iterate leaves and greedily upgrade the operation type if it decreases the total error (this requires updating the path to the root, like in a segment tree). Now use meet in the middle: for each child of the root, consider $$$K \approx 10^5$$$ possibilities of what few changes we should make to slightly modify the result. Sort the possible results in two children of the root, use two pointers to find the best pair. Like in birthday paradox, this is actually like considering $$$K^2$$$ possibilities.

Improvements:

  1. Make two phases of improvements: one from Half to Single, then from Single to Double. This way, each phase you need to get lucky with fewer bits.
  2. The Half (fp16) type is very special because it always rounds towards 0. If all numbers are positive, you can't use this type because you almost always decrease the final sum. There are tests where almost all numbers are small and positive. You can use the few big (positive and negative) values to manipulate the sum. In particular, $$$60000 + 0.01 = 60000$$$ and $$$-60000 + 0.01 \approx -59970$$$. You can combine those two operations and again use something like meet in the middle.
  3. Some greedy approaches like merging from smallest absolute value, or combining a positive with negative value if their absolute value is close.

I'm very afraid that the final tests will be so different that my algorithm (2) will be useless.

And I have no idea how the top 1-2 achieved their scores.

scores per test
  • »
    »
    9 days ago, # ^ |
      Vote: I like it +18 Vote: I do not like it

    As I understand, the dfs-order of leaves in your solution is always 0, 1, 2, etc? In other words, do you even change their order? If no, then what is the point of "make sure to keep every block of 16 elements as one subtree"?

    • »
      »
      »
      9 days ago, # ^ |
      Rev. 3   Vote: I like it 0 Vote: I do not like it

      That's a reasonable question. The point (3) describes some examples of changing order. Though 5000+ points is possible with just 0..n-1 order.

      Some other small improvement: if I'm extremely close to the final sum, I collect nodes close to the root, shuffle them, rebuild that part of the tree, calculate the new sum and see if it's better. This approach shifts around bigger blocks, e.g. sizes 1024. And I try random orders for small $$$N$$$, basically ignoring the order penalty.

      • »
        »
        »
        »
        9 days ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        By the way, if you split everything into blocks of size 16 and then shuffle them, then you will probably have about $$$(16-k)n/32$$$ cache misses, where $$$k$$$ is the size of the last block.

        • »
          »
          »
          »
          »
          9 days ago, # ^ |
            Vote: I like it +44 Vote: I do not like it

          Yup. I always use the uneven block last. I came up with a nice hack: don't worry about it in the solution, but (when printing the tree) first go to child with size divisible by 16.

  • »
    »
    9 days ago, # ^ |
      Vote: I like it +6 Vote: I do not like it

    Thank you very much for sharing your approach, and congratulations on a top result!

    Can you also share the scores you got on individual tests?

  • »
    »
    9 days ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    For three of the tests (#72-74) the value of $$$S_e$$$ used by the checker was wrong, meaning that it wasn't the value of the infinitely precise sum rounded to the nearest binary64 number. Did this make any difference? I had to special-case these tests.

    • »
      »
      »
      9 days ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      I think that I didn't do anything particular for those tests. Once the checker was published, I just copied their kahan sum to compute the target sum. I don't remember if my scores improved thanks to that.

  • »
    »
    9 days ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    If you dont mind, can you please share your code for the final submission?

  • »
    »
    9 days ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Hi! First of all I would like to congratulate you for the amazing score and performance you achieved at this contest! Now, I want to say I saw a very interesting thing at your submissions when I was looking at the top scores submissions, I saw that you had a lot of them that achieved 1920 points, which is the same as a simple solution of liniarly itereting on the numbers. Could you please tell me what was the idea or problem behind this score ?

    • »
      »
      »
      8 days ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      Data mining. I used asserts and memory allocation to get some information about tests. You can read about it in the past Huawei Challenges.

      • »
        »
        »
        »
        6 days ago, # ^ |
        Rev. 2   Vote: I like it -10 Vote: I do not like it

        Is there any sense to get information about the tests. if our solution will be judged in the final tests

  • »
    »
    4 days ago, # ^ |
    Rev. 3   Vote: I like it 0 Vote: I do not like it

    Can you also describe how did you pick which operation use for a Node (Half, Single or Double)? I see that you wrote "Greedily assign the fastest possible operation", but I couldn't come up with any greedy that will improve my score (instead of decreasing score).

    And I am very surprised that 5000+ points can be obtained by only using the following order: 0, 1, 2, ..., n — 1

    • »
      »
      »
      4 days ago, # ^ |
      Rev. 3   Vote: I like it +18 Vote: I do not like it

      First I greedily assign the fastest operation that doesn't compute infinity. So:

      if (max({abs(left), abs(right), abs(left+right)}) <= 131000 && both children used HALF) USE HALF;
      else if (max(...) <= 1e38 && both children used HALF or SINGLE) USE SINGLE;  
      else use DOUBLE;
      

      This obviously creates a big final error and a terrible score, much worse than using only DOUBLE operations. In my previous comment (starting from "In random order"), I explained how I upgrade those operations (HALF->SINGLE and SINGLE->DOUBLE) to decrease the error, eventually getting it down to 0, usually.

      Think about this easier problem: Given a sequence of random values in range 1..1e9, insert operators + and - between them to get a sum equal to 0. Solution: insert operators at random first, then flip some of them to slightly improve the total sum, then use MITM to consider a lot of possible sets of changes at the same time, hopefully getting the sum 0.

      And I am very surprised that 5000+ points can be obtained by only using the following order: 0, 1, 2, ..., n — 1

      Note that it's the order of leaves. We build a (balanced) binary tree.

      • »
        »
        »
        »
        4 days ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        Okay, I got it. Initially, I didn't get an idea of updating the path to the root. Yes, I agree, I did the same (constructed a binary tree), but you already said the best score for me was when all my operations were DOUBLE.

        Thank you so much for the explanation and congrats :)

»
9 days ago, # |
Rev. 2   Vote: I like it +20 Vote: I do not like it
My scores
  • »
    »
    9 days ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    If you don't mind, can you please explain your approach?

  • »
    »
    9 days ago, # ^ |
      Vote: I like it +10 Vote: I do not like it

    My solution is based on maintaining a list of numbers, with the invariant that their exact sum is the target sum. The goal is then to reduce this list to a single element. The transitions are:

    • Take two numbers, and replace them by their (fp16,fp32,fp64) sum.
    • Replace a sublist of the numbers by their fp16/fp32 approximation.
    • A few other heuristics, e.g. trying to create fp16 values by adding positive and negative (fp32/fp64) values.

    The main subroutine takes a list of pairs (precise value, approximate value), and produces a list with as many approximate values as possible. For each input pair, the error is the difference between the two values.

    • Without loss of generality, assume that the total error is positive.
    • Initialize two priority queues: one holds the positive errors, the other one contains the negative errors.
    • Until the queues are empty, repeat the following:
    1. Pop the top element of the positive queue. If it is less than the total error, then subtract it from the total error.
    2. Otherwise, pop the top element of the negative queue, add it to the positive element, and push the result to the correct queue.

    At the end of this process, we have written the total error as the sum of lots of small positive errors. Each of these small errors corresponds to a sublist of the input list of pairs. For each such sublist, we check whether replacing all precise values by the corresponding approximate values preserves the target sum.

    This subroutine is used in various places, most importantly to create blocks of 16 numbers. Each block is initialized with the fp64 sum of the 16 numbers. (With some randomization of the permutation of the 16 numbers, it is possible to reach the target sum). We want to instead compute each block using fp16 and/or fp32 operations. We can compute some candidate tree for each block; such a tree determines an approximate value for the block. The above subroutine will try to use as many of these approximate values as possible.

»
9 days ago, # |
  Vote: I like it +14 Vote: I do not like it

I didn't achieve good scores, but look at the spreadsheet I used to analyze my solutions! (this was the first time I used such simple technique as "visualize my results")

»
9 days ago, # |
  Vote: I like it +11 Vote: I do not like it

I want to share the testcase data I extracted using memory allocation I learnt from last ICPC challenge. It shows some distribution or feature of each testcase. The value is not accurate since I use square root or log10 for those large values, but you can know the rough magnitude. Here ld is long double, d is double, s is single, h is half. Also log is the log10 and error is the accuracy error calculated from score.

»
9 days ago, # |
  Vote: I like it +6 Vote: I do not like it

Are the standings final? I don’t believe the final tests have been run, but the scoreboard says final standings.

  • »
    »
    9 days ago, # ^ |
      Vote: I like it +1 Vote: I do not like it

    The standings are not final. The final solutions will be run against a hidden set of testcases.

»
9 days ago, # |
Rev. 33   Vote: I like it +35 Vote: I do not like it

There is a bug in mingw-w64 implementation of scanf, which is used by the checker (Codeforces testing system is run on Windows). If you're using any method of input other than scanf and it's relatives, you'll sometimes get different last bit when reading numbers compared to what the checker sees. This results in a difference in the final sum between you and the checker if this bit is high enough. In this problem 1 bit of difference means you'll get half the points for the test. The probability of the bug is 1/4096 for numbers that are not representable exactly by long double (most decimals, e.g. 0.1, 0.12, 0.123,..). In my tests the probability was around 1/4300. I found this bug, because my sum and checker's sum were different in last bit on some of my random tests.

P.S. Apparently the bug only works if the number is written with less than 17 digits of precision. Otherwise we get 0 in first extra long double mantissa bit and everything is A-OK. So the bug is absent in all numbers from tests #16 and #10, because they have 19 digits of precision. But if there are long tests like #5 (with 7 digits of precision), there will be problems

P.P.S. Don't blame me for not telling everyone during contest. I wasn't aware of it until the last day.


Example: godbolt

#include <cstdio>

int main() {
    double val          =  1337.1337221 ;
    const char *str_val = "1337.1337221";
    
    double scanned;
    sscanf(str_val, "%lf", &scanned);
    if (scanned != val) {
        printf("⛔ %s ( %.13la != %.13la )\n", str_val, scanned, val);
    } else {
        printf("✅ %s\n", str_val);
    }
}

If you're using mingw-w64 (like Codeforces does), this will give you .


Explanation of example

More examples


Fun fact: there is no bug if you #include <stdio.h> instead of <cstdio>, but only if there are no other C++ headers included before it. The solution to this mystery lies in the depths of libstdc++ config/os/mingw32-w64/os_defines.h:l51

  • »
    »
    8 days ago, # ^ |
    Rev. 2   Vote: I like it 0 Vote: I do not like it

    Update:

    The bug is absent in preliminary tests

    Code
    Results

    Reported to mingw-w64/bugs/989

    Reported to testlib/issues/203

»
9 days ago, # |
  Vote: I like it +25 Vote: I do not like it

I used some very funny ideas:

Suppose we want to compute the exact sum $$$S_e$$$ with a mantissa of 52 bits. Then I first search for two input values $$$x$$$ and $$$y$$$ such that summing them in double precision gives a number with a mantissa that has the same bits as the 29 last bits of $$$S_e$$$. As in most cases $$$N^2 » 2^{29}$$$, $$$x$$$ and $$$y$$$ can be easily found in $$$\mathcal{O}(N \log N)$$$.

Now I can build a tree with the $$$N-2$$$ remaining values. My only condition for building this tree is $$$\texttt{float(T) + (x+y)} = S_e$$$, such that I just have to compute the sum of the remaining values with a precision of 23 bits instead of 52 bits. That way it's easier to use float summations.

But we can also repeat this process if $$$T$$$ can be cast to fp16. It's possible to search for two input values $$$z$$$ and $$$w$$$ such that $$$z+w$$$ has the same bits as the last $$$13$$$ bits of $$$float(T)$$$. That way I can compute a tree $$$T'$$$ with the $$$N-4$$$ remaining values such that $$$\texttt{float(fp16(T') + (z+w))} = \texttt{float(T)}$$$ That way I can compute $$$T'$$$ with a 10 bits precision and make it easier to use fp16 summation.

In the end my tree looks like $$$\verb|{d: {s: {d: {h:T'}, {d: z, w}}}, {d: x, y}}|$$$

  • »
    »
    9 days ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Very interesting idea! Could you share your results on the tests because I'm curious how your solution did, especially when it didn't find a sum of 2 numbers with the required last bits. Or did you have some improvements to handle this case?

    • »
      »
      »
      9 days ago, # ^ |
      Rev. 2   Vote: I like it 0 Vote: I do not like it
      results
    • »
      »
      »
      9 days ago, # ^ |
      Rev. 2   Vote: I like it 0 Vote: I do not like it

      For the last 29 bits, I always found a correct pair except for cases in 72-78 range and 1-12 range (and 43 if I'm not wrong). For the last 13 bits of $$$\texttt{float(T)}$$$ I only search when $$$T$$$ can be cast in fp16 and I always succeed except for cases 32, 37, 39 and 40 for which I have no clue why I did not find :(

      • »
        »
        »
        »
        9 days ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        Thanks! I think the tests 72-78 were specifically created in a way to counter a lot of solutions. You can even see that by the n given in those tests(n=100, 1000, 100000 while on the others n is more random). For the tests 1-12 and 43 n is small(<=10000) so perhaps your solution didn't have a big enough sample of numbers to find the numbers. Weird for the fp16 cases but maybe it's related to the way approximation works in fp16 like in the example Errichto gave: −60000+0.01≈−59970. So it really seems like overall the solution consistently finds a pair which is very satisfying haha

  • »
    »
    9 days ago, # ^ |
      Vote: I like it +3 Vote: I do not like it

    Lol, this is so clever and funny at the same time.

»
9 days ago, # |
Rev. 8   Vote: I like it 0 Vote: I do not like it

There was no need for custom janky fp16. Both Intel and AMD have F16C extension since SSE2 times (2011-2012), it's supported by GCC 12+ and Clang 15+ as _Float16 and since C++23 as std::float16_t.

»
9 days ago, # |
Rev. 2   Vote: I like it +15 Vote: I do not like it

For small test, I ran some simple SA where I randomly move nodes around (keeping block of 16) and changing the type. For most tests, I am trying something a bit smarter:

  • First I check what accuracy is needed in term of power of 2 (it is the exponent of the final sum — 52).
  • For each value of the input, I compute the error if we cast them to float32 and float16. For each bit above the threshold computed above, I pick two values such as the error of the cast has its most significant bit matching.
  • For each pair of value, I want want that provides me the option to increase the error, and one that can decrease the error. If both have the same sign, I can cast one of them and keep the other one alone.
  • I move those values at the end, sorted by most signficant bit (higher first)
  • I compute my initial error which depends on how many of those value have been cast, and I try to generate a tree with an error as small as possible.
  • Finally I compute the total error, and I switch the cast on and off in order to progressively reduce the error to 0.

I needed a special case for the least significant bit where I need one of the two values to not be rounded up.

To build the tree:

  • I select an order (randomly permute block of 16 and randomly permute within a blockl),
  • I keep the current error
  • Each turn I merge two adjacent nodes and replace them by the result in my list. Which nodes I pick is determined by in priority using float16 cast, and among all the option pick the one that would reduce my error the most (and if it does not, keep me in range of what can be corrected).

This work well for most seed from 11 to 71 (one exception at 42 I think) and 75/76, and for the other my SA works decently.

One issue here is that cast 16 only goes down (in absolute value) so when the input is all positive, we can't really use them unless we start with number that are close to a float16 to start with. The max score will usually be 63 there. For most other cases it's possible to achieve 81.

my scores
  • »
    »
    8 days ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Nice solution!

    Checker issues aside, it's great that there was such a variety of ideas.

    • »
      »
      »
      7 days ago, # ^ |
        Vote: I like it +14 Vote: I do not like it

      Thanks.

      The contest was very poorly designed, I can't imagine that's what they were expecting to get, but that aside, the problem was actually interesting with many ideas that could achieve a good result. Of course I am also biased by the final result, had I not performed well, I would just remember the headache to get it working and not the interesting ideas that came along the way =)

  • »
    »
    7 days ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Thanks for sharing your methods! I have some problem understanding your method in “ pick two values such as the error of the cast has its most significant bit matching”. What is the error here means? Value in fp64 minus value in fp32? And what is the most significant bit here means, is that in the exponent of the error? I just want to understand the brief idea behind this method. Thanks!

    • »
      »
      »
      6 days ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      Yes correct, the error is val — fp32(val) or val — fp16(val). And the most significant bit is the exponent, because I can basically flip that bit by casting or not casting. If I can do that for all the bits, then I can fix the final errors.

»
9 days ago, # |
  Vote: I like it -24 Vote: I do not like it

my best submission scored 5k+ points, but the last one was 4k+, can I contact someone to solve this problem?(

»
8 days ago, # |
  Vote: I like it 0 Vote: I do not like it

Is it still possible to register on the ICPC website? I don't know where to find the competition there.

»
8 days ago, # |
  Vote: I like it +36 Vote: I do not like it

When will the system test be conducted?

P/s: Nice contest, and there are quite a lot of simple but efficient solutions in the comments which really amaze me :).

»
8 days ago, # |
  Vote: I like it -21 Vote: I do not like it

I want the system to be able to test both the last commit and the highest scoring commit, so that people like me who scored low on the last commit will not lose the score they deserve.

  • »
    »
    8 days ago, # ^ |
      Vote: I like it +24 Vote: I do not like it

    I think that the reason it is not done this way is because it favours randomised solutions. However, I think it would be great to have an opportunity to know whether the solution failed one of the main tests due to RE/TL/ML, as done in some other competitions.

  • »
    »
    8 days ago, # ^ |
      Vote: I like it -15 Vote: I do not like it

    I tried to write to the organizers to test my best submission, as the latter has a very poor score, but they do not respond(

  • »
    »
    8 days ago, # ^ |
      Vote: I like it +49 Vote: I do not like it

    I think it can be unfair to tell the organizers to use one particular solution because now that the participants can communicate with each other and test the system outside the contest time, you could illegitimately know which solution is better.

    Also, would you take the max of those scores? Then people who sent their best scoring code last would be at a disadvantage since the system would test only one solution.

    • »
      »
      »
      8 days ago, # ^ |
        Vote: I like it +3 Vote: I do not like it

      I agree with you in some ways, but I wrote about the situation to the organizers almost immediately after the end of the competition until I could test anything after the end of the competition, also I could not resubmit my best submission because of the 10 minutes limit, besides I am not asking to choose a certain submission, but to choose the one that is the best and is displayed in the table right now. I spent quite a lot of effort and time on the competition(

  • »
    »
    7 days ago, # ^ |
      Vote: I like it -13 Vote: I do not like it

    I don't understand why I got a negative vote, I just wanted to make my point clear, obviously I knew it couldn't be and I should pay for my mistakes

»
7 days ago, # |
  Vote: I like it +69 Vote: I do not like it

It seems that the final judging is over and the tests are very similar. I guess that organizers used the same generator method, just with different seeds. Here are some statistics, assuming that nothing will change.

maxplus didn't lose many points and thus climbed to the first place, congrats. The top10 didn't change but the threshold dropped around 60-70 points. Three new people got into top30, and obviously three people dropped down.

Rank Name Score Preliminary
1 ($$$\color{green}{\texttt{-1}}$$$) maxplus 5721.93 ($$$\color{red}{\texttt{-7.50}}$$$) 5729.43
2 ($$$\color{green}{\texttt{-1}}$$$) sullyper 5669.12 ($$$\color{red}{\texttt{-18.20}}$$$) 5687.31
3 ($$$\color{red}{\texttt{+2}}$$$) Sung.An 5620.23 ($$$\color{red}{\texttt{-142.82}}$$$) 5763.05
4 ($$$\color{gray}{\texttt{+0}}$$$) teapotd 5613.47 ($$$\color{red}{\texttt{-35.51}}$$$) 5648.99
5 ($$$\color{green}{\texttt{-1}}$$$) Errichto 5599.94 ($$$\color{green}{\texttt{+3.62}}$$$) 5596.32
6 ($$$\color{green}{\texttt{-3}}$$$) square1001 5571.81 ($$$\color{green}{\texttt{+21.22}}$$$) 5550.59
7 ($$$\color{green}{\texttt{-1}}$$$) msmits 5562.13 ($$$\color{red}{\texttt{-0.61}}$$$) 5562.74
8 ($$$\color{red}{\texttt{+3}}$$$) RNS_KSR 5518.78 ($$$\color{red}{\texttt{-88.58}}$$$) 5607.36
9 ($$$\color{green}{\texttt{-1}}$$$) gleb.astashkin 5509.04 ($$$\color{red}{\texttt{-13.86}}$$$) 5522.9
10 ($$$\color{red}{\texttt{+3}}$$$) dbdr 5453.75 ($$$\color{red}{\texttt{-138.14}}$$$) 5591.9
top35
»
7 days ago, # |
  Vote: I like it 0 Vote: I do not like it

When I execute my code on my self-generated random test with N = 10^6 and each number has the form A.BCDEFGHIJKLMNOPQRSTeX where -300 <= X <= 300, it takes over 9 seconds to finish (and I got more than 47.14 points which indicates perfect accuracy). However the same code run for less than 5 seconds on each test of the organizer. Does that mean the final system tests are still weak somehow?

  • »
    »
    7 days ago, # ^ |
    Rev. 4   Vote: I like it 0 Vote: I do not like it

    Most of the system test have a low X since you can't really cast big numbers to fp16 or fp32 without getting +-INF. You could indeed try to sum some of them until they get low enough to cast them, but the odds of it actually happening in the test are so low there was almost no profit in doing so.

    Aside from that, reading the numbers as string and parsing them to long doubles was the only way i found to get the same Kahan sum as the checker did.

»
7 days ago, # |
  Vote: I like it +12 Vote: I do not like it

Is there going to be any plagiarism tests before the prizes are distributed or should I lose hope as I placed 205 ;(

»
6 days ago, # |
  Vote: I like it +28 Vote: I do not like it

Hi, everyone! I've finished 9th this time. And this was by a margin the hardest contest for me so far.

I had a kind of unique approach from the start and faith in it until the very end. But I was struggling so hard to guess the correct way of calculating fp16 addition, so basically I started to optimize something only on 11th day, after the checker's code was published :)

Let me share the idea with you. First of all, I split the input array into two parts.

  1. First part consists of 90-97.5% of elements. The main goal is to get the minimum W as possible, keeping an absolute error relatively low

  2. Second part is the "compensator". Let's build a balanced tree. Leafs are elements, internal nodes are operations. All the internal operations are fp64, and only the lowest level of operations (between two leaves) might be fp32 or fp16. We can approximate desired error almost transparently by choosing between different type of operation on the lowest level (other operations on the path to the root of the tree are fp64, so we usually don't lose any precision there)

The main challenge is how to keep the absolute error in the first part relatively low. You can notice that rounding during fp16 addition always truncate the result towards zero. It's not a problem if array is balanced in terms of the number of positive and negative elemtents. So actually the main problem is how to use as much fp16 operations as possible if array is not balanced at all

Without loss of generality, let's consider that the array consists only of positive number.

Consider following approach:

  • Split the array to some number of blocks of predetermined size (let's say blockSize = 16 to get rid of random memory access penalty)

  • Sum up all the elements in each block constructing a balanced tree using as much fp16/fp32 operations as possible. We almost don't care about resulting error right there

  • Sum up all the blocks one by one choosing between fp32 and fp64. We should pick fp32 only if rounding helps to lower the error. The "compensated" amount of each step is the rounding error, which clearly depends on magnitude of addendums. But this is a good news since all the elements are positive, partial sum will increase after each step, so we would compensate bigger amount closer to the end of blocks summation process

This approach allows us to get a relatively low resulting error (which most probably might be compensated by the second part) even if all the elements have the same sign. In worst case scenario we have to skip some worst (in terms of increasing error) fp16 operations during blocks sum calculation

my final scores
»
6 days ago, # |
  Vote: I like it +14 Vote: I do not like it

the ones who does not have positive contribution can not make a comment with image ****

thank you

»
6 days ago, # |
  Vote: I like it +23 Vote: I do not like it

Just so you just realize how useless and impractical this task is:

If you round numbers in fp16 you never reduce the error => using replacing fp64 with fp16 you only increase the answer.

Is it logical that fp32 should also not reduce the answer? Yes.... but a bunch of bugs in the author checker changes the problem dramatically.

As it should be: We use fp64 to somehow reduce the answer, then use fp16, fp32 to increase it. And the task is to gnaw the scores, because it is really hard to do it (even ~55 on a random test). Butooooooooooooooooooooooooooo, we have a different task. We have fp64, which can calculate the answer very slightly inaccurately. We have fp16, which only increases. And there's fp32, which doesn't know how to behave at all.

And as if a task in which we would have to think about rounding and how it works is even slightly meaningful. But a task where we???????? where we can make random changes in the response thanks to incorrect authoring code is a task that can never be applied anywhere.

  • »
    »
    4 days ago, # ^ |
      Vote: I like it +47 Vote: I do not like it

    If you pay attention to the changes brought by GPU chips, you may guess the original purpose of this topic. Unfortunately, some hardware conditions cannot be fully simulated. Therefore, some adjustments and changes have been made to the question. Thank you very much for your suggestions, We will continue to optimize and improve. Congratulations on entering the TOP60 list.

»
16 hours ago, # |
  Vote: I like it +10 Vote: I do not like it

I was expecting a problem to minimize the error, but it turned out to be a problem to cancel the error knowing the accurate answer. Still a fun problem, but it seems not as useful in practice as advertised (they said "addressing this challenge", not "a problem inspired by this challenge"). It would be surprising to me if the organizer could use any of the algorithms in this contest on their chips.