The 2023 Post World Finals Online ICPC Challenge powered by Huawei

5 weeks ago, # ^ |

+11

According to para. 8 of the challenge rules, "only participants who entered a valid ICPC username (email address) during registration are eligible for recognition as a Challenge winner", which means (if I remember correctly) that you need your codeforces email to match your icpc.global email. It should be possible to register there even if you are not icpc-eligible (e.g. not a student).

You can probably avoid this as long as possible and only do this after the contest ends if you are among the winners, though.

→ Reply

»

Wasif_Shahzad

4 weeks ago, # ^ |

-8

so we should have an account on ICPC website?

→ Reply

»

matheusaires095

4 weeks ago, # ^ |

0

Yes, I think so.

→ Reply

»

Kyte

4 weeks ago, # ^ |

← Rev. 2 →

0

You can participate no matter you have an account or not. If you don't have, then you'll be invited to create one after this contest.

→ Reply

»

iamshihab100

4 weeks ago, # |

0

Best of luck for all the participant.

→ Reply

»

LittleLegioncy

4 weeks ago, # |

-23

Hello this contest has been such a great pleasure to complete! Thank you to all the writers and editors and testers of this :) such a great contest.

→ Reply

»

100413hhyz

4 weeks ago, # |

-15

How many tasks are there in this contest?

→ Reply

»

Xenocryst

3 weeks ago, # ^ |

-10

only 1

→ Reply

»

100413hhyz

3 weeks ago, # ^ |

-8

thanks

→ Reply

»

great_fortune

4 weeks ago, # |

+18

What about the T-shirts for the last challenge?

→ Reply

»

McPqndq

4 weeks ago, # ^ |

-10

I never got mine. Did anyone?

→ Reply

»

ICPCNews

3 days ago, # ^ |

+10

Thanks for the question. We realize the wait has been too long. As mentioned earlier, there were some manufacturing challenges (pardon the pun). The t-shirts, we understand, have finally arrived in EACH region globally for all the top 200 ranked contestants. Each region is now delivering the t-shirts, and we hope you will receive it soon. On behalf of Huawei, thank you for your patience!

→ Reply

»

Abdulrahman_ELBarbary

4 weeks ago, # |

-9

What should I learn to reach the level that I can solve that kind of problems?

→ Reply

»

TBPOTW

4 weeks ago, # |

-44

it is rated?

→ Reply

»

HaneDaniko

4 weeks ago, # |

0

Have to say, I haven't spent as much money since I was born as 12000 EUR.

→ Reply

»

stand_still

4 weeks ago, # |

0

Can I participate in other rated contests during the two-week challenge?

→ Reply

»

4 weeks ago, # |

+37

When will i receive my T-shirt I got last round?

→ Reply

»

andreifilimon

4 weeks ago, # |

0

What is the perfect score?

→ Reply

»

winger

4 weeks ago, # |

+88

There's a lot of minor details about the scoring process that are missing from the statement. Do you plan to release a local scorer to clarify some of them? Thank you

→ Reply

»

Kyte

4 weeks ago, # |

+11

The testing queue is so long, i need to wait 10 minutes to get the results.

→ Reply

»

4 weeks ago, # |

+27

DelayForces:( I have been waiting for judging for 25 minutes, but it is still "In queue" now.

→ Reply

»

windreaper

4 weeks ago, # ^ |

0

For some reason this is happening to random submissions :(

→ Reply

»

Tom66

4 weeks ago, # |

+3

Maybe the next time we will have to write an evaluator for codeforces :)

→ Reply

»

Tom66

4 weeks ago, # ^ |

+3

Or write a program to predict(guess) a proper time to submit to avoid being inq

→ Reply

»

AliArapov221

4 weeks ago, # |

← Rev. 2 →

+12

Can someone explain which order codeforces judges? I sent submission 1 hour ago. Still in queue while recent ones have been already judged.

→ Reply

»

teruel

3 weeks ago, # |

0

I just noticed the grader is extremely fast, can we expect the same performace for the hidden tests?

→ Reply

»

andreifilimon

3 weeks ago, # |

-42

How hard is to reach 3000points? Like how long would a code for those scores be?

→ Reply

»

samarth_1703

3 weeks ago, # ^ |

0

Can we see other's solutions after contest is over?

→ Reply

»

lddlinan

3 weeks ago, # |

+20

Time limit is "10s", for just adding 1000000 double numbers?. I am so confused: what is the point of solving this? Spending 10s CPU time and generate a not-reusable "program" just for a specific input. Am I missing something important? Really do not understand the "real" problem behind this "abstraction".

→ Reply

»

Coder

3 weeks ago, # ^ |

+48

Not only that, they don't reveal many details which are very important and you have to guess them. If you want your "real" problem solved, you don't hide any details, you provide local tester and a few tests (or test generation method).

→ Reply

»

JasonMendoza2008

3 weeks ago, # ^ |

0

I second this

→ Reply

»

3 weeks ago, # |

+102

Maybe you could think that because I'm in the lead, I've understood the task statement.

I assure you : it's not the case. I have no idea how to add two fp16 numbers in the "right" way, by which I mean the one used in the evaluator. Especially subnormal fp16.

→ Reply

»

3 weeks ago, # ^ |

+31

I would like to thank the organizers for the latest announcement.

→ Reply

»

3 weeks ago, # ^ |

0

Still have no idea how they cast fp64 to fp32 or sum fp32.

→ Reply

»

3 weeks ago, # ^ |

+48

Two doubles -> Two floats -> Two doubles -> Sum

→ Reply

»

3 weeks ago, # ^ |

0

Oohhh thanks a lot! And they do not cast the sum to float? Anyway I will try when I will have access to a computer

→ Reply

»

3 weeks ago, # |

0

The standings changes so fast!

→ Reply

»

3 weeks ago, # |

+69

Organizers aren't answering questions in the contest system so I will ask here:

1) In order to compute the perfect sum $$$S_e$$$, do we read input values as fp64 (double) or something more precise?
2) What happens when we convert fp16 infinity back to fp64? Does it become infinity or e.g. the max fp16 value?

pinging @ICPCNews

→ Reply

»

dimkolya

3 weeks ago, # ^ |

0

1) Note that the actual binary64 value is not always exactly equal to the given decimal value. Instead, the actual given value is the closest number representable in binary64. When reading the input, most programming languages will do the conversion for you automatically.

Based on this statement, it is enough to just read input value into double without any extra actions.

2) When the algorithm performs an addition in type T, the two arguments are converted into type T, and the result is also of type T. ... The addition of two fp16 happens in fp64, but the result is immediately converted to fp16.

If I understand correctly: double a, double b -> half a, half b -> double a, double b -> double sum -> half sum

→ Reply

»

3 weeks ago, # ^ |

+7

This doesn't answer my doubts :(

1) I'm asking about the definition of $$$S_e$$$, which organizers calculate "as precisely as [they] can". Is it the sum of input values, each rounded to binary64 first? Or is it the precise sum of input values, only at the end rounded to binary 64?

2) What happens when we convert fp16 infinity back to fp64?

→ Reply

»

dimkolya

3 weeks ago, # ^ |

0

1) I also struggle with understanding $$$S_e$$$, but if " the actual given value is the closest number representable in binary64 ", I suppose that $$$S_e$$$ is computed based on the "given values".

2) Based on my submissions, fp16 infinity converts into fp64 infinity. And it would be weird if infinity converts into some non-infinity value.

→ Reply

»

Rafbill

3 weeks ago, # ^ |

+25

My best guess is the following: the expected sum S_e is the exact sum of input values converted to binary64. The tester has a bug, and does not calculate the sum correctly for 3 of the 76 test cases.

→ Reply

»

Noobish_Monk

3 weeks ago, # ^ |

+10

2) Shouldn't it be the same what happens when we convert float's infinity to double?

→ Reply

»

3 weeks ago, # ^ |

+29

For 1), I have a simple algorithm that computes $$$S_e$$$. It may not give the theoretical $$$S_e$$$, but it seems to give the same $$$S_e$$$ as the evaluator. It may be that "as precisely as we can" != "as precisely as possible". I'm not sure I'm allowed to reveal more details.

I have no idea for 2).

→ Reply

»

3 weeks ago, # ^ |

+19

"... I'm not sure I'm allowed to reveal more details..."

Once the contest comes to an end, feel free to reveal all the details (procedure, source code, etc.) :). I'm eager to see a good solution in action.

→ Reply

»

gultai4ukr

3 weeks ago, # |

+54

The fifth Huawei-CF optimization challenge was on the midway... Participants were still guessing evaluation details...

→ Reply

»

Kiri8128

3 weeks ago, # |

+38

I also asked questions, but organizers aren't answering the questions...

The announcement says "When a value is fp64 or fp32, we just use the native data types." , but I don't think this is very clear. This is because IEEE 754 allows several different rounding algorithms, as explained in https://en.wikipedia.org/wiki/IEEE_754#Rounding_rules , and is dependent on languages or compilers. (Python users may be in trouble because it doesn't support fp32 calculation in build-in functions.)

→ Reply

»

3 weeks ago, # ^ |

+19

It seems to be consistent with C++ double and float.

→ Reply

»

2 weeks ago, # |

+16

In my humble opinion, these recent announcements are not entirely fair to those individuals in top positions, as they were clever enough to unravel the hidden aspects of the contest unaided. Now, they may face additional competition for the top spots. While such adjustments might be acceptable in a 'just for fun' contest, they seem unjustified here, especially considering the significant monetary stakes involved.

Fear not, top performers! I pose no threat to you! I currently reside at the bottom of the leaderboard and intend to stay there for the duration of the competition. :)

Good luck!

→ Reply

»

2 weeks ago, # |

+110

Come on... releasing the checker and some tests after 10 days of the competition? +Extension

I did most of the work in the first week, and planned to do very little this week. Good luck with that plan now, right?

→ Reply

»

HuaweiChallenge

2 weeks ago, # ^ |

-66

We know that not every solution is the best solution, because there are a thousand Hamlets for a thousand readers.

The current release plan is probably the one that best meets most candidate's needs. I apologize for the impact on you.

The contest is still on, who knows what the final result will be?

→ Reply

»

2 weeks ago, # |

← Rev. 2 →

+30

Why?
"Next, we calculate the expected sum as precisely as we can"
"//'Exact' answer based on Kahan summation"
Or, the Kahan is shown but the expected sum is really exact while testing?
Hope it is.

→ Reply

»

maxplus

2 weeks ago, # ^ |

+31

Expected sum is not always exact. It is calculated by Kahan summation as in the published checker implementation.

→ Reply

»

physics0523

2 weeks ago, # |

+5

My clar few days ago:
Q. In scoring, "Next, we calculate the expected sum S_e as precisely as we can" Is this mean S_e is the exact sum (after converting the inputs into binary64)?
A. The intent is that it is the closest possible binary64 value to the actual (infinite precision) sum.

... then, finding the exact sum with Kahan's algorithm is contradict with this answer, but will the checker fixed?

→ Reply

»

2 weeks ago, # |

+78

Is the checker numerically stable? Will it always add the same numbers to the same value?

I locally got a situation where printing extra stuff in the checker changes the sum that it computes. Maybe that's allowed in C++.

I also created a test where s{a,b,c} != s{s{a,b},c}, but I don't understand float->double->float casting enough to say if this is incorrect.

→ Reply

»

maxplus

2 weeks ago, # ^ |

← Rev. 2 →

+7

The checker seems deterministic, why would the order of operations change?

Could you share this situation or describe how to reproduce it? I don't think it should be legal in this code in standard C++, at least with IEEE 754-compliant FP arithmetic.

Could you share this test? float->double->float roundtrip should be lossless.

Edit: The example yosupo provided answers all my questions; "GNU G++17 7.3.0" doesn't strictly adhere to IEEE 754.

→ Reply

»

yosupo

2 weeks ago, # ^ |

← Rev. 3 →

+34

#include <iostream>
#include <stack>


using namespace std;

// Calculate the fp32 sum sequence like {s:1,2,3}
double calculateFp32(stack<double>& nums) {
    float currResultSingle = 0;
    while (!nums.empty()) {
        currResultSingle += static_cast<float>(nums.top());
        nums.pop();
    }
    return static_cast<double>(currResultSingle);
}

int main() {
    stack<double> s({1.0e-10, 1.0e-10, 1.0e0});
    double answer = calculateFp32(s);
    printf("%.20lf\n", answer);
}

This code prints 1.00000000020000001655 under "GNU G++17 7.3.0", and prints 1.00000000000000000000 under "GNU G++20 13.2". In addition to that, if we add a printf like printf("value: %.20lf\n", nums.top()); currResultSingle += static_cast<float>(nums.top());, the result changes to 1.00000000000000000000 under "GNU G++17 7.3.0".

→ Reply

»

maxplus

2 weeks ago, # ^ |

+7

Interesting. I was aware of this issue a while ago, but lately couldn't reproduce it so discarded the knowledge as no longer relevant in practice. Thank you!

→ Reply

»

9 days ago, # ^ |

← Rev. 2 →

0

It looks like the checker behavior is as follows:

For d sums, the accumulator containing the running binary64 sum and the next binary64 number to sum are expanded to 80-bit (long double) floating-point numbers, added, and then rounded back to binary64.
For s sums, the binary64 numbers to sum are rounded to binary32 and then expanded to 80 bits and added into an 80-bit accumulator, except that whenever there are a multiple of 64 numbers left to add, counting the number just added, the accumulator is rounded to binary32 and then expanded back to 80 bits. The accumulator is rounded to binary64 to give the final value of the sum.
For h sums, the accumulator is a 16-bit floating-point number, according to the nonstandard format explained in the contest announcements. Each addend is truncated to the 16-bit format; then the accumulator and addend are expanded to 80 bits, added, rounded to binary64, and then truncated back to 16 bits.

This strange behavior, including the behavior changing when you insert prints, is similar to other excess precision problems in C++, as mentioned in this blog post; it happens when the compiler produces code that uses the 80-bit 8087 registers. This problem generally doesn't occur when compiling for the 64-bit architecture as compilers use the SSE2 instructions instead.

→ Reply

»

visho33

11 days ago, # |

0

terminelo profe

→ Reply

»

11 days ago, # ^ |

0

¡Dale, un esfuercito más y llegamos al top 20! ¡Vamos, vamos! :)

→ Reply

»

gother99

11 days ago, # |

-41

reduce the submission rate limits NOW or i riot

→ Reply

»

11 days ago, # ^ |

-18

Is it the reason you are so rude? Who is afraid of you??

→ Reply

»

iordache_

11 days ago, # |

0

Will the tests really be completely different in the end or will some new tests get added?

→ Reply

»

11 days ago, # |

← Rev. 3 →

0

I try to use Python Decimal library to calculate the more accurate results, but that seems different with the results calculated in checker. And Kahan will also lose precision during summation. Can I assume that we are not expected to calculate the most accurate answer, but the answer that matches the checker most?

For example, this self defined input: 5 -1.577492e-05 -2.576098e-05 3.994e-3 100000 -99999.3233. If we load them as double in python and sum through decimal, the result is 0.6806524640963445590490334732164390274533616321. But through checker the result is 0.68065246409969404339790344238281. There are several bits difference, which will lead to a big score downgrade in accuracy.

→ Reply

»

11 days ago, # ^ |

0

Yes you can assume they use Kahan algorithm to compute Se and not the exact sum.

→ Reply

»

11 days ago, # ^ |

0

Thanks for confirming. It really takes me some time to realize this and I am struggling to match checker's behavior.

→ Reply

»

gultai4ukr

10 days ago, # |

+1

Queue is stuck. Both new solutions and the ones with ongoing testing aren't progressing at the moment

→ Reply

»

PTqwq_

10 days ago, # |

← Rev. 2 →

-6

You are right, but we can't get verdicts as fast as before now.

So, can codeforces allocate only 10% to other submissions?

→ Reply

»

9 days ago, # |

+5

In this contest, you can never assume that you have achieved an absolute lead.

→ Reply

»

1yearlegend

9 days ago, # |

0

Dear problem managers, I would really appreciate if you would put some submission restriction.

→ Reply

»

9 days ago, # ^ |

+6

It won't change much at this point anyway. The queue is almost 1 hour, and the contest ends in 2 hours anyway.

→ Reply

»

Neum4nn

9 days ago, # ^ |

+7

Quick unrelated question. Your profile says you're working at Huawei, doesn't that prevent you from participating ?

→ Reply

»

9 days ago, # ^ |

+37

I use it in CF as my affiliation but I'm not an employee (and I don't have access to codebase). I sometimes help with events (B2B), e.g. I did workshops for their interns last year, a Summer online IOI camp, a China trip for ICPC students soon.

Btw. I hope they will let me organize future CF Challenges. There were a lot of issues with this one.

→ Reply

»

Telatabeez

9 days ago, # |

0

the standing is on fire

→ Reply

»

church_turing_thesis

9 days ago, # |

-6

Let’s send our North Korean homie Sung.An to world finals.

→ Reply

»

SAHAL01

9 days ago, # |

0

When can we see others solution?

→ Reply

»

9 days ago, # ^ |

← Rev. 2 →

0

Yes! Please, please, please! Now, as everyone reading this post envisions me holding a hat and begging for a tip, what a grotesque image of myself! Hahaha :).

→ Reply

»

brezhart

8 days ago, # ^ |

0

Never. Every porblem that huawei have a practical application in theirs proprietary researches. So they wont make it opensource

→ Reply

»

SAHAL01

7 days ago, # ^ |

0

Thank you.

→ Reply

»

9 days ago, # |

+35

Here is a graph showing ranking on the leaderboard vs. score, and how it changed between May 12 and just before the end of the contest, on May 23.

Here is an enlargement of the upper left corner.

→ Reply

»

6 days ago, # ^ |

+20

Here are the preliminary scores (x axis) shown against the final scores (y axis):

→ Reply

»

9 days ago, # |

← Rev. 2 →

+56

I have provisional 6th place. My main idea for 5000+ points:

Build a binary tree, make sure to keep every block of 16 elements as one subtree. Greedily assign the fastest possible operation in every node (Half, Single, Double). Never decrease the type from child to parent. In random order, iterate leaves and greedily upgrade the operation type if it decreases the total error (this requires updating the path to the root, like in a segment tree). Now use meet in the middle: for each child of the root, consider $$$K \approx 10^5$$$ possibilities of what few changes we should make to slightly modify the result. Sort the possible results in two children of the root, use two pointers to find the best pair. Like in birthday paradox, this is actually like considering $$$K^2$$$ possibilities.

Improvements:

Make two phases of improvements: one from Half to Single, then from Single to Double. This way, each phase you need to get lucky with fewer bits.
The Half (fp16) type is very special because it always rounds towards 0. If all numbers are positive, you can't use this type because you almost always decrease the final sum. There are tests where almost all numbers are small and positive. You can use the few big (positive and negative) values to manipulate the sum. In particular, $$$60000 + 0.01 = 60000$$$ and $$$-60000 + 0.01 \approx -59970$$$. You can combine those two operations and again use something like meet in the middle.
Some greedy approaches like merging from smallest absolute value, or combining a positive with negative value if their absolute value is close.

I'm very afraid that the final tests will be so different that my algorithm (2) will be useless.

And I have no idea how the top 1-2 achieved their scores.

scores per test

#1: Accepted [984 ms, 0 MB, 0 points]
#2: Accepted [952 ms, 0 MB, 0 points]
#3: Accepted [968 ms, 0 MB, 47.14 points]
#4: Accepted [1015 ms, 0 MB, 47.14 points]
#5: Accepted [1030 ms, 0 MB, 47.14 points]
#6: Accepted [1046 ms, 0 MB, 47.14 points]
#7: Accepted [1077 ms, 0 MB, 47.14 points]
#8: Accepted [1030 ms, 0 MB, 81.65 points]
#9: Accepted [1921 ms, 3 MB, 81.65 points]
#10: Accepted [953 ms, 0 MB, 47.14 points]
#11: Accepted [3858 ms, 3 MB, 59.63 points]
#12: Accepted [4765 ms, 5 MB, 62.471 points]
#13: Accepted [4874 ms, 92 MB, 63.344 points]
#14: Accepted [5562 ms, 162 MB, 63.362 points]
#15: Accepted [7702 ms, 253 MB, 63.37 points]
#16: Accepted [4687 ms, 92 MB, 77.972 points]
#17: Accepted [5374 ms, 164 MB, 77.971 points]
#18: Accepted [7124 ms, 256 MB, 77.997 points]
#19: Accepted [4749 ms, 93 MB, 81.298 points]
#20: Accepted [5296 ms, 163 MB, 81.176 points]
#21: Accepted [7015 ms, 255 MB, 81.192 points]
#22: Accepted [4749 ms, 93 MB, 81.394 points]
#23: Accepted [5249 ms, 164 MB, 81.523 points]
#24: Accepted [7155 ms, 255 MB, 81.535 points]
#25: Accepted [4749 ms, 97 MB, 81.436 points]
#26: Accepted [5374 ms, 166 MB, 81.533 points]
#27: Accepted [7124 ms, 259 MB, 81.532 points]
#28: Accepted [4749 ms, 100 MB, 81.447 points]
#29: Accepted [5093 ms, 161 MB, 81.46 points]
#30: Accepted [7250 ms, 261 MB, 81.524 points]
#31: Accepted [4686 ms, 94 MB, 81.418 points]
#32: Accepted [2155 ms, 125 MB, 81.621 points]
#33: Accepted [2671 ms, 181 MB, 81.64 points]
#34: Accepted [3733 ms, 295 MB, 81.642 points]
#35: Accepted [2218 ms, 126 MB, 81.622 points]
#36: Accepted [2436 ms, 155 MB, 81.641 points]
#37: Accepted [3796 ms, 297 MB, 81.643 points]
#38: Accepted [2139 ms, 126 MB, 81.623 points]
#39: Accepted [2983 ms, 211 MB, 81.637 points]
#40: Accepted [3859 ms, 300 MB, 81.643 points]
#41: Accepted [2187 ms, 125 MB, 81.62 points]
#42: Accepted [4718 ms, 93 MB, 81.388 points]
#43: Accepted [3561 ms, 0 MB, 57.711 points]
#44: Accepted [5234 ms, 163 MB, 81.275 points]
#45: Accepted [7139 ms, 254 MB, 81.352 points]
#46: Accepted [4796 ms, 97 MB, 81.283 points]
#47: Accepted [5265 ms, 167 MB, 81.348 points]
#48: Accepted [7217 ms, 258 MB, 81.381 points]
#49: Accepted [4764 ms, 101 MB, 81.23 points]
#50: Accepted [5265 ms, 162 MB, 81.289 points]
#51: Accepted [7312 ms, 261 MB, 80.951 points]
#52: Accepted [4765 ms, 93 MB, 70.176 points]
#53: Accepted [5578 ms, 164 MB, 72.095 points]
#54: Accepted [8077 ms, 256 MB, 78.445 points]
#55: Accepted [4812 ms, 97 MB, 79.891 points]
#56: Accepted [5702 ms, 165 MB, 65.472 points]
#57: Accepted [7531 ms, 259 MB, 74.913 points]
#58: Accepted [5030 ms, 101 MB, 74.111 points]
#59: Accepted [6109 ms, 162 MB, 70.467 points]
#60: Accepted [7359 ms, 261 MB, 79.901 points]
#61: Accepted [4671 ms, 94 MB, 81.22 points]
#62: Accepted [4796 ms, 92 MB, 79.363 points]
#63: Accepted [5515 ms, 164 MB, 79.404 points]
#64: Accepted [7655 ms, 255 MB, 79.327 points]
#65: Accepted [4812 ms, 97 MB, 79.066 points]
#66: Accepted [5468 ms, 168 MB, 79.115 points]
#67: Accepted [7484 ms, 259 MB, 79.339 points]
#68: Accepted [4890 ms, 101 MB, 79.515 points]
#69: Accepted [5343 ms, 163 MB, 79.28 points]
#70: Accepted [7578 ms, 262 MB, 79.083 points]
#71: Accepted [4796 ms, 94 MB, 79.244 points]
#72: Accepted [4999 ms, 4 MB, 57.642 points]
#73: Accepted [4515 ms, 39 MB, 55.839 points]
#74: Accepted [2625 ms, 3 MB, 58.974 points]
#75: Accepted [3765 ms, 4 MB, 61.72 points]
#76: Accepted [3031 ms, 39 MB, 63.812 points]
#77: Accepted [4436 ms, 3 MB, 53.402 points]
#78: Accepted [3389 ms, 38 MB, 49.236 points]

→ Reply

»

9 days ago, # ^ |

+18

As I understand, the dfs-order of leaves in your solution is always 0, 1, 2, etc? In other words, do you even change their order? If no, then what is the point of "make sure to keep every block of 16 elements as one subtree"?

→ Reply

»

9 days ago, # ^ |

← Rev. 3 →

0

That's a reasonable question. The point (3) describes some examples of changing order. Though 5000+ points is possible with just 0..n-1 order.

Some other small improvement: if I'm extremely close to the final sum, I collect nodes close to the root, shuffle them, rebuild that part of the tree, calculate the new sum and see if it's better. This approach shifts around bigger blocks, e.g. sizes 1024. And I try random orders for small $$$N$$$, basically ignoring the order penalty.

→ Reply

»

9 days ago, # ^ |

0

By the way, if you split everything into blocks of size 16 and then shuffle them, then you will probably have about $$$(16-k)n/32$$$ cache misses, where $$$k$$$ is the size of the last block.

→ Reply

»

9 days ago, # ^ |

+44

Yup. I always use the uneven block last. I came up with a nice hack: don't worry about it in the solution, but (when printing the tree) first go to child with size divisible by 16.

→ Reply

»

Degalat57

9 days ago, # ^ |

+6

Thank you very much for sharing your approach, and congratulations on a top result!

Can you also share the scores you got on individual tests?

→ Reply

»

9 days ago, # ^ |

0

For three of the tests (#72-74) the value of $$$S_e$$$ used by the checker was wrong, meaning that it wasn't the value of the infinitely precise sum rounded to the nearest binary64 number. Did this make any difference? I had to special-case these tests.

→ Reply

»

9 days ago, # ^ |

0

I think that I didn't do anything particular for those tests. Once the checker was published, I just copied their kahan sum to compute the target sum. I don't remember if my scores improved thanks to that.

→ Reply

»

aviralarpan3301

9 days ago, # ^ |

0

If you dont mind, can you please share your code for the final submission?

→ Reply

»

8 days ago, # ^ |

0

If it's allowed, I will share my code.

→ Reply

»

1yearlegend

9 days ago, # ^ |

0

Hi! First of all I would like to congratulate you for the amazing score and performance you achieved at this contest! Now, I want to say I saw a very interesting thing at your submissions when I was looking at the top scores submissions, I saw that you had a lot of them that achieved 1920 points, which is the same as a simple solution of liniarly itereting on the numbers. Could you please tell me what was the idea or problem behind this score ?

→ Reply

»

8 days ago, # ^ |

0

Data mining. I used asserts and memory allocation to get some information about tests. You can read about it in the past Huawei Challenges.

→ Reply

»

randomIsNotRandom

6 days ago, # ^ |

← Rev. 2 →

-10

Is there any sense to get information about the tests. if our solution will be judged in the final tests

→ Reply

»

_LeMur_

4 days ago, # ^ |

← Rev. 3 →

0

Can you also describe how did you pick which operation use for a Node (Half, Single or Double)? I see that you wrote "Greedily assign the fastest possible operation", but I couldn't come up with any greedy that will improve my score (instead of decreasing score).

And I am very surprised that 5000+ points can be obtained by only using the following order: 0, 1, 2, ..., n — 1

→ Reply

»

4 days ago, # ^ |

← Rev. 3 →

+18

First I greedily assign the fastest operation that doesn't compute infinity. So:

if (max({abs(left), abs(right), abs(left+right)}) <= 131000 && both children used HALF) USE HALF;
else if (max(...) <= 1e38 && both children used HALF or SINGLE) USE SINGLE;  
else use DOUBLE;

This obviously creates a big final error and a terrible score, much worse than using only DOUBLE operations. In my previous comment (starting from "In random order"), I explained how I upgrade those operations (HALF->SINGLE and SINGLE->DOUBLE) to decrease the error, eventually getting it down to 0, usually.

Think about this easier problem: Given a sequence of random values in range 1..1e9, insert operators + and - between them to get a sum equal to 0. Solution: insert operators at random first, then flip some of them to slightly improve the total sum, then use MITM to consider a lot of possible sets of changes at the same time, hopefully getting the sum 0.

And I am very surprised that 5000+ points can be obtained by only using the following order: 0, 1, 2, ..., n — 1

Note that it's the order of leaves. We build a (balanced) binary tree.

→ Reply

»

_LeMur_

4 days ago, # ^ |

0

Okay, I got it. Initially, I didn't get an idea of updating the path to the root. Yes, I agree, I did the same (constructed a binary tree), but you already said the best score for me was when all my operations were DOUBLE.

Thank you so much for the explanation and congrats :)

→ Reply

»

Rafbill

9 days ago, # |

← Rev. 2 →

+20

My scores

#1: Accepted [8296 ms, 563 MB, 0 points]
#2: Accepted [8890 ms, 563 MB, 0 points]
#3: Accepted [8109 ms, 563 MB, 47.14 points]
#4: Accepted [8577 ms, 563 MB, 47.14 points]
#5: Accepted [8515 ms, 563 MB, 47.14 points]
#6: Accepted [8203 ms, 563 MB, 47.14 points]
#7: Accepted [8421 ms, 563 MB, 47.14 points]
#8: Accepted [7999 ms, 563 MB, 81.65 points]
#9: Accepted [8093 ms, 563 MB, 81.644 points]
#10: Accepted [8108 ms, 563 MB, 47.14 points]
#11: Accepted [9296 ms, 563 MB, 60.662 points]
#12: Accepted [9593 ms, 564 MB, 62.949 points]
#13: Accepted [9015 ms, 582 MB, 63.243 points]
#14: Accepted [7640 ms, 605 MB, 63.245 points]
#15: Accepted [9437 ms, 637 MB, 63.244 points]
#16: Accepted [9562 ms, 584 MB, 79.717 points]
#17: Accepted [9578 ms, 607 MB, 79.909 points]
#18: Accepted [7937 ms, 640 MB, 80.044 points]
#19: Accepted [8827 ms, 584 MB, 81.577 points]
#20: Accepted [9577 ms, 609 MB, 81.6 points]
#21: Accepted [7671 ms, 644 MB, 81.636 points]
#22: Accepted [8734 ms, 584 MB, 81.603 points]
#23: Accepted [9593 ms, 610 MB, 81.624 points]
#24: Accepted [7749 ms, 644 MB, 81.633 points]
#25: Accepted [9592 ms, 585 MB, 81.606 points]
#26: Accepted [9640 ms, 610 MB, 81.633 points]
#27: Accepted [9718 ms, 644 MB, 81.629 points]
#28: Accepted [8374 ms, 584 MB, 81.612 points]
#29: Accepted [9608 ms, 609 MB, 81.631 points]
#30: Accepted [7827 ms, 644 MB, 81.634 points]
#31: Accepted [8921 ms, 584 MB, 81.623 points]
#32: Accepted [7889 ms, 583 MB, 63.242 points]
#33: Accepted [9078 ms, 605 MB, 63.244 points]
#34: Accepted [9608 ms, 637 MB, 63.245 points]
#35: Accepted [7984 ms, 583 MB, 63.241 points]
#36: Accepted [9452 ms, 606 MB, 63.243 points]
#37: Accepted [9577 ms, 639 MB, 63.244 points]
#38: Accepted [7718 ms, 583 MB, 63.242 points]
#39: Accepted [9140 ms, 605 MB, 63.243 points]
#40: Accepted [9593 ms, 638 MB, 63.244 points]
#41: Accepted [8046 ms, 582 MB, 63.239 points]
#42: Accepted [7718 ms, 584 MB, 81.62 points]
#43: Accepted [8733 ms, 563 MB, 57.711 points]
#44: Accepted [8812 ms, 609 MB, 81.641 points]
#45: Accepted [9578 ms, 645 MB, 81.644 points]
#46: Accepted [7921 ms, 584 MB, 81.629 points]
#47: Accepted [9155 ms, 609 MB, 81.642 points]
#48: Accepted [9640 ms, 644 MB, 81.643 points]
#49: Accepted [7718 ms, 584 MB, 81.639 points]
#50: Accepted [9577 ms, 609 MB, 81.641 points]
#51: Accepted [9608 ms, 644 MB, 81.643 points]
#52: Accepted [9608 ms, 583 MB, 79.022 points]
#53: Accepted [8265 ms, 606 MB, 80.315 points]
#54: Accepted [9609 ms, 638 MB, 81.645 points]
#55: Accepted [8233 ms, 584 MB, 81.619 points]
#56: Accepted [7593 ms, 623 MB, 64.911 points]
#57: Accepted [7890 ms, 638 MB, 81.593 points]
#58: Accepted [9593 ms, 584 MB, 81.088 points]
#59: Accepted [7984 ms, 605 MB, 80.627 points]
#60: Accepted [9593 ms, 639 MB, 81.639 points]
#61: Accepted [8140 ms, 583 MB, 81.622 points]
#62: Accepted [9593 ms, 583 MB, 81.214 points]
#63: Accepted [8452 ms, 607 MB, 81.1 points]
#64: Accepted [9124 ms, 640 MB, 81.129 points]
#65: Accepted [9578 ms, 583 MB, 80.994 points]
#66: Accepted [7874 ms, 607 MB, 81.199 points]
#67: Accepted [9608 ms, 640 MB, 81.035 points]
#68: Accepted [9561 ms, 584 MB, 80.869 points]
#69: Accepted [8061 ms, 607 MB, 81.314 points]
#70: Accepted [9390 ms, 640 MB, 81.38 points]
#71: Accepted [9561 ms, 583 MB, 81.076 points]
#72: Accepted [9593 ms, 564 MB, 57.949 points]
#73: Accepted [8234 ms, 572 MB, 56.137 points]
#74: Accepted [8952 ms, 563 MB, 59.713 points]
#75: Accepted [9187 ms, 564 MB, 77.564 points]
#76: Accepted [9562 ms, 571 MB, 63.582 points]
#77: Accepted [8390 ms, 564 MB, 47.4 points]
#78: Accepted [7874 ms, 572 MB, 47.15 points]

→ Reply

»

aviralarpan3301

9 days ago, # ^ |

0

If you don't mind, can you please explain your approach?

→ Reply

»

Rafbill

9 days ago, # ^ |

+10

My solution is based on maintaining a list of numbers, with the invariant that their exact sum is the target sum. The goal is then to reduce this list to a single element. The transitions are:

Take two numbers, and replace them by their (fp16,fp32,fp64) sum.
Replace a sublist of the numbers by their fp16/fp32 approximation.
A few other heuristics, e.g. trying to create fp16 values by adding positive and negative (fp32/fp64) values.

The main subroutine takes a list of pairs (precise value, approximate value), and produces a list with as many approximate values as possible. For each input pair, the error is the difference between the two values.

Without loss of generality, assume that the total error is positive.
Initialize two priority queues: one holds the positive errors, the other one contains the negative errors.
Until the queues are empty, repeat the following:

Pop the top element of the positive queue. If it is less than the total error, then subtract it from the total error.
Otherwise, pop the top element of the negative queue, add it to the positive element, and push the result to the correct queue.

At the end of this process, we have written the total error as the sum of lots of small positive errors. Each of these small errors corresponds to a sublist of the input list of pairs. For each such sublist, we check whether replacing all precise values by the corresponding approximate values preserves the target sum.

This subroutine is used in various places, most importantly to create blocks of 16 numbers. Each block is initialized with the fp64 sum of the 16 numbers. (With some randomization of the permutation of the 16 numbers, it is possible to reach the target sum). We want to instead compute each block using fp16 and/or fp32 operations. We can compute some candidate tree for each block; such a tree determines an approximate value for the block. The above subroutine will try to use as many of these approximate values as possible.

→ Reply

»

9 days ago, # |

+14

I didn't achieve good scores, but look at the spreadsheet I used to analyze my solutions! (this was the first time I used such simple technique as "visualize my results")

→ Reply

»

9 days ago, # |

+11

I want to share the testcase data I extracted using memory allocation I learnt from last ICPC challenge. It shows some distribution or feature of each testcase. The value is not accurate since I use square root or log10 for those large values, but you can know the rough magnitude. Here ld is long double, d is double, s is single, h is half. Also log is the log10 and error is the accuracy error calculated from score.

→ Reply

»

booor

9 days ago, # |

+6

Are the standings final? I don’t believe the final tests have been run, but the scoreboard says final standings.

→ Reply

»

h_squared

9 days ago, # ^ |

+1

The standings are not final. The final solutions will be run against a hidden set of testcases.

→ Reply

»

jamjury

9 days ago, # |

← Rev. 33 →

+35

There is a bug in mingw-w64 implementation of scanf, which is used by the checker (Codeforces testing system is run on Windows). If you're using any method of input other than scanf and it's relatives, you'll sometimes get different last bit when reading numbers compared to what the checker sees. This results in a difference in the final sum between you and the checker if this bit is high enough. In this problem 1 bit of difference means you'll get half the points for the test. The probability of the bug is 1/4096 for numbers that are not representable exactly by long double (most decimals, e.g. 0.1, 0.12, 0.123,..). In my tests the probability was around 1/4300. I found this bug, because my sum and checker's sum were different in last bit on some of my random tests.

P.S. Apparently the bug only works if the number is written with less than 17 digits of precision. Otherwise we get 0 in first extra long double mantissa bit and everything is A-OK. So the bug is absent in all numbers from tests #16 and #10, because they have 19 digits of precision. But if there are long tests like #5 (with 7 digits of precision), there will be problems

P.P.S. Don't blame me for not telling everyone during contest. I wasn't aware of it until the last day.

Example: godbolt

#include <cstdio>

int main() {
    double val          =  1337.1337221 ;
    const char *str_val = "1337.1337221";
    
    double scanned;
    sscanf(str_val, "%lf", &scanned);
    if (scanned != val) {
        printf("⛔ %s ( %.13la != %.13la )\n", str_val, scanned, val);
    } else {
        printf("✅ %s\n", str_val);
    }
}

If you're using mingw-w64 (like Codeforces does), this will give you ⛔.

Explanation of example

More examples

godbolt

#include <cstdio>

int main() {
    double scanned;
    
#define test(val) \
    sscanf(#val, "%lf", &scanned); \
    if (scanned != val) { \
        printf("⛔ %-12s ( %.13la != %.13la )\n", #val, scanned, val); \
    } else { \
        printf("✅ %s\n", #val); \
    }
    
    test(12.34514011)
    test(1.23458561)
    test(.123456428)
    test(.0123456789) // ok
    
    test(1e126)
    test(2e126)
    test(3e126) // ok
    test(4e126)
    test(5e125)
    
    test(1337.1337221)
    test(1337.1337671)
    test(1337.1337) // ok

    test(666.124212)
    test(666.6642291)
    test(666.) // ok
    
    test(96.69e+109)
    test(60.09e-190)
    test(420.420e261)
    test(420.691789)
    test(69.420) // nice
}

The chain of dependencies

Fun fact: there is no bug if you #include <stdio.h> instead of <cstdio>, but only if there are no other C++ headers included before it. The solution to this mystery lies in the depths of libstdc++ config/os/mingw32-w64/os_defines.h:l51

→ Reply

»

jamjury

8 days ago, # ^ |

← Rev. 2 →

0

Update:

The bug is absent in preliminary tests

Code

#include <cstdio>
#include <charconv>

int main() {
    int n;
    scanf("%d", &n);
    for (int i = 0; i < n; ++i) {
        char str_val[256];
        scanf("%s", &str_val);

        double val, scanned;
        std::from_chars(str_val, str_val + 256, val);
        sscanf(str_val, "%lf", &scanned);

        if (scanned != val) {
            return 0;
        }
    }

    printf("{d:");
    for (int i = 0; i < n; ++i) {
        if (i) printf(",");
        printf("%d", i + 1);
    }
    printf("}\n");
}

Results

#1: Accepted [46 ms, 0 MB, 0 points]
#2: Accepted [46 ms, 0 MB, 0 points]
#3: Accepted [30 ms, 0 MB, 47.14 points]
#4: Accepted [30 ms, 0 MB, 29.255 points]
#5: Accepted [31 ms, 0 MB, 47.14 points]
#6: Accepted [46 ms, 0 MB, 27.631 points]
#7: Accepted [30 ms, 0 MB, 28.707 points]
#8: Accepted [15 ms, 0 MB, 47.14 points]
#9: Accepted [31 ms, 0 MB, 47.14 points]
#10: Accepted [31 ms, 0 MB, 47.14 points]
#11: Accepted [46 ms, 0 MB, 47.14 points]
#12: Accepted [61 ms, 0 MB, 27.252 points]
#13: Accepted [750 ms, 0 MB, 22.966 points]
#14: Accepted [1843 ms, 0 MB, 22.72 points]
#15: Accepted [2796 ms, 0 MB, 22.028 points]
#16: Accepted [733 ms, 0 MB, 23.19 points]
#17: Accepted [1812 ms, 0 MB, 28.144 points]
#18: Accepted [2749 ms, 0 MB, 23.226 points]
#19: Accepted [781 ms, 0 MB, 23.25 points]
#20: Accepted [1890 ms, 0 MB, 27.76 points]
#21: Accepted [2827 ms, 0 MB, 24.078 points]
#22: Accepted [749 ms, 0 MB, 23.155 points]
#23: Accepted [1796 ms, 0 MB, 22.471 points]
#24: Accepted [2921 ms, 0 MB, 23.501 points]
#25: Accepted [843 ms, 0 MB, 21.33 points]
#26: Accepted [1952 ms, 0 MB, 23.195 points]
#27: Accepted [2858 ms, 0 MB, 24.541 points]
#28: Accepted [781 ms, 0 MB, 23.056 points]
#29: Accepted [1780 ms, 0 MB, 23.891 points]
#30: Accepted [2921 ms, 0 MB, 24.399 points]
#31: Accepted [780 ms, 0 MB, 24.154 points]
#32: Accepted [749 ms, 0 MB, 22.667 points]
#33: Accepted [1906 ms, 0 MB, 23.119 points]
#34: Accepted [2905 ms, 0 MB, 20.914 points]
#35: Accepted [780 ms, 0 MB, 24.557 points]
#36: Accepted [1921 ms, 0 MB, 21.853 points]
#37: Accepted [3014 ms, 0 MB, 20.602 points]
#38: Accepted [796 ms, 0 MB, 23.78 points]
#39: Accepted [1874 ms, 0 MB, 23.603 points]
#40: Accepted [3030 ms, 0 MB, 23.018 points]
#41: Accepted [750 ms, 0 MB, 26.321 points]
#42: Accepted [765 ms, 0 MB, 22.671 points]
#43: Accepted [31 ms, 0 MB, 47.14 points]
#44: Accepted [1937 ms, 0 MB, 24.383 points]
#45: Accepted [2999 ms, 0 MB, 23.013 points]
#46: Accepted [780 ms, 0 MB, 23.957 points]
#47: Accepted [1874 ms, 0 MB, 23.371 points]
#48: Accepted [2921 ms, 0 MB, 22.084 points]
#49: Accepted [811 ms, 0 MB, 24.467 points]
#50: Accepted [1843 ms, 0 MB, 24.139 points]
#51: Accepted [2952 ms, 0 MB, 22.808 points]
#52: Accepted [718 ms, 0 MB, 24.013 points]
#53: Accepted [1811 ms, 0 MB, 22.998 points]
#54: Accepted [2827 ms, 0 MB, 22.014 points]
#55: Accepted [781 ms, 0 MB, 23.372 points]
#56: Accepted [2015 ms, 0 MB, 25.205 points]
#57: Accepted [2921 ms, 0 MB, 24.082 points]
#58: Accepted [811 ms, 0 MB, 23.171 points]
#59: Accepted [1796 ms, 0 MB, 25.595 points]
#60: Accepted [2921 ms, 0 MB, 21.877 points]
#61: Accepted [781 ms, 0 MB, 22.985 points]
#62: Accepted [765 ms, 0 MB, 29.555 points]
#63: Accepted [1828 ms, 0 MB, 23.133 points]
#64: Accepted [2843 ms, 0 MB, 25.592 points]
#65: Accepted [780 ms, 0 MB, 25.027 points]
#66: Accepted [1874 ms, 0 MB, 23.185 points]
#67: Accepted [2999 ms, 0 MB, 23.313 points]
#68: Accepted [843 ms, 0 MB, 23.195 points]
#69: Accepted [1828 ms, 0 MB, 23.183 points]
#70: Accepted [2999 ms, 0 MB, 23.836 points]
#71: Accepted [781 ms, 0 MB, 25.971 points]
#72: Accepted [30 ms, 0 MB, 0 points]
#73: Accepted [327 ms, 0 MB, 0 points]
#74: Accepted [46 ms, 0 MB, 0 points]
#75: Accepted [46 ms, 0 MB, 47.14 points]
#76: Accepted [296 ms, 0 MB, 47.14 points]
#77: Accepted [31 ms, 0 MB, 5.357 points]
#78: Accepted [328 ms, 0 MB, 4.737 points]

Reported to mingw-w64/bugs/989

Reported to testlib/issues/203

→ Reply

»

9 days ago, # |

+25

I used some very funny ideas:

Suppose we want to compute the exact sum $$$S_e$$$ with a mantissa of 52 bits. Then I first search for two input values $$$x$$$ and $$$y$$$ such that summing them in double precision gives a number with a mantissa that has the same bits as the 29 last bits of $$$S_e$$$. As in most cases $$$N^2 » 2^{29}$$$, $$$x$$$ and $$$y$$$ can be easily found in $$$\mathcal{O}(N \log N)$$$.

Now I can build a tree with the $$$N-2$$$ remaining values. My only condition for building this tree is $$$\texttt{float(T) + (x+y)} = S_e$$$, such that I just have to compute the sum of the remaining values with a precision of 23 bits instead of 52 bits. That way it's easier to use float summations.

But we can also repeat this process if $$$T$$$ can be cast to fp16. It's possible to search for two input values $$$z$$$ and $$$w$$$ such that $$$z+w$$$ has the same bits as the last $$$13$$$ bits of $$$float(T)$$$. That way I can compute a tree $$$T'$$$ with the $$$N-4$$$ remaining values such that $$$\texttt{float(fp16(T') + (z+w))} = \texttt{float(T)}$$$ That way I can compute $$$T'$$$ with a 10 bits precision and make it easier to use fp16 summation.

In the end my tree looks like $$$\verb|{d: {s: {d: {h:T'}, {d: z, w}}}, {d: x, y}}|$$$

→ Reply

»

Banan

9 days ago, # ^ |

0

Very interesting idea! Could you share your results on the tests because I'm curious how your solution did, especially when it didn't find a sum of 2 numbers with the required last bits. Or did you have some improvements to handle this case?

→ Reply

»

9 days ago, # ^ |

← Rev. 2 →

0

results

#1: Accepted [8562 ms, 0 MB, 0 points]
#2: Accepted [8562 ms, 0 MB, 0 points]
#3: Accepted [8530 ms, 0 MB, 47.14 points]
#4: Accepted [8530 ms, 0 MB, 47.14 points]
#5: Accepted [8515 ms, 0 MB, 47.14 points]
#6: Accepted [8530 ms, 7 MB, 47.14 points]
#7: Accepted [8531 ms, 118 MB, 47.14 points]
#8: Accepted [8515 ms, 0 MB, 81.65 points]
#9: Accepted [8515 ms, 0 MB, 81.65 points]
#10: Accepted [8530 ms, 0 MB, 47.14 points]
#11: Accepted [8562 ms, 0 MB, 47.14 points]
#12: Accepted [8609 ms, 1 MB, 47.6 points]
#13: Accepted [8811 ms, 151 MB, 64.31 points]
#14: Accepted [8343 ms, 394 MB, 63.939 points]
#15: Accepted [8467 ms, 585 MB, 63.669 points]
#16: Accepted [9171 ms, 154 MB, 78.585 points]
#17: Accepted [8577 ms, 397 MB, 78.413 points]
#18: Accepted [7952 ms, 595 MB, 78.206 points]
#19: Accepted [8827 ms, 151 MB, 81.506 points]
#20: Accepted [8780 ms, 397 MB, 79.921 points]
#21: Accepted [7842 ms, 522 MB, 79.378 points]
#22: Accepted [8874 ms, 146 MB, 81.644 points]
#23: Accepted [8671 ms, 360 MB, 81.64 points]
#24: Accepted [8015 ms, 576 MB, 80.995 points]
#25: Accepted [8999 ms, 155 MB, 81.642 points]
#26: Accepted [8515 ms, 412 MB, 81.618 points]
#27: Accepted [8265 ms, 620 MB, 79.04 points]
#28: Accepted [8952 ms, 157 MB, 81.642 points]
#29: Accepted [8515 ms, 352 MB, 80.881 points]
#30: Accepted [8358 ms, 577 MB, 81.385 points]
#31: Accepted [8968 ms, 143 MB, 81.646 points]
#32: Accepted [8780 ms, 139 MB, 63.777 points]
#33: Accepted [8561 ms, 373 MB, 81.649 points]
#34: Accepted [9437 ms, 630 MB, 81.649 points]
#35: Accepted [8968 ms, 167 MB, 81.648 points]
#36: Accepted [8577 ms, 383 MB, 81.649 points]
#37: Accepted [9171 ms, 537 MB, 63.383 points]
#38: Accepted [9078 ms, 173 MB, 81.648 points]
#39: Accepted [8671 ms, 371 MB, 63.543 points]
#40: Accepted [9140 ms, 607 MB, 63.383 points]
#41: Accepted [8827 ms, 137 MB, 81.648 points]
#42: Accepted [8937 ms, 155 MB, 81.648 points]
#43: Accepted [8546 ms, 0 MB, 57.711 points]
#44: Accepted [8905 ms, 412 MB, 81.649 points]
#45: Accepted [8390 ms, 573 MB, 81.649 points]
#46: Accepted [8655 ms, 174 MB, 81.648 points]
#47: Accepted [8686 ms, 407 MB, 81.649 points]
#48: Accepted [8640 ms, 625 MB, 81.649 points]
#49: Accepted [8639 ms, 144 MB, 81.648 points]
#50: Accepted [8577 ms, 360 MB, 81.649 points]
#51: Accepted [8280 ms, 579 MB, 81.649 points]
#52: Accepted [8405 ms, 159 MB, 64.012 points]
#53: Accepted [8405 ms, 380 MB, 65.066 points]
#54: Accepted [9483 ms, 610 MB, 64.156 points]
#55: Accepted [8983 ms, 160 MB, 81.46 points]
#56: Accepted [7155 ms, 395 MB, 63.582 points]
#57: Accepted [8468 ms, 623 MB, 64.283 points]
#58: Accepted [8718 ms, 181 MB, 64.725 points]
#59: Accepted [8514 ms, 381 MB, 64.128 points]
#60: Accepted [8640 ms, 590 MB, 67.644 points]
#61: Accepted [9139 ms, 176 MB, 72.576 points]
#62: Accepted [9015 ms, 172 MB, 80.06 points]
#63: Accepted [8624 ms, 417 MB, 79.831 points]
#64: Accepted [9187 ms, 585 MB, 79.48 points]
#65: Accepted [9187 ms, 177 MB, 79.936 points]
#66: Accepted [8656 ms, 462 MB, 79.867 points]
#67: Accepted [9374 ms, 659 MB, 79.466 points]
#68: Accepted [8968 ms, 162 MB, 79.951 points]
#69: Accepted [7781 ms, 372 MB, 79.818 points]
#70: Accepted [9062 ms, 580 MB, 79.484 points]
#71: Accepted [8936 ms, 168 MB, 79.996 points]
#72: Accepted [8561 ms, 0 MB, 66.692 points]
#73: Accepted [8906 ms, 45 MB, 56.249 points]
#74: Accepted [8531 ms, 0 MB, 67.915 points]
#75: Accepted [8562 ms, 0 MB, 76.504 points]
#76: Accepted [8531 ms, 54 MB, 69.389 points]
#77: Accepted [8593 ms, 0 MB, 50.687 points]
#78: Accepted [8733 ms, 45 MB, 47.753 points]

→ Reply

»

9 days ago, # ^ |

← Rev. 2 →

0

For the last 29 bits, I always found a correct pair except for cases in 72-78 range and 1-12 range (and 43 if I'm not wrong). For the last 13 bits of $$$\texttt{float(T)}$$$ I only search when $$$T$$$ can be cast in fp16 and I always succeed except for cases 32, 37, 39 and 40 for which I have no clue why I did not find :(

→ Reply

»

Banan

9 days ago, # ^ |

0

Thanks! I think the tests 72-78 were specifically created in a way to counter a lot of solutions. You can even see that by the n given in those tests(n=100, 1000, 100000 while on the others n is more random). For the tests 1-12 and 43 n is small(<=10000) so perhaps your solution didn't have a big enough sample of numbers to find the numbers. Weird for the fp16 cases but maybe it's related to the way approximation works in fp16 like in the example Errichto gave: −60000+0.01≈−59970. So it really seems like overall the solution consistently finds a pair which is very satisfying haha

→ Reply

»

aviralarpan3301

9 days ago, # ^ |

+3

Lol, this is so clever and funny at the same time.

→ Reply

»

jamjury

9 days ago, # |

← Rev. 8 →

0

There was no need for custom janky fp16. Both Intel and AMD have F16C extension since SSE2 times (2011-2012), it's supported by GCC 12+ and Clang 15+ as _Float16 and since C++23 as std::float16_t.

→ Reply

»

sullyper

9 days ago, # |

← Rev. 2 →

+15

For small test, I ran some simple SA where I randomly move nodes around (keeping block of 16) and changing the type. For most tests, I am trying something a bit smarter:

First I check what accuracy is needed in term of power of 2 (it is the exponent of the final sum — 52).
For each value of the input, I compute the error if we cast them to float32 and float16. For each bit above the threshold computed above, I pick two values such as the error of the cast has its most significant bit matching.
For each pair of value, I want want that provides me the option to increase the error, and one that can decrease the error. If both have the same sign, I can cast one of them and keep the other one alone.
I move those values at the end, sorted by most signficant bit (higher first)
I compute my initial error which depends on how many of those value have been cast, and I try to generate a tree with an error as small as possible.
Finally I compute the total error, and I switch the cast on and off in order to progressively reduce the error to 0.

I needed a special case for the least significant bit where I need one of the two values to not be rounded up.

To build the tree:

I select an order (randomly permute block of 16 and randomly permute within a blockl),
I keep the current error
Each turn I merge two adjacent nodes and replace them by the result in my list. Which nodes I pick is determined by in priority using float16 cast, and among all the option pick the one that would reduce my error the most (and if it does not, keep me in range of what can be corrected).

This work well for most seed from 11 to 71 (one exception at 42 I think) and 75/76, and for the other my SA works decently.

One issue here is that cast 16 only goes down (in absolute value) so when the input is all positive, we can't really use them unless we start with number that are close to a float16 to start with. The max score will usually be 63 there. For most other cases it's possible to achieve 81.

my scores

#1: Accepted [9061 ms, 460 MB, 0 points]
#2: Accepted [9062 ms, 460 MB, 0 points]
#3: Accepted [9093 ms, 460 MB, 47.14 points]
#4: Accepted [9078 ms, 460 MB, 47.14 points]
#5: Accepted [9062 ms, 460 MB, 47.14 points]
#6: Accepted [9093 ms, 460 MB, 47.14 points]
#7: Accepted [9077 ms, 460 MB, 47.14 points]
#8: Accepted [9093 ms, 460 MB, 81.65 points]
#9: Accepted [9077 ms, 460 MB, 81.65 points]
#10: Accepted [9078 ms, 460 MB, 47.14 points]
#11: Accepted [6842 ms, 460 MB, 61.284 points]
#12: Accepted [6827 ms, 460 MB, 62.833 points]
#13: Accepted [6874 ms, 471 MB, 63.141 points]
#14: Accepted [7671 ms, 494 MB, 63.139 points]
#15: Accepted [7765 ms, 505 MB, 63.138 points]
#16: Accepted [7109 ms, 470 MB, 79.471 points]
#17: Accepted [7374 ms, 488 MB, 79.386 points]
#18: Accepted [7077 ms, 500 MB, 78.668 points]
#19: Accepted [7000 ms, 471 MB, 81.595 points]
#20: Accepted [7218 ms, 488 MB, 81.599 points]
#21: Accepted [7562 ms, 499 MB, 81.591 points]
#22: Accepted [6968 ms, 470 MB, 81.611 points]
#23: Accepted [7577 ms, 488 MB, 81.631 points]
#24: Accepted [7280 ms, 499 MB, 81.636 points]
#25: Accepted [6890 ms, 470 MB, 81.614 points]
#26: Accepted [7577 ms, 488 MB, 81.635 points]
#27: Accepted [7952 ms, 500 MB, 81.638 points]
#28: Accepted [6921 ms, 471 MB, 81.615 points]
#29: Accepted [7343 ms, 488 MB, 81.633 points]
#30: Accepted [7343 ms, 500 MB, 81.638 points]
#31: Accepted [6983 ms, 470 MB, 81.616 points]
#32: Accepted [7280 ms, 471 MB, 81.604 points]
#33: Accepted [7249 ms, 494 MB, 81.604 points]
#34: Accepted [8015 ms, 500 MB, 79.35 points]
#35: Accepted [7171 ms, 472 MB, 81.594 points]
#36: Accepted [7171 ms, 494 MB, 81.62 points]
#37: Accepted [8234 ms, 500 MB, 81.627 points]
#38: Accepted [6968 ms, 472 MB, 81.531 points]
#39: Accepted [7749 ms, 489 MB, 81.59 points]
#40: Accepted [8406 ms, 500 MB, 81.627 points]
#41: Accepted [7155 ms, 472 MB, 81.216 points]
#42: Accepted [7030 ms, 470 MB, 81.614 points]
#43: Accepted [9077 ms, 460 MB, 57.711 points]
#44: Accepted [7124 ms, 488 MB, 81.635 points]
#45: Accepted [7296 ms, 499 MB, 81.641 points]
#46: Accepted [6890 ms, 470 MB, 81.617 points]
#47: Accepted [6984 ms, 488 MB, 81.635 points]
#48: Accepted [7874 ms, 500 MB, 81.64 points]
#49: Accepted [6890 ms, 471 MB, 81.618 points]
#50: Accepted [7124 ms, 488 MB, 81.635 points]
#51: Accepted [7218 ms, 500 MB, 81.641 points]
#52: Accepted [6937 ms, 470 MB, 80.228 points]
#53: Accepted [7624 ms, 494 MB, 81.5 points]
#54: Accepted [6968 ms, 505 MB, 81.598 points]
#55: Accepted [7062 ms, 472 MB, 81.615 points]
#56: Accepted [7328 ms, 490 MB, 63.234 points]
#57: Accepted [7031 ms, 500 MB, 81.584 points]
#58: Accepted [6905 ms, 471 MB, 81.571 points]
#59: Accepted [7671 ms, 488 MB, 81.597 points]
#60: Accepted [7453 ms, 500 MB, 81.626 points]
#61: Accepted [6999 ms, 470 MB, 81.61 points]
#62: Accepted [6952 ms, 470 MB, 80.919 points]
#63: Accepted [7514 ms, 494 MB, 80.842 points]
#64: Accepted [7202 ms, 499 MB, 80.876 points]
#65: Accepted [6937 ms, 472 MB, 80.855 points]
#66: Accepted [7530 ms, 494 MB, 80.941 points]
#67: Accepted [7421 ms, 500 MB, 80.865 points]
#68: Accepted [7234 ms, 472 MB, 80.752 points]
#69: Accepted [7686 ms, 488 MB, 80.959 points]
#70: Accepted [7218 ms, 500 MB, 81.018 points]
#71: Accepted [6984 ms, 471 MB, 80.903 points]
#72: Accepted [9093 ms, 460 MB, 56.599 points]
#73: Accepted [9140 ms, 462 MB, 49.905 points]
#74: Accepted [9078 ms, 460 MB, 67.463 points]
#75: Accepted [6874 ms, 460 MB, 73.534 points]
#76: Accepted [6890 ms, 465 MB, 74.169 points]
#77: Accepted [9077 ms, 460 MB, 49.55 points]
#78: Accepted [4906 ms, 471 MB, 46.998 points]

→ Reply

»

8 days ago, # ^ |

0

Nice solution!

Checker issues aside, it's great that there was such a variety of ideas.

→ Reply

»

sullyper

7 days ago, # ^ |

+14

Thanks.

The contest was very poorly designed, I can't imagine that's what they were expecting to get, but that aside, the problem was actually interesting with many ideas that could achieve a good result. Of course I am also biased by the final result, had I not performed well, I would just remember the headache to get it working and not the interesting ideas that came along the way =)

→ Reply

»

7 days ago, # ^ |

0

Thanks for sharing your methods! I have some problem understanding your method in “ pick two values such as the error of the cast has its most significant bit matching”. What is the error here means? Value in fp64 minus value in fp32? And what is the most significant bit here means, is that in the exponent of the error? I just want to understand the brief idea behind this method. Thanks!

→ Reply

»

sullyper

6 days ago, # ^ |

0

Yes correct, the error is val — fp32(val) or val — fp16(val). And the most significant bit is the exponent, because I can basically flip that bit by casting or not casting. If I can do that for all the bits, then I can fix the final errors.

→ Reply

»

DimonKrutoi

9 days ago, # |

-24

my best submission scored 5k+ points, but the last one was 4k+, can I contact someone to solve this problem?(

→ Reply

»

Voz_bonita

8 days ago, # |

0

Is it still possible to register on the ICPC website? I don't know where to find the competition there.

→ Reply

»

minhcool

8 days ago, # |

+36

When will the system test be conducted?

P/s: Nice contest, and there are quite a lot of simple but efficient solutions in the comments which really amaze me :).

→ Reply

»

8 days ago, # |

-21

I want the system to be able to test both the last commit and the highest scoring commit, so that people like me who scored low on the last commit will not lose the score they deserve.

→ Reply

»

Degalat57

8 days ago, # ^ |

+24

I think that the reason it is not done this way is because it favours randomised solutions. However, I think it would be great to have an opportunity to know whether the solution failed one of the main tests due to RE/TL/ML, as done in some other competitions.

→ Reply

»

DimonKrutoi

8 days ago, # ^ |

-15

I tried to write to the organizers to test my best submission, as the latter has a very poor score, but they do not respond(

→ Reply

»

martins

8 days ago, # ^ |

+49

I think it can be unfair to tell the organizers to use one particular solution because now that the participants can communicate with each other and test the system outside the contest time, you could illegitimately know which solution is better.

Also, would you take the max of those scores? Then people who sent their best scoring code last would be at a disadvantage since the system would test only one solution.

→ Reply

»

DimonKrutoi

8 days ago, # ^ |

+3

I agree with you in some ways, but I wrote about the situation to the organizers almost immediately after the end of the competition until I could test anything after the end of the competition, also I could not resubmit my best submission because of the 10 minutes limit, besides I am not asking to choose a certain submission, but to choose the one that is the best and is displayed in the table right now. I spent quite a lot of effort and time on the competition(

→ Reply

»

7 days ago, # ^ |

-13

I don't understand why I got a negative vote, I just wanted to make my point clear, obviously I knew it couldn't be and I should pay for my mistakes

→ Reply

»

tzl_Dedicatus545

7 days ago, # ^ |

+23

Because people(including me),thinks your point is totally wrong.

→ Reply

»