Weak test cases for problem G from Grakn Forces, or how did I hack tourist

#	User	Rating
1	ecnerwala	3649
2	Benq	3581
3	orzdevinwang	3570
4	Geothermal	3569
4	cnnfls_csy	3569
6	tourist	3565
7	maroonrk	3531
8	Radewoosh	3521
9	Um_nik	3482
10	jiangly	3468

#	User	Contrib.
1	maomao90	174
2	awoo	164
3	adamant	161
4	TheScrasse	159
5	nor	158
6	maroonrk	156
7	-is-this-fft-	152
8	SecondThread	147
9	orz	146
10	pajenegod	145

Today I was upsolving a problem 1408G - Clusterization Counting from the recent Grakn Forces 2020 contest. I've solved it, however I didn't really like my solution. So I decided to view how other participants solved this problem. Of course, I first opened the tourist's submission (94320762), because his codes are always written in a good style and are easy to understand.

But when I carefully looked through his solution, I was a little bit confused, because he used the idea, which I supposed to be wrong.

The idea is the following: computers with indices $$$x_1$$$, $$$x_2$$$, $$$\ldots$$$, $$$x_k$$$ form a valid set, iff for each $$$i \in$$$ {$$$x_1, x_2, \ldots, x_k$$$}, values $$$a[i][x_1]$$$, $$$a[i][x_2]$$$, $$$\ldots$$$, $$$a[i][x_k]$$$ are smaller then all $$$a[i][j]$$$, where $$$j \notin$$$ {$$$x_1, x_2, \ldots, x_k$$$}. It's obvious that it's a necessary condition for being a valid set, and while solving this problem I first thought, that it can be also a sufficient condition. But after thinking a little bit, I realized that it's not a sufficient condition. For example on the following test:

Computers with indices $$$1, 2, 3$$$ don't form a valid set due to $$$a[2][3] > a[1][4]$$$, however all conditions are true: $$$a[1][2] < a[1][4]$$$, $$$a[1][3] < a[1][4]$$$, $$$a[2][1] < a[2][4]$$$, $$$a[2][3] < a[2][4]$$$, $$$a[3][1] < a[3][4]$$$ and $$$a[3][2] < a[3][4]$$$.

So I decided to run tourist's submission on this test and it turned out, that his output was wrong. After I successfully hacked his submission, I also run all 97 accepted solutions from the contest on that test case, and 13 of them turned out to be wrong. Among them were 3 submissions from the participants who took places in the top 10.

I think that there are many different tests which fail this idea, so it's strange that none of them was present in the system tests. Maybe the reason is that there are only 24 system tests in this problem. A similar situation (a small number of system tests) happened to the problem 1408F - Two Different from this round. So I think that preparing a larger number of tests is a good idea, especially for the hard problems where there are not too many submissions to test.

Comments (9)

Write comment?

Radewoosh

4 years ago, # |

+27

Yea, tests were really weak, I've hacked aid's solution because of DSU without any optimization (so it was $$$O(n^3)$$$) and also Errichto hacked mine because of a bug.

There's a description of the bug: when we scan the edges in the ascending order of weights and merge the components if necessary, we always multiply two polynomials that represent the results for the components. Also, if the current edge is the maximal edge contained in some component, we do something like $$$DP[component][1]$$$++, as it's a valid component at the moment. So, merging components A and B to calculate the weight of the heaviest edge in the component we should take max(max_in_A, max_in_B, max_between_A_and_B), so it will amortize to $$$O(n^2)$$$. Due to the bug I was taking only max(max_in_A, max_between_A_and_B) and that also passed the systests...

→ Reply

Bhosideke

4 years ago, # ^ |

-68

can you help with codechef long challenge we can do a zoom call? I can pay you

+53

Yeah, I can cheer you to solve the problem by yourself, no problem. Maybe even I won't too expensive!

-28

Cheering probably won't help too because I struggle with DIV2A's.

rng_X

+11

Then maybe nothing will help in your case!

antontrygub

+10

So tourist submits obviously wrong solutions without proofs, wow.

+16

Cause submitting obviously wrong solutions, but with proofs, is better...

dorijanlendvaj

← Rev. 2 →

+41

One way all the WA uphacks could've been avoided is by using pseudo-multitest: you can make some components with small size and then merge them later; the end result will be correct only if the result for all of those components is also correct. Example of a generator which hacks all the G WA uphacked solutions:

Code

#include <bits/stdc++.h>

using namespace std;

const int N=3010,LEN=6; //it would probably be best to have a test with LEN=4 and a few with LEN=5, 6 and 7
int t,n,h[N][N];

int main()
{
	t=1500/LEN;
	n=t*LEN;
	cout<<n<<endl;
	vector<int> v;
	for (int i=1;i<=LEN*(LEN-1)/2;++i) v.push_back(i);
	for (int z=0;z<t;++z)
	{
		random_shuffle(v.begin(),v.end());
		int cn=0;
		for (int di=1;di<LEN;++di) for (int i=z*LEN;i<(z+1)*LEN-di;++i) h[i][i+di]=v[cn++]+z*LEN*(LEN-1)/2;
	}
	int cu=t*LEN*(LEN-1)/2;
	for (int i=0;i<n;++i) for (int j=i+1;j<n;++j) if (!h[i][j]) h[i][j]=++cu;
	for (int i=0;i<n;++i) for (int j=0;j<i;++j) h[i][j]=h[j][i];
	for (int i=0;i<n;++i)
	{
		cout<<h[i][0];
		for (int j=1;j<n;++j) cout<<' '<<h[i][j];
		cout<<endl;
	}
}

SuperJ6

How is dori not coordinator yet?

BigBag's blog