You can use this test case in custom invocation to confirm that the solution using long double is indeed twice faster.
EDIT: I could pinpoint the problem and it is the mod member function that gets the norm of a vector using
hypot(x,y), which is the same as
sqrt(x*x+y*y) but with better precision. The solution with double gets accepted if I use double everywhere, casting to long double only here:
hypot((long double)x,y) (code). Using
sqrt(x*x+y*y) instead works too. Nonetheless it is still very strange that hypot function is faster when dealing with long double.