Interactive problems in Code Jam

#	User	Rating
1	tourist	3880
2	jiangly	3669
3	ecnerwala	3654
4	Benq	3627
5	orzdevinwang	3612
6	Geothermal	3569
6	cnnfls_csy	3569
8	jqdai0815	3532
9	Radewoosh	3522
10	gyh20	3447

#	User	Contrib.
1	awoo	161
2	maomao90	160
3	adamant	157
4	maroonrk	154
5	-is-this-fft-	148
5	atcoder_official	148
5	SecondThread	148
8	Petr	147
9	TheScrasse	145
9	nor	145

This post in Code Jam group

Hi,

I want to discuss interactive problems in Code Jam. Code Jam didn't have interactive problems until recently, and there are still major differences between interactive problems in Code Jam versus the other competitions where interactive problems are more established.

One difference is in how sample interactions are presented. I think that in Code Jam, sample interactions are very inconvenient to read. For regular input and output, there is a place where you can read just the input and just the output, but for interactive problems, you have to read a mix of pseudocode, comments and the actual input/output.

Other competitions that have interactive problems usually use more concise formats, which are easier to read quickly. One possibility is to just present the input and the output separately, like here (problems H and I). This doesn't convey the relative order of the input and output lines, but it can be reconstructed from the description of the interaction protocol. Also, it is possible to use empty lines to show the order, like here (problem C). There are multiple approaches that can be used to show a sample interaction in a single column, like showing input and output lines in different colors, or prefixing them with different symbols (for example, left and right arrows). In all cases, it is possible to add an additional column for comments.

Another feature of interactive problems specific to Code Jam is that when a solution produces an incorrect result, the judge program will communicate this to the solution and wait for it to terminate. This doesn't make sense. Once the solution produces an incorrect result, there is no point in continuing to run it, and in other competitions, it would be immediately terminated in such a situation. But apparently Code Jam team expects the contestants to write code that would check the judge's output and handle the case where it indicates that the solution's output is incorrect. This is useless, because it wouldn't make any solution pass that wouldn't pass otherwise, and that's all that matters for a solution. I for one don't write such code (my solutions always skip this part of the judge's output), but I still need to not forget to read (and ignore) that line, something that I don't have to do in other competitions.

Finally, I'd prefer sample interactions to be correct. It's simple: for regular problems, sample inputs and outputs are (almost) always correct, so why should it be any different for interactive problems? Yet, both of this year's interactive problems available so far have sample interactions where a solution provides an incorrect answer, apparently to demonstrate the judge's response to it (the right response, as I already said, is to terminate the solution with WA verdict).

What do you think of all this?

# This is a small program that runs two processes, connecting the # stdin of each one to the stdout of the other. # It doesn't perform a lot of checking, so many errors may # be caught internally by Python (e.g., if your command line has incorrect # syntax) or not caught at all (e.g., if the judge or solution hangs). # # Run this as: # python interactive_runner.py <cmd_line_judge> -- <cmd_line_solution> # # For example: # python interactive_runner.py python testing_tool.py 0 -- ./my_binaRy # # this will run the first test set of a python judge called "testing_tool.py" # that receives the test set number (starting from 0) via command line parameter # with a solution compiled into a binary called "my_binary". # # This is only intended as a convenient tool to help contestants test solutions # locally. In particular, it is not identical to the implementation on our # server, which is more complex. from __future__ import print_function import sys, subprocess, threading class SubprocessThread(threading.Thread): def __init__(self, args, stdin_pipe=subprocess.PIPE, stdout_pipe=subprocess.PIPE, stderr_pipe=subprocess.PIPE): threading.Thread.__init__(self) self.p = subprocess.Popen( args, stdin=stdin_pipe, stdout=stdout_pipe, stderr=stderr_pipe) def run(self): try: self.return_code = self.p.wait() self.stdout = "" if self.p.stdout is None else self.p.stdout.read() self.stderr = "" if self.p.stderr is None else self.p.stderr.read() except (SystemError, OSError): self.return_code = -1 self.stdout = "" self.stderr = "The process crashed or produced too much output." class ProxySubprocessThread(threading.Thread): def __init__(self, args, stdin_pipe=subprocess.PIPE, stdout_pipe=subprocess.PIPE, stderr_pipe=subprocess.PIPE): threading.Thread.__init__(self) self.p = subprocess.Popen(args, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE) self.stdin_pipe = stdin_pipe self.stdout_pipe = stdout_pipe self.stderr_pipe = stderr_pipe self.ix = 1 def report(self, message, party): message = message.strip('\n') if party == 'USER': a = "" b = "USER" else: a = "JUDGE" b = "" print("{:>6} {:>6} {:<6} {}".format(self.ix, a, b, message)) self.ix += 1 def run(self): while True: data = self.p.stdout.readline() if not data:break self.report(data.decode(), "JUDGE") self.stdout_pipe.write(data) self.stdout_pipe.flush() data = self.stdin_pipe.readline() if not data: break self.report(data.decode(), "USER") self.p.stdin.write(data) self.p.stdin.flush() self.return_code = 0 self.stderr = b"" if '-debug' in sys.argv: sys.argv.remove('-debug') SThread = ProxySubprocessThread else: SThread = SubprocessThread assert sys.argv.count("--") == 1, ( "There should be exactly one instance of '--' in the command line.") sep_index = sys.argv.index("--") judge_args = sys.argv[1:sep_index] sol_args = sys.argv[sep_index + 1:] t_sol = SubprocessThread(sol_args) t_judge = SThread(judge_args, stdin_pipe=t_sol.p.stdout, stdout_pipe=t_sol.p.stdin) t_sol.start() t_judge.start() t_sol.join() t_judge.join() print("Judge return code:", t_judge.return_code) print("Judge standard error:", t_judge.stderr.decode()) print("Solution return code:", t_sol.return_code) print("Solution standard error:", t_sol.stderr.decode())

Comments (8)

Write comment?

mango_lassi

5 years ago, # |

+28

"Once the solution produces an incorrect result, there is no point in continuing to run it, and in other competitions, it would be immediately terminated in such a situation."

I don't think this is usually the case. In codeforces it definitely isn't, see for example 872D - Something with XOR Queries. Similarly in BOI 2018 there's this problem.

It doesn't really bother me that much either, since I usually write a separate function to write and read output to and from the grader, and you can just put a exit(0) there if the grader returns -1.

→ Reply

pwild

5 years ago, # ^ |

+13

Actually, both Kattis and DOMjudge have recently changed the protocol for interactive problems so that when the validator detects a wrong answer, that is also the verdict that will be shown, see this PR.

I think the interaction protocol on Code Jam would also be a lot cleaner if they split each test case into a separate run of the submission. Does anybody know why they don't do that? Is it because of server load?

mnbvmar

+99

The visual design thing: I particularly loved the sample interaction in this year's ICPC finals dress rehearsal:

IMHO it leaves very little area for interpretation — you instantly see who writes what and when. Of course, the only problem is that it's not straightforward to recreate in the web browser. :/

Another thing is that I needed some time to get used to the sample interactor. I had to guess that I needed to look at the source code of the interactor and understand some comments in there, only to run the code. It could be another thing repelling people from the interactive problems in Codejam.

rmn

+33

Of course, the only problem is that it's not straightforward to recreate in the web browser. :/

With flexbox, it's actually not that difficult: https://cdpn.io/rbJWGX

Coder

+49

You forgot to mention that they provide tester script which may be more important than how pretty sample input/output looks in statement

Al.Cash

I wish their script could copy the interaction history to console so I could see what went wrong...

mnaeraxr

+22

I modified interactor for round 1A to debug messages going through. Just pass -debug flag to the interactor like this:

python interactive_runner.py -debug python testing_tool.py 0 -- ./my_binaRy

interactive_runner.py

Notice I have only tested this with working programs. I haven't explored what happens when the program crash (RTE) or the interactor crash.

This should print to the terminal something like this:

output

     1  JUDGE        3 365 100
     2        USER   16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16
     3  JUDGE        0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
     4        USER   9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9
     5  JUDGE        0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0
     6        USER   5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
     7  JUDGE        0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0
     8        USER   7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7
     9  JUDGE        0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
    10        USER   11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11
    11  JUDGE        0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0
    12        USER   13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13
    13  JUDGE        0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0
    14        USER   17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17
    15  JUDGE        0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1
    16        USER   2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
    17  JUDGE        0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0
    18        USER   2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
    19  JUDGE        1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    20        USER   2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
    21  JUDGE        0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0
    22        USER   2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
    ...

godmod

3 years ago, # ^ |

Exactly what I was looking for. Still works !! thanks mnaeraxr

eatmore's blog