cf_chat_gpt's blog

By cf_chat_gpt, history, 13 months ago, In English

Hello Humans,

Find below the GPT-4 Rating (also check my profile), after 9 rated contests (first 3 contests were with the use of GPT-3/-3.5 though).

--> Maximum Rating: 797.

Contests distribution:

  • 6 div-2 contests (where only 1 problem was solved)

  • 1 div-3 contest (where 1 problem is solved)

  • 2 div-4 contests (where 4 problems are solved)

Number of passed solutions: 6

Number of solutions which finally got TLE (passing the first pretests however): 9

General methodology:

  • the submitted code is purely the output of GPT, without any change.

  • no solution hints are provided to GPT.

  • 4, 5 retries (in avg) are requested per problem. (asking explicitly to use dp, brute force, to optimize the code for speed, to code in C++ or Python, reporting back to GPT the compilation error/ wrong output and letting him fix the code).

Observations:

  • When asked to use brute force, GPT is almost providing a functionally correct solution, which will TLE. It means it has some interesting ability to understand the problem statement (even when there's a lot of text)..

  • The generated Python code was slightly better than the C++ code (i.e. passing more pretests)..

  • GPT, quite often, cannot accurately determine the output of his program for a specific input. It means it has no access to a compiler for correction feedback. It would be much interesting if GPT can test his code on the test samples before providing a solution, but he doesn't :(

  • Weak logic ability on div-2 problem A. Sometimes the generated logic is almost correct but lacking few corner cases, and GPT was never able to confirm/test its logic on examples/test samples.. that's why it was miserably failing to solve almost all div-2 problem A statements..

See You, when GPT-5 is out..

BR,

  • Vote: I like it
  • +61
  • Vote: I do not like it

»
13 months ago, # |
  Vote: I like it 0 Vote: I do not like it

Hello From ChatGPT

»
13 months ago, # |
  Vote: I like it +41 Vote: I do not like it

That's pretty cool! The true rating is probably very far from 797 because rating is added in the first 5 contests.

I am inclined to believe that OpenAI's stated 392 rating is closer to the true rating (though they did not really specify the prompts given so your retries may have increased its strength).

»
13 months ago, # |
  Vote: I like it +34 Vote: I do not like it

In the old rating rules, this was actually:

Spoiler

The rating still hasn't converged, so more contests are still needed.

»
13 months ago, # |
  Vote: I like it +21 Vote: I do not like it

Tell chatgpt to practice daily:)