By raiffeisen, 13 months ago, translation, In English

Codeforces, welcome!

Registration for the ML competition from Raiffeisenbank that starts on May 17 is now open. The prize fund of the championship is 700,000 rubles. The competition is unrated.

All terms of participation were developed by the Raiffeisenbank eFX trading team in collaboration with Codeforces. Huge thanks to geranazavr555 for the testing and great advice and MikeMirzayanov for Codeforces systems.

Participants are invited to build a predictive model based on the provided historical data.

Cash prizes will be contested only among Russian residents (we expect winning participants to provide descriptions of their solutions in Russian).

Prizes:

  • 1-4 places — 100K rubles each
  • 5-10 places — 50K rubles each
  • 1-60 places — merch packages

Registration is open until the end of the contest. Please fill in the contest registration form.

UPD1: Competition will now run for two weeks until May 31, 19:00 MSK!

UPD2: Use our baseline solution for a quick start

UPD3: Contest results are open. You can download the system testing data by the link http://assets.codeforces.com/rounds/1522/8f2fa64f1730ae12dc37504765d7e012e16613f0/tests2.csv

UPD4: Results will be announced within coming week! Stay tuned!

UPD5: The results are now final and the winners are:

  1. PenaFeministka
  2. arefiev.mc
  3. YANK01
  4. IgorDr1999
  5. l4morak
  6. Derbent
  7. catforces
  8. ZADaCHI
  9. Taube
  10. evteev

Congratulations! Everyone in global top-60 -- don't forget to check your inbox, we'll message you about merch packages.

Announcement of Codeforces Raif ML Round 1
 
 
 
 
  • Vote: I like it
  • +152
  • Vote: I do not like it

»
13 months ago, # |
  Vote: I like it +21 Vote: I do not like it

Is this Only for russian?

»
13 months ago, # |
Rev. 2   Vote: I like it +1 Vote: I do not like it

Hi, Currently I am not looking for any job opportunities. Could you please add "None" under "Are you looking for internships or full-time positions?*". I am just participating for fun. Also, what should I provide under Telegram or another messenger ?

»
13 months ago, # |
  Vote: I like it 0 Vote: I do not like it

I'm assuming this is unrated, right? (since it seems to be an ML contest rather than an algorithmic contest). It doesn't mention in the description.

»
13 months ago, # |
  Vote: I like it 0 Vote: I do not like it

Is this contest supposed to be a normal cf contest (with algorithmic problems) or is this something else ?

  • »
    »
    13 months ago, # ^ |
      Vote: I like it +8 Vote: I do not like it

    It is mentioned above that

    Participants are invited to build a predictive model based on the provided historical data

    So, it is not a normal CF contest.

»
13 months ago, # |
  Vote: I like it 0 Vote: I do not like it

Is this competition going to be ranked?

»
13 months ago, # |
  Vote: I like it +12 Vote: I do not like it

is it rated ?

»
13 months ago, # |
  Vote: I like it 0 Vote: I do not like it

Any similar contest from Past, so that we can practice ?

  • »
    »
    13 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    You can refer to the last two problems in Quora Programming Contest this year

    https://codeforces.com/blog/entry/86539

    The server is long offline, and the test cases are not published. I wrote some explanations for the ML problems in the Codeforces thread.

»
13 months ago, # |
Rev. 2   Vote: I like it 0 Vote: I do not like it

In more naive terms, in the contest participants are provided a dataset to build a machine learning model, or something different?

»
13 months ago, # |
  Vote: I like it 0 Vote: I do not like it

Is this contest rated?

»
13 months ago, # |
  Vote: I like it 0 Vote: I do not like it

Where can we find the contest rules please x) ?

»
13 months ago, # |
  Vote: I like it 0 Vote: I do not like it

Is Deep learning allowed and also what libraries we are allowed to use ?

»
12 months ago, # |
  Vote: I like it 0 Vote: I do not like it

I haven't ever tried writing ML algorithms but I do want to learn. Can someone please provide with some problems which are supposed to be solved with ML algorithms?

  • »
    »
    12 months ago, # ^ |
      Vote: I like it +8 Vote: I do not like it

    you can explore kaggle for ML related stuffs. It's like CF for ML. There are regular competitions on kaggle and it also have some good learning resources.

»
12 months ago, # |
  Vote: I like it 0 Vote: I do not like it

I hope to work remotely at Raiffeisen bank.

»
12 months ago, # |
  Vote: I like it 0 Vote: I do not like it

Is this only for university students?

»
12 months ago, # |
  Vote: I like it 0 Vote: I do not like it

What is the round format?
1. Is it a dynamic evaluation or is it a submission-upload format?
2. If dynamic, please share resource and library availabilities. (GPU/CPU specs, max-memory, library availabilities, info if custom libs can be uploaded, internet access)

»
12 months ago, # |
  Vote: I like it -21 Vote: I do not like it

Is it rated?

»
12 months ago, # |
  Vote: I like it -22 Vote: I do not like it

anyone here?

»
12 months ago, # |
Rev. 2   Vote: I like it +1 Vote: I do not like it

How to determine the winner of the football match ? Do we have to check other parameters if the total number of final goals of both teams are same ? Or we can say that it is a draw.

»
12 months ago, # |
  Vote: I like it -9 Vote: I do not like it

Nice to see codeforces hosting Data science related contests as well, Hoping to learn something new.

»
12 months ago, # |
Rev. 3   Vote: I like it +1 Vote: I do not like it

The interactive part is not clear to me. Why in the output there are 2 blank lines before each bet? I need more clarification on this point: After every response, your program should print a line feed and flush the output buffer. Is the "every response" mean every bet? I do not want to spend 2 — 3 days on this problem only to find out my solution "hung" in the system test.

»
12 months ago, # |
  Vote: I like it 0 Vote: I do not like it

Has anyone figured out what "Wrong answer on test 1" is supposed to mean here, instead of like a bad negative score?

  • »
    »
    12 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Hey, I also got the "Wrong Answer on Test 1". What files did you put in zip apart from main.py and train.csv?

    • »
      »
      »
      12 months ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      Apparently, I was missing some header files. Make sure to keep main.py at the root level of your zip file. Other than main.py I have two other files — the model and a helper dictionary

      • »
        »
        »
        »
        12 months ago, # ^ |
        Rev. 2   Vote: I like it 0 Vote: I do not like it

        Don't you have train.csv in your zip file? Can you write names of all the files with their extensions in your zip file?

        • »
          »
          »
          »
          »
          12 months ago, # ^ |
            Vote: I like it 0 Vote: I do not like it

          I am mainly using my pretrained model for predictions so I dont require a train.csv in the zip file

»
12 months ago, # |
  Vote: I like it 0 Vote: I do not like it

If I am trying to include numpy, pandas or scikit learn, it is giving runtime error. Is there any workaround for it?

»
12 months ago, # |
  Vote: I like it +1 Vote: I do not like it

I found that each match has a fixed answer with the highest score, which is one of "HOME", "DRAW", and "AWAY". And the score of output "SKIP" is 0. Then for each match, as long as you try 3 times, you can know which output has the highest score. When trying the i-th match, all outputs in other matches will output "SKIP". In this way, the highest score of the i-th match can be tested. So we can get the highest score through n*3 submissions, am I right?

  • »
    »
    12 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    I think that, but then you will try to submit 25,500 submission which is a hilarious number XD

  • »
    »
    12 months ago, # ^ |
      Vote: I like it +5 Vote: I do not like it

    Also, these are preliminary tests and the real tests will be after the contest, and if you did that it will pass the preliminary tests but will fail after that.

»
12 months ago, # |
  Vote: I like it 0 Vote: I do not like it

Anyone tried submitting with Python3 Lib +Zip files? What files did your zip file contain? 1) main.py 2) train.csv

.....??

  • »
    »
    12 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Yes, I have these two files but it is giving "compilation error". Pls help

  • »
    »
    12 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    The only requirement is to have a main.py file at the archive's root.

»
12 months ago, # |
  Vote: I like it 0 Vote: I do not like it

@geranazavr555 Do we have to submit the training code also?

  • »
    »
    12 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Hi Harsh, you have to print the outcome n number of times The first line will be n- the number of test_case. for every line in n input the advanced known parameters Based on the advanced parameters , print either HOME or AWAY or DRAW and flush it and again input the known parameters

    • »
      »
      »
      12 months ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      Thanks Subarna, but if I have trained a model on trainset how to use that model for prediction. Do I have to include the model file in my submission?

      • »
        »
        »
        »
        12 months ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        If you have trained your model for prediction based on the given training dataset, you have to include the model, the main code and the csv file altogether.

  • »
    »
    12 months ago, # ^ |
      Vote: I like it +1 Vote: I do not like it

    No, but after contest jury may ask you to provide your training code.

»
12 months ago, # |
  Vote: I like it 0 Vote: I do not like it

Can I get some help? I am trying to submit my solution I tried with sklearn first which worked but when I am doing the same with Tensorflow it's giving TLE.

I feel like I have done something wrong in implementation only :(

Can any one help thanks

»
12 months ago, # |
  Vote: I like it -11 Vote: I do not like it

Why after predicting the result of each match we are getting the post-match characteristics as input...I mean it's of no use..

  • »
    »
    12 months ago, # ^ |
      Vote: I like it +12 Vote: I do not like it

    It is of use as you can use that data for improving future predictions.

  • »
    »
    12 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    well, IRL the statistics is useful to understand whom to blame — judges, players, or coaches

»
12 months ago, # |
  Vote: I like it 0 Vote: I do not like it

Seems that contest has been extended for one week

»
12 months ago, # |
  Vote: I like it 0 Vote: I do not like it

How to resolve the issue Can not find 'main.py'?

»
12 months ago, # |
  Vote: I like it 0 Vote: I do not like it

Help required!

I am trying to upload a zip file whose size is 8.8 MB It has main.py and a model saved as .pkl The size of main.py is negligible but size of the pickle file is 52 MB. Any suggestions to reduce the size.

  • »
    »
    12 months ago, # ^ |
    Rev. 2   Vote: I like it 0 Vote: I do not like it

    Hi Aditya, You can opt for following things:-

    a) Go for bz2 compression. Library is available in python

    b) Use HIGHEST PROTOCOL with pickle. For ex: pickle.dump(clf open("raif_model.pkl", 'wb'), protocol=pickle.HIGHEST_PROTOCOL)

    c) Lower the min_samples_split if any

»
12 months ago, # |
Rev. 2   Vote: I like it 0 Vote: I do not like it

Is there some problem with reading match stats after SKIP? I get more points when I don't read match stats after skip rather than when I read it

»
12 months ago, # |
  Vote: I like it 0 Vote: I do not like it

Is merch packages for anyone in top places?

»
12 months ago, # |
  Vote: I like it 0 Vote: I do not like it

Is the training dataset also given in chronological order?

»
12 months ago, # |
  Vote: I like it 0 Vote: I do not like it

The team identifier given in testing would be from those given in train data?

»
12 months ago, # |
  Vote: I like it 0 Vote: I do not like it

"Cash prizes will be contested only among Russian residents (we expect winning participants to provide descriptions of their solutions in Russian)."

Does this does hold for merch packages as well?

»
12 months ago, # |
  Vote: I like it +33 Vote: I do not like it

It seems that the test data in the system has absolutely different nature in compare with training data. The one reason to think so is the following. If training data is splitted somehow into training and validation data, then local score estimation on that validation dataset does not correlate with the score in the test system (e.g. local score $$$\approx 100$$$, test score $$$-150$$$, local score $$$20$$$, test score $$$350$$$, wtf). And this is happening regardless of the split method (I tried a lot of them, but, unfortunately, did not find a good one for comfort local testing and solution estimation(it is really important, for instance, for hyperparameters search)). Usually, in many famous ML contests (e.g. Kaggle), it is solved in the following way: all contestants can choose 2 different solutions and the final score will be the maximum of chosen solutions scores. Why it is impossible in this contest?

  • »
    »
    12 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it
    • »
      »
      »
      12 months ago, # ^ |
        Vote: I like it +31 Vote: I do not like it

      Gl with the random shuffled Final leaderboard...

      • »
        »
        »
        »
        12 months ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        While I agree there might be a random shuffle, you are complaining about footballers playing inconsistently and blaming the dataset for not meeting your model.

        • »
          »
          »
          »
          »
          12 months ago, # ^ |
            Vote: I like it +23 Vote: I do not like it

          I am not. That is the usual situation with the real data. And there are enormous number of olympiads with the same situation. I said that in order to prevent a large difference in public and private leaderboard and to find the most effective solutions among participants it might be usefull to add opportunity to estimate the final solution as the max of two bests. And you are trying to say that is not important using not appropriate jokes. If organizers main purpose is to find a good solution then it would be helpful.

          • »
            »
            »
            »
            »
            »
            12 months ago, # ^ |
              Vote: I like it 0 Vote: I do not like it

            Oh sorry. I thought you are complaining.

    • »
      »
      »
      12 months ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      bruh

»
12 months ago, # |
  Vote: I like it 0 Vote: I do not like it

raiffeisen Please, add the way to participate out of contest. I want to train my ML skills, but I study at school, not at university so I can't fill the registration form.

»
12 months ago, # |
  Vote: I like it 0 Vote: I do not like it

Can we put .mat file in zip file which we are submitting?

»
12 months ago, # |
  Vote: I like it 0 Vote: I do not like it

Less than 19 hours left until the end of the contest. After the end of the contest, system testing will be launched using the new dataset. Make sure that the last submission is your final solution, it will be used in the final standings.

»
12 months ago, # |
  Vote: I like it +25 Vote: I do not like it

Now, when the contest is over, i can announce that the teams 280-281 are Man City and Man United (tho my solution doesn't use this fact)

»
12 months ago, # |
  Vote: I like it 0 Vote: I do not like it

what happend to the results the tables had changed at once

»
12 months ago, # |
  Vote: I like it 0 Vote: I do not like it

What happen to the standings, they are changed completely. Its more of like the admin reversed the leaderboard. Never seen such a big difference in the public and private lb. Even Kaggle leaderboards doesnt change quite like this.

  • »
    »
    12 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    This leaderboard permutation is common thing for such competitions, it always happen when train and test data have different distributions, meaning the better model you have (fitting train data) the worse it will perform on test set, so the only way to get public test points is overfitting leaderboard, meaning you will have huge gap on private data.

    • »
      »
      »
      12 months ago, # ^ |
      Rev. 4   Vote: I like it 0 Vote: I do not like it

      I agree what you said but I have taken part in alot of different ml contest and never seen such a huge difference. My public lb was 280 and private was -30 something and I dont think so it was overfitting keeping in mind that the top scorer got 800+ score.

      • »
        »
        »
        »
        12 months ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        It is possible to get huge score with an algorithm like that: take some team and make 3 submissions predicting all wins for that team, all ties and all loses, take best predictions, do that for another team, repeat until you got score that you want, but taking in considiration that top scoring (800) competitors from china made just 3 submission, i thought they have some cool solution too.

        • »
          »
          »
          »
          »
          12 months ago, # ^ |
          Rev. 2   Vote: I like it 0 Vote: I do not like it

          I guess the train-test sets were completely different but in this case there is no point of the training data because machine can't predict the sets correctly which are completely different from training set and hence the large difference

          • »
            »
            »
            »
            »
            »
            12 months ago, # ^ |
              Vote: I like it 0 Vote: I do not like it

            Yes, but it still possible to train just on those n <= 8500 samples from test inside main.py without using training data (i was doing that), or use train data with little weight and data that you get from test with bigger weight, but still the main problem here that we cant say anything how good our solution is because both 3 sets (train, public test, private test) have different distruibutions, even selecting the solution that you will send is like a casino.

            • »
              »
              »
              »
              »
              »
              »
              12 months ago, # ^ |
                Vote: I like it 0 Vote: I do not like it

              Leave it. I meant something else. Though yesterday after the competition ended my score went good i.e on the private lb but then they blocked it and in the morning it got worse. That is why I was curious like if they did a mistake or what.

»
12 months ago, # |
  Vote: I like it +8 Vote: I do not like it

"Cash prizes will be contested only among Russian residents (we expect winning participants to provide descriptions of their solutions in Russian)."

Does this does hold for merch packages as well?

»
12 months ago, # |
  Vote: I like it +21 Vote: I do not like it

I would like to thank RF for an interesting 2 weeks of competition! This is my first ML competition. During the competition, I had doubts about the ML capabilities to solve this problem, possibly related to my lack of knowledge.

It would be very useful if someone who is a closely associated with ML shared their experience. Did you consider this task as classification or regression problem or something else? Which model was most preferable for you? What features were the most significant? What tricks did you use? What metrics did you have on the test data?

As for me, I tried to solve it as a classification problem and tested various "out of the box" classifiers: xgboost, catboost, keras fully connected neural network (very slow for one sample prediction and would hardly fit into TL), some stacking of them etc. There were features series from different team indicators for the last n matches. The series were updated as the matches progressed. The most significant feature turned out to be the Elo rating.

I am stuck on the following: With some train-test splitting, for example, in the proportion of 32000:8000, the models with some tuning gained +100, +200 (and even +600 points). The accuracy on the validation data was 0.5-0.6, which is of course better than random walk since 3 classes. But at the same time, with different splitting (for example 28000:5000), the same models gained -100, -200 points with 0.5 accuracy. It is clear that this models is not suitable for a successful predictions due to large fluctuations. I have tried to improve this by selecting various combinations of features, but it didn’t work. Also I have tried to predict only home team win — validation data accuracy has increased to 0.7, possible income has naturally decreased, but the worst thing is that fluctuations remain at -50/+50.

The one way I found is to increase accuracy by skipping low probability model answers. Thus, for 8000 matches, it is possible to answer the 300 most predictable matches with accuracy about 0.9.

The problem is that the probable outcomes of these matches have very low odds, such low that the losses on 10% failed predictions balance the income with the 90% guessed ones.

I think, models learns the features that are already "included" in the bookmaker's odds (may be these features are so good for separation because they are "right" in some way?) And the way is to find the features that the bookmakers are not accounted for (overestimated outcomes). I think it is very difficult (impossible) to find such features, or not?

Can the scores gained with using these good features outperform the lucky fluctuations of nominal "xgboost" with an 0.5 accuracy and -N/+N points?

In any case, this is all a newbie reasoning and it would be very interesting to see experts solutions.

»
12 months ago, # |
  Vote: I like it +17 Vote: I do not like it

geranazavr555

I was in one of the top 30-th at this contest and I'm not Russian.

Can I receive non-cash prizes? If possible, do I have to explain my solution?

»
11 months ago, # |
Rev. 3   Vote: I like it 0 Vote: I do not like it

Can anyone tell how many points did the baseline solution receive?