vlchen888's blog

By vlchen888, history, 3 months ago, In English

Hey all,

We are excited to invite everyone to the Quora Programming Challenge 2021, a free programming competition open to participants from around the world*! The competition will take place from Saturday, February 6th, 2021 to Sunday, February 7th, 2021, in two divisions of 4 hours each, to accommodate different timezones.

Quora is a platform to ask questions, get useful answers, and share what you know with the world. On this contest, we include some problems inspired by real-world challenges that Quora engineers faced in building and growing the product. In addition to competitive programming-style questions, there will also be some machine learning problems. We hope that this contest will be fun for everyone!

The top contestants in each division will be able to win the following prizes:

  • 1st place: $2000
  • 2nd place: $1000
  • 3rd place: $500
  • 4th — 10th: $200
  • Top 50 finishers: Quora Challenge T-Shirt

Our recruiting team will also be reaching out to top participants for their interest in interviewing for roles at Quora. We recently became a remote-first company, so you don’t have to relocate to work with us.

Register for the competition here by 2/5/2021: https://www.quora.com/careers/challenge.

Hope to see you at the contest!

 
 
 
 
  • Vote: I like it
  • +127
  • Vote: I do not like it

»
3 months ago, # |
  Vote: I like it 0 Vote: I do not like it

Auto comment: topic has been updated by vlchen888 (previous revision, new revision, compare).

»
3 months ago, # |
  Vote: I like it +9 Vote: I do not like it

Deja vu.

»
3 months ago, # |
  Vote: I like it +21 Vote: I do not like it

Why is LinkedIn profile a required field?
If it is so much required, why does it accept https://linked.in/none as answer?
Mind that the answer is accurate and truthful to the extent possible, as required in the rules.

  • »
    »
    3 months ago, # ^ |
      Vote: I like it +79 Vote: I do not like it

    It's not really surprising that the form doesn't validate that the URL is a real profile URL, they did some rough validation and that's enough; do you expect forms to also automatically check that your City is a "real city" against some magical database before allowing your submission? No one ever does that.

    LinkedIn profile is required because, clearly, Quora is running this (for free, with real prize money!) as a tool to recruit people to work there.

    • »
      »
      »
      3 months ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      For important stuff, there's a way to validate an address: "We sent you a mail / message / etc. with a code, please input it in the next form to complete the registration process".

      This one looks mandatory (required field in the form) but not enforced (no validation and no specific mention of LinkedIn profile in the rules, only "fill the form accurately and truthfully"). Hence the question, which is more about the company's intent: if it is really required, it can be clearer in the rules or via validation, if not, the required status can be dropped.

      • »
        »
        »
        »
        3 months ago, # ^ |
          Vote: I like it +11 Vote: I do not like it

        Validation only ever really happens during actual account creation; no one ever validates things like that in registration forms.

        I guess the real question is whether they'll disqualify you if you don't have a LinkedIn? I'm not sure if they will, but from the rules about "accurately and truthfully", it sounds like they can, so I'd just make a LinkedIn and leave it mostly empty.

»
3 months ago, # |
  Vote: I like it 0 Vote: I do not like it

The problems in the 2014 version of this challenge has a not-strictly-algorithmic challenge, in particular Sorted Set where you have to parallelize a task, which is not common in CP. Are these types of questions fair game under your description of "competitive programming-style questions"?

  • »
    »
    3 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    This is basically to recruit people, so I think most probably they will ask problems similar to that they find in their daily workflow instead of totally focusing on "competitive programming-style questions".

  • »
    »
    3 months ago, # ^ |
      Vote: I like it +46 Vote: I do not like it

    Hi, thanks for the question! This iteration of the challenge will contain problems that fall into two categories.

    1. “Algorithm” problems: These are competitive programming-style questions that have strictly algorithmic solutions. It is possible to achieve a full score on these problems. Most of the problems will fall into this category.
    2. “Machine Learning” problems: These are problems around machine learning concepts. Scoring is problem-dependent, but roughly would be based on some scaled accuracy metric. We do not guarantee that it is possible to achieve a full score on these problems.
    • »
      »
      »
      3 months ago, # ^ |
        Vote: I like it +55 Vote: I do not like it

      I think many people are familiar with type 1, but for type 2, do you think you could provide a simple example problem ahead of time?

      • »
        »
        »
        »
        3 months ago, # ^ |
          Vote: I like it +5 Vote: I do not like it

        +1. Will any machine learning background be required to solve those problems?

      • »
        »
        »
        »
        3 months ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        Hi Anand, you can take a look at this problem from our 2014 contest for an example of what we’re considering an ML problem: https://www.quora.com/q/quorahaqathon/Quora-Haqathon-Labeler. These problems will not have perfect solutions, and you’ll find knowledge of machine learning useful for solving them. Resources are constrained though, so you won’t be training a SOTA model, and heuristic based approaches may be sufficient or superior for some problems.

    • »
      »
      »
      3 months ago, # ^ |
        Vote: I like it +53 Vote: I do not like it

      Have you considered splitting the challenge into two contests? Also it's been my experience that it's quite hard to give a meaningful solution to a ML problem in 5 hours :(.

      Will it be clear on which problems you can guarantee a full score?

»
3 months ago, # |
  Vote: I like it -33 Vote: I do not like it

Since I don't really like Quora as a platform (I believe in the idea, but not the current implementation), is there any way for me to register for the contest without logging into a Quora account? I try to minimize my social media accounts, especially those based around ads and data monetization (which is all of them nowadays, except Microsoft Circles).

Additionally, what's the approximate value of a Quora Challenge T-Shirt? I'm trying to decide whether it's worth attending based on the expected value.

  • »
    »
    3 months ago, # ^ |
      Vote: I like it +8 Vote: I do not like it

    I think a shirt like that mainly has value to you, so no one else can answer that question. Unless you mean the monetary value of a shirt which can't be over 15 dollars (and could be a lot less)

»
3 months ago, # |
  Vote: I like it +13 Vote: I do not like it

Reminder that registration for this contest is closing soon!

  • »
    »
    3 months ago, # ^ |
      Vote: I like it +20 Vote: I do not like it

    Am I right that all problems allow partial scoring, and that wrong attempts don't increase penalty? Will there be a limit on a number of submissions per problem?

  • »
    »
    3 months ago, # ^ |
      Vote: I like it +28 Vote: I do not like it

    29 January 2021 — "In the week before the contest, you should receive another email from us with the link to the contest platform and instructions on how to log in. We will use the same link for both divisions of the contest, and you will be able to log in to either one (or both) of the divisions."

    I have not received another email ...

    • »
      »
      »
      2 months ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      Ok I have received an email

      "The Quora Programming Challenge 2021 is almost upon us!

      Here is the login information that you will need to access the contest (please save this information):

      Contest URL: challenge.quora.com ..."

»
2 months ago, # |
Rev. 2   Vote: I like it 0 Vote: I do not like it

Missed registration by some time, thought I will be able to register by 5 Feb IST time :(

  • »
    »
    2 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    I thought we had all of February 5th (until 23:59 UTC) at least to register... ):

  • »
    »
    2 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    I thought the same thing, two rounds missed.

  • »
    »
    2 months ago, # ^ |
      Vote: I like it +14 Vote: I do not like it

    +1, can the registration time be extended? :(

»
2 months ago, # |
Rev. 3   Vote: I like it +6 Vote: I do not like it
From https://challenge.quora.com/rules

I find this interesting for discussion :)

  • »
    »
    2 months ago, # ^ |
      Vote: I like it +22 Vote: I do not like it

    Doubling the time limit is understandable but not needed; doubling the memory limit is just weird.

    • »
      »
      »
      2 months ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      I think they were right to allow double time limit (for java at least, no idea about python).

      I personally solved the datacenter problem from division 2 first in java, got TLE, implemented same approach (almost same code) which got accepted.

»
2 months ago, # |
Rev. 3   Vote: I like it +6 Vote: I do not like it

Any Java Coders out here, I keep getting the error, "Execution failed because the return code was nonzero". I named my file as Example.java and removed the package and also removed the public keyword from my class. But this continues to show up. I tried submitting a solution using CPP and It got accepted. (BTW this is for the practice session.)

EDIT : (I managed to solve it, my file name had to be "example.java" and not "Example.java")

  • »
    »
    2 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    There isn't a custom invocation like what Codeforces have that can show us what is the error code :/

  • »
    »
    2 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Actually, I missed the practice session. Can you tell me that what should be the name of class for Java solution ? Or can we name it anything ?

    • »
      »
      »
      2 months ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      What I think is, class and file name will be the same as question name.

      But the thing is, they have not mentioned this anywhere and if you name it something else then a weird error will come

      • »
        »
        »
        »
        2 months ago, # ^ |
        Rev. 3   Vote: I like it +4 Vote: I do not like it

        "Since the grading system renames files to match the task name, you will need to use that same task name for the public class name in Java solutions. In the practice problem, this would be "example", since the grading system renames the file to be example.java" . They've sent something like this as an announcement. So, according to it name your file as the "questionname".java and make sure there are no capitals.

»
2 months ago, # |
  Vote: I like it +33 Vote: I do not like it

Will standings page be available during the contest? I don't see one in practice session.

»
2 months ago, # |
  Vote: I like it +29 Vote: I do not like it

"A Participant choosing to participate in both divisions could win prizes in both divisions. Limit one (1) prize per division per Participant.".

So, is it possible that best 50 in both divisions is exactly the same?

»
2 months ago, # |
  Vote: I like it +2 Vote: I do not like it

FYI, here are the Installed Python packages quoted from the official website:

  • Keras 2.4.3
  • numpy 1.19.5
  • Keras-Preprocessing 1.1.2
  • lightgbm 3.1.1
  • pandas 1.1.5
  • scikit-learn 0.24.1
  • scipy 1.5.4
  • tensorboard 2.4.1
  • tensorboard-plugin-wit 1.8.0
  • tensorflow 2.4.1
  • tensorflow-estimator 2.4.0
  • torch 1.7.1
  • xgboost 1.3.3
»
2 months ago, # |
  Vote: I like it -8 Vote: I do not like it

Is there any possibility of late registration for either division ?

»
2 months ago, # |
  Vote: I like it -41 Vote: I do not like it

first long challenge, now this, I am going to use telegram in the coming weeks.

»
2 months ago, # |
  Vote: I like it +92 Vote: I do not like it

anton would've surely rejected atleast 6/7 problems. :)

  • »
    »
    2 months ago, # ^ |
      Vote: I like it +18 Vote: I do not like it

    seeing same type of comments nowadays It feels that anton have redefined competitive programming.

    • »
      »
      »
      2 months ago, # ^ |
        Vote: I like it +56 Vote: I do not like it

      I think this problemset would have been rejected by 2015 coordinators as well.

      • »
        »
        »
        »
        2 months ago, # ^ |
        Rev. 4   Vote: I like it -18 Vote: I do not like it

        why because they were too difficult ? I usually observe that people like kind of easy problem set and dislike difficult one.

        Also notice that format was completely different . It was 4 hrs instead of 2 hrs.

        -is-this-fft- i don't know why i can't reply two times in 10 minutes , So you want to say because problems were kind of popular or well known and thus they were boring ? From "standard" i usually feel well known problems which people solve while learning some topic.

        Was only me who found problemset was only for > master people ?

        • »
          »
          »
          »
          »
          2 months ago, # ^ |
            Vote: I like it +32 Vote: I do not like it

          No, because of problem style. Most of the problems were very standard, even a few years ago. I basically quit the contest out of boredom.

        • »
          »
          »
          »
          »
          2 months ago, # ^ |
            Vote: I like it +15 Vote: I do not like it

          -is-this-fft- i don't know why i can't reply two times in 10 minutes ,

          Probably because of unrated and low contribution.

          So you want to say because problems were kind of popular or well known and thus they were boring ? From "standard" i usually feel well known problems which people solve while learning some topic.

          Yes, that. For example there was a basic "learning König's theorem" problem and a pretty basic "learning HLD" problem. Note that using these techniques to set problems is fine, but not if knowing the thing is basically the whole problem. I also think that it is fine if there is an "educational" problem from time to time, but if all problems are like that then the problemset is bad.

          Was only me who found problemset was only for > master people ?

          Well, not master+, more like expert+. But you're right that this contest wasn't very green-friendly.

          • »
            »
            »
            »
            »
            »
            2 months ago, # ^ |
              Vote: I like it 0 Vote: I do not like it

            thanks . Basically the topics you mentioned are well known to high rated people and thus it was boring for them and difficult for less rating people because they aren't aware of those standard techniques.

            More like it was asking directly from topics (I guess that you want to say).

            Any hint for Escape (maize , wall , victor were terms) problem ? I got partial by brute force but wasn't able to solve for full points , because i was marking points for all guards .

          • »
            »
            »
            »
            »
            »
            2 months ago, # ^ |
              Vote: I like it +56 Vote: I do not like it

            There was no need for HLD, binary climb was more than enough

»
2 months ago, # |
  Vote: I like it 0 Vote: I do not like it

In problem C(Problem ID: students), won't the graph always be bipartite? (Solving with this assumption fetched me 21 points though) or was it an implementation mistake?

21 Pointer CODE
  • »
    »
    2 months ago, # ^ |
      Vote: I like it +19 Vote: I do not like it

    It is bipartite but your complexity is wrong.

    • »
      »
      »
      2 months ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      cant it be solved with the help of priority queue?

  • »
    »
    2 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Yeah I did the same. But I dont know anything about Maxmimum Independent set. So I just copied some template from github.

    I am really curious about the expected solution.

  • »
    »
    2 months ago, # ^ |
      Vote: I like it +3 Vote: I do not like it

    Yes, the graph is bipartite. But I think you are assuming that you can always get a maximum independent set in a bipartite graph by taking all the nodes in one of it sides (same color). That's clearly not true.

    • »
      »
      »
      2 months ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      Yes, you're absolutely right, so is finding maximum independent set of a graph some standard problem?

    • »
      »
      »
      2 months ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      Can you please tell me what is wrong with my code. I also used bipartite graph and took max of two colors for every connected component. https://ideone.com/cKVuu2

      • »
        »
        »
        »
        2 months ago, # ^ |
        Rev. 2   Vote: I like it +1 Vote: I do not like it

        Imagine bipartie graph

        1 1
        2 1
        3 1
        3 2
        3 3
        

        Each half has 3 vertices, so your solution will output 3. But if we take 1, 2 from left side and 2, 3 from right side we can achieve 4

        • »
          »
          »
          »
          »
          2 months ago, # ^ |
            Vote: I like it 0 Vote: I do not like it

          I'm sorry but didn't understand what you mean by 1 2 and 3 in that array. Basically I did a bfs on each connected components and assigned 0 or 1 color to each node and took the maximum of two colors for each connected components. Can you please tell me what is wrong with my methodology here.

          • »
            »
            »
            »
            »
            »
            2 months ago, # ^ |
              Vote: I like it +1 Vote: I do not like it

            Sorry, it was badly formatted. Hope it is understandable now

  • »
    »
    2 months ago, # ^ |
      Vote: I like it +3 Vote: I do not like it

    I think you have to find the maximum independent set instead of choosing only one color.

  • »
    »
    2 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Maximum independent set = total vertexes — edges in maximum matching

    • »
      »
      »
      2 months ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      Do you mean maximum bipartite matching? If yes can you share some insight of how you got this formula(sorry if it's something very standard)

      • »
        »
        »
        »
        2 months ago, # ^ |
          Vote: I like it +10 Vote: I do not like it

        From König's theorem we know that number of vertexes in minimal cover equals to number of edges in maximum matching. So total number of vertex is sum of vertex in minimum cover and maximal independent set so we have this equation.

    • »
      »
      »
      2 months ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      How to find maximum matching for this graph? My Dinic only gets sub 2.

      • »
        »
        »
        »
        2 months ago, # ^ |
          Vote: I like it +24 Vote: I do not like it

        If u use dinic on vertex only if it appears in the input, which is $$$4\times 10^5$$$, then it would be fast enough. Since it works in O(sqrt(V) * E) in bipartite graph.

        • »
          »
          »
          »
          »
          2 months ago, # ^ |
            Vote: I like it +25 Vote: I do not like it

          I got TLE when I did what you mentioned. I got AC by running dinic for every connetected component.

          • »
            »
            »
            »
            »
            »
            2 months ago, # ^ |
              Vote: I like it 0 Vote: I do not like it

            Does the contest provide a TLE verdict? I had used the less efficient solution and received a "killed (possibly memory exceeded)" message and suspect it may be a TLE.

        • »
          »
          »
          »
          »
          2 months ago, # ^ |
            Vote: I like it 0 Vote: I do not like it

          Ah, I forgot it :|. Thanks

»
2 months ago, # |
  Vote: I like it 0 Vote: I do not like it

Is the wall problem (forgot the exact name) a connected component dp problem? Can't find a valid transition during contest.

  • »
    »
    2 months ago, # ^ |
      Vote: I like it +10 Vote: I do not like it

    Yea it is, you just need to maintain the current type of connected components in each column and also which squares in the column are "active".

    • »
      »
      »
      2 months ago, # ^ |
        Vote: I like it +8 Vote: I do not like it

      Any chance you could send me your solution? I'm not sure why mine doesn't pass, and I'd like to stress test it to find a countercase.

      • »
        »
        »
        »
        2 months ago, # ^ |
          Vote: I like it +10 Vote: I do not like it
      • »
        »
        »
        »
        2 months ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        I coded the bruteforce solution during the contest to debug my full solution. I felt like it was worth it since it is difficult to debug otherwise, and at least I can gain 14 pts from the first subtask in case I did not manage to debug it until the end of the contest.

        Even the O(2^N) solution requires a MST. :|

        • »
          »
          »
          »
          »
          2 months ago, # ^ |
          Rev. 2   Vote: I like it 0 Vote: I do not like it

          I also wrote a brute force in contest, both to submit for 14 points and to use for stress testing. Unfortunately, I was unable to find a countercase using that solution, so I requested other competitors' solutions so that I could try to generate a countercase with a higher N. Unfortunately, even that didn't work out, so apparently there's a relatively obscure bug somewhere in my solution.

          Did your brute force give you the countercase you were looking for, and if so, would you mind sharing the case? It sounds like a fair number of others failed on the same test case I did (TC7 from subtask one), so there's a decent chance countercases that worked for others will also break my solution.

          • »
            »
            »
            »
            »
            »
            2 months ago, # ^ |
            Rev. 3   Vote: I like it +18 Vote: I do not like it

            Did your brute force give you the countercase you were looking for, and if so, would you mind sharing the case?

            Fortunately, yes. This is the countercase

            Spoiler

            It should return 111 where my buggy full solution code returns 115. It does not pass the first hidden testcase though. The issue was related to the first and the third row squares being connected while the second is not.

    • »
      »
      »
      2 months ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      Thanks man

»
2 months ago, # |
  Vote: I like it +8 Vote: I do not like it

How can people apart from top-50 check their ranks?

»
2 months ago, # |
  Vote: I like it +10 Vote: I do not like it

Did anyone get an issue on the wall task where on test case 7 the code would throw a runtime error? I know others who had the same problem

  • »
    »
    2 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    I didn't end up with an RTE, but I got WA7 on the first subtask and couldn't finish debugging before the end of the round. I'm also looking for a countercase; I've stress tested my solution on thousands of cases against a slow solution I wrote and the correct solutions I've just been sent and can't find a case that I WA. If anyone happens to have gotten WA7/RTE7, found a countercase, and successfully debugged their solution, that case would be very much appreciated!

    • »
      »
      »
      2 months ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      Can you show your code?

      • »
        »
        »
        »
        2 months ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        Sure, here's a link: https://pastebin.com/pRjgYdNH.

        Essentially, I use a state that, after all vertical walls prior to a given column and all horizontal walls in that column have been processed, indicates the connectivity structure of the current column and which connected components contain workers. There are then 22 states, though I track them as values from 0 to 47 to make keeping track of the underlying values easier--essentially, for a given state S, column 0 is in connected component 0, S % 2 is the connected component containing column 1, (S / 2) % 3 is the connected component containing column 2, and S / 6 is a bitmask representing which components contain workers, and 32 possible transitions at each column (as there are five walls that could be removed in each transition). I precalculate all possible transitions, after which my solution is a fairly straightforward DP approach.

        • »
          »
          »
          »
          »
          2 months ago, # ^ |
            Vote: I like it +13 Vote: I do not like it

          I ran a stress-test against your solution compiled with sanitizers, and what I found is that your code fails with RE when $$$n$$$ is $$$1$$$. The problem is that in line 150 you allocate an array of size $$$N-1$$$, and it seems to be not legal (maybe it is undefined behaviour, I am not sure): I get an error "variable length array evaluates to non-positive value 0".

          • »
            »
            »
            »
            »
            »
            2 months ago, # ^ |
              Vote: I like it 0 Vote: I do not like it

            Ooh, gotcha—thanks! I’ll see if fixing that helps.

»
2 months ago, # |
  Vote: I like it +7 Vote: I do not like it

Will tasks be available to submit somewhere later?

»
2 months ago, # |
  Vote: I like it 0 Vote: I do not like it

what was test case 7 of problem escape. it was getting stuck.

  • »
    »
    2 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    One thing that I missed in first attempt and got WA at test 7 was that guards will also be stopped by the walls. So you can't just check manhattan distance of a cell from a guard and determine its feasibility.

    • »
      »
      »
      2 months ago, # ^ |
      Rev. 2   Vote: I like it -13 Vote: I do not like it

      Ya I got it now that I also missed this only . Why not it was in bold? Thanks

»
2 months ago, # |
  Vote: I like it +24 Vote: I do not like it

How to solve flipped? My 93 points code: CODE

  • »
    »
    2 months ago, # ^ |
      Vote: I like it +24 Vote: I do not like it

    I got AC by assuming each column originally followed a normal distribution with its post-flip mean and standard deviation and picking the rows where the product of the normal PMF corresponding to each value assuming the row is not flipped divided by the product of the normal PMF corresponding to each value assuming the row is flipped is minimized. Intuitively, I assumed that the flips didn't substantially change the overall distribution of each column and then picked the rows where flipping most drastically increases the probability that the values in the row appear in their corresponding distributions.

    Here's my code (excluding my template): https://pastebin.com/YtYNwPh5

  • »
    »
    2 months ago, # ^ |
      Vote: I like it +24 Vote: I do not like it

    I computed the mean and standard deviation of each column. Then, for each row $$$r$$$, let $$$f(r)$$$ denote the sum of squared distance from 0.5 of all the value of the cumulative normal distribition on each value in the row (drawn from the estimated normal distribution for each column). Do the same for reversed row $$$r'$$$ and finally sort all rows by the value $$$f(r')-f(r)$$$ and output the first $$$b$$$ rows.

»
2 months ago, # |
Rev. 2   Vote: I like it +37 Vote: I do not like it

Seems like the problemset is not (yet) publicly available. I uploaded the problemset to my server for public consumption (so those who missed the registration can also follow the discussion).

»
2 months ago, # |
  Vote: I like it +7 Vote: I do not like it

Does anyone know the ML techniques that should be used for the last 2 problems?

Also, I was getting RTE for 3 cases in problem Flipped with Cpp. I switched to Java and I was able to pass those cases. I suspect that I had to use the extra memory limit.

  • »
    »
    2 months ago, # ^ |
      Vote: I like it +13 Vote: I do not like it

    For the second last problem, we want to select the reversed rows. A cell is misplaced if its value deviates from the mean of the column. We do want to normalise the deviation by dividing with the standard deviation of the column. The rows with many very misplaced cells are likely reversed.

    We can reverse the b worst rows, and the result was already quite good (around 70?).

    The issue with reversing many rows at once is that we might reverse the wrong row. One way is to reverse one row at a time, and recalculate the mean and standard deviation. But this is too slow.

    The middle ground is to reverse a portion of the b rows, in maybe 16 portions. I scored 99 points (even though it displayed 100 on the console).

    The "ML technique" here is statistical knowledge of mean and standard deviation.

    Code: https://github.com/tonghuikang/codecomp/blob/quora-2021-1/template/f.py

    For the last problem, I used LightGBM (which is one the packages Quora allows, and is faster than XGBoost).

    The challenge is feature engineering. Each data should have the same set of features, but what we see in the data is that each user can visit multiple sites. What I did is to cycle through the visits — each user will revisit the first site again. For the time, I calculated the time difference, and cycle. Each data will have 200+200 features.

    I was prepared to handle more feature engineering and fine-tune the threshold, but I got full marks after a few tries.

    The "ML technique" feature engineering, and trust the black box GBDT model and pray.

    Code: https://github.com/tonghuikang/codecomp/blob/quora-2021-1/template/spam.ipynb

    • »
      »
      »
      2 months ago, # ^ |
        Vote: I like it +3 Vote: I do not like it

      For the last problem, I think trying to do feature engineering is unnecessary. They tell you the exact model used to generate the data (a Markov chain for each user type), and there are pretty much no hidden variables, so you can directly estimate the parameters of the Markov chain using the input data, and then directly compute the Bayesian probabilities of the user types given the test inputs.

»
2 months ago, # |
  Vote: I like it 0 Vote: I do not like it

For problem Walls, how can we use the size of the grid to our advantage, or is there any algorithm to find the minimum spanning tree of a subgraph?

  • »
    »
    2 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    I believe in general this is the steiner tree problem which is NP-hard. On a narrow graph I think it is possible to use broken profile DP but did not implement it, not sure if there is an easier method.

»
2 months ago, # |
  Vote: I like it +55 Vote: I do not like it

Will there be editorial?

»
2 months ago, # |
  Vote: I like it +3 Vote: I do not like it

Sorry for Asking, but can anyone help me with problem B " Escape ". I could not find the idea how to mark all the cell visible to atleast 1 guard any help.

  • »
    »
    2 months ago, # ^ |
    Rev. 2   Vote: I like it +4 Vote: I do not like it

    For each cell $$$c$$$ that a guard with radius $$$r$$$ is on, set $$$dist[c] = -(r+1)$$$. Now do a multisource shortest path using a priority queue starting from all these cells. For any cell, if the shortest distance is found to be less than 0 at the end, then it is covered by atleast one guard.

    • »
      »
      »
      2 months ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      Do you mean we should use multisource BFS with priority queue to prioritize popping the cells with lower $$$dist[c]$$$ first? Because when using a normal queue it won't be correct I think, for instance on this example.

      5 10 2
      ##########
      #S.......#
      #........#
      #E.......#
      ##########
      3 4 1
      3 9 10
      
      • »
        »
        »
        »
        2 months ago, # ^ |
          Vote: I like it +3 Vote: I do not like it

        Yep, you're right! Sorry I missed mentioning priority queue.

    • »
      »
      »
      2 months ago, # ^ |
      Rev. 2   Vote: I like it 0 Vote: I do not like it

      shivensinha4 But what if I come across a cell that is already marked(monitored by another guard), now should I add this cell in the queue ? If yes then time complexity will be quadratic, if No, then what if a cell adjacent to that cell is not monitored by the previous guards, but is monitored by the current guard ?

      • »
        »
        »
        »
        2 months ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        You'll be using a priority queue. It's almost dijkstra.

        • »
          »
          »
          »
          »
          2 months ago, # ^ |
            Vote: I like it 0 Vote: I do not like it

          ok, but what should we do if we come across a cell that is already guarded by someone else, should we add it to priority queue ? If No then what if a cell adjacent to that cell is not monitored by the previous guards, but is monitored by the current guard ?

          • »
            »
            »
            »
            »
            »
            2 months ago, # ^ |
              Vote: I like it 0 Vote: I do not like it

            Exactly like you handle relaxations in Dijkstra.

»
2 months ago, # |
  Vote: I like it +1 Vote: I do not like it

How to solve problem "Escape". My approach was to first block all the cells which are at the distance <= d from guard. Then I find the length of the shortest path using bfs. It runs on subtask 1 only. Code

  • »
    »
    2 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    No need to call fillit() for each guard.Push all guards in the queue in beginning.Because given k can be upto 10^4 and each fillit() will take 10^6 in worst case.

    • »
      »
      »
      2 months ago, # ^ |
      Rev. 3   Vote: I like it 0 Vote: I do not like it

      can you please elaborate your "queue" approach ?

      Do you want to say that we fill all guards based on their "d" (distance till which they can guard) value in queue , then we pop the guard from queue which has biggest "d" value .

      The whole point is to not mark cells multiple times which can be reached by multiple guards to avoid TLE .

      Are you saying we need to do BFS ( from each guard in the queue ) ? I guess that would work

      Please help . Thanks

      • »
        »
        »
        »
        2 months ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        Just check if it is blocked or not and its distance <= (d of guard removed from the queue) if true mark that cell(say '#', now it is blocked) and push it into queue for further check.Just normal multisource bfs.

        • »
          »
          »
          »
          »
          2 months ago, # ^ |
            Vote: I like it 0 Vote: I do not like it

          thanks for your reply.

          When we pop a guard from queue , then we check all of it's neighboring cells or you want to say for each guard popped we will check for each cell in the grid (in that case it will give TLE) ?

  • »
    »
    2 months ago, # ^ |
      Vote: I like it +1 Vote: I do not like it

    Hey, I used multisource BFS, pushed the coordinate of every guard in the queue with their distance, and mark the cell visited if a guard can reach the cell. Now, if our source is visited means at least one guard can reach our initial position and we can't go any further, so "IMPOSSIBLE". If our destination is visited means we can't go to destination, so "IMPOSSIBLE" and if all the neighbors of destination are visited then there is no chance to go to destination cell so the answer is "IMPOSSIBLE". If none of the conditions is satisfied above then apply simple BFS from source to destination. If we can reach the destination cell then print the distance else "IMPOSSIBLE".

    • »
      »
      »
      2 months ago, # ^ |
      Rev. 2   Vote: I like it 0 Vote: I do not like it

      hi there, suppose u have guard in cell 3,4 with d=1. Now u visited 4,4 from 3,4 with d=0 and markded 4,4 visited... now if from another guard cell 7,4 with d=100, u can visit 4,4 with d>0 this time and also move many other cell from 4,4 now with d>0,, how did u handle this case in Multisource BFS.

      my point is if u mark visied a cell for smaller d from a guard,,, and if u could have visited that cell from another guard with larger d and could get to many other cell via that particular cell, how did u handle that?

»
2 months ago, # |
  Vote: I like it 0 Vote: I do not like it

Division 1 recently concluded! Congratulations to all the winners and really happy to see all of the interesting discussion here. We hope you enjoyed the Division 1 problems and hope to see you at Division 2, which is starting in less than 6 hours: Feb 7 2021, 01:00 to 05:00 (UTC + 00:00)

»
2 months ago, # |
  Vote: I like it -15 Vote: I do not like it

How tf ML problems are ML?

»
2 months ago, # |
  Vote: I like it 0 Vote: I do not like it

Does anyone happen to have the sample I/O for Spam downloaded, and if so, would you mind posting it publicly? I wrote a solution to the problem after the contest ended, and I'm curious about how it would perform on a larger test case.

»
2 months ago, # |
  Vote: I like it +3 Vote: I do not like it

Any link for tutorial?

»
2 months ago, # |
  Vote: I like it 0 Vote: I do not like it

Can anybody tell the efficient solution for escape problem for full points ? None of the comments in this blog mention the efficient solution .

I think Multi-source BFS would give TLE as well , because you will visit a cell say (x,y) which is being gaurd by multiple enemies , so you will have to mark it multiple times which will waste time , and you cant ignore to mark it multiple times as well as each enemy may have different 'd' .

Please someone throw light on the efficient solution for escape ?

  • »
    »
    2 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    You only need to keep track of the enemy with the longest remaining range if starting from that cell. That way you only need to propagate from that cell once.

    • »
      »
      »
      2 months ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      Did you use a priority queue to maintain the "enemy with longest remaining range"?

      • »
        »
        »
        »
        2 months ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        I think there is a pretty good reason , for : k<=10000 .

»
2 months ago, # |
  Vote: I like it +10 Vote: I do not like it

Very surprised that https://en.wikipedia.org/wiki/Strong_connectivity_augmentation is not given as a CP problem before, or maybe my Google-fu skill is bad.

  • »
    »
    2 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Same, I found a solution on Google, but I could only get the first subtask because I had some weird bug causing me to WA on the second subtask.

    Best I could find was this link in Russian, which I tried to piece together with Google Translate :)

    • »
      »
      »
      2 months ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      Doesn't this give the solution as well?

      • »
        »
        »
        »
        2 months ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        Yes, but good luck trying to understand / implement this in 45 mins if you haven't seen the idea before.

      • »
        »
        »
        »
        2 months ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        That gives the correct answer, I believe, but getting the construction right is the (much) harder part of the problem. I attempted to implement the approach given in this paper http://www.springer.com/cda/content/document/cda_downloaddocument/9780387235288-c2.pdf but was unsuccessful in doing so (I suspect because I couldn't get the details of the T > S case, which isn't discussed in the paper, right).

        I was not a fan of this problem, given that it's relatively standard and is essentially an implementation (or copying/pasting, if anyone was able to find code online) test. (I also liked malicious less than the other ML problems--it felt like it required much more esoteric knowledge of machine learning, while most of the other ML problems could be solved using general intuition.)

        • »
          »
          »
          »
          »
          2 months ago, # ^ |
            Vote: I like it +3 Vote: I do not like it

          My guess is, it's very hard to give meaningful ML problems within the time frame of 3-4 hour contest that is not just merely pulling something from the library. Also the satisfaction depends on how quickly you get to a good score :)

          I got quite frustrated with malicious; turns out I only scratch the surface idea, kudos to huikang for going into exploring the dataset and figuring out the correct insight!

    • »
      »
      »
      2 months ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      Oops, I forgot the crucial line 327 of breaking when you find a matching. My code now AC's on CSES.

      Submission

      • »
        »
        »
        »
        2 months ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        Actually, the paper I was following to implement this code also forgot to put that line in. Though it's definitely my fault for not checking their code first before blindly implementing :P

      • »
        »
        »
        »
        2 months ago, # ^ |
          Vote: I like it -8 Vote: I do not like it

        CSES' testcases are weak. I stresstested my code and it got wrong answer on simple testcase.

»
2 months ago, # |
  Vote: I like it +2 Vote: I do not like it

https://cses.fi/problemset/task/1685 The exact same problem.. This is a joke :/
The quora one got me wrong answer on TC 6.

»
2 months ago, # |
  Vote: I like it +25 Vote: I do not like it

Finally tourist and I have something in common — we are the only ones in the top 100 who completely solve malicious before the last hour.

https://www.quora.com/q/quoraprogrammingchallenge/Division-2-One-Hour-Remaining

Code: https://github.com/tonghuikang/codecomp/blob/quora-2021-2/template/malicious.ipynb

»
2 months ago, # |
  Vote: I like it 0 Vote: I do not like it

Where can the list of all Div1 problems be found ?

»
2 months ago, # |
  Vote: I like it 0 Vote: I do not like it

A chance of editorials for the divisions in this contest?

»
2 months ago, # |
  Vote: I like it 0 Vote: I do not like it

By any chance if anyone who has downloaded the problemset for Div2 can share it here?

»
2 months ago, # |
  Vote: I like it +4 Vote: I do not like it

Someone pls share the implementation of Escape and Students problem of DIV-1 Thanks in advance:)

  • »
    »
    2 months ago, # ^ |
    Rev. 4   Vote: I like it +2 Vote: I do not like it

    Here is my code for Problem Escape (excluding my template):

    AC_CODE

    EDIT: It seems the test-cases for this problems were weak, thanks to quinoa for giving a counter case for the above solution. The following solution seems correct to us now, though confirmation of the same is appreciated.

    REVISED_CODE
    • »
      »
      »
      2 months ago, # ^ |
      Rev. 4   Vote: I like it +2 Vote: I do not like it

      Did you get AC with this? I don't think it's correct since you're not prioritizing the nodes with greatest remaining distance when popping them from the queue. For instance it will fail on this testcase. Your code outputs "2" whereas it should say "IMPOSSIBLE".

      5 10 2
      ##########
      #S.......#
      #........#
      #E.......#
      ##########
      3 4 1
      3 9 10
      
      

      Whereas when removing the first guard

      5 10 1
      ##########
      #S.......#
      #........#
      #E.......#
      ##########
      3 9 10
      

      your code gives "IMPOSSIBLE" (which is correct).

      • »
        »
        »
        »
        2 months ago, # ^ |
          Vote: I like it +2 Vote: I do not like it

        True, it seems like test-cases were weak, so how will we remove them prioritizing the distance, using a priority queue. Can you share your implementation?

        • »
          »
          »
          »
          »
          2 months ago, # ^ |
            Vote: I like it +3 Vote: I do not like it

          I could not solve the problem, my solution only worked for the small test case. I think if you change your queue by a priority queue then it will work. Because then we will visit every cell when the remaining distance is maximal and then we should never visit it again. Whereas when using a normal queue we might visit a node when this distance is not maximal, and it is incorrect to label it as visited at this point.

          But I'm also looking for some confirmation that this is the expected solution..

        • »
          »
          »
          »
          »
          2 months ago, # ^ |
            Vote: I like it +1 Vote: I do not like it
        • »
          »
          »
          »
          »
          2 months ago, # ^ |
            Vote: I like it 0 Vote: I do not like it

          Hey fireblaze777 , what was the intended time complexity of the first problem "digest" ? Was it O(n^3) or O(n^4) ? If it was the former , what was the solution idea for it for 100 points ?

»
2 months ago, # |
  Vote: I like it +3 Vote: I do not like it

How do we solve datacenter and skyscraper(div-2) optimally?

  • »
    »
    2 months ago, # ^ |
      Vote: I like it +6 Vote: I do not like it

    Datacenter: transform each coordinate (x, y) into (x + y, x — y). The problem is now equivalent to finding the datacenter i that minimizes

    $$$\sum_{j = 1}^n |x_i - x_j| + |y_i - y_j|$$$

    which is much nicer to handle: precompute some prefix / suffix sum and iterate through each possible datacenter as the choice.

    Skyscraper: maintain a segment tree where each node contains the maximum height (and possibly the minimum height, not sure if this is needed). The goal is, for a skyscraper i, to quickly find the minimum and maximum indices L and R such that all skyscrapers within [L; R] have heights no more than that of i. You can adapt the segment tree to quickly find these indices in O(log N) time.

    • »
      »
      »
      2 months ago, # ^ |
        Vote: I like it +6 Vote: I do not like it

      I don't get the solution for Data Center. Can you please elaborate why it is equivalent?

      • »
        »
        »
        »
        2 months ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        We need to compute the distance of two points $$$(x_1, y_1)$$$ and $$$(x_2, y_2)$$$ under the original and transformed formula. By suitable translation and reflection, we can shift these two points to $$$(0, 0)$$$ and $$$(x, y)$$$, and furthermore we can assume $$$x \ge y \ge 0$$$ (the formula is similar for all other cases).

        Under the original formula, the distance of two points is $$$\max (x, y) = x$$$.

        Under the transformed formula, the distance of two transformed points $$$(0, 0)$$$ and $$$(x + y, x - y)$$$ are $$$(x + y) + (x - y) = 2x$$$. So the new distance under the transformed formula is just double the old distance.

        • »
          »
          »
          »
          »
          2 months ago, # ^ |
            Vote: I like it +13 Vote: I do not like it

          Another way to think about this is we are rotating the entire coordinate by 45 degrees. So Manhattan distance becomes Chebyshev (chessboard) distance, and vice-versa.

    • »
      »
      »
      2 months ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      How do you adapt segment tree for this problem?

      • »
        »
        »
        »
        2 months ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        Store the maximum for its range in every node.

        Now, to find the smallest $$$i$$$ in the range $$$[l, r]$$$, for which $$$h[i] > x$$$, descend carefully down the segtree.

        Something like:

        if (tree[left_child] > x) return descend(left_child);
        else return descend(right_child);
        

        This way, you will find the required index in $$$O(log N)$$$, per query.

        • »
          »
          »
          »
          »
          2 months ago, # ^ |
            Vote: I like it 0 Vote: I do not like it

          Does it mean we have to decompose [i+1, n] into $$$O(\log n)$$$ intervals and find the first one having max > h[i]?

          • »
            »
            »
            »
            »
            »
            2 months ago, # ^ |
              Vote: I like it 0 Vote: I do not like it

            Yeah, in a way.

            Dividing it into intervals on the fly.

            This should help (see "Searching for the first element greater than a given amount")

»
2 months ago, # |
  Vote: I like it -6 Vote: I do not like it

Every time there is an asterisk after "world", I automatically assume it excludes Iran. What a shame :(

»
2 months ago, # |
Rev. 2   Vote: I like it +10 Vote: I do not like it

How to solve div2 message for any decent score? The English prediction problem.

I know that the dataset is generated from actual English phrases, so I've tried different greedy techniques of picking words based on some brewed up weighted score (frequency, position of the missing word, distance to other words of the same phrase, etc.). But I wasn't able to get anywhere 30+

  • »
    »
    2 months ago, # ^ |
      Vote: I like it +10 Vote: I do not like it

    We were tasked to predict the missing word in the sentence.

    When I want to predict the missing word b in a triplet a,b,c — I consider for every possible b, the frequency of (a,b), the frequency of (b,c) and the freqency of (a,b,c). The candidate word for b that appears the most frequently will be my prediction.

    It makes sense if we give greater weights to (a,b,c) because it is less likely to appear compared to (a,b) or (b,c). I count the freqency every appearance of (a,b,c) 10 times rather than once that I did for (a,b) and (b,c). This scored around 47/100 points.

    To get 78/100 points, I considered two words before and two words after the missing word. In the final minutes of the contest, I was fine-tuning the weights and submitted the code repeatedly.

    Code: https://github.com/tonghuikang/codecomp/blob/quora-2021-2/template/f.py

    It is also possible to consider up to 30 characters before and after for a slightly higher score at a very little increase in complexity, but my code was not structured to do that.

    For the full score, I think we need to use a neural network model. The general problem here is masked language encoding.

    https://keras.io/examples/nlp/masked_language_modeling/

    There are readily available templates out there, but it needs to download billon-parameter models which is not allowed by the online judge. We need to build (or copy) a downsized model that is suitable for the dataset.

  • »
    »
    2 months ago, # ^ |
    Rev. 2   Vote: I like it +10 Vote: I do not like it

    I think I got lucky when hitting high score pretty quickly.

    First I got 70 pts by keeping track of a frequency table of all 2-gram (an ordered pair of word (x, y)), and for each possible candidate w, its score is freq (p, w) * freq (w, r) where p and r are the word before and after w, respectively (if there's no such word p/r or the pair doesn't exist in the frequency table, assign the weight of 1 for that pair).

    Return the word of highest score.

    I got 82 by improving the above idea with including the 3-grams.

»
2 weeks ago, # |
  Vote: I like it +40 Vote: I do not like it

Got my T-shirt (Singapore)

Applied and got an offer, and I have accepted the offer ... Yay

  • »
    »
    2 weeks ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    congrats !

    Did you get any link to track the T-shirt? (cause no link sent to me)

    • »
      »
      »
      2 weeks ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      There was no tracking link sent to me too, the package does not indicate UPS or FedEx.