Блог пользователя Aksenov239

Автор Aksenov239, история, 3 года назад, По-английски

Hello everybody!

After a long break I would like to announce the fourth Bioinformatics contest organized by ITMO University, Bioinformatics Institute, Rosalind and Stepik. The contest is sponsored by JetBrains, Yandex, Serokell, and Genotek.

The contest is hosted on Stepik platform. To participate you have to sign the rules.

The contest consists of two rounds:

The contest is prepared by the following team: Alexey Sergushichev, Nikita Alexeev, budalnik, demon1999, tourist, cdkrot, GShark, doreshnikov (all ITMO University), German Demidov (Center for Genomic Regulation), Andrey Prjibelski (CAB SPbU).

Are there any prizes?

  • 1st place – Whole Genome Sequencing
  • 2nd and 3rd – Whole Exome Sequencing
  • 4th and 5th – 23andMe or Genotek DNA Service
  • 1st–30th – Honorable Bioinformatics T-Shirt

Do I need to know biology?

The knowledge of biology is not necessary to solve the problems!

We wish that you will like the problems!

Good luck!

  • Проголосовать: нравится
  • +254
  • Проголосовать: не нравится

»
3 года назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

What is the requirement for being qualified to the final round? It's not stated how maby participants advance, and only 'score the right number of points' is mentioned on the website :(

  • »
    »
    3 года назад, # ^ |
      Проголосовать: нравится +3 Проголосовать: не нравится

    The only requirement is 'scoring the right number of points', it is stated clearly :)

    Everybody who will get at least 952 points (including one point from the "Welcome!" step with the Honor Code) will advance to the Final Round.

»
3 года назад, # |
  Проголосовать: нравится +1 Проголосовать: не нравится

Daniil Oreshnikov is doreshnikov, you could have made him a colorful link too. Now it looks like he isn't registered on Codeforces.

»
3 года назад, # |
Rev. 2   Проголосовать: нравится +157 Проголосовать: не нравится

Is it ethical to sequence the genome of one of the greatest programmers of our time? What if a bad agent gets access to it, creating a million clones, thereby causing rating deflation?

  • »
    »
    3 года назад, # ^ |
    Rev. 2   Проголосовать: нравится -101 Проголосовать: не нравится

    Ok why the downvotes?

    What Monogon said is COMPLETELY dumb. Your DNA is "legally" flying around, everything you touch, your hair, saliva, typically anything. If a bad agent needs your DNA, they already have it.

    So why the downvotes? cz unrated? Elitist/racist ruskis again?

    • »
      »
      »
      3 года назад, # ^ |
        Проголосовать: нравится +69 Проголосовать: не нравится

      Because his comment was a joke and you are treating as it was a serious inference, genius.

»
3 года назад, # |
  Проголосовать: нравится +8 Проголосовать: не нравится

Welcome back! You were missed, even if my CPU is dreading what cursed implementations I come up with.

»
3 года назад, # |
  Проголосовать: нравится +5 Проголосовать: не нравится

Hope Benq(or someone else) can share us something about his genome after the contest:-)

»
3 года назад, # |
  Проголосовать: нравится +1 Проголосовать: не нравится

One of it's kind!

»
3 года назад, # |
  Проголосовать: нравится +9 Проголосовать: не нравится

The link for time and date says "19 June", but leads to "19 July"

»
3 года назад, # |
  Проголосовать: нравится -7 Проголосовать: не нравится

Contest prepared by tourist

»
3 года назад, # |
Rev. 2   Проголосовать: нравится -42 Проголосовать: не нравится

is this contest rated? if so, how do you know which user is which on the other platform? edit 1: so, let me get things clear, in this website people downvote posts where there's a genuine question and mods do nothing to prevent this?

»
3 года назад, # |
  Проголосовать: нравится +8 Проголосовать: не нравится

t is a substring of s if t is contained as a contiguous collection of symbols in contiguous collection of symbols in s

Is this a non-standard definition of a substring? It's very confusing.

»
3 года назад, # |
Rev. 3   Проголосовать: нравится +36 Проголосовать: не нравится

Could anyone tell me how to solve Diagnosis without running 2.5 hours for all the tests?

I only have solultions with time complexity $$$O(m(n+q))$$$ or $$$O((\sum cq_i)(\sum cm_i))$$$ or $$$O(m(\sum cq_i)+q(\sum cm_i))$$$.

  • »
    »
    3 года назад, # ^ |
    Rev. 2   Проголосовать: нравится +29 Проголосовать: не нравится

    I solved it by realizing that several test-cases are weak and having a certain pattern e.g.:

    • There is one input file with $$$|Q_p|=1$$$ and $$$|D_m|=1$$$. I solved it using a linear $$$O(n+m+q)$$$ solution to associate every vertex with a disease.

    • On another input file, the vertices for each patient are a subset of a certain disease, i.e. $$$Q_p \subseteq D_m$$$. Hence for each patient, I only need to find a disease that is a superset of patient vertices. With a decent filtering trick, my code ran for less than 1 minute for this test-case.

    • On another input file, I need to draw a portion of the tree to realize that the vertices that belong to the same disease or patient is not randomly placed within the tree, e.g. there is a certain pattern on it. The solution for this test-case becomes pretty trivial.

    The fact that the constraint for each test-case isn't explicitly stated made me think that the clue is hidden within the test-case itself.

    • »
      »
      »
      3 года назад, # ^ |
        Проголосовать: нравится 0 Проголосовать: не нравится

      How did you solve tests 6 and 7? Are those the last two cases you described?

      • »
        »
        »
        »
        3 года назад, # ^ |
          Проголосовать: нравится +11 Проголосовать: не нравится

        test #7 is $$$Q_p \subseteq D_m$$$ the vertices for each patient are a subset of a certain disease

        test #6 is the one with the funny pattern (I need to draw a portion of the tree on a paper).

        • »
          »
          »
          »
          »
          3 года назад, # ^ |
            Проголосовать: нравится +11 Проголосовать: не нравится

          Another way to solve test 6 is to notice that depths are $$$\leq 5$$$(let's call this number $$$d$$$) and the sizes are $$$\leq 3$$$(let's call this number $$$s$$$). Now, for each (disease, patient) pair the solution is the maximum sum of nodes such that the $$$i$$$-th node is a parent of some disease's node and the $$$i$$$-th patient's node. For each disease, the number of possible parents is $$$s \cdot d$$$, so we can just precompute for each disease all possible ways its parents can contribute to the solution; there are up to $$$(s \cdot d)^s$$$ ways. Then, for each patient, try all $$$d^s$$$ ways to choose the parents and see if there is a disease which can also have that way of choosing parents.

    • »
      »
      »
      3 года назад, # ^ |
      Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

      Thanks!

      Maybe I am not good at finding the patterns of tests...

      • »
        »
        »
        »
        3 года назад, # ^ |
          Проголосовать: нравится +3 Проголосовать: не нравится

        I think that (is this task and competetion) finding patterns + thinking about + impelemnting multiple solutions per test takes not less time than running $$$O(m (n + q))$$$. I'd better spent this time on actualizing template from old GCJ to run output only tasks in multiple threads (but I had spent it on optimizing running time from 2h+ to 1h...)

»
3 года назад, # |
Rev. 3   Проголосовать: нравится +13 Проголосовать: не нравится

Problem 2 was B(inary)S(earch)

»
3 года назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Thanks a lot for challenge, I really enjoyed it (especially binary search on Stepik on B).

1). What was the optimal strategy for problem A?

2). OK, I received some points on problem D. But I don't understand, how was counting the rate of minor haplotype? Was some read counted if it differed with minor haplotype by less than k polymorphisms?

»
3 года назад, # |
  Проголосовать: нравится +19 Проголосовать: не нравится

When you got 299.97 of 300 points trying to reconstruct the minor haplotype in Problem 4: