Gassa's blog

By Gassa, history, 8 years ago, translation, In English

The submission time is over for VeeRoute Marathon. Let us discuss the contest — the problem itself and related stuff — while the final testing takes place. Which solutions work best? Were the problem statement and materials sufficient to start solving, or something was lacking? What tools did you implement yourself?

Discussion of VeeRoute Marathon
  • Vote: I like it
  • +105
  • Vote: I do not like it

»
8 years ago, # |
  Vote: I like it +52 Vote: I do not like it

I used a greedy approach to solve the problem, first sort all customers according to time, and then for each customer, find out the driver which gives the lowest 'cost' to drive the customers to their destination.

Here, the cost is made of four parts,

  1. If the driver haven't been used before, the cost will be something in terms of base

  2. The extra distance the driver need to drive relative to the ideal case

  3. The extra time used by the driver in driving relative to the ideal case (base/32400 each second)

  4. The waiting time when the driver has nothing to do (base/43200 each second)

And finally, I get around 8400 for the ten cases, ~600 lines of code: 16703064

I wonder how the solutions that get 9000 or above look like. 0.0

orz #adore

»
8 years ago, # |
  Vote: I like it +69 Vote: I do not like it

Here is my final submission : http://lpaste.net/8778658654736875520

Score on the test cases :

1: OK [29078 ms, 366 mb]: points 1189.49915560350 optimistic 157200063, actual 132156515: 0/1889 queries missed, 215/1165 drivers hired, score = 1189.499
2: OK [28999 ms, 42 mb]: points 636.19767951860 optimistic 16550320, actual 26014430: 0/621 queries missed, 57/192 drivers hired, score = 636.198
3: OK [29015 ms, 225 mb]: points 1172.05069368070 optimistic 88096670, actual 75164556: 0/1508 queries missed, 123/433 drivers hired, score = 1172.051
4: OK [29030 ms, 290 mb]: points 974.45092542630 optimistic 65958262, actual 67687618: 0/1655 queries missed, 124/742 drivers hired, score = 974.451
5: OK [29015 ms, 124 mb]: points 988.32412068890 optimistic 62443583, actual 63181280: 0/1095 queries missed, 107/470 drivers hired, score = 988.324
6: OK [29047 ms, 191 mb]: points 1201.62851196460 optimistic 127653700, actual 106233914: 0/1328 queries missed, 156/1161 drivers hired, score = 1201.629
7: OK [28999 ms, 62 mb]: points 660.00671929090 optimistic 16519590, actual 25029427: 0/768 queries missed, 52/189 drivers hired, score = 660.007
8: OK [28999 ms, 89 mb]: points 671.01204427080 optimistic 15715502, actual 23420596: 0/915 queries missed, 48/157 drivers hired, score = 671.012
9: OK [29030 ms, 334 mb]: points 963.24433559860 optimistic 58063754, actual 60279362: 0/1802 queries missed, 112/574 drivers hired, score = 963.244
10: OK [28999 ms, 30 mb]: points 1170.00869876670 optimistic 49264350, actual 42105969: 0/535 queries missed, 70/405 drivers hired, score = 1170.009

I used the lemon library (link) to compute minimum weight maximum matchings and minimum weight maximum bipartite matchings. (i had to reduce the lemon sources to get it to fit in a 64kb zip).

The main idea of my solution is that the right representation we want to work with is a partition of the orders into chains, as a min cost max bipartite matching can then match these chains with the drivers (function tryChains). To know if a chain is admissible (function canDoChains), I ensure that there exists a garage with at least 3 drivers able to do this chain of orders, (in practice all chains can then be matched in most cases).

A possible algorithm would be to try local optimization on a small number (2-4) of chains (swap orders between chains, or go from AAABBB CCCDDD to AAADDD BBBCCC, etc.), but it is better to optimize all chains at the same time. To do that, I split each chain (function splitChains) in two parts randomly and compute (function mergeChains) a min cost maximum matching where the nodes are the chains and edges are possible ways to merge the chains. Because the matching before the splitting of the chains is still admissible the cost can only decrease. We need to optimize both the number of chains and the total distance, so the cost depends on the time and the distance at the beginning of the algorithm as we want to reduce the number of chains, and only on the distance at the end.

That's the basic idea, but the fact we can do 2 orders at the same time makes it more complex. I compute in an initialization step a good set of merged orders, again with a min cost max matching and a suitable cost. However we may want to change this set. To do so, when merging chains, we compute several new chains with different ordering of the inner orders (function orderChain::merge). Also when we split chain, we can split inside a merged order. I also do a local optimization pass (function optimizeDoubles) to optimize this further.

I can do between 300 and 10000 (split, merge)-iterations in 30 sec depending on the test case.

  • »
    »
    8 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    I had roughly similar ideas, but didn't have time to implement the split-merge optimization. I did have the pairwise optimization, though. Also, "split each chain randomly" means pick a point in time where to split the chains, or something else?

    How exactly did you choose the initial chains? I processed the (unpaired) orders by time and then tried to pair the current order with another order and a driver so that the total added distance is minimum — so basically greedy.

»
8 years ago, # |
  Vote: I like it +8 Vote: I do not like it

When will we have the large set of tests' verdict ?

  • »
    »
    8 years ago, # ^ |
    Rev. 2   Vote: I like it +3 Vote: I do not like it

    Looks like large tests are beginning now. Lots of problemset submissions have been queued and I'm guessing this is because of VeeRoute. You can see when the systests started (I think) here. They will take a while though: 30 seconds per test * 500 tests = 15,000 seconds or a little over 4 hours.

»
8 years ago, # |
  Vote: I like it -40 Vote: I do not like it

If you care, Lost's final is not dream.

»
8 years ago, # |
Rev. 4   Vote: I like it +157 Vote: I do not like it

Edit #1: Maybe I should move it to a separate blog post, considering the amount of space it takes? :)

Edit #2: Link to solution: 16709816

Provisional Score: 9863.098

#1: OK [29530 ms, 37 mb]: points 1228.35437875220 optimistic 157200063, actual 127976149: 0/1889 queries missed, 201/1165 drivers hired, score = 1228.354
#2: OK [29530 ms, 36 mb]: points 651.79769614650 optimistic 16550320, actual 25391805: 0/621 queries missed, 55/192 drivers hired, score = 651.798
#3: OK [29530 ms, 36 mb]: points 1200.58759877510 optimistic 88096670, actual 73377961: 0/1508 queries missed, 118/433 drivers hired, score = 1200.588
#4: OK [29530 ms, 36 mb]: points 998.34182496180 optimistic 65958262, actual 66067814: 0/1655 queries missed, 118/742 drivers hired, score = 998.342
#5: OK [29530 ms, 36 mb]: points 1015.72997954770 optimistic 62443583, actual 61476558: 0/1095 queries missed, 102/470 drivers hired, score = 1015.730
#6: OK [29530 ms, 37 mb]: points 1241.02193079330 optimistic 127653700, actual 102861760: 0/1328 queries missed, 154/1161 drivers hired, score = 1241.022
#7: OK [29530 ms, 36 mb]: points 678.06036101950 optimistic 16519590, actual 24363008: 0/768 queries missed, 51/189 drivers hired, score = 678.060
#8: OK [29515 ms, 36 mb]: points 673.27449949010 optimistic 15715502, actual 23341894: 0/915 queries missed, 48/157 drivers hired, score = 673.274
#9: OK [29546 ms, 36 mb]: points 989.68445935870 optimistic 58063754, actual 58668956: 0/1802 queries missed, 107/574 drivers hired, score = 989.684
#10: OK [29515 ms, 36 mb]: points 1186.24587728070 optimistic 49264350, actual 41529628: 0/535 queries missed, 70/405 drivers hired, score = 1186.246

Summary:

Simple greedy for initialization, SA #1 for minimizing number of drivers, SA #2 for minimizing the total distance (which equals to optimizing the final score).

Intro:

Originally, I decided to participate because I almost haven't done any coding in 2016. And whenever I did it, I spent extraordinary amount of time on debugging. Since this contest was very heavy on implementation, It looked like a good excuse to exercise a bit. My goal was to just implement all of the "basic" ideas, do a little bit tweaking and be done with it. But the problem turned out to be so complex (mostly due to not forcing triangle inequality), that my solution is longer than in most TC's Marathons, even though it's not very polished.

Details:

My greedy phase can almost be skipped, because later SA phases are good enough to start from a random (or empty) state. I left it only, because I didn't want to add another round of bug hunting.

SA has 2 phases and the main difference is that they use a different scoring function. The first one tries to optimize the number of drivers and the second tries to maximize the score. The reason for two phases is that improving the scores in small steps, will try to split orders evenly among the drivers. This means that in order to get rid of single driver, local search has to perform a lot of "up the hill" moves in a row, which most probably won't happen.

There were two workarounds for this. Either create a very complex transition function that tries to get rid off a single driver entirely by splitting/merging existing chains of orders (something along what Rafbill did), or create alternative scoring function that will achieve the same goal using simple (and in my case, already implemented) set of transitions.

The scoring function for first phase is as simple as it gets: pow(number_of_orders * X, Y) * Z, where X, Y & Z are some constants. When Y is smaller than 1.0 (in my case it's 0.3), the scoring function increases exponentially when reducing the number of orders for one driver. In practice, this will just try to reduce the number of orders from drivers that already have very few of them.

State Representation:

Despite complex statement, representing states is surprisingly easy. We either try to complete a single order at the time, or double (two different people, that either share arrival or destination point). This means, that it's enough to just remember the list of orders for each driver.

Moves (Transitions):

I decided to keep only simple types of moves. The reason is that SA works pretty bad with moves of different complexity (and thus, the range of returned scores).

  • moveSwapDrivers(): Swap orders from two different drivers. Possibly one of them doesn't have any order
  • moveSwapOrders(): Take two orders from different drivers and swap them
  • movePushOrders(): Take one order and move it to a different driver
  • moveSwapDoubleOrder(): Take a single double order and reverse it.
  • moveJoinOrders(): Take two different single orders and join it.
  • moveSwapPartialOrders(): Take two different orders (one of them has to be double), and swap only parts of those.
  • moveSplitOrders(): Take a double order and split it into two single ones.
  • movePushUnused(), moveAddUnused(): They're used only when there are orders that re not yet completed. Push/Join (in hindsight, it should've been moveJoinUnused()) unused order.

All of them, except for moveSplitOrders() are pretty important. Most of the work is done by moveSwapOrders() and movePushOrders(), but other moves are important for escaping local extremum and increasing the number of double orders.

Important Optimizations (in no particular order):

  • Adding an O(1) function that tells if move is possible. It doesn't cover all of the possibilities, it just acts as a quick pruning.
  • Replacing my Mersenne Twister RNG with a very simple XORShift. It turned out that RNG used significant amount of time :) And MT is still much faster than rand()
  • Since I had 7 different types of transitions, it was very tiresome to tell what moves were more important. Especially, since it changes after every addition to the code. So I just wrote a primitive Hlll Climbing on the set of parameters responsible for choosing move type.
  • I split entire day into 30 minutes time slots. In each time slot, I remember what (driver, order #) pairs are during this time slot. Thanks to that when swapping/moving orders, first I'm generating a random order, then I'm generating another one from the same time slot.
  • Since there are only a few types of orders that can be done together, it's easy group them and consider moves only from the same group. This is a very similar optimization to the one above, the main difference is that grouping here is more natural, and I don't have to recompute the numbers after accepting each new state.

Random Stuff:

  • Greedy + Phase 2 SA with 7 moves gave me ~8850 (6-9 hours)
  • Adding Phase 1 SA without almost any tweaking pushed my score to ~9450 (around 3 more hours)
  • Getting to 9850 was a real pain, and involved lots of tweaking, optimizations and bugfixing (for example accidentally pruning correct state transitions). And honestly I don't know how many hours I spent on this. It's really hard to tell, because most of coding sessions lasted for 2-3 minutes and then I had to wait for test runs to complete.
  • Using simple HC instead of SA gives me ~9500, which shows that effect of SA was close to negligible.
  • I used 100 tests locally in order to be able to see small gains. On seeds 1-100 my score is 97158.
  • In order to validate the correctness of my solution, I just added a lot of asserts to my function that prints the final solution.

And finally, thanks to Gassa, KOTEHOK and other people behind organizing this contest. This was an interesting problem (I don't think I ever had more than one SA phase in my code), and I was really impressed that there were no bugs in the tester considering the complexity of the problem. My only small hint would be to have something more approachable for beginners ;)

  • »
    »
    8 years ago, # ^ |
      Vote: I like it +22 Vote: I do not like it

    Sorry about stupid questions, but what is SA and HC? And what was Phase1 SA and what was Phase 2 SA in your solution?

    • »
      »
      »
      8 years ago, # ^ |
        Vote: I like it +26 Vote: I do not like it

      SA means simulated annealing and HC means hill climbing I think.

      • »
        »
        »
        »
        8 years ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        Thanks, but questions about what is Phase 1 and Phase 2 still remain

        • »
          »
          »
          »
          »
          8 years ago, # ^ |
            Vote: I like it +16 Vote: I do not like it

          That's just me, not sticking to the chosen terminology. By saying "SA has 2 phases" I just meant those two separate runs of SA. In practice, they are completely separate: The first one runs for roughly 15 secs and then the other one. But since the only major difference is the scoring fiction (and the temperature schedule), it's easy to think about them in terms of two phases.

»
8 years ago, # |
  Vote: I like it 0 Vote: I do not like it

I wonder if someone else like me didn't note that seed is an unsigned int instead of an int.

The judging program seems treat the number as an unsigned int and using scanf and printf with %d is still considered correct while using cin and cout will cause WA.

And I am just asking, will you consider to make any adjustment on the seed so that it fits the int type? It is sad to lose around half of the points because of this.

Anyway, need to look carefully on the input format next time :(

  • »
    »
    8 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Lol, I haven't noticed it too :)

    Looking at the codes from others, almost no one is using unsigned int.

    I guess people overlooked it, because it's a very strange requirement. Also, because the partial tests don't check that. I remember that I thought something like "yeah whatever, probably it's an old workaround for something that they fixed already, and they don't even read this number" :)

    • »
      »
      »
      8 years ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      The intent for having the seed printed was to open more possibilities for the tooling.

      The seed is included in the input to enable re-generating the test instead of reading it — which is generally faster. Reading a whole 40M file may consume a second or two unless you specifically optimize for input speed. It is not too important for a solution which has 30 seconds to run, but more so for the checking program or other tools which are otherwise fast.

      The seed is included in the output to enable using the output without the input.

      • »
        »
        »
        »
        8 years ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        The seed is included in the output to enable using the output without the input.

        But, in order to tell that printed seed is wrong you have to know the original seed beforehand? :)

        Or do you just generate whole test and look for any wrong data (too many drivers/orders, etc.)?

        • »
          »
          »
          »
          »
          8 years ago, # ^ |
          Rev. 2   Vote: I like it 0 Vote: I do not like it

          I mean if you have a local tool which re-generates the input based on the seed, you don't have to even store whole inputs.

          For example, the checking program reads the seed from the input file, and then it has two possibilities:

          1. Read the whole input.

          2. Generate the input based on the seed.

          To put it another way: locally, I've used one-liner input files containing just the seed. It saves disk space and disk access time.

          Edit: Of course all of this makes sense if we don't consider any custom inputs and only use generated inputs. However, test generator specification is the part of the problem statement anyway, so there's little use in solving tests of other kinds.

          • »
            »
            »
            »
            »
            »
            8 years ago, # ^ |
              Vote: I like it +9 Vote: I do not like it

            My comment was about the server testing code, because I assumed that the line you wrote was about it. I just found it funny that we have to output seed, but the tester knows it anyway. In any case, that was slight misunderstanding on my part.

            Otherwise, I understand the idea. I think it's pretty neat, but I wonder how many people used it. In my case adding faster I/O (reading whole lines and parsing from memory) took maybe 2-3 minutes and cut down input reading time to around 0.5s worst case on my non-SSD system.

    • »
      »
      »
      8 years ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      I'm using Long Long for Seed, because read problem statement carefully :)) And I don't think that I'm the only one person who correctly process such situation.

  • »
    »
    8 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    We are aware of the problem, and will discuss if we can restart the testing using only seeds that fit into [0, 231). I'd expect the decision to be made in 12 hours, but unfortunately, not right now.

    • »
      »
      »
      8 years ago, # ^ |
      Rev. 2   Vote: I like it 0 Vote: I do not like it

      I think if you give for us a code of tests generator, some participant could make some evaluation of quality of his solution based on tests where 2^32 > Seed >= 2^31, and make some decision of what parameters to use for his optimization methods. Also his solution could work's better for such testcases, for example, and that's why I think, that the restrictions for Seed should be remained as "< 2^32", as was in problem statement.

      P.S. All participants can re-read a problem statement during 2 weeks, how many times you can do it in such time term (more than several thousands :))), so such restriction for Seed should be read too, and it is only problems of someones carelessness, other participants, which are correctly process such case should not be impacted.

      • »
        »
        »
        »
        8 years ago, # ^ |
          Vote: I like it +13 Vote: I do not like it

        I used unsigned, but if most of top participants fail on such stupid bug it will make competition less interesting. So I think jury need to change testset (but please don't rejudge it all again, just remove some tests)

  • »
    »
    8 years ago, # ^ |
      Vote: I like it +20 Vote: I do not like it

    The final decision is to use the reduced interval [0, 231) for choosing seeds.

    We considered the following points in favor of this decision:

    • The seed line does not relate to the problem domain in any way.

    • Seeds not fitting into signed int32 were not present in the preliminary test set, while most other aspects of the problem were covered.

    • For some extra confusion, all other input fitted into signed int32 quite well.

    While we recognize that careful reading of problem statement is an important skill, in this case, correct behavior also depends on the choice of programming language and other factors.

»
8 years ago, # |
  Vote: I like it 0 Vote: I do not like it

I've published the current version of checking program on Pastebin. The checker uses testlib.h. Please report if you find anything wrong with it!

»
8 years ago, # |
  Vote: I like it +9 Vote: I do not like it

I wonder what is the meaning of the verdict JUDGEMENT FAILED?

  • »
    »
    8 years ago, # ^ |
      Vote: I like it +22 Vote: I do not like it

    The platform is not very comfortable yet with testing Marathon-type problems, so issues happen. Don't worry, we'll rejudge all such cases.

»
8 years ago, # |
  Vote: I like it +40 Vote: I do not like it

Psyho Great job!

  • »
    »
    8 years ago, # ^ |
      Vote: I like it +29 Vote: I do not like it

    Thanks!

    • »
      »
      »
      8 years ago, # ^ |
        Vote: I like it +18 Vote: I do not like it

      What will you choose? iPhone or Nexus.

      • »
        »
        »
        »
        8 years ago, # ^ |
          Vote: I like it +2 Vote: I do not like it

        Huh, asking the important questions.

        This weekend is Google Code Hash and I'm blindly guessing that they will give out Nexuses for prizes. Not that it matters since I'm already drowning in gadgets ;)

»
8 years ago, # |
  Vote: I like it +28 Vote: I do not like it

My approach consisted of constructing a time-expanded network. Basically, each node is a (time,place) pair and there are arcs from (t1,p1) to (t2,p2) if an order or a pair of orders can be satisfied starting from time t1, place p1, moving to do the pickup(s), then doing the dropoff(s). So (t2,p2) was always the time and place of a drop-off. In order to keep the size of this network feasible: - I rounded up all times to multiples of 30 seconds (so, essentially, there were much fewer distinct times) - I introduced "waiting" nodes and arcs in order to avoid creating too many arcs (otherwise there would have been an arc from the end of each order/pair of orders to the beginning of each potential subsequent order/pair of orders) - I also "collapsed" all the arcs with fixed starting times (e.g. pickup from the airport) into one - I limited the number of 2-order arcs to a small-ish number if the test was large (keeping only those with the best "cost", where cost was the difference in travelled distance compared to satisfying the orders independently)

The number of nodes in this network was in the order of 10s of thousands and the number of arcs in the order of millions (up to 19-20 millions arcs on larger tests). Still, I easily exceeded the 512 MB memory limit, which was my biggest source of pain. I had to remove fields from my structs and otherwise optimize things in ways I didn't necessarily intend, just to squeeze my solution under the memory limit.

Then I ran a greedy solution using this time-expanded network. At each step I considered each starting time of a driver (at most 13 different values) and tried to compute a path through this graph containing only arcs with yet unsatisfied orders, such that the average cost per order is as low as possible (i.e. trying to minimize (base penalty + total distance) divided by number of satisfied orders). Of course, I didn't do this optimally (that would have required me to at least keep the best cost for each number of orders — not that large, since I noticed drivers couldn't satisfy more than about 30 orders), but heuristically somehow. I also had to avoid picking up the same order twice (for orders where the travel time is very low, the time-expanded network, despite being a DAG, may contain paths with arcs handling the same order at different starting times).

I also had to make this path computation somewhat independent of the destination garage, since it was too time-consuming to try it out for every (starting time, starting garage) pair. So I just guaranteed that the path can end in some garage with drivers still available at that garage at the given starting time, at most 12 hours after the starting time and with at most 9 hours of travel time.

I noticed that the initial drivers were picking up quite a lot of orders (20+), but as I added new drivers, they were less and less efficient (having even many drivers which only picked up 1 or 2 orders).

This is what I did in a few days in the first week of the contest and gave me about 8374 points (if I remember correctly). In the 2nd week I spent almost no time on this problem, because I participated in another contest (the Codechef monthly long contest), so I only added a step which, while time permits, clears the schedules of a few randomly selected drivers (at least 10% of the used drivers and at least 5% of the satisfied orders) and tries to rebuild the schedule for them by considering the arcs in some order (depending on total distance travelled, but giving preference to arcs with 2 orders) and inserting them in the schedule of a driver where it adds the least additional cost (as long as all the other constraints are still satisfied). This gave me +150 points (so about 8524 points in total).

I have a question for the other competitors, though. I just looked at my results on the 500 test cases now and I was surprised to notice that my solution very rarely ran up to the time limit I set for it (which was 29 seconds). I saw many cases where it ran even less than 20 seconds (sometimes as low as 14 seconds). So, unless I have some stupid bug in the code, I'm now wondering if I measured time incorrectly. I used "gettimeofday", which is what I use (and need to use) in Topcoder Marathon matches. But this measures the actual "real" time, while the time spent running my solution may be less. Was the 30 seconds time limit imposed on just the time spent in my code? (e.g. if multiple solutions were run on the same machine and my solution was sometimes being preempted, then in a physical interval of 30 seconds, my solution could have actually only run much less than that, e.g. 20 seconds only). Was this the case? And if so, was this mentioned anywhere? (I am wondering if I just missed this, which seems to be a pretty important thing)

What did you use for measuring time here on Codeforces?

  • »
    »
    8 years ago, # ^ |
    Rev. 2   Vote: I like it 0 Vote: I do not like it

    About the time questions, look at the last announcement in the contest page.

    • »
      »
      »
      8 years ago, # ^ |
        Vote: I like it +8 Vote: I do not like it

      Thanks, I hand't noticed it (as I said, I kind of stopped working on the problem after the first week, and this announcement was posted one day before the end of the contest?). Anyway, it only cost me one place in the final standings, due to my solution not using the full amount of time that it should have, so it's no big deal. But it would have been nice to find out such things much earlier.

  • »
    »
    8 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Take a look at the last announcement. I had the same issues with using gettimeofday()

»
8 years ago, # |
  Vote: I like it +21 Vote: I do not like it

It was a pleasure to participate. Thanks to organizers. Wish Codeforces have more marathons.