Franklyn_W's blog

By Franklyn_W, history, 3 years ago,

Addendum: This is mostly about the story behind our solution to N. To see more about the math, see here

When I was younger, fairytales always captivated me, with their nice and clean stories. All through my life I've liked the idea of having a fairytale ending to a life arc, but I was frequently disappointed. Many times after an unsuccessful experience, one simply gets more unsucessful experiences and then time runs out...

At ICPC recently, I was lucky enough to finally attain my fairytale ending to my competitive programming career. I figured I would tell the fairytale, to close a career that started about 6 years ago.

On my World Finals team, my role is to help the team get hard math problems, as I haven't actively processed algorithms in years. So imagine my delight when my teammate (ekzhang) points out that problem N reduces to finding a vector so that the (l2)-norm of x is r, and $Ax = b$ for a matrix $A$ and vector $b$. This has to be doable!

The first observation is that finding the min-norm solution $x_{\text{min}}$ to $Ax=b$ is useful. Once we have the minimum-norm solution, it's sufficient to just find an $e$ in the null space of $Ax = b$, and consider $x_c = x_{\text{min}} + c \cdot e$, choosing $c$ so that the norm of $x_c$ is equal to $r$. Note that if we replace $x_{\text{min}}$ with any such $x$, it may not be possible to find a $c$ so that the norm of $x_c$ is equal to $r$.

When I hear min-norm solution to $Ax = b$ my ears perk up -- a well known result in machine learning is that stochastic gradient descent on $Ax - b$ (more precisely, $(Ax - b)^{\top}(Ax - b)$)) converges to the min-norm solution. Could this be my chance to use preconditioned gradient descent?

As it turns out this is not to be. I try using gradient descent, and the precision after a finite number of operations is simply not high enough, especially given the massive precision demanded by the problem.

It's here that I stop trying to overcomplicate things. I rewrite $Ax = b$ as $Cx = d$, where $C$ is a matrix composed of orthogonal rows, via Gram-Schmidt orthogonalization (or QR decomposition). As such, if we write

$C = \begin{bmatrix} -- & e_1 & -- \\ -- & \vdots & -- \\ -- & e_n & -- \end{bmatrix}$

Now, to find the min-norm solution of $Cx = d$, note that we can simply take $x_{\text{min}} = d_1e_1 + \ldots + d_ne_n$. Note that any other solution to $Cx = d$ is just $x_{\text{min}}$ added to vectors in $e_{n+1}, e_{n+2}, \ldots e_d$, each of which only add norm to $x$. And now I knew I was almost done! We just need to find an $e_{n+1}$ that's orthogonal to each of $e_1, e_2, \ldots e_n$, and then find a $c$ so that the norm of $x_{\text{min}} + ce_{n+1}$ is equal to $r$.

Inspired by this, and wanting to get the coveted first solve award, I begin coding this solution, typing the fastest that I have in a while. After about 20 minutes, at the three-hour mark, I have a mostly working solution that seems to check out on the sample data, and I excitedly submit, to get

${\color{red}{\textbf{Wrong Answer}}}$.

After this, there's a fairly obvious typo I fix, and I resubmit again, getting

${\color{red}{\textbf{Runtime Error}}}$.

At the last few programming contests I've participated in, run-time error has been a pretty common occurence. This is a fairly rare error to have, and one that is usually not too hard to fix. Calculating the value of $c$ in the above uses a square root operation, and due to floating point precision issues sometimes this number is slightly below zero, so I patch that issue. Resubmit!

${\color{red}{\textbf{Runtime Error}}}$

At this point, I try running my new code on the samples again, and get something quite unusual: my code prints the answer, but throws a run-time error in the middle. Frustrated, I step away from the problem briefly and let my other teammate (halohalo) try looking for a run-time error. He quickly finds something amiss. Namely, in order to compute $e_{n+1}$ my code calls a subroutine which essentially runs Gram-Schmidt orthogonalization on $Ax = b$, but with a random vector appended to the end of $A$. The last value of $b$ doesn't matter for this component (in fact the value of $b$ doesn't matter at all for computing $e_{n+1}$) but the way I had it implemented, $b$ was one shorter than it was supposed to be, and was thus leading to out-of-bounds errors. We fix this, and are greeted by:

${\color{red}{\textbf{Wrong Answer}}}$

At this point ekzhang decided to reimplement my program from scratch. I worked on preparing large tests, hoping to find an error. I find it, a large test where the variable $n$ is equal to $500$ and where $d$ is $490$. In this case, the linear system is actually overdetermined, and we have $499$ equations in $490$ variables. In these cases, naive Gram-Schmidt will start trying to nomalize vectors which (up to floating point error) are zero, leading to potentially unusual consequences. In this case, my program explodes.

His program passes the samples, but doesn't pass the large case either. There are now about $10$ minutes left in the contest, so he tries a hail mary adding one line of code to the beginning of the program: $n = min(n, d + 1)$. In fact, I thought that I had expressed this implicitly later in the program.

His solution still fails the weird case, but is much closer to right. With nothing else to try, and with 4 minutes left and 15 wrong submissions, I add this line of code to the beginning of my program, and submit.

${\color{green}{\textbf{Correct}}}$

When I see the green ${\color{green}{\textbf{Correct}}}$ on the screen, it's probably the happiest moment I've ever had in any contest ever.

You couldn't have scripted a better ending to my ICPC career (I'm not doing it in the future), one where I solved the hardest math problem on the set.

Originally we thought that Latvia may have solved the problem before us, but then when we see their last submission is after ours (at 298, with 2 minutes left) we know that we've won the first to solve award for the problem.

There were many things that could have gone better in solving this problem. We probably didn't need to make these wrong submits, and I could've made the $n = min(n, d + 1)$ check earlier (never hurts to check twice!) But at the end we had our shiny Correct, and that's all that mattered to me in the end.

I'd like to thank everyone who helped me (far too numerous to list in the post) for a great 6 years of competitive programming. Bon voyage!

• +531

 » 3 years ago, # |   +47 Thanks “coach Wang” — happy to provide any debugging and/or “Hail Mary” moments you need :)
 » 3 years ago, # |   0 Very inspiring! Thank a lot and, at the same time, I also wish you a better life after retirement in the future!
 » 3 years ago, # |   +11 Thanks “Coach Wang”! Good times. Remember to not access memory out of bounds next time :D
 » 3 years ago, # |   +11 I read this as i was participating in the competition (well, too hard for me to even qualify), it made me happy for you. I can't imagine the feelings you would have gotten if you did not get the AC during the contest and realizing that the line $n = min(n,d+1)$ could've change the WA to AC.Anyways, congratulations to you and the team :D.
 » 3 years ago, # |   +13 Sine ICPC does not allow for recycled code, did you have to implement QR decomposition/Gram-Schmidt algorithm from scratch? Or did you use something like Julia (which I think already has those built-in?)
 » 3 years ago, # |   0 Auto comment: topic has been updated by Franklyn_W (previous revision, new revision, compare).
 » 3 years ago, # |   +16 Note that the reason that Gram-Schmidt worked is that the inputs to the problem is randomly generated. Gram-Schmidt is highly unstable in many situations, compared to Householder method for QR decomposition (or similar operations). For example, see last slide in:http://www.seas.ucla.edu/~vandenbe/133A/lectures/ls.pdf