So, the qualification round was upon us, and it's a nice way to have some fun. In the past two years I've been writing solutions in multiple languages. In both of them I used Excel for one problem. Without any use of VBA of course, as it would've been too boring. This year I decided to go nuclear and use Excel for everything. I managed to get 100 points. Here are the notes on this glorious experience. .
So, how does this works overall? In Excel you can import data, particularly from text. You can also specify separator, which is usually space, which means the numbers will spread out over the columns. Then you can write formulas to calculate the output row for every test, and if they're presented sequentially in one column, you can easily select them and copy-paste into any text editor.
Let's deal with the easiest problem first in terms of implementation in Excel — D. X is in column A, R is in column B, C is in column C. Use simple formula:
=IF(A2="", "", "Case #" & ROW(D2)-1 & ": " & IF(MOD(B2*C2, A2) = 0, IF(A2 <= 2, "GABRIEL", IF(A2 >= 7, "RICHARD", IF(A2 = 3, IF(MIN(B2, C2) <= 1, "RICHARD", "GABRIEL"), IF(A2 = 4, IF(MIN(B2, C2) <= 2, "RICHARD", "GABRIEL"), IF(A2 = 5, IF(OR(MIN(B2, C2) <= 2, AND(MIN(B2, C2) = 3, MAX(B2, C2) = 5)), "RICHARD", "GABRIEL"), IF(MIN(B2, C2) <= 3, "RICHARD", "GABRIEL")))))), "RICHARD"))
Done. Boring. Next, please.
So let's deal with A next. This introduces some complications. The data grows up to 1000 in size, so we might want to use 1000 cells. Now, we have 100 test cases and we have ourselves a first problem — source code size limit. However, we have this nice feature, that if we select a range of cells and drag the bottom-right corner of it, it will copy-paste the formula and adjust the references incrementally. We can also control what exactly do we want to change incrementally by placing
$ before column name or row number. For example,
$A$1 will always remain A1,
$A1 will keep the column name the same, but will increase the row number in the next row and so on. So essentially, we could still have the large data tables, however remove them initially and add some instructions to auto-fill some ranges before using the spreadsheet. So, with that in mind, we can get under 100KB for .xlsx.
Great. So with A in mind, let's have 1001 columns, where we compute partial sums for all prefixes. Then, in another 1000 columns, for every position we can compute how many people do we need to add for that group to stand up and in the result column just pick the maximum of those. So, if we avoid checks for empty test cases and small stuff like that, we get formulas looking like this:
=IF(LEN($B2) >= D$1, VALUE(MID($B2,D$1,1)) + C2, "") =IF($A2 >= ALP$1, MAX(ALP$1-D2, 0), "") =IF($B2="", "", "Case #" & ROW()-1 & ": " & MAX(0, ALP2:BYA2))
MID() takes one character of the string,
VALUE() converts it to integer and we get our partial sums; the second line checks how much people for that index we need to add. First row has just stored indexes for convenience, we could get the same result using
COLUMN() - CONST function. Anyway, nothing complicated here, just need to make sure to make the second column where the strings ends up as column containing text. Otherwise, they might get treated as number and leading zeroes might get removed, messing up everything. Anyway, let's move on.
So, we get to B. My approach was for every possible amount of non-special minutes M we sum
(X-1)/M for all given values X to figure how many minutes will be special to move pancakes around. Then pick such M that the sum is minimal. Sounds great. However we need 1000 blocks for all possible values of M and then we need to divide all the data by that number, meaning that we need 1M numbers for every test case. Not ideal. So, I solved small by having 36 columns in that way, but for large we need something else, since we don't have (or at least I'm not aware) how we can have a sum over range where some function is applied to every cell beforehand.
But! We do have such function for one particular function —
SUMPRODUCT(). It takes two or more ranges identical in size and returns the sum of their products. We need the division though. However, let's transform the data to counting how many people have X pancakes on the plate like this with 1000 columns:
INDIRECT() is the awesome function. It can take a string with cell name and returns a reference to it. So we just construct the name of range endpoints and then convert them to references, meaning we will look for our required number of pancakes in the correct range. Again, row 1 stores pancake counts for convenience and column ALM stores row numbers where to look for the data (could be
ROW()*2-1 if I wasn't too lazy). Great. Now we can notice that if we fix M, the answer will be SUMPRODUCT of the data row stored in this way with the row where we divide all number (X-1) by M. Great. However we require that rows for every M. Oh well, let's put this 1000x1000 table somewhere in the spreadsheet. To make it auto-fillable, we can use something like this:
=QUOTIENT(ALO$1 - 1, ROW()-101)
ROW() will get us our divisor for every row, and row 1 stores pancake counts, so we can auto-fill everything. Now we can use
SUMPRODUCT() to get sums for every M:
=SUMPRODUCT($ALN2:$BXY2, INDIRECT("ALN" & 101+BXZ$1):INDIRECT("BXY" & 101+BXZ$1)) + BXZ$1
Finally, just look for minimum in that range. And get our output. Done. Let's move on to the final one.
So, C. Our string size is now 10K. Close to failing, since the maximum string size in Excel is 32K. Also, we only have 16K columns in sheet, so that is another concern. In my approach, I computed the result for every prefix with length <= 40K and for every suffix with length <= 40K. I find the shortest prefix giving i, the shortest suffix giving k, and checking if the remainder gives us j.
First problem: with previous approach we don't have enough columns. So we have to store that array in other orientation. So, how can we calculate the character from the table? I just extend it with negatives for simplicity, and have that 8x8 table somewhere in the sheet. We can then use
MATCH() to find the position in header arrays for that table, and get a correct reference from there:
=INDIRECT(CHAR(MATCH(MID(L$2, MOD(ROW()-5, L$3)+1, 1), $D$1:$K$1, 0)+67) & MATCH(L4, $C$2:$C$9, 0)+1)
The third argument of
MATCH() changes the behavior — when it's 0 we look for first element position in array that is exactly the same. We'll use it to find the shortest prefix and suffix as well later on. Now, as we have computed all the prefixes and suffixes, I computed if the split exists in another 20 columns. The other useful function is
ADDRESS(), which takes row and column number and computes the string, which you can then feed to
INDIRECT(), so there's no need to compute three-letter column names from column number manually, but I do that with one-letter names with
CHAR() for simplicity.
Anyway, that's all great, but we hit another major problem — the sheet now won't calculate in 4 minutes on small dataset. Well, shit. I tried to slightly modify formulas to make them shorter, but to no avail. At the signs of desperation, I installed LibreOffice in hope that it might be somehow faster. My expectations of LibreOffice were not disappointed, as it crashed when opening already filled 50MB .xlsx and crashed when trying to fill those 8M cells, so it got uninstalled as quickly as it got installed.
It was 9AM, I still hadn't slept, because I decided that spending time using Excel in ways Excel shouldn't be used is way more fun than sleeping. So, I was about to give up, but then I had one final idea — what if
MID() which I use to get a single character from data in those 8M cells is somehow non O(1). The idea was stupid, but it led to some juicy results. So I quickly, inserted spaces between every letter in data in the .small input I had failed to solve in time earlier, and tried to see how quickly can I recalculate those 8M values. Since I had to manually change formulas after modifying column layout due to some $ signs, I just remove the whole result calculation part for now and tested it with every character now being in its own cell. It processed reasonably quickly. Great! Let's add result calculation back and it'll all work, right? Wrong. Same slowness. Hm, this is very weird. Is this behavior the same when I'm using
MID() on a huge string? Yep. Remove all result calculation and leave just prefixes and suffixes — it gets calculated in 20 seconds. I still have no idea why — my current suggestion is that caches might be to blame, as with result calculation distorted my nice dependency grid I somehow got a lot of cache misses now. Anyway, whatever the reason is, I now have something to go on.
In Excel, you can turn off automatic calculation. That's a start. However, you still can't recalculate selected region. But, you can recalculate the whole sheet. And that's our solution. I left prefix/suffix calculation on one sheet, and then when I import the data I recalculate that sheet in about 20 seconds. Then I go to the second sheet, where the results are computed using those prefixes/suffixes, and now I can calculate that sheet separately, meaning it won't affect the performance of prefix/suffix calculation anymore. As such, we managed to improve spreadsheet calculation speed by splitting cells in two datasheets, which was quite a fun trick.
So in the end, I managed to get my spreadsheets bug-free and get 100 points with pure Excel. For anyone interested in seeing this madness in full, here is the link. Also, as mentioned above, you do need Microsoft Office (I used 2013), as LibreOffice can't handle that madness. However, it's totally fine this year, as they relaxed the rule on having a free compiler or interpreter available for the language in qualification round. Now it reads: "For the qualification round of Code Jam, you may use any programming language to solve a Code Jam problem, using any development environment or text editor". So, that's cool. Enjoy, but probably don't try this at home. For your own sanity. I don't really want to optimize anymore spreadsheets myself. At least for another year.