Right after the end of Facebook Hacker Cup 2021 Round 1, wjomlex posted the following comment in the discussion section under the announcement blog:

A common issue was people writing O(N^2) solutions to problem A when O(N) solutions are necessary. You won't be able to run an O(N^2) solution in 6 minutes when N = 800,000.

But is this really true? Challenge accepted!

##### Problem A1

**Spoiler**

First of all, here is an $$$O(N)$$$ solution for problem A1:

**Spoiler**

```
#include <iostream>
#include <vector>
#include <stdint.h>
using namespace std;
/*
* This is a solution for problem A1. Time complexity: O(N)
* Characters in the input buffer are remapped: 'F' => 0, 'X' => 1, 'O' => 2
*/
static int32_t solve_a1(const int8_t *arr, int n)
{
int32_t ans = 0, hand = 0;
for (int i = 0; i < n; i++) {
int32_t ch = arr[i];
hand |= ch;
if (hand == 3) {
ans++;
hand = ch;
}
}
return ans;
}
int main()
{
ios_base::sync_with_stdio(false); cin.tie(NULL);
int t;
cin >> t;
for (int casenum = 1; casenum <= t; casenum++) {
int n;
string s;
cin >> n >> s;
vector<int8_t> buf(n);
for (int i = 0; i < n; i++) {
if (s[i] == 'X')
buf[i] = 1;
else if (s[i] == 'O')
buf[i] = 2;
}
int32_t ans = solve_a1(&buf[0], n);
cout << "Case #" << casenum << ": " << ans << "\n";
}
}
```

A bit of preprocessing is done to the input data for remapping letters to numbers 0, 1 and 2. This allows to use bitwise OR operation for updating the current state of hands: 0 — undecided, 1 — left hand, 2 — right hand. If the hands state variable becomes equal to 3, then it's a hands change situation and the answer is increased by 1. The reason for implementing it this way is that the elementary operation in the innermost loop needs to be very fast. It's going to be reused in A2 solutions too.

##### Problem A2, solution v0

Reusing code from the problem A1 solution allows to easily create an $$$O(N^3)$$$ solution for problem A2 by just iterating over all the possible substrings, calling function "solve_a1" for each of them and accumulating the result. But we need $$$O(N^2)$$$ to satisfy the conditions of the challenge and this can be achieved by applying a minor adjustment, transforming "solve_a1" into "solve_a2l":

**Solution v0 for problem A2**

```
#include <iostream>
#include <vector>
#include <string>
#include <stdint.h>
using namespace std;
// #pragma GCC optimize("O3,unroll-loops")
// #pragma GCC target("avx,avx2,sse,sse2,sse3,sse4,popcnt,fma")
/*
* The data type used for the array. Can be int8_t, int16_t or int32_t. Smaller
* data types reduce memory usage and cache misses, but may introduce a bit of
* conversion overhead.
*/
#ifndef ETYPE
#define ETYPE int32_t
#endif
const int32_t mod = 1000000007;
/*
* This is a solution for modified problem A2: account only substrings, which are
* aligned at the left side of the original string. Example: valid left aligned
* substrings of "OXXF" are "O", "OX", "OXX", and "OXXF". Time complexity: O(N)
*
* Note: characters in the input buffer are remapped: 'F' => 0, 'X' => 1, 'O' => 2
*/
static int32_t solve_a2l(const ETYPE *arr, int n)
{
int32_t partial_ans = 0, ans = 0, hand = 0;
for (int i = 0; i < n; i++) {
int32_t ch = arr[i];
hand |= ch;
if (hand == 3) {
partial_ans++;
hand = ch;
}
ans += partial_ans;
if (ans >= mod)
ans -= mod;
}
return ans;
}
/*
* This is a solution for problem A2. Time complexity: O(N^2)
*
* Note: characters in the input buffer are remapped: 'F' => 0, 'X' => 1, 'O' => 2
*/
static int32_t solve_a2(const ETYPE *arr, int n)
{
int32_t ans = 0;
for (int i = 0; i < n; i++) {
ans += solve_a2l(arr + i, n - i);
if (ans >= mod)
ans -= mod;
}
return ans;
}
int main()
{
ios_base::sync_with_stdio(false); cin.tie(NULL);
int t;
cin >> t;
for (int casenum = 1; casenum <= t; casenum++) {
int n;
string s;
cin >> n >> s;
vector<ETYPE> buf(n);
for (int i = 0; i < n; i++) {
if (s[i] == 'X')
buf[i] = 1;
else if (s[i] == 'O')
buf[i] = 2;
}
int32_t ans = solve_a2(&buf[0], n);
cout << "Case #" << casenum << ": " << ans << "\n";
}
}
```

Unsurprisingly, this solution is way too slow and needs 28 minutes to process the practice mode data even on my desktop computer with a quad-core Intel Core i7-860 processor @2.8GHz:

**Spoiler**

```
# g++-11.2.0 -O3 -march=native -funroll-loops solution0.cpp
# time ./a.out < weak_typing_chapter_2_input.txt > output.txt
real 28m13.998s
user 28m12.630s
sys 0m1.280s
```

##### Problem A2, solution v1 (OpenMP pragma)

A good thing is that the performance can be very easily improved by just taking advantage of more CPU cores:

**Solution v1 for problem A2 (use OpenMP pragma)**

```
#include <iostream>
#include <vector>
#include <string>
#include <stdint.h>
using namespace std;
// #pragma GCC optimize("O3,unroll-loops")
// #pragma GCC target("avx,avx2,sse,sse2,sse3,sse4,popcnt,fma")
/*
* The data type used for the array. Can be int8_t, int16_t or int32_t. Smaller
* data types reduce memory usage and cache misses, but may introduce a bit of
* conversion overhead.
*/
#ifndef ETYPE
#define ETYPE int32_t
#endif
const int32_t mod = 1000000007;
/*
* This is a solution for modified problem A2: account only substrings, which are
* aligned at the left side of the original string. Example: valid left aligned
* substrings of "OXXF" are "O", "OX", "OXX", and "OXXF". Time complexity: O(N)
*
* Note: characters in the input buffer are remapped: 'F' => 0, 'X' => 1, 'O' => 2
*/
static int32_t solve_a2l(const ETYPE *arr, int n)
{
int32_t partial_ans = 0, ans = 0, hand = 0;
for (int i = 0; i < n; i++) {
int32_t ch = arr[i];
hand |= ch;
if (hand == 3) {
partial_ans++;
hand = ch;
}
ans += partial_ans;
if (ans >= mod)
ans -= mod;
}
return ans;
}
/*
* This is a solution for problem A2. Time complexity: O(N^2)
*
* Note: characters in the input buffer are remapped: 'F' => 0, 'X' => 1, 'O' => 2
*/
static int32_t solve_a2(const ETYPE *arr, int n)
{
int64_t ans = 0;
#pragma omp parallel for reduction(+:ans) schedule(guided)
for (int i = n - 1; i >= 0; i--)
ans += solve_a2l(arr + i, n - i);
return ans % mod;
}
int main()
{
ios_base::sync_with_stdio(false); cin.tie(NULL);
int t;
cin >> t;
for (int casenum = 1; casenum <= t; casenum++) {
int n;
string s;
cin >> n >> s;
vector<ETYPE> buf(n);
for (int i = 0; i < n; i++) {
if (s[i] == 'X')
buf[i] = 1;
else if (s[i] == 'O')
buf[i] = 2;
}
int32_t ans = solve_a2(&buf[0], n);
cout << "Case #" << casenum << ": " << ans << "\n";
}
}
```

Just a single line `#pragma omp parallel for`

before a single loop helps a lot. The `reduction(+:ans)`

part is also needed to tell OpenMP how to combine results from each loop iteration (without it we would need to store these results in a temporary buffer and then add them together in an extra $$$O(N)$$$ loop). The accumulator variable is upgraded to int64_t to avoid overflows. Additionally, the `schedule(guided)`

part of the pragma and the reversed order of processing is there to keep all CPU cores busy and improve performance. The compiler command line needs to include `-fopenmp`

option.

Below are benchmarks on my desktop computer, old laptop and Samsung Galaxy Tab S6 Lite tablet:

**Spoiler**

```
# g++-11.2.0 -O3 -march=native -funroll-loops -fopenmp solution1.cpp
# time ./a.out < weak_typing_chapter_2_input.txt > output.txt
real 6m31.028s
user 52m6.475s
sys 0m0.020s
# g++-11.2.0 -O3 -march=core2 -funroll-loops -fopenmp solution1.cpp
# time ./a.out < weak_typing_chapter_2_input.txt > output.txt
real 25m39.187s
user 51m15.532s
sys 0m0.156s
# aarch64-unknown-linux-gnu-g++-11.2.0 -static -O3 -mtune=cortex-a73.cortex-a53 -funroll-loops -fopenmp solution1.cpp
# adb push a.out /data/local/tmp/
# time adb shell /data/local/tmp/a.out < weak_typing_chapter_2_input.txt > output.txt
real 8m34.429s
user 0m0.000s
sys 0m0.005s
```

Solution | Device | Processor | Time |
---|---|---|---|

solution v0 | desktop computer | quad-core Intel Core i7-860 @2.8GHz | 28m13.998s |

solution v1 (OpenMP) | desktop computer | quad-core Intel Core i7-860 @2.8GHz | 6m31.028s |

old laptop | dual-core Intel Core2 Duo T7200 @2.0GHz | 25m39.187s | |

android tablet | octa-core 4x ARM Cortex-A73 @2.3GHz + 4x ARM Cortex-A53 @1.7GHz | 8m34.429s |

##### Running it all on a homemade "cluster"

I have multiple devices at home and none of them is fast enough by itself. But what if they all work together? It isn't very difficult to make a script, which dissects the input data into separate testcases, runs them on different devices and combines their output together:

**Ruby script**

```
require 'open3'
# This script splits a big test into individual testcases. And then runs
# them on multiple computers in parallel.
# Each entry in the 'workers' array describes a command line, which can be
# executed to process a single testcase. Testcases can be run on Android
# phones/tablets via 'adb' or on other Linux computers via 'ssh'.
workers = [
"./a.out", # Desktop computer
"ssh [email protected] /tmp/a.out", # Laptop
"adb shell /data/local/tmp/a.out", # Android tablet
]
# If true, then print debugging information and statistics to stderr
VERBOSE = true
#########################################################################
jobs_todo = {}
jobs_in_progress = {}
jobs_done = {}
# read all testcases from standard input
testcases = gets.to_i.times.map {|i| gets ; [i + 1, gets.strip] }
# sort testcases by lengths of strings, longest first
testcases.sort! {|x, y| y[1].length <=> x[1].length }
# initialize the 'jobs_todo' hash (key: testcase number, value: string)
testcases.each {|x| jobs_todo[x[0]] = x[1] }
workers_contribution = {}
workers.each {|w| workers_contribution[w] = {:cases => 0, :work => 0} }
mutex = Mutex.new
# Spawn a thread for each worker
threads = workers.map do |cmdline|
Thread.new do
while true
id, data = nil, nil
mutex.synchronize do
queue = jobs_todo
queue = jobs_in_progress if queue.empty?
if queue.empty?
# Nothing left to do? Then print results and exit.
if VERBOSE
total_work = workers_contribution.to_a.map {|x| x[1][:work] }.sum
workers_contribution.each do |k, x|
x[:work_percentage] = x[:work].to_f / total_work * 100
end
sorted_wc = workers_contribution.to_a.sort {|x, y| y[1][:work] <=> x[1][:work]}
STDERR.printf("\n== Workers contribution: ==\n")
sorted_wc.each do |worker, x|
STDERR.printf(" '%s' : %.2f%% (%d cases)\n",
worker, x[:work_percentage], x[:cases])
end
end
jobs_done.to_a.sort {|x, y| x[0] <=> y[0] }.each do |x|
printf("Case #%d: %s\n", x[0], x[1])
end
exit
end
id, data = queue.shift
jobs_in_progress[id] = data
end
STDERR.printf("'%s' working on case #%d\n", cmdline, id) if VERBOSE
ans, status = Open3.capture2(cmdline, :stdin_data=>format("1\n%d\n%s\n", data.size, data))
unless status.success? && ans =~ /^Case\s+\#1\:\s*(\d+)/
STDERR.printf("Error: something failed when running '%s'\n", cmdline)
break
end
ans = $1.to_i
STDERR.printf("'%s' finished case #%d (ans = %d)\n", cmdline, id, ans) if VERBOSE
mutex.synchronize do
if jobs_in_progress.delete(id)
# Account only useful work (the other worker hasn't completed it first)
workers_contribution[cmdline][:cases] += 1
workers_contribution[cmdline][:work] += data.size * data.size
end
jobs_done[id] = ans
end
end
end
end
# Wait until threads completion
threads.each {|thread| thread.join }
```

**Results**

```
# time ruby a2_dispatcher.rb < weak_typing_chapter_2_input.txt > output.txt
'ssh [email protected] /tmp/a.out' working on case #17
'./a.out' working on case #19
'adb shell /data/local/tmp/a.out' working on case #18
'adb shell /data/local/tmp/a.out' finished case #18 (ans = 148472118)
'adb shell /data/local/tmp/a.out' working on case #16
'adb shell /data/local/tmp/a.out' finished case #16 (ans = 834058048)
'adb shell /data/local/tmp/a.out' working on case #37
'adb shell /data/local/tmp/a.out' finished case #37 (ans = 56032428)
'adb shell /data/local/tmp/a.out' working on case #84
'adb shell /data/local/tmp/a.out' finished case #84 (ans = 58222271)
'adb shell /data/local/tmp/a.out' working on case #52
'adb shell /data/local/tmp/a.out' finished case #52 (ans = 50657009)
'adb shell /data/local/tmp/a.out' working on case #101
'adb shell /data/local/tmp/a.out' finished case #101 (ans = 42545057)
'adb shell /data/local/tmp/a.out' working on case #91
'adb shell /data/local/tmp/a.out' finished case #91 (ans = 56100830)
'adb shell /data/local/tmp/a.out' working on case #112
'adb shell /data/local/tmp/a.out' finished case #112 (ans = 59108176)
'adb shell /data/local/tmp/a.out' working on case #65
'adb shell /data/local/tmp/a.out' finished case #65 (ans = 27529065)
'adb shell /data/local/tmp/a.out' working on case #79
'adb shell /data/local/tmp/a.out' finished case #79 (ans = 41898492)
'adb shell /data/local/tmp/a.out' working on case #67
'adb shell /data/local/tmp/a.out' finished case #67 (ans = 12521059)
'adb shell /data/local/tmp/a.out' working on case #29
'adb shell /data/local/tmp/a.out' finished case #29 (ans = 64753838)
'adb shell /data/local/tmp/a.out' working on case #85
'adb shell /data/local/tmp/a.out' finished case #85 (ans = 7280734)
'adb shell /data/local/tmp/a.out' working on case #50
'adb shell /data/local/tmp/a.out' finished case #50 (ans = 52510928)
'adb shell /data/local/tmp/a.out' working on case #68
'adb shell /data/local/tmp/a.out' finished case #68 (ans = 36776368)
'adb shell /data/local/tmp/a.out' working on case #113
'adb shell /data/local/tmp/a.out' finished case #113 (ans = 31104515)
'adb shell /data/local/tmp/a.out' working on case #96
'adb shell /data/local/tmp/a.out' finished case #96 (ans = 54221446)
'adb shell /data/local/tmp/a.out' working on case #57
'adb shell /data/local/tmp/a.out' finished case #57 (ans = 24358475)
'adb shell /data/local/tmp/a.out' working on case #60
'adb shell /data/local/tmp/a.out' finished case #60 (ans = 19820558)
'adb shell /data/local/tmp/a.out' working on case #41
'adb shell /data/local/tmp/a.out' finished case #41 (ans = 29639004)
'adb shell /data/local/tmp/a.out' working on case #47
'adb shell /data/local/tmp/a.out' finished case #47 (ans = 44802909)
'adb shell /data/local/tmp/a.out' working on case #102
'adb shell /data/local/tmp/a.out' finished case #102 (ans = 15457955)
'adb shell /data/local/tmp/a.out' working on case #51
'adb shell /data/local/tmp/a.out' finished case #51 (ans = 17078071)
'adb shell /data/local/tmp/a.out' working on case #20
'adb shell /data/local/tmp/a.out' finished case #20 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #38
'adb shell /data/local/tmp/a.out' finished case #38 (ans = 17236422)
'adb shell /data/local/tmp/a.out' working on case #82
'adb shell /data/local/tmp/a.out' finished case #82 (ans = 20074945)
'adb shell /data/local/tmp/a.out' working on case #74
'adb shell /data/local/tmp/a.out' finished case #74 (ans = 39100184)
'adb shell /data/local/tmp/a.out' working on case #53
'adb shell /data/local/tmp/a.out' finished case #53 (ans = 33315064)
'adb shell /data/local/tmp/a.out' working on case #31
'adb shell /data/local/tmp/a.out' finished case #31 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #105
'adb shell /data/local/tmp/a.out' finished case #105 (ans = 24581824)
'adb shell /data/local/tmp/a.out' working on case #64
'adb shell /data/local/tmp/a.out' finished case #64 (ans = 6801215)
'adb shell /data/local/tmp/a.out' working on case #71
'adb shell /data/local/tmp/a.out' finished case #71 (ans = 25461827)
'adb shell /data/local/tmp/a.out' working on case #58
'adb shell /data/local/tmp/a.out' finished case #58 (ans = 12450431)
'adb shell /data/local/tmp/a.out' working on case #69
'adb shell /data/local/tmp/a.out' finished case #69 (ans = 18137497)
'adb shell /data/local/tmp/a.out' working on case #43
'adb shell /data/local/tmp/a.out' finished case #43 (ans = 14613562)
'adb shell /data/local/tmp/a.out' working on case #70
'adb shell /data/local/tmp/a.out' finished case #70 (ans = 16770582)
'adb shell /data/local/tmp/a.out' working on case #59
'adb shell /data/local/tmp/a.out' finished case #59 (ans = 10269141)
'adb shell /data/local/tmp/a.out' working on case #95
'adb shell /data/local/tmp/a.out' finished case #95 (ans = 18002252)
'adb shell /data/local/tmp/a.out' working on case #26
'adb shell /data/local/tmp/a.out' finished case #26 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #90
'adb shell /data/local/tmp/a.out' finished case #90 (ans = 2042265)
'adb shell /data/local/tmp/a.out' working on case #23
'adb shell /data/local/tmp/a.out' finished case #23 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #109
'adb shell /data/local/tmp/a.out' finished case #109 (ans = 3227509)
'adb shell /data/local/tmp/a.out' working on case #110
'adb shell /data/local/tmp/a.out' finished case #110 (ans = 17228709)
'adb shell /data/local/tmp/a.out' working on case #72
'adb shell /data/local/tmp/a.out' finished case #72 (ans = 10254945)
'adb shell /data/local/tmp/a.out' working on case #61
'adb shell /data/local/tmp/a.out' finished case #61 (ans = 3391455)
'adb shell /data/local/tmp/a.out' working on case #77
'adb shell /data/local/tmp/a.out' finished case #77 (ans = 6142681)
'adb shell /data/local/tmp/a.out' working on case #116
'adb shell /data/local/tmp/a.out' finished case #116 (ans = 9898960)
'adb shell /data/local/tmp/a.out' working on case #21
'adb shell /data/local/tmp/a.out' finished case #21 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #78
'adb shell /data/local/tmp/a.out' finished case #78 (ans = 10133048)
'adb shell /data/local/tmp/a.out' working on case #75
'adb shell /data/local/tmp/a.out' finished case #75 (ans = 8831706)
'adb shell /data/local/tmp/a.out' working on case #80
'adb shell /data/local/tmp/a.out' finished case #80 (ans = 11639385)
'adb shell /data/local/tmp/a.out' working on case #103
'adb shell /data/local/tmp/a.out' finished case #103 (ans = 10890701)
'adb shell /data/local/tmp/a.out' working on case #32
'adb shell /data/local/tmp/a.out' finished case #32 (ans = 5747095)
'adb shell /data/local/tmp/a.out' working on case #88
'adb shell /data/local/tmp/a.out' finished case #88 (ans = 6963374)
'adb shell /data/local/tmp/a.out' working on case #22
'adb shell /data/local/tmp/a.out' finished case #22 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #98
'adb shell /data/local/tmp/a.out' finished case #98 (ans = 8240529)
'adb shell /data/local/tmp/a.out' working on case #100
'adb shell /data/local/tmp/a.out' finished case #100 (ans = 8808845)
'adb shell /data/local/tmp/a.out' working on case #89
'adb shell /data/local/tmp/a.out' finished case #89 (ans = 1542116)
'adb shell /data/local/tmp/a.out' working on case #63
'adb shell /data/local/tmp/a.out' finished case #63 (ans = 4589254)
'adb shell /data/local/tmp/a.out' working on case #104
'adb shell /data/local/tmp/a.out' finished case #104 (ans = 2110139)
'adb shell /data/local/tmp/a.out' working on case #81
'adb shell /data/local/tmp/a.out' finished case #81 (ans = 4156428)
'adb shell /data/local/tmp/a.out' working on case #48
'adb shell /data/local/tmp/a.out' finished case #48 (ans = 960469)
'adb shell /data/local/tmp/a.out' working on case #24
'adb shell /data/local/tmp/a.out' finished case #24 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #62
'adb shell /data/local/tmp/a.out' finished case #62 (ans = 3086380)
'adb shell /data/local/tmp/a.out' working on case #42
'adb shell /data/local/tmp/a.out' finished case #42 (ans = 330330)
'adb shell /data/local/tmp/a.out' working on case #27
'adb shell /data/local/tmp/a.out' finished case #27 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #25
'adb shell /data/local/tmp/a.out' finished case #25 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #97
'adb shell /data/local/tmp/a.out' finished case #97 (ans = 2311248)
'adb shell /data/local/tmp/a.out' working on case #92
'adb shell /data/local/tmp/a.out' finished case #92 (ans = 3409902)
'adb shell /data/local/tmp/a.out' working on case #87
'adb shell /data/local/tmp/a.out' finished case #87 (ans = 3018037)
'adb shell /data/local/tmp/a.out' working on case #44
'adb shell /data/local/tmp/a.out' finished case #44 (ans = 437384)
'adb shell /data/local/tmp/a.out' working on case #106
'adb shell /data/local/tmp/a.out' finished case #106 (ans = 1645320)
'adb shell /data/local/tmp/a.out' working on case #34
'adb shell /data/local/tmp/a.out' finished case #34 (ans = 1615237)
'adb shell /data/local/tmp/a.out' working on case #111
'adb shell /data/local/tmp/a.out' finished case #111 (ans = 2286510)
'adb shell /data/local/tmp/a.out' working on case #56
'adb shell /data/local/tmp/a.out' finished case #56 (ans = 671796)
'adb shell /data/local/tmp/a.out' working on case #49
'adb shell /data/local/tmp/a.out' finished case #49 (ans = 2486902)
'adb shell /data/local/tmp/a.out' working on case #33
'adb shell /data/local/tmp/a.out' finished case #33 (ans = 1626785)
'adb shell /data/local/tmp/a.out' working on case #73
'adb shell /data/local/tmp/a.out' finished case #73 (ans = 944403)
'adb shell /data/local/tmp/a.out' working on case #46
'adb shell /data/local/tmp/a.out' finished case #46 (ans = 1871972)
'adb shell /data/local/tmp/a.out' working on case #76
'adb shell /data/local/tmp/a.out' finished case #76 (ans = 1036408)
'adb shell /data/local/tmp/a.out' working on case #54
'adb shell /data/local/tmp/a.out' finished case #54 (ans = 1223607)
'adb shell /data/local/tmp/a.out' working on case #55
'adb shell /data/local/tmp/a.out' finished case #55 (ans = 1109920)
'adb shell /data/local/tmp/a.out' working on case #30
'adb shell /data/local/tmp/a.out' finished case #30 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #114
'adb shell /data/local/tmp/a.out' finished case #114 (ans = 697327)
'adb shell /data/local/tmp/a.out' working on case #66
'adb shell /data/local/tmp/a.out' finished case #66 (ans = 907592)
'adb shell /data/local/tmp/a.out' working on case #94
'adb shell /data/local/tmp/a.out' finished case #94 (ans = 778303)
'adb shell /data/local/tmp/a.out' working on case #108
'adb shell /data/local/tmp/a.out' finished case #108 (ans = 316631)
'adb shell /data/local/tmp/a.out' working on case #35
'adb shell /data/local/tmp/a.out' finished case #35 (ans = 207097)
'adb shell /data/local/tmp/a.out' working on case #45
'adb shell /data/local/tmp/a.out' finished case #45 (ans = 729879)
'adb shell /data/local/tmp/a.out' working on case #40
'adb shell /data/local/tmp/a.out' finished case #40 (ans = 220988)
'adb shell /data/local/tmp/a.out' working on case #39
'adb shell /data/local/tmp/a.out' finished case #39 (ans = 154206)
'adb shell /data/local/tmp/a.out' working on case #86
'adb shell /data/local/tmp/a.out' finished case #86 (ans = 230182)
'adb shell /data/local/tmp/a.out' working on case #107
'adb shell /data/local/tmp/a.out' finished case #107 (ans = 132748)
'adb shell /data/local/tmp/a.out' working on case #99
'adb shell /data/local/tmp/a.out' finished case #99 (ans = 274610)
'adb shell /data/local/tmp/a.out' working on case #36
'adb shell /data/local/tmp/a.out' finished case #36 (ans = 107906)
'adb shell /data/local/tmp/a.out' working on case #83
'adb shell /data/local/tmp/a.out' finished case #83 (ans = 32833)
'adb shell /data/local/tmp/a.out' working on case #115
'adb shell /data/local/tmp/a.out' finished case #115 (ans = 16016)
'adb shell /data/local/tmp/a.out' working on case #93
'adb shell /data/local/tmp/a.out' finished case #93 (ans = 3280)
'adb shell /data/local/tmp/a.out' working on case #28
'adb shell /data/local/tmp/a.out' finished case #28 (ans = 1390)
'adb shell /data/local/tmp/a.out' working on case #5
'adb shell /data/local/tmp/a.out' finished case #5 (ans = 146)
'adb shell /data/local/tmp/a.out' working on case #10
'adb shell /data/local/tmp/a.out' finished case #10 (ans = 188)
'adb shell /data/local/tmp/a.out' working on case #4
'adb shell /data/local/tmp/a.out' finished case #4 (ans = 36)
'adb shell /data/local/tmp/a.out' working on case #9
'adb shell /data/local/tmp/a.out' finished case #9 (ans = 32)
'adb shell /data/local/tmp/a.out' working on case #11
'adb shell /data/local/tmp/a.out' finished case #11 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #12
'adb shell /data/local/tmp/a.out' finished case #12 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #13
'adb shell /data/local/tmp/a.out' finished case #13 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #14
'adb shell /data/local/tmp/a.out' finished case #14 (ans = 165)
'adb shell /data/local/tmp/a.out' working on case #15
'adb shell /data/local/tmp/a.out' finished case #15 (ans = 165)
'adb shell /data/local/tmp/a.out' working on case #3
'adb shell /data/local/tmp/a.out' finished case #3 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #8
'adb shell /data/local/tmp/a.out' finished case #8 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #2
'adb shell /data/local/tmp/a.out' finished case #2 (ans = 1)
'adb shell /data/local/tmp/a.out' working on case #7
'adb shell /data/local/tmp/a.out' finished case #7 (ans = 2)
'adb shell /data/local/tmp/a.out' working on case #1
'adb shell /data/local/tmp/a.out' finished case #1 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #6
'adb shell /data/local/tmp/a.out' finished case #6 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #17
'./a.out' finished case #19 (ans = 553484346)
'./a.out' working on case #17
'./a.out' finished case #17 (ans = 735866676)
== Workers contribution: ==
'./a.out' : 66.68% (2 cases)
'adb shell /data/local/tmp/a.out' : 33.32% (114 cases)
'ssh [email protected] /tmp/a.out' : 0.00% (0 cases)
real 4m45.626s
user 38m2.469s
sys 0m0.111s
```

Now the time is down to **4m45.626s** and this is within the 6 minutes limit. Assigning jobs to different devices could be surely improved. Right now I'm just sorting them by the lenghts of strings and processing more difficult cases first. The old laptop was totally useless and couldn't handle even a single big testcase #17. The other devices came to the rescue in the end and finished it faster.

##### TL;DR;

Facebook Hacker Cup has an unusual format. The contestants have 6 minutes to run their solutions on their own computers. This provides a lot of opportunities to speed up even poor time complexity solutions by taking advantage of multiple CPU cores and multiple computers. OpenMP is easy to use.

##### To be continued

This is not the end, but the blog is already becoming too large and it's necessary to wrap it up. The next part will feature Clang vs. GCC, different compiler optimization options and taking advantage of SSE2/AVX2.

**Spoiler**

It's possible to process the true worst case (5 x 800000) using just a single thread of an Intel Core i7-860 processor in less than 6 minutes by a vectorized $$$O(N^2)$$$ solution.

Really nice blog, I remember people talking about doing this but not having a concrete example. Glad to see someone actually went ahead and did it.

Just a side-note: the use of a separate script sounds nonstandard (but it works around some issues with heterogeneous systems, so it's really nice). As far as I know, there's a way to use MPI (even as a hybrid with OpenMP) over a network (even though people claim heterogeneous support is broken, from what I've heard, it tends to work for Android on ARM and Linux on x86-64), which could allow for even better distribution of tasks across nodes (perhaps in terms of load balancing), though this doesn't sound as necessary for this problem.

See? This is what we mean when we talk about problem solving. Don't let anybody tell you what you can't do.

Nice. People run solutions on multiple machines in GHC but I've never heard about using a tablet. It seems that it would be enough for you to just split test cases in half between PC and tablet, right? No need for any smart assignment.

I used a less complicated set-up several times already. Split the test cases into 8 groups and run in parallel, no pragmas. It even got me to FHC finals once — I squeezed a slow $$$O(N \cdot \log^2 N)$$$ in 5 minutes.

Sadly, all of this rewards having a strong computer or using an online cluster. I would love to see this format in onsite competitions though, where all contestants have the same computing power.