$$$O(N^2)$$$ solution to Facebook Hacker Cup 2021 Round 1 problem A2 (part one)

#	User	Rating
1	tourist	3880
2	jiangly	3669
3	ecnerwala	3654
4	Benq	3627
5	orzdevinwang	3612
6	Geothermal	3569
6	cnnfls_csy	3569
8	jqdai0815	3532
9	Radewoosh	3522
10	gyh20	3447

#	User	Contrib.
1	awoo	161
2	maomao90	160
3	adamant	156
4	maroonrk	153
5	-is-this-fft-	148
5	atcoder_official	148
5	SecondThread	148
8	Petr	147
9	nor	144
9	TheScrasse	144

Right after the end of Facebook Hacker Cup 2021 Round 1, wjomlex posted the following comment in the discussion section under the announcement blog:

A common issue was people writing O(N^2) solutions to problem A when O(N) solutions are necessary. You won't be able to run an O(N^2) solution in 6 minutes when N = 800,000.

But is this really true? Challenge accepted!

Problem A1

Spoiler

Problem A2, solution v0

Reusing code from the problem A1 solution allows to easily create an $$$O(N^3)$$$ solution for problem A2 by just iterating over all the possible substrings, calling function "solve_a1" for each of them and accumulating the result. But we need $$$O(N^2)$$$ to satisfy the conditions of the challenge and this can be achieved by applying a minor adjustment, transforming "solve_a1" into "solve_a2l":

Solution v0 for problem A2

#include <iostream>
#include <vector>
#include <string>
#include <stdint.h>
using namespace std;

// #pragma GCC optimize("O3,unroll-loops")
// #pragma GCC target("avx,avx2,sse,sse2,sse3,sse4,popcnt,fma")

/*
 * The data type used for the array. Can be int8_t, int16_t or int32_t. Smaller
 * data types reduce memory usage and cache misses, but may introduce a bit of
 * conversion overhead.
 */
#ifndef ETYPE
#define ETYPE int32_t
#endif

const int32_t mod = 1000000007;

/*
 * This is a solution for modified problem A2: account only substrings, which are
 * aligned at the left side of the original string. Example: valid left aligned
 * substrings of "OXXF" are "O", "OX", "OXX", and "OXXF". Time complexity: O(N)
 *
 * Note: characters in the input buffer are remapped: 'F' => 0, 'X' => 1, 'O' => 2
 */
static int32_t solve_a2l(const ETYPE *arr, int n)
{
  int32_t partial_ans = 0, ans = 0, hand = 0;
  for (int i = 0; i < n; i++) {
    int32_t ch = arr[i];
    hand |= ch;
    if (hand == 3) {
      partial_ans++;
      hand = ch;
    }
    ans += partial_ans;
    if (ans >= mod)
      ans -= mod;
  }
  return ans;
}

/*
 * This is a solution for problem A2. Time complexity: O(N^2)
 *
 * Note: characters in the input buffer are remapped: 'F' => 0, 'X' => 1, 'O' => 2
 */
static int32_t solve_a2(const ETYPE *arr, int n)
{
  int32_t ans = 0;
  for (int i = 0; i < n; i++) {
    ans += solve_a2l(arr + i, n - i);
    if (ans >= mod)
      ans -= mod;
  }
  return ans;
}

int main()
{
  ios_base::sync_with_stdio(false); cin.tie(NULL);
  int t;
  cin >> t;
  for (int casenum = 1; casenum <= t; casenum++) {
    int n;
    string s;
    cin >> n >> s;
    vector<ETYPE> buf(n);
    for (int i = 0; i < n; i++) {
      if (s[i] == 'X')
        buf[i] = 1;
      else if (s[i] == 'O')
        buf[i] = 2;
    }
    int32_t ans = solve_a2(&buf[0], n);
    cout << "Case #" << casenum << ": " <<  ans << "\n";
  }
}

Unsurprisingly, this solution is way too slow and needs 28 minutes to process the practice mode data even on my desktop computer with a quad-core Intel Core i7-860 processor @2.8GHz:

Spoiler

# g++-11.2.0 -O3 -march=native -funroll-loops solution0.cpp
# time ./a.out < weak_typing_chapter_2_input.txt > output.txt

real	28m13.998s
user	28m12.630s
sys	0m1.280s

Problem A2, solution v1 (OpenMP pragma)

A good thing is that the performance can be very easily improved by just taking advantage of more CPU cores:

Solution v1 for problem A2 (use OpenMP pragma)

#include <iostream>
#include <vector>
#include <string>
#include <stdint.h>
using namespace std;

// #pragma GCC optimize("O3,unroll-loops")
// #pragma GCC target("avx,avx2,sse,sse2,sse3,sse4,popcnt,fma")

/*
 * The data type used for the array. Can be int8_t, int16_t or int32_t. Smaller
 * data types reduce memory usage and cache misses, but may introduce a bit of
 * conversion overhead.
 */
#ifndef ETYPE
#define ETYPE int32_t
#endif

const int32_t mod = 1000000007;

/*
 * This is a solution for modified problem A2: account only substrings, which are
 * aligned at the left side of the original string. Example: valid left aligned
 * substrings of "OXXF" are "O", "OX", "OXX", and "OXXF". Time complexity: O(N)
 *
 * Note: characters in the input buffer are remapped: 'F' => 0, 'X' => 1, 'O' => 2
 */
static int32_t solve_a2l(const ETYPE *arr, int n)
{
  int32_t partial_ans = 0, ans = 0, hand = 0;
  for (int i = 0; i < n; i++) {
    int32_t ch = arr[i];
    hand |= ch;
    if (hand == 3) {
      partial_ans++;
      hand = ch;
    }
    ans += partial_ans;
    if (ans >= mod)
      ans -= mod;
  }
  return ans;
}

/*
 * This is a solution for problem A2. Time complexity: O(N^2)
 *
 * Note: characters in the input buffer are remapped: 'F' => 0, 'X' => 1, 'O' => 2
 */
static int32_t solve_a2(const ETYPE *arr, int n)
{
  int64_t ans = 0;
  #pragma omp parallel for reduction(+:ans) schedule(guided)
  for (int i = n - 1; i >= 0; i--)
    ans += solve_a2l(arr + i, n - i);
  return ans % mod;
}

int main()
{
  ios_base::sync_with_stdio(false); cin.tie(NULL);
  int t;
  cin >> t;
  for (int casenum = 1; casenum <= t; casenum++) {
    int n;
    string s;
    cin >> n >> s;
    vector<ETYPE> buf(n);
    for (int i = 0; i < n; i++) {
      if (s[i] == 'X')
        buf[i] = 1;
      else if (s[i] == 'O')
        buf[i] = 2;
    }
    int32_t ans = solve_a2(&buf[0], n);
    cout << "Case #" << casenum << ": " <<  ans << "\n";
  }
}

Just a single line #pragma omp parallel for before a single loop helps a lot. The reduction(+:ans) part is also needed to tell OpenMP how to combine results from each loop iteration (without it we would need to store these results in a temporary buffer and then add them together in an extra $$$O(N)$$$ loop). The accumulator variable is upgraded to int64_t to avoid overflows. Additionally, the schedule(guided) part of the pragma and the reversed order of processing is there to keep all CPU cores busy and improve performance. The compiler command line needs to include -fopenmp option.

Below are benchmarks on my desktop computer, old laptop and Samsung Galaxy Tab S6 Lite tablet:

Spoiler

# g++-11.2.0 -O3 -march=native -funroll-loops -fopenmp solution1.cpp
# time ./a.out < weak_typing_chapter_2_input.txt > output.txt

real	6m31.028s
user	52m6.475s
sys	0m0.020s

# g++-11.2.0 -O3 -march=core2 -funroll-loops -fopenmp solution1.cpp
# time ./a.out < weak_typing_chapter_2_input.txt > output.txt

real	25m39.187s
user	51m15.532s
sys	0m0.156s

# aarch64-unknown-linux-gnu-g++-11.2.0 -static -O3 -mtune=cortex-a73.cortex-a53 -funroll-loops -fopenmp solution1.cpp
# adb push a.out /data/local/tmp/
# time adb shell /data/local/tmp/a.out < weak_typing_chapter_2_input.txt > output.txt

real	8m34.429s
user	0m0.000s
sys	0m0.005s

Solution	Device	Processor	Time
solution v0	desktop computer	quad-core Intel Core i7-860 @2.8GHz	28m13.998s
solution v1 (OpenMP)	desktop computer	quad-core Intel Core i7-860 @2.8GHz	6m31.028s
	old laptop	dual-core Intel Core2 Duo T7200 @2.0GHz	25m39.187s
	android tablet	octa-core 4x ARM Cortex-A73 @2.3GHz + 4x ARM Cortex-A53 @1.7GHz	8m34.429s

Running it all on a homemade "cluster"

I have multiple devices at home and none of them is fast enough by itself. But what if they all work together? It isn't very difficult to make a script, which dissects the input data into separate testcases, runs them on different devices and combines their output together:

Ruby script

require 'open3'

# This script splits a big test into individual testcases. And then runs
# them on multiple computers in parallel.

# Each entry in the 'workers' array describes a command line, which can be
# executed to process a single testcase. Testcases can be run on Android
# phones/tablets via 'adb' or on other Linux computers via 'ssh'.

workers = [
  "./a.out",                         # Desktop computer
  "ssh ssvb@t61 /tmp/a.out",         # Laptop
  "adb shell /data/local/tmp/a.out", # Android tablet
]

# If true, then print debugging information and statistics to stderr
VERBOSE = true

#########################################################################

jobs_todo = {}
jobs_in_progress = {}
jobs_done = {}

# read all testcases from standard input
testcases = gets.to_i.times.map {|i| gets ; [i + 1, gets.strip] }
# sort testcases by lengths of strings, longest first
testcases.sort! {|x, y| y[1].length <=> x[1].length }
# initialize the 'jobs_todo' hash (key: testcase number, value: string)
testcases.each {|x| jobs_todo[x[0]] = x[1] }

workers_contribution = {}
workers.each {|w| workers_contribution[w] = {:cases => 0, :work => 0} }

mutex = Mutex.new

# Spawn a thread for each worker
threads = workers.map do |cmdline|
  Thread.new do
    while true
      id, data = nil, nil

      mutex.synchronize do
        queue = jobs_todo
        queue = jobs_in_progress if queue.empty?

        if queue.empty?
          # Nothing left to do? Then print results and exit.

          if VERBOSE
            total_work = workers_contribution.to_a.map {|x| x[1][:work] }.sum
            workers_contribution.each do |k, x|
              x[:work_percentage] = x[:work].to_f / total_work * 100
            end
            sorted_wc = workers_contribution.to_a.sort {|x, y| y[1][:work] <=> x[1][:work]}
            STDERR.printf("\n== Workers contribution: ==\n")
            sorted_wc.each do |worker, x|
              STDERR.printf("  '%s' : %.2f%% (%d cases)\n",
                            worker, x[:work_percentage], x[:cases])
            end
          end

          jobs_done.to_a.sort {|x, y| x[0] <=> y[0] }.each do |x|
            printf("Case #%d: %s\n", x[0], x[1])
          end
          exit
        end

        id, data = queue.shift
        jobs_in_progress[id] = data
      end

      STDERR.printf("'%s' working on case #%d\n", cmdline, id) if VERBOSE

      ans, status = Open3.capture2(cmdline, :stdin_data=>format("1\n%d\n%s\n", data.size, data))
      unless status.success? && ans =~ /^Case\s+\#1\:\s*(\d+)/
        STDERR.printf("Error: something failed when running '%s'\n", cmdline)
        break
      end
      ans = $1.to_i

      STDERR.printf("'%s' finished case #%d (ans = %d)\n", cmdline, id, ans) if VERBOSE

      mutex.synchronize do
        if jobs_in_progress.delete(id)
          # Account only useful work (the other worker hasn't completed it first)
          workers_contribution[cmdline][:cases] += 1
          workers_contribution[cmdline][:work] += data.size * data.size
        end
        jobs_done[id] = ans
      end
    end
  end
end

# Wait until threads completion
threads.each {|thread| thread.join }

Results

# time ruby a2_dispatcher.rb < weak_typing_chapter_2_input.txt > output.txt
'ssh ssvb@t61 /tmp/a.out' working on case #17
'./a.out' working on case #19
'adb shell /data/local/tmp/a.out' working on case #18
'adb shell /data/local/tmp/a.out' finished case #18 (ans = 148472118)
'adb shell /data/local/tmp/a.out' working on case #16
'adb shell /data/local/tmp/a.out' finished case #16 (ans = 834058048)
'adb shell /data/local/tmp/a.out' working on case #37
'adb shell /data/local/tmp/a.out' finished case #37 (ans = 56032428)
'adb shell /data/local/tmp/a.out' working on case #84
'adb shell /data/local/tmp/a.out' finished case #84 (ans = 58222271)
'adb shell /data/local/tmp/a.out' working on case #52
'adb shell /data/local/tmp/a.out' finished case #52 (ans = 50657009)
'adb shell /data/local/tmp/a.out' working on case #101
'adb shell /data/local/tmp/a.out' finished case #101 (ans = 42545057)
'adb shell /data/local/tmp/a.out' working on case #91
'adb shell /data/local/tmp/a.out' finished case #91 (ans = 56100830)
'adb shell /data/local/tmp/a.out' working on case #112
'adb shell /data/local/tmp/a.out' finished case #112 (ans = 59108176)
'adb shell /data/local/tmp/a.out' working on case #65
'adb shell /data/local/tmp/a.out' finished case #65 (ans = 27529065)
'adb shell /data/local/tmp/a.out' working on case #79
'adb shell /data/local/tmp/a.out' finished case #79 (ans = 41898492)
'adb shell /data/local/tmp/a.out' working on case #67
'adb shell /data/local/tmp/a.out' finished case #67 (ans = 12521059)
'adb shell /data/local/tmp/a.out' working on case #29
'adb shell /data/local/tmp/a.out' finished case #29 (ans = 64753838)
'adb shell /data/local/tmp/a.out' working on case #85
'adb shell /data/local/tmp/a.out' finished case #85 (ans = 7280734)
'adb shell /data/local/tmp/a.out' working on case #50
'adb shell /data/local/tmp/a.out' finished case #50 (ans = 52510928)
'adb shell /data/local/tmp/a.out' working on case #68
'adb shell /data/local/tmp/a.out' finished case #68 (ans = 36776368)
'adb shell /data/local/tmp/a.out' working on case #113
'adb shell /data/local/tmp/a.out' finished case #113 (ans = 31104515)
'adb shell /data/local/tmp/a.out' working on case #96
'adb shell /data/local/tmp/a.out' finished case #96 (ans = 54221446)
'adb shell /data/local/tmp/a.out' working on case #57
'adb shell /data/local/tmp/a.out' finished case #57 (ans = 24358475)
'adb shell /data/local/tmp/a.out' working on case #60
'adb shell /data/local/tmp/a.out' finished case #60 (ans = 19820558)
'adb shell /data/local/tmp/a.out' working on case #41
'adb shell /data/local/tmp/a.out' finished case #41 (ans = 29639004)
'adb shell /data/local/tmp/a.out' working on case #47
'adb shell /data/local/tmp/a.out' finished case #47 (ans = 44802909)
'adb shell /data/local/tmp/a.out' working on case #102
'adb shell /data/local/tmp/a.out' finished case #102 (ans = 15457955)
'adb shell /data/local/tmp/a.out' working on case #51
'adb shell /data/local/tmp/a.out' finished case #51 (ans = 17078071)
'adb shell /data/local/tmp/a.out' working on case #20
'adb shell /data/local/tmp/a.out' finished case #20 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #38
'adb shell /data/local/tmp/a.out' finished case #38 (ans = 17236422)
'adb shell /data/local/tmp/a.out' working on case #82
'adb shell /data/local/tmp/a.out' finished case #82 (ans = 20074945)
'adb shell /data/local/tmp/a.out' working on case #74
'adb shell /data/local/tmp/a.out' finished case #74 (ans = 39100184)
'adb shell /data/local/tmp/a.out' working on case #53
'adb shell /data/local/tmp/a.out' finished case #53 (ans = 33315064)
'adb shell /data/local/tmp/a.out' working on case #31
'adb shell /data/local/tmp/a.out' finished case #31 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #105
'adb shell /data/local/tmp/a.out' finished case #105 (ans = 24581824)
'adb shell /data/local/tmp/a.out' working on case #64
'adb shell /data/local/tmp/a.out' finished case #64 (ans = 6801215)
'adb shell /data/local/tmp/a.out' working on case #71
'adb shell /data/local/tmp/a.out' finished case #71 (ans = 25461827)
'adb shell /data/local/tmp/a.out' working on case #58
'adb shell /data/local/tmp/a.out' finished case #58 (ans = 12450431)
'adb shell /data/local/tmp/a.out' working on case #69
'adb shell /data/local/tmp/a.out' finished case #69 (ans = 18137497)
'adb shell /data/local/tmp/a.out' working on case #43
'adb shell /data/local/tmp/a.out' finished case #43 (ans = 14613562)
'adb shell /data/local/tmp/a.out' working on case #70
'adb shell /data/local/tmp/a.out' finished case #70 (ans = 16770582)
'adb shell /data/local/tmp/a.out' working on case #59
'adb shell /data/local/tmp/a.out' finished case #59 (ans = 10269141)
'adb shell /data/local/tmp/a.out' working on case #95
'adb shell /data/local/tmp/a.out' finished case #95 (ans = 18002252)
'adb shell /data/local/tmp/a.out' working on case #26
'adb shell /data/local/tmp/a.out' finished case #26 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #90
'adb shell /data/local/tmp/a.out' finished case #90 (ans = 2042265)
'adb shell /data/local/tmp/a.out' working on case #23
'adb shell /data/local/tmp/a.out' finished case #23 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #109
'adb shell /data/local/tmp/a.out' finished case #109 (ans = 3227509)
'adb shell /data/local/tmp/a.out' working on case #110
'adb shell /data/local/tmp/a.out' finished case #110 (ans = 17228709)
'adb shell /data/local/tmp/a.out' working on case #72
'adb shell /data/local/tmp/a.out' finished case #72 (ans = 10254945)
'adb shell /data/local/tmp/a.out' working on case #61
'adb shell /data/local/tmp/a.out' finished case #61 (ans = 3391455)
'adb shell /data/local/tmp/a.out' working on case #77
'adb shell /data/local/tmp/a.out' finished case #77 (ans = 6142681)
'adb shell /data/local/tmp/a.out' working on case #116
'adb shell /data/local/tmp/a.out' finished case #116 (ans = 9898960)
'adb shell /data/local/tmp/a.out' working on case #21
'adb shell /data/local/tmp/a.out' finished case #21 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #78
'adb shell /data/local/tmp/a.out' finished case #78 (ans = 10133048)
'adb shell /data/local/tmp/a.out' working on case #75
'adb shell /data/local/tmp/a.out' finished case #75 (ans = 8831706)
'adb shell /data/local/tmp/a.out' working on case #80
'adb shell /data/local/tmp/a.out' finished case #80 (ans = 11639385)
'adb shell /data/local/tmp/a.out' working on case #103
'adb shell /data/local/tmp/a.out' finished case #103 (ans = 10890701)
'adb shell /data/local/tmp/a.out' working on case #32
'adb shell /data/local/tmp/a.out' finished case #32 (ans = 5747095)
'adb shell /data/local/tmp/a.out' working on case #88
'adb shell /data/local/tmp/a.out' finished case #88 (ans = 6963374)
'adb shell /data/local/tmp/a.out' working on case #22
'adb shell /data/local/tmp/a.out' finished case #22 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #98
'adb shell /data/local/tmp/a.out' finished case #98 (ans = 8240529)
'adb shell /data/local/tmp/a.out' working on case #100
'adb shell /data/local/tmp/a.out' finished case #100 (ans = 8808845)
'adb shell /data/local/tmp/a.out' working on case #89
'adb shell /data/local/tmp/a.out' finished case #89 (ans = 1542116)
'adb shell /data/local/tmp/a.out' working on case #63
'adb shell /data/local/tmp/a.out' finished case #63 (ans = 4589254)
'adb shell /data/local/tmp/a.out' working on case #104
'adb shell /data/local/tmp/a.out' finished case #104 (ans = 2110139)
'adb shell /data/local/tmp/a.out' working on case #81
'adb shell /data/local/tmp/a.out' finished case #81 (ans = 4156428)
'adb shell /data/local/tmp/a.out' working on case #48
'adb shell /data/local/tmp/a.out' finished case #48 (ans = 960469)
'adb shell /data/local/tmp/a.out' working on case #24
'adb shell /data/local/tmp/a.out' finished case #24 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #62
'adb shell /data/local/tmp/a.out' finished case #62 (ans = 3086380)
'adb shell /data/local/tmp/a.out' working on case #42
'adb shell /data/local/tmp/a.out' finished case #42 (ans = 330330)
'adb shell /data/local/tmp/a.out' working on case #27
'adb shell /data/local/tmp/a.out' finished case #27 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #25
'adb shell /data/local/tmp/a.out' finished case #25 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #97
'adb shell /data/local/tmp/a.out' finished case #97 (ans = 2311248)
'adb shell /data/local/tmp/a.out' working on case #92
'adb shell /data/local/tmp/a.out' finished case #92 (ans = 3409902)
'adb shell /data/local/tmp/a.out' working on case #87
'adb shell /data/local/tmp/a.out' finished case #87 (ans = 3018037)
'adb shell /data/local/tmp/a.out' working on case #44
'adb shell /data/local/tmp/a.out' finished case #44 (ans = 437384)
'adb shell /data/local/tmp/a.out' working on case #106
'adb shell /data/local/tmp/a.out' finished case #106 (ans = 1645320)
'adb shell /data/local/tmp/a.out' working on case #34
'adb shell /data/local/tmp/a.out' finished case #34 (ans = 1615237)
'adb shell /data/local/tmp/a.out' working on case #111
'adb shell /data/local/tmp/a.out' finished case #111 (ans = 2286510)
'adb shell /data/local/tmp/a.out' working on case #56
'adb shell /data/local/tmp/a.out' finished case #56 (ans = 671796)
'adb shell /data/local/tmp/a.out' working on case #49
'adb shell /data/local/tmp/a.out' finished case #49 (ans = 2486902)
'adb shell /data/local/tmp/a.out' working on case #33
'adb shell /data/local/tmp/a.out' finished case #33 (ans = 1626785)
'adb shell /data/local/tmp/a.out' working on case #73
'adb shell /data/local/tmp/a.out' finished case #73 (ans = 944403)
'adb shell /data/local/tmp/a.out' working on case #46
'adb shell /data/local/tmp/a.out' finished case #46 (ans = 1871972)
'adb shell /data/local/tmp/a.out' working on case #76
'adb shell /data/local/tmp/a.out' finished case #76 (ans = 1036408)
'adb shell /data/local/tmp/a.out' working on case #54
'adb shell /data/local/tmp/a.out' finished case #54 (ans = 1223607)
'adb shell /data/local/tmp/a.out' working on case #55
'adb shell /data/local/tmp/a.out' finished case #55 (ans = 1109920)
'adb shell /data/local/tmp/a.out' working on case #30
'adb shell /data/local/tmp/a.out' finished case #30 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #114
'adb shell /data/local/tmp/a.out' finished case #114 (ans = 697327)
'adb shell /data/local/tmp/a.out' working on case #66
'adb shell /data/local/tmp/a.out' finished case #66 (ans = 907592)
'adb shell /data/local/tmp/a.out' working on case #94
'adb shell /data/local/tmp/a.out' finished case #94 (ans = 778303)
'adb shell /data/local/tmp/a.out' working on case #108
'adb shell /data/local/tmp/a.out' finished case #108 (ans = 316631)
'adb shell /data/local/tmp/a.out' working on case #35
'adb shell /data/local/tmp/a.out' finished case #35 (ans = 207097)
'adb shell /data/local/tmp/a.out' working on case #45
'adb shell /data/local/tmp/a.out' finished case #45 (ans = 729879)
'adb shell /data/local/tmp/a.out' working on case #40
'adb shell /data/local/tmp/a.out' finished case #40 (ans = 220988)
'adb shell /data/local/tmp/a.out' working on case #39
'adb shell /data/local/tmp/a.out' finished case #39 (ans = 154206)
'adb shell /data/local/tmp/a.out' working on case #86
'adb shell /data/local/tmp/a.out' finished case #86 (ans = 230182)
'adb shell /data/local/tmp/a.out' working on case #107
'adb shell /data/local/tmp/a.out' finished case #107 (ans = 132748)
'adb shell /data/local/tmp/a.out' working on case #99
'adb shell /data/local/tmp/a.out' finished case #99 (ans = 274610)
'adb shell /data/local/tmp/a.out' working on case #36
'adb shell /data/local/tmp/a.out' finished case #36 (ans = 107906)
'adb shell /data/local/tmp/a.out' working on case #83
'adb shell /data/local/tmp/a.out' finished case #83 (ans = 32833)
'adb shell /data/local/tmp/a.out' working on case #115
'adb shell /data/local/tmp/a.out' finished case #115 (ans = 16016)
'adb shell /data/local/tmp/a.out' working on case #93
'adb shell /data/local/tmp/a.out' finished case #93 (ans = 3280)
'adb shell /data/local/tmp/a.out' working on case #28
'adb shell /data/local/tmp/a.out' finished case #28 (ans = 1390)
'adb shell /data/local/tmp/a.out' working on case #5
'adb shell /data/local/tmp/a.out' finished case #5 (ans = 146)
'adb shell /data/local/tmp/a.out' working on case #10
'adb shell /data/local/tmp/a.out' finished case #10 (ans = 188)
'adb shell /data/local/tmp/a.out' working on case #4
'adb shell /data/local/tmp/a.out' finished case #4 (ans = 36)
'adb shell /data/local/tmp/a.out' working on case #9
'adb shell /data/local/tmp/a.out' finished case #9 (ans = 32)
'adb shell /data/local/tmp/a.out' working on case #11
'adb shell /data/local/tmp/a.out' finished case #11 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #12
'adb shell /data/local/tmp/a.out' finished case #12 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #13
'adb shell /data/local/tmp/a.out' finished case #13 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #14
'adb shell /data/local/tmp/a.out' finished case #14 (ans = 165)
'adb shell /data/local/tmp/a.out' working on case #15
'adb shell /data/local/tmp/a.out' finished case #15 (ans = 165)
'adb shell /data/local/tmp/a.out' working on case #3
'adb shell /data/local/tmp/a.out' finished case #3 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #8
'adb shell /data/local/tmp/a.out' finished case #8 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #2
'adb shell /data/local/tmp/a.out' finished case #2 (ans = 1)
'adb shell /data/local/tmp/a.out' working on case #7
'adb shell /data/local/tmp/a.out' finished case #7 (ans = 2)
'adb shell /data/local/tmp/a.out' working on case #1
'adb shell /data/local/tmp/a.out' finished case #1 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #6
'adb shell /data/local/tmp/a.out' finished case #6 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #17
'./a.out' finished case #19 (ans = 553484346)
'./a.out' working on case #17
'./a.out' finished case #17 (ans = 735866676)

== Workers contribution: ==
  './a.out' : 66.68% (2 cases)
  'adb shell /data/local/tmp/a.out' : 33.32% (114 cases)
  'ssh ssvb@t61 /tmp/a.out' : 0.00% (0 cases)

real	4m45.626s
user	38m2.469s
sys	0m0.111s

Now the time is down to 4m45.626s and this is within the 6 minutes limit. Assigning jobs to different devices could be surely improved. Right now I'm just sorting them by the lenghts of strings and processing more difficult cases first. The old laptop was totally useless and couldn't handle even a single big testcase #17. The other devices came to the rescue in the end and finished it faster.

TL;DR;

Facebook Hacker Cup has an unusual format. The contestants have 6 minutes to run their solutions on their own computers. This provides a lot of opportunities to speed up even poor time complexity solutions by taking advantage of multiple CPU cores and multiple computers. OpenMP is easy to use.

To be continued

This is not the end, but the blog is already becoming too large and it's necessary to wrap it up. The next part will feature Clang vs. GCC, different compiler optimization options and taking advantage of SSE2/AVX2.

Spoiler

Comments (3)

Write comment?

nor

3 years ago, # |

+16

Really nice blog, I remember people talking about doing this but not having a concrete example. Glad to see someone actually went ahead and did it.

Just a side-note: the use of a separate script sounds nonstandard (but it works around some issues with heterogeneous systems, so it's really nice). As far as I know, there's a way to use MPI (even as a hybrid with OpenMP) over a network (even though people claim heterogeneous support is broken, from what I've heard, it tends to work for Android on ARM and Linux on x86-64), which could allow for even better distribution of tasks across nodes (perhaps in terms of load balancing), though this doesn't sound as necessary for this problem.

→ Reply

wjomlex

+23

See? This is what we mean when we talk about problem solving. Don't let anybody tell you what you can't do.

Errichto

Nice. People run solutions on multiple machines in GHC but I've never heard about using a tablet. It seems that it would be enough for you to just split test cases in half between PC and tablet, right? No need for any smart assignment.

I used a less complicated set-up several times already. Split the test cases into 8 groups and run in parallel, no pragmas. It even got me to FHC finals once — I squeezed a slow $$$O(N \cdot \log^2 N)$$$ in 5 minutes.

Sadly, all of this rewards having a strong computer or using an online cluster. I would love to see this format in onsite competitions though, where all contestants have the same computing power.

ssvb's blog

Problem A1

Problem A2, solution v0

Problem A2, solution v1 (OpenMP pragma)

Running it all on a homemade "cluster"

TL;DR;

To be continued