Multi-threaded CodeJam-style templates

→ Pay attention

Before contest
Codeforces Round 940 (Div. 2) and CodeCraft-23
3 days
Register now »

*has extra registration

→ Top rated

#	User	Rating
1	ecnerwala	3648
2	Benq	3580
3	orzdevinwang	3570
4	cnnfls_csy	3569
5	Geothermal	3568
6	tourist	3565
7	maroonrk	3530
8	Radewoosh	3520
9	Um_nik	3481
10	jiangly	3467

Countries | Cities | Organizations

View all →

→ Top contributors

#	User	Contrib.
1	maomao90	174
2	adamant	164
2	awoo	164
4	TheScrasse	160
5	nor	159
6	maroonrk	156
7	-is-this-fft-	150
8	SecondThread	147
9	orz	146
10	pajenegod	145

View all →

→ Find user

→ Recent actions

Detailed →

mukel's blog

Multi-threaded CodeJam-style templates

By mukel, 10 years ago, In English

UPD: Added Scala futures approach.

Google CodeJam 2014 is coming, and it will be nice if someone could share a way to run multiple tests concurrently, all programming languages are welcome. I've seen very handy codes in D, but I still use mainly C++, Scala has some nice ways to do it also. Running programs concurrently can be useful in CF also, eg. offline output generation. I'll try to populate this thread with some examples. Right now is up to you.

codejam, concurrent, code, gcj

mukel
10 years ago
8

Comments (8)

Write comment?

andreyv

10 years ago, # |

← Rev. 3 →

+59

C++11 makes this easy. If you have a function f(), you can run it asynchronously with async(f), and get the result from the object returned by std::async:

auto t = async(f);

int res = t.get();

Now suppose f(int a, int b) calculates a sum of numbers within an interval and takes two parameters. You can make two lambdas that will cover the first half and the second half, and pass them to std::async():

auto t1 = async([](){ return f(0, 50); });
auto t2 = async([](){ return f(51, 100); });

int sum = t1.get() + t2.get();

Full example: http://pastie.org/8954684

Note that std::async() decides by itself how many threads to start. If you want to force a new thread, use std::async(std::launch::async, f).

EDIT: It looks like you can write even shorter code:

auto t1 = async(f, 0, 50);
auto t2 = async(f, 51, 100);

int sum = t1.get() + t2.get();

→ Reply

niklasb

10 years ago, # ^ |

I think in most common C++ implementation async is not pooled, so it's probably counterproductive to use it for functions that run only shortly or to use it to run more tasks in parallel than you have hardware threads.

→ Reply

andreyv

10 years ago, # ^ |

Yes, std::async() starts a new thread per each function if it decides to evaluate it asynchronously.

I think std::async() should be able to limit the number of threads to the number of CPUs, because when running the function, it can also decide to run it later or use lazy evaluation in the same thread.

→ Reply

Xellos

10 years ago, # ^ |

This is good. It doesn't require templates, which makes it applicable in a contest where you don't have access to pre-written code.

As long as it supports C++11...

→ Reply

CountZero

10 years ago, # |

you can easily parallelize your C++ code by using OpenMP pragmas. Example:

int a1 = 0, a2 = 1;

#pragma omp parallel for reduction(+: a1) reduction(*: a2)
for (int i = 0; i < 10; ++i) {
	a1 += a[i];
	a2 *= a[i];
}

//g++ -fopenmp openmp.cpp

→ Reply

limed

10 years ago, # |

← Rev. 3 →

Scala:

object A extends App {

  def tokenizeLine = new java.util.StringTokenizer(readLine)
  def readInts(n: Int) = { val t = tokenizeLine; Array.fill(n)(t.nextToken.toInt) }
  
  type TInput = (Int, Int, Int)
  type TOutput = (Int, Long)
  
  val solve: PartialFunction[TInput, TOutput] = { case (testNo, n, m) => 

    var sum = 0L
    // do something stupid but heavy:
    for (i <- 0 until n) sum += m

    (testNo, sum)
  }

  val testCount = readInt
  val inputs = scala.collection.mutable.ArrayBuffer[TInput]()
  
  for (testNo <- 1 to testCount) {
    val Array(n, m) = readInts(2)
    inputs += ((testNo, n, m))
  }

  //val outputs = inputs.par.collect { solve } seq // multi-threaded
  val outputs = inputs.collect { solve } // single-threaded
    
  for ((testNo, sum) <- outputs) println(s"Case #SStestNo: SSsum")
}

P.S. Forum parser goes crazy about $ sign in strings so I had to replace it with SS. Also, I have removed debugging and profiling from my code to make it shorter. Full code is in my original message.

Sample input:

8
1000000000 8
100000000 7
10000000 6
100 5
1000000000 4
100000000 3
1000000000 2
1 1

Multi-threaded output:

(...edited...)
Finished after 1332 ms

Single-threaded output:

(...)
Finished after 2367 ms

.par is the only switch into a parallel execution so it is easy to run sequentially for debugging.

I see you are from EPFL (the birth place of Scala), so I hope you will come up with some improvements to this code:) But in general, I don't think it is worth bothering with multiple threads in Code Jam unless you have 8 or more cores.

→ Reply

mukel

10 years ago, # ^ |

← Rev. 3 →

UPD: Sad story about when a multi-threaded solution is a good idea... In fact, I didn't qualified in the FB Hackercup once because my Scala program had a silly bug, which I fixed almost instantly; and then my program finished flawlesly 20 seconds after the deadline, too late. Since then I promise myself not to use Scala for contests until I becomes really comfortable with it, and if I use it this year for the GCJ or the HackerCup, my code will run in multiple threads for sure.

The .par is really handy for executing things in parallel with a almost no changes on the code. Another approach is using futures, check the following example :


import scala.concurrent.duration._
import scala.concurrent._
import ExecutionContext.Implicits.global

object Main {

    val in = new java.util.Scanner(System.in)
    
    def nextInt = in.nextInt
    
    def time(block: => Unit) = {
        val ticks = System.currentTimeMillis
        block
        System.currentTimeMillis - ticks
    }
    
    def main(args: Array[String]) = {
      
        val t = time {
       
            val taskCount = nextInt
            val tasks = for (_ <- 1 to taskCount) yield {
                
                // read the input sequentially
                val begin, end  = nextInt
                
                // yields a Future[Long], magic begins
                future {
                    // slow computation
                    var sum = 0L    
                    for (i <- begin to end)
                        sum += i
                    sum
                }
            }
        
            // transform a Seq[Future[Long]] in Future[Seq[Long]]
            val all = Future.sequence(tasks)
            
            // wait 10s before raising a timeout exception
            val result = Await.result(all, 10 seconds)
        
            // be happy
            println(result mkString " ")
        }
        
        println("Total time: " + t + " ms") 
   }
}

Even if I have a quadcore laptop (i7 with 4 physical cores and 8 virtual threads), the above code doesn't scale by a factor of 4 as expected, but only 2. Maybe I should try with a tweaked ExecutionContext instead of the default one. Any hints???

UPD: I tested the code in a different machine and is scaling fine.

→ Reply

limed

10 years ago, # ^ |

Well, maybe your loops are too short? There can be issues with hotspot compiler kicking in after some time, and besides you are including IO in your timing, which is single-threaded. IMO it should be more scalable. My code with .par scales by 1.77 on dual core, and Futures should have a similar benefit.

As a side note, you should avoid Scanner and use StringTokenizer for input. As far as I remember, for some problems here on CF only StringTokenizer was able to read large inputs within TL.

→ Reply