kfx's blog

By kfx, history, 5 years ago, In English

Python is a great programming language: simple, expressive, compact.

In Codeforces, Python is often the best choice for Div 2 A and B problems. For example, problem 600A - Extract Numbers is very easy to write in Python: first tokenize the string with the built-in split() function, then try to parse the integers with the built-in int(), then output the comma-separated strings of results with ",".join(lst).

For the more complex problems, writing fast-enough Python code is often a challenge. Here is a list of tips to improve performance.

  • Use PyPy instead of the standard Python interpreter. According to 20 offical benchmarks it is on average 7 times faster than Python 2. PyPy2 is in my opinion the best option at the moment for competitive programming.

  • Append to an existing string with "+=" instead of concatenating more than two strings with "+" and storing the result with "=". (With two strings, both ways work fine. s1 = s1 + s2 is fast because Python interpreter optimizes that to s1 += s2.)

  • string.join(iterable) is as fast as it gets.

  • list comprehension is faster than for loops.

  • prefer xrange() instead of range(), as the former returns an iterator, while the latter a new list.

  • avoid zip() in Python 2: it constructs a new list. Instead use a for loop . In Python 3, zip()` is fine, because there it returns an iterator.

  • it's ok to use Python list as an array; it has O(1) element access time. It is NOT ok to use it as queue, even if the list is sorted. Use collections.deque instead if you need fast extraction of the max element.

  • finally, realize that acceptable solutions to many Codeforces problems simply cannot be coded in Python, no matter how hard you try.

Experimental results

For the following tests I used a Lenovo laptop with Intel(R) Core(TM) i5-3427U CPU @ 1.80GHz, running Ubuntu 14.04.3 LTS. Software versions were: g++ 4.8.4, Python 2.7.6 and 3.4.3, PyPy 2.2.1 (Python 2.7.3) and 2.4.0 (Python 3.2.5). Each test was run 10 times, and the average results are reported.

I/O: Python vs. printf/scanf vs. cin/cout

The problem: read a file with integers from 0 to 500000, store them in an array, then print out the array.

Implementations: scanf/printf, cin/cout, "fast" cin/cout, Python 2, Python 3.

The results:

 scanf/printf:    0.1056 sec
 cin/cout:        0.2274 sec
 "fast" cin/cout: 0.0943 sec

 Python 2:        0.3535 sec
 PyPy 2:          0.1645 sec

 Python 3:        0.2535 sec
 PyPy 3:          0.3324 sec

Verdict: C++ is faster, but PyPy2 should be fast enough for most problems.

Iterator construction vs. list construction

The problem: sum all integers from 0 to 1e6 by using a for loop.

The results:

Loop with range:

 python2: 0.502402067184 sec
 pypy2:   0.0170 sec

Loop with xrange:

 python2: 0.334251880646 sec
 pypy2:   0.0166962146759 sec

Loop with range in Python3 that is in fact the same as xrange in Python2:

 python3: 0.6606624750002084 sec
 pypy3:   0.0168 sec

Verdict: the differences between PyPy and standard Python are actually much much more important than range vs. xrange!

List construction: iterative growth vs. pre-reserving all memory at start.

The problem: Create a list with 1e6 elements.

The code when using iterative growth:

l = []
for x in range(1000000):
  l.append(x)

The code when pre-reserving memory:

l = [0] * 1000000
for x in range(1000000):
  l[x] = x

The results: With append:

 python2: 0.698101997375 sec
 pypy2:   0.483808994293 sec
 python3: 0.844806871000 sec
 pypy3:   0.510159969329 sec

With []:

 python2: 0.618577957153 sec
 pypy2:   0.0570020675659 sec
 python3: 0.6261008259998 sec
 pypy3:   0.0605409145355 sec

Verdict: PyPy is able to take advantage of the faster algorithm (with one-time growth), while standard Python is not able to. The PyPy speedup is an order-of-magnitude (~10 times).

Result summary: PyPy2 beats the others hands down in these tests. PyPy3 looks a bit unready at the moment, sometimes slower than the standard Python.

 
 
 
 
  • Vote: I like it
  • +124
  • Vote: I do not like it

»
5 years ago, # |
  Vote: I like it 0 Vote: I do not like it

Auto comment: topic has been updated by kfx (previous revision, new revision, compare).

»
5 years ago, # |
  Vote: I like it +8 Vote: I do not like it

Few notes.

  1. string.* became obsolete many years ago, all those functions are now available as methods of a class str.

  2. It's ok to use Python list as an array; it has O(1) element access time. It is NOT ok to use it as queue, even if the list is sorted. Use collections.deque instead if you need fast extraction of the max element.

"Queue" is a wrong word to describe what you are talking about. None of list and collections.deque are available to extract the max element efficiently, it's the job of priority queue.

3 . Very efficient trick is to wrap the code of your solution into a function:


def main(): # write your solution here main()

It can speedup your solution 1.5-3 times (from my previous experience), as far as I understand, this reduces the size of the namespace interpreter has to keep during the runtime.

4 . Don't use lists of lists. It's much faster to use a 1d-list and to transform pair of indices to a single index in this 1d list rather than using 2d-lists = lists of lists.

  • »
    »
    6 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Anyone explain 4th point plz.

    • »
      »
      »
      4 months ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      Suppose you want to use a 2d array in python. For storing a 2d array, we use a list of lists. example:- suppose you want to store following 2d array/matrix:- 1 2 3 4 5 6 Basically, it is a list of which contains two lists where inner lists are [1,2,3] and [4,5,6]. That is your 2d array will be stored as:- [ [1,2,3] , [4,5,6] ] ^ this one is outer list

      Let A=[ [1,2,3] , [4,5,6] ] ^ and ^ are inner lists

      here A[0][0]=1, A[0][1]=1, A[0][2]=3, A[1][0]=4 and so on... Coming to Zlobober's suggestion. What he is saying is that we should not use such list of lists to represent a 2d array, because a list of lists is very slow. Instead, he suggests us to use a single list to store a 2d matrix.

      for example, the matrix would be stored as [1,2,3,4,5,6]

      Now you would reference elements in this list as follows- A[0]=1,A[1]=2,A[2]=3,A[3]=4 and so on...

      but since you want to use a 2d array representation to store our matrix. you can create a function to map matrix indexes as follows:-

      def Index(i,j): return i*cols+j #cols =3 for our case

      now you would reference elements as follows:- A[Index(0,0)]=1 A[Index(0,1)]=2 A[Index(0,2)]=3 A[Index(1,0)]=4 A[Index(1,1)]=5 A[Index(1,2)]=6

»
5 years ago, # |
Rev. 2   Vote: I like it +6 Vote: I do not like it

If there are a lot of lines in input it will be faster to use sys.stdin.read()

»
6 months ago, # |
  Vote: I like it 0 Vote: I do not like it

I wrote a program where several lookups had to be made to know whether an element is present in the list or not, but this implementation gave me TLE, when I used dictionary for the same it gave me AC, didn't make any other changes other than just using dictionary instead of list, but lookup time in a list O(1), so what did made the code fasten up.

https://www.codechef.com/viewsolution/31188754

https://www.codechef.com/viewsolution/31192537

  • »
    »
    6 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    lookup time in a list O(1)

    No, accessing an element in a list is O(1), not searching !

    Your aim is different.

    You want to check whether an element is present in the list or not. When you use a list, you traverse whole list checking individual elements, whether that is equal to 'search_key'. Thus its O(N) not O(1).

    With dictionary, its O(1).

    In nutshell... Search in List- O(N) Search in Dict- O(1) !

    Hence dict fastened your code !

    • »
      »
      »
      6 months ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      My bad, I misinterpreted what he was trying to say, thanks.