Dart-Xeyter's blog

By Dart-Xeyter, history, 3 years ago, In English

I have a few friends who write rounds in Python, and I noticed that they don't use some very simple optimizations, and the program ends up getting TL. Meanwhile, if you use these constructions, in half or more cases TL is removed.

I will show everything using the example code of my friend I_am_Drew from a 1413B - Новая техника. This code received TL (worked longer than $$$1$$$ second).

for __ in range(int(input())):
    n, m = list(map(int, input().split()))
    kek = []
    for i in range(n):
        el = list(map(int, input().split()))
        el.append(0)
        kek.append(el)
    stolb = list(map(int, input().split()))
    ind = 0
    for i in range(m):
        if kek[0][i] in stolb:
            ind = i
            break
 
    for i in range(n):
        kek[i][m] = stolb.index(kek[i][ind])
 
    for j in range(m-1):
        stolb = list(map(int, input().split()))
 
    kek.sort(key=lambda x: x[m])
    for elem in kek:
        elem.pop()
        print(*elem)

First, as you know, data input and output takes quite a long time. Fortunately, this can be fixed using the sys module. I usually write this way because it's the quickest fix, but of course it's not exactly code-style :)

from sys import stdin, stdout
input, print = stdin.readline, stdout.write

The stdin.readline function reads a string like input, but faster. Also, if necessary, there is, for example, the stdin.read function, which reads all input as a string (then you need to put ^D after it is completed), and others, but I usually do not use them. The output is more complicated, the stdout.write function accepts only strings, and does not output a line feed or other separator after it. Therefore, you have to write as in the example below, it is also not very long to fix it, the main thing is not to forget :) After the conversions, you get this code. (Note that the input code has not changed at all, but the output at the end is quite a lot).

from sys import stdin, stdout
input, print = stdin.readline, stdout.write
for __ in range(int(input())):
    n, m = list(map(int, input().split()))
    kek = []
    for i in range(n):
        el = list(map(int, input().split()))
        el.append(0)
        kek.append(el)
    stolb = list(map(int, input().split()))
    ind = 0
    for i in range(m):
        if kek[0][i] in stolb:
            ind = i
            break

    for i in range(n):
        kek[i][m] = stolb.index(kek[i][ind])

    for j in range(m - 1):
        stolb = list(map(int, input().split()))

    kek.sort(key=lambda x: x[m])
    for elem in kek:
        elem.pop()
        # print(' '.join(map(str, elem)))
        for q in elem:
            print(str(q)+' ')
        print('\n')

It is also known that global variables work longer than local ones, so if you put all the code (of course, without other functions) in, for example, main, it will also work faster. The final version of the code looks like this:

from sys import stdin, stdout
input, print = stdin.readline, stdout.write


def main():
    for __ in range(int(input())):
        n, m = list(map(int, input().split()))
        kek = [list(map(int, input().split()))+[0] for _ in range(n)]
        stolb = list(map(int, input().split()))
        ind = 0
        for i in range(m):
            if kek[0][i] in stolb:
                ind = i
                break

        for i in range(n):
            kek[i][m] = stolb.index(kek[i][ind])

        for j in range(m - 1):
            stolb = list(map(int, input().split()))

        kek.sort(key=lambda x: x[m])
        for elem in kek:
            print(' '.join(map(str, elem[:-1])))
            print('\n')


main()

Note that there were very few changes, but the program accelerated at least $$$2$$$ times and now gets OK, working in $$$545$$$ milliseconds. Of course, you can come up with a lot of optimizations, but these are the main ones, and they work on most tasks and are easy to write. You should understand that, of course, this is not a panacea, and if, for example, in the task input or output $$$1$$$ number, optimization of fast input-output becomes useless. However, it comes in handy in many tasks.

Also, keep in mind that although PyPy3 is usually much faster than Python3 (for example, in this task it is $$$1.5$$$ times faster), there are situations when Python3 is faster, and I know problems where Python3 solutions get OK, but Pypy3 didn't. This does not mean that every TL needs to be forwarded to Python3, collecting a fine, just keep this in mind. In my experience, Python3 is often faster in problems on string, but of course it's different every time.

I hope this blog will help you use Python even more successfully :)

  • Vote: I like it
  • +56
  • Vote: I do not like it

»
3 years ago, # |
  Vote: I like it -24 Vote: I do not like it

Hey, recently I learned Mo's algorithm and now I'm applying it in following questions this and this but I'm getting TLE, can you please take a look at my code and tell me where to optimise my code. Thanks, waiting for your response.

  • »
    »
    3 years ago, # ^ |
      Vote: I like it +30 Vote: I do not like it

    First off, the order in which you sort the queries is not Mo's ordering, are you sure you fully understood Mo's algorithm?

    But in general, I'm not sure you can without significant effort. In the ABC problem we have $$$n \leqslant 5 \cdot 10^5$$$, an $$$O(n^{1.5})$$$ algorithm is somewhat questionable even in C++ (official solution is linearithmic). The SPOJ problem has more lenient constraints. But despite various optimizations, Python will always be many times slower than C++ and solutions with sqrt-complexity tend to take time close to the limit.

    • »
      »
      »
      3 years ago, # ^ |
      Rev. 2   Vote: I like it -10 Vote: I do not like it

      I did a mistake in the code, I sort the query on the basis of L and in case of tie sort on the basis of R but we need to sort on the basis of L // (block size) and in case of tie sort on the basis of R. This is my updated code