Shisuko's blog

By Shisuko, history, 3 years ago, In English

For almost a decade now, I've been told to avoid using += on strings in Java and Python. Because they are immutable, the += creates an entirely new string and has to copy over all the characters from the left-hand side. Thus, putting a += in a loop could have your program's complexity degrade into quadratic in some cases if you're not careful.

But then just last week, I learned that CPython actually hands string concatenation smartly! On the Custom Test tab here on Codeforces, the following code runs in just 280ms when submitted to Python 3.8.10.

s = ''
for _ in range(10**6):
    s += 'a'
print(len(s))

(In contrast, it badly TLEs when submitting to PyPy 3.7, so I suppose that's one thing Python has over PyPy, but it's not like join was too hard to do, though, anyway).

Was this a recent change? How long have I been living a lie?

  • Vote: I like it
  • +49
  • Vote: I do not like it

»
3 years ago, # |
  Vote: I like it +12 Vote: I do not like it

https://docs.python.org/2.4/lib/typesseq.html

If s and t are both strings, some Python implementations such as CPython can usually perform an in-place optimization for assignments of the form s=s+t or s+=t. When applicable, this optimization makes quadratic run-time much less likely. This optimization is both version and implementation dependent. For performance sensitive code, it is preferable to use the str.join() method which assures consistent linear concatenation performance across versions and implementations. Changed in version 2.4: Formerly, string concatenation never occurred in-place.

»
3 years ago, # |
  Vote: I like it +25 Vote: I do not like it

i get that this is irrelevant to the blog, but did you know that you can cin character arrays. i have also been living under a lie my whole life.

  • »
    »
    3 years ago, # ^ |
    Rev. 2   Vote: I like it +2 Vote: I do not like it

    you can also read from the k-th element of the char array

    for example:

    char s[maxn];
    std::cin >> (s + 1);
    

    will read from s[1] (which kinda makes sense since ig operator<< and operator>> are overloaded for char*)

    also, you can do the same with cout as well (printing from the k-th character of a char array)

»
3 years ago, # |
  Vote: I like it +71 Vote: I do not like it

You still need to assume it's quadratic, otherwise you're just gonna be badly surprised one day. Python and assessing critical performance is a match made in hell.

»
3 years ago, # |
  Vote: I like it +32 Vote: I do not like it

PyPy talks about it in its documentation.

On the subject of how brittle this feature can be in CPython. The following code will do a slow string concatination

A = ['abc']
A[0] += 'd'

The reason why CPython can't optimize it is because += for A[0] is equvivalent to

A = ['abc']
x = A[0]
x += 'd'
A[0] = x

Note that both x and A[0] contains references to the string here, so the concatination will be slow. It is fast iff there is only a single reference to the string. So in order to make it into a fast string concatination, you need to do something like

A = ['abc']
x = A[0]
A[0] = None
x += 'd'
A[0] = x
»
3 years ago, # |
  Vote: I like it 0 Vote: I do not like it