PyPy2 vs PyPy3 - Codeforces

Please subscribe to the official Codeforces channel in Telegram via the link https://t.me/codeforces_official. ×

→ Pay attention

Before contest
Codeforces Round (Div. 2)
38:13:51
Register now »

*has extra registration

→ Streams

Codeforces Round 1995 (Div 2) - Official Solution Discussion

By Shayan

Before stream 40:18:49

Codeforces Round 960 Solution Discussion

By aryanc403

Before stream 40:23:49

View all →

→ Top rated

#	User	Rating
1	tourist	3880
2	jiangly	3669
3	ecnerwala	3654
4	Benq	3627
5	orzdevinwang	3612
6	Geothermal	3569
6	cnnfls_csy	3569
8	jqdai0815	3532
9	Radewoosh	3522
10	gyh20	3447

Countries | Cities | Organizations

View all →

→ Top contributors

#	User	Contrib.
1	awoo	161
2	maomao90	160
3	adamant	157
4	maroonrk	154
5	-is-this-fft-	148
5	atcoder_official	148
5	SecondThread	148
8	Petr	147
9	TheScrasse	145
9	nor	145

View all →

→ Find user

→ Recent actions

Detailed →

notTehlka's blog

PyPy2 vs PyPy3

By notTehlka, 3 years ago, In English

In yesterday's contest's problem D2, my PyPy3 code runs in 4150ms and the same PyPy2 code runs in 2700ms. Can someone please explain why? And in what cases, PyPy2 is faster than PyPy3?

pajenegod c1729 Kiri8128 conqueror_of_tourist can help maybe.

#python, #doubt

notTehlka
3 years ago
15

Comments (14)

Show archived | Write comment?

bjy

3 years ago, # |

-18

I mean, 'same' only applies if you treat the io templates as opaque black boxes (like I do by still not using them...?). Most py 3<2 performance issues can be attributed to 3 being unicode aware, whether it's the additional decode/encode calls or default int/str conversions.

A more direct work-on-the-bytes-like-c approach could maybe eek out some extra mileage, but my level of interest/desperation fell short of that (also during the contest I had a kid gnawing on my head telling me about her friend 'Buggy' who was an actual insect she had brought into the house -- so I commented out the input replacement as if it were a sync issue (ie cargo cult prayer) and it stupidly passed pretests, womp womp)

→ Reply

pajenegod

3 years ago, # ^ |

I do not think the reason between the performance this time around is because of unicode. Here is a submission that almost entirely skips unicode in PyPy3 121736481, working only with bytestrings, and it runs slow.

→ Reply

pajenegod

3 years ago, # |

← Rev. 2 →

+11

I've tested around a bit, and I found that only using os.write and os.read makes PyPy2 and PyPy3 run equally fast (121737546 and 121737672). So I think the culprit might be that PyPy3 has a slower implementation of BytesIO than PyPy2. But at the same time, it doesn't really make sense that the speed of BytesIO matters here (since the total amount of input is kind of tiny). My best guess is that something is badly optimized in PyPy3's implementation of BytesIO.

→ Reply

at_f

3 years ago, # ^ |

I ran into this problem before for 1535E - Gold Transfer (which is also interactive).

Using the pyrival template TLE'd on pypy3 but passes on pypy2. Switching to os.write made it barely pass but there was still a large gap between pypy2 and pypy3 (pypy2 was 3s, pypy3 was 4.3s, and time limit was 4.5s). Using your suggestion of replacing os.read too finally made the gap disappear: 118470981 121755890 (note: using a implementation of input that I stole from you from another problem)

I wonder what the actual cause of the bug is and whether the pypy devs can fix it. It kind of sucks that pypy3 needs extra work to achieve the same performance as pypy2 for interactives (but at least there's a known workaround for it now, thanks!).

→ Reply

pajenegod

3 years ago, # ^ |

In general PyPy should just have an overhaul of everything IO related. One of the big drawbacks of PyPy compared to CPython is that the built in IO is really not good. It is stupid that my fastIO code even speeds things up in the first place.

Here is an example of how PyPy really has unoptimized IO. Haven't you wondered why using input is much slower than readline() in PyPy (even though input should just be a wrapper to readline)? Well this kind of explains it:

Code example

import sys,os
class dummy:
    def flush(self):
        os.write(1, b'Flushing\n')
    def write(self, S):
        os.write(1, b'Writing: ' + str(S.encode('ascii')).encode('ascii') + b'\n')

sys.stdout = dummy()
os.write(1, b'Calling input\n')
input()
os.write(1, b'Done calling input\n')

Output CPython3

Calling input
Flushing
Done calling input
Flushing
Flushing

Output PyPy3

Calling input
Writing: b''
Writing: b''
Flushing
Done calling input
Flushing

→ Reply

at_f

3 years ago, # ^ |

In your example is the issue the extra empty writes? I wonder if they are actually expensive. I remember looking at the system calls generated by pypy vs cpython and didn't see a significant difference. Command I used:

strace -Tfe trace=open,close,read,write,fsync,file python io-test.py < io-test.in

→ Reply

notTehlka

3 years ago, # ^ |

To sum it all up, we should use PyRival template for normal problems and this IO for interactive. Am I right?

→ Reply

at_f

3 years ago, # ^ |

Yea and it seems like you now have the fastest time on 1543D2 with it. That IO should probably be cleaned up and wrapped a little nicer before people copy and paste it all over the place though. (like rename input() to something like getint() since it's semantically very different from input)

Personally I don't like long IO templates so I will stick to using sys.stdin.buffer.readline and os.write unless it seems like TL is going to be a problem.

→ Reply

FrozenKandy

3 years ago, # |

Hi, this may be out of topic but I am looking for a little help in python and this seems just the right place.

This problem Parsa's humongous tree, is throwing tle, and there is no accepted solution in python but in pypy3. Moreover, all accepted solutions in pypy uses stacks and not recursion (Is recursion slower?). My submission: Using recursion, Using stack.

Please correct me if I am doing something wrong, or is this because of the tight time limit? I would be really grateful for the help.

→ Reply

Chef_17

3 years ago, # ^ |

I had the same issue during the contest- pypy3 was giving WA (stack limit) , Python 3 gave me TLE/WA- Testcase 4 (tried threading too link) and then finally I implemented the same logic in c++ and got AC (link 10 mins after the contest lol :) ).

Can someone help in writing a bootstrap version of DFS ( bypass stack limit ref) — cuz I didn't had any return value in dfs function call , so what was supposed to be yield ?.

→ Reply

pajenegod

3 years ago, # ^ |

I've explained about no return value before here.

I tried adding bootstrap to your code 121758780. Unfortunately it gets TLE because the implementation is pretty slow, and also because Cf runs 32 bit PyPy (causing your implementation to use big integers everywhere). Hopefully Cf will be switching to 64 bit in the near future (link).

→ Reply

1zaid1

3 years ago, # |

-19

python is gay

→ Reply

MrDeboo

3 years ago, # ^ |

-31

you're gay

→ Reply

1zaid1

3 years ago, # ^ |

-24

am I python? then yes im very gay

→ Reply