For some strange reason, my code to compress a directed graph into a DAG and run some process on it gets TLE on this problem 894E - Ральф и грибы.

This is strange, because N = M = 1e6 at maximum. In addition, I used to have a log factor which I removed (using math instead), but it still gets TLE on test case 14.

My solution to the problem (41978175) is pretty much same idea as editorial, up until the last point. Instead of running toposort/dp on the DAG, I just traverse down and take the maximum path, but complexity of the function should still be O(N+M).

I am baffled by how this would TLE, can anybody give me some advice on constant optimizations or how the complexity could possibly be worse than imagined?

Umm, I think very large input is the problem. There is at max 3*10^6 input integers, you're using cin. Taking the input alone might be taking more than 2 seconds. You could try again using scanf.

I tried it, 41985820, still hitting the TLE on 14. I think it should be fine for cin/cout because I already unsynchronized it with ios::sync_with_stdio(0); cin.tie(0);

Probably, the problem is with the sqrt function. I think it is not always

O(1) orO(logN).I don't think so

https://ideone.com/yE9Arx

Yep, you are right. My bad.

You can use gprof to check which function uses much time.

I didn't read much code. But my submissions tells that problem in

dfs3. My first version was that you graph afterdfs2 is not DAG graph. But it is DAG.I think your comlexity isn't

O(N+M) since indfs3 each vertex can be visited much more than two times. 41988918 tells about this more clearly.To constant optimizations:

1)

vector.push_back({first,second}). It makes pair then allocate space in tail of vector then copy built pair. I recommend here usevector.emplace_back(first,second) for that version build pair direct in vector.2) You pushing unknown times to vector. During this your vector can be reallocated multiple times. This can be very slow. My recommendation here is use before pushing

vector.reverse(MAX) and after pushing dovector.shring_to_fit(). It in several compillers (!!!!!!!!) reallocates space to fit current size. And there will be just two reallocations.But not this is your problem here. Maybe in my submissions you find some ways to debug program without knowledge of input data.

Thank you, now looking at it dfs3 can run exponential in time. Realization is that if we don't memoize states code can revisit states, I think I was thinking as dag to look too much like a tree in my head.