CSES Coin Combinations II

#	User	Rating
1	tourist	3880
2	jiangly	3669
3	ecnerwala	3654
4	Benq	3627
5	orzdevinwang	3612
6	Geothermal	3569
6	cnnfls_csy	3569
8	jqdai0815	3532
9	Radewoosh	3522
10	gyh20	3447

#	User	Contrib.
1	awoo	161
1	maomao90	161
3	adamant	156
4	maroonrk	153
5	-is-this-fft-	148
5	atcoder_official	148
5	SecondThread	148
8	Petr	147
9	nor	144
10	TheScrasse	142

Can someone tell me what is the difference in logic between the two solutions?

The first solution is accepted by the CSES

Code 1

#include <bits/stdc++.h> 
using namespace std;
 
#define MOD1 1000000007

#define I int

#define V(x) vector<x>
 
#define fr(i, s, e) for(int i = s; i < e; i++)
#define fra(i, c) for(auto &i: c)
 
// CSES Coin Combinations II...
 
void solve() {
	I n, x;
	cin >> n >> x;
	V(I) c(n); 
	fra(i, c) cin >> i;
	V(V(I)) dp(n, V(I) (x+1, 0));
	// state: dp[j][i] = number of ordered ways to produce a sum of money i using the first j coins...
	// transition equation: dp[j][i] = dp[j-1][i]+dp[j][i-c[j]]...
	fr(j, 0, n)
		dp[j][0] = 1; // base cases...
	fr(j, 0, n) {
		fr(i, 1, x+1) {
			if(j > 0) dp[j][i] = dp[j-1][i];
			if(i-c[j] >= 0) dp[j][i] = (dp[j][i]+dp[j][i-c[j]])%MOD1;
		}
	}
	cout << dp[n-1][x]; // final subproblem...
	// time complexity: O(n*x)*O(1)...
}
 
int main() {
	// setIO("problem_name");
	fastio();
	I T = 1;
	// cin >> T;
	while(T--) solve();	
}

The second solution is giving TLE

Code 2

#include <bits/stdc++.h> 
using namespace std;
 
#define MOD1 1000000007

#define I int

#define V(x) vector<x>
 
#define fr(i, s, e) for(int i = s; i < e; i++)
#define fra(i, c) for(auto &i: c)
 
// CSES Coin Combinations II...
 
void solve() {
	I n, x;
	cin >> n >> x;
	V(I) c(n); 
	fra(i, c) cin >> i;
	V(V(I)) dp(x+1, V(I) (n, 0));
	// state: dp[i][j] = number of ordered ways to produce a sum of money i using the first j coins...
	// transition equation: dp[i][j] = dp[i][j-1]+dp[i-c[j]][j]...
	fr(j, 0, n)
		dp[0][j] = 1; // base cases...
	fr(i, 1, x+1) {
		fr(j, 0, n) {
			if(j > 0) dp[i][j] = dp[i][j-1];
			if(i-c[j] >= 0) dp[i][j] = (dp[i][j]+dp[i-c[j]][j])%MOD1;
		}
	}
	cout << dp[x][n-1]; // final subproblem...
	// time complexity: O(x*n)*O(1)...
}
 
int main() {
	// setIO("problem_name");
	fastio();
	I T = 1;
	// cin >> T;
	while(T--) solve();	
}

The only difference in code is the order of nested for loop

Someone please help!

Comments (10)

Write comment?

kotharirahil9

7 months ago, # |

Even I faced the same issue, I think the problem is that it is faster to make n vectors of size x than x vectors of size n, as n has goes up to 100 and x upto 1e6.

→ Reply

practice_has

7 months ago, # ^ |

I get you but even if you make n*x vector and use x loop before n loop then only 1 more test case is passing and 4 are still giving TLE, perhaps row-major order address calculation

maxrgby

even if it didn't TLE, the second one would be wrong because it counts the same ordered way multiple times

It's not true brother, since you need the previous state to be calculated before the current state, you can do any order, the answer will be correct! Suppose :- x = 7 and coins = {2, 5} According to you it should give 2+5 and 5+2 that is 2 ways but it's giving only 1

my bad, i did not notice that your vector was 2-dimensional

i think this is an issue with the CSES grader, on my device it runs at almost the same time and the if statements trigger the same amount of times

that being said, for this problem, i would recommend using a single 1D DP array instead of a 2D DP array; it saves a lot of time and memory.

h_squared

I also found the TL to be too tight on this one. I usually use arrays instead of vectors in such cases

I tried with arrays too, in both the order of the loops, it gave TLE for the code similar to 2nd and got accepted for the code similar to 1st

TL isn't that tight if you use a 1d DP

Medeali

there i a very easy solution using 1D dp i also had this problem when using 2d arrays so i just changed the space complexity to 1 dimension too (by the way the time complexity was O(n*k) still got tle with the 2d pretty expected cses grader is known for being slow)

yashnator

← Rev. 2 →

It's because your first implementation has more cache hits than the second one. In C++, when you make a std::vector<std::vector>>, the outer vector may look like (v[0]. v[1], v[2] ... , v[n-1]), but the std::vectors are not contiguously allocated in the memory. Try doing this a few times:

vector<vector<ll>> z(5, vector<ll> (6, 0));
cout << &(z[0].back()) + 1 << " " << &(z[1].front()) << endl;

You will not always get the same memory address for consecutive vectors. However, all its elements are allocated contiguously for one particular 1-d std::vector.

In the questions n < 100 and x < 1e5. So, it's faster to access x elements of a vector consecutively.

Hence, in the first implementation, you accessed longer contiguous memory segments and got more cache hits than in the second implementation, which is why you got TLE for the second case.

practice_has's blog