Summary: SOS DP may be misleadingly characterized as DP, better characterized as many-dimensional prefix sums, and can be applied more conveniently/flexibly as a result of this new characterization. Also, it's a decent introduction to many-dimensional prefix sums for those who have never encountered it before.

Disclaimer: this blog is more of a random thought than something that directly contributes to CP, but I thought it was interesting, so maybe you guys will as well.

If you're not familiar with SOS DP, you can learn about it at this blog. Or you can keep reading — I'll explain the important concepts along the way.

You start with an array of values $$$A[i]$$$ of length $$$2^N$$$, which represents a function of a subset of $$$N$$$ bits. Also, we want to calculate a new array $$$F$$$, where $$$F[mask] = \sum_{i \subseteq mask} A[i]$$$. For example, if $$$A[i]$$$ counts how many times the number $$$i$$$ appears in an array, $$$F[mask]$$$ would count how many times any subset of $$$mask$$$ appears in that array. The DP solution has time complexity $$$O(N \cdot 2^N)$$$. The initial DP implementation also uses $$$O(N \cdot 2^N)$$$ memory, but the blog also offers another way to code it as a "memory optimization" — if you're unfamiliar with this code, don't worry, I'll come back to it later.

```
// https://codeforces.com/blog/entry/45223
for(int i = 0; i < (1<<N); ++i)
F[i] = A[i];
for(int i = 0; i < N; ++i)
for(int mask = 0; mask < (1<<N); ++mask)
if( mask & (1<<i) )
F[mask] += F[mask^(1<<i)];
```

Now, I'll explain why this isn't just a memory optimization, but a natural way to solve the problem using prefix sums. To demonstrate this, let's work with the case where $$$N = 2$$$, and represent the values as a $$$2$$$x$$$2$$$ matrix $$$M$$$. So in this matrix form, we'll have an index for each bit: $$$A[0] = A[00_2] = M[0][0], A[1] = A[01_2] = M[0][1], A[2] = A[10_2] = M[1][0], A[3] = A[11_2] = M[1][1]$$$. Now, let's think about what would happen after performing the standard 2D prefix sum implementation on this matrix:

```
// add across rows
for (int i = 1; i < 2; i++)
for (int j = 0; j < 2; j++)
M[i][j] += M[i-1][j];
// add across columns
for (int i = 0; i < 2; i++)
for (int j = 1; j < 2; j++)
M[i][j] += M[i][j-1];
```

What have we achieved? Notice that after prefix sums, each the value at each position represents the sum of all values where the indices are at most the current set of indices. But we've defined these indices to correspond to bits, so we have summed all values where each bit is at most the current bit. And that exactly describes the original sum over subsets problem!

Now the original implementation of SOS DP is starting to look more like an implementation of prefix sums. First, the variable $$$i$$$ is iterated from $$$0$$$ to $$$N-1$$$ — this corresponds to how when working with many-dimensional prefix sums, we must pass through the values once per dimension (for example, the 2D prefix sum code makes two passes through the matrix). Next, iterating $$$mask$$$ from $$$0$$$ to $$$2^N-1$$$ represents addressing each value (for example, in 2D prefix sums we need to loop over $$$i$$$ and $$$j$$$ to address each entry). We check if the $$$i$$$'th bit exists in $$$mask$$$ because we only want to change values where the $$$i$$$'th "index" is greater than $$$0$$$. And finally, we add the value where the $$$i$$$'th index is removed as part of the prefix sum calculation.

Why is thinking about it this way useful? Because it can speed up implementation of other uses of SOS DP — instead of thinking about the DP first then performing the memory optimization, we can go straight to the code we want by thinking about it as prefix sums. For example, we can immediately solve the sum-over-supersets problem: instead of calculating $$$F[mask] = \sum_{i \subseteq mask}A[i]$$$, we want to calculate $$$F[mask] = \sum_{mask \subseteq i}A[i]$$$. In terms of prefix sums, we just want the sum of values where the indices are greater than our current index.

```
for(int i = 0; i < (1<<N); ++i)
F[i] = A[i];
for(int i = 0; i < N; ++i)
for(int mask = 0; mask < (1<<N); ++mask)
if( (mask & (1<<i)) == 0 ) // this is the only line that's changed!
F[mask] += F[mask^(1<<i)];
```

Notice that the only difference is that instead of wanting the current bit to be equal to $$$1$$$, we want it equal to $$$0$$$ so that we can add the next index.

Additionally, we can easily reverse this process by thinking of it through prefix sums: If we are given the sum-over-subset array $$$F$$$ and want to calculate the original values $$$A$$$, that's the same thing as if we are given the prefix sums and want to retrieve the initial values. In other words, we just need to calculate the differences along each dimension.

```
for(int i = 0; i < (1<<N); ++i)
A[i] = F[i];
for(int i = 0; i < N; ++i)
for(int mask = 0; mask < (1<<N); ++mask)
if( mask & (1<<i) )
F[mask] -= F[mask^(1<<i)]; // this is the only line that's changed!
```

This becomes useful in problems like 1620G - Subsequences Galore, where we can easily calculate the sum-over-superset array, and want to retrieve the initial values. The editorial first flips every bit of each index (turning the sum-over-superset into a sum-over-subsets) and then finds these differences, but with a more intuitive understanding of what's going on we can quickly implement what we need to do.

Finally, I'll end this off by discussing how many-dimensional prefix sums in general is not that far off from what we have already been implementing. Sometimes, in problems like this one, we need to optimize the solution with prefix sums, except we don't know how many dimensions there will be. Therefore, instead of storing the values in an n-dimensional array, we need to store them in a 1D array. The implementation will be very similar to what we have been doing, but the only difference is that each dimension can have length greater than 2, which makes it harder to access individual indices.

We can overcome this by calculating "block" sizes. For example, if we have three dimensions of size $$$n, m, o$$$, and we want to place $$$A[i][j][k]$$$ in a one-dimensional array, its new index will be $$$(1)i + (n)j + (n\cdot m)k$$$. Then these coefficients $$$1, n, n\cdot m$$$ will be the block sizes. The implementation will look something like this (notice the similarities to the code displayed already):

```
int N = 4 // number of dimensions
vector<int> dims = {6, 9, 4, 20}; // actual dimensions
vector<int> blocks(N+1);
blocks[0] = 1;
for (int i = 0; i < N; ++i)
blocks[i+1] = blocks[i] * dims[i];
for (int i = 0; i < blocks[N]; ++i)
F[i] = A[i];
for(int i = 0; i < N; ++i)
for(int j = 0; j < blocks[N]; ++j){
if ((j / blocks[i]) % dims[i] != 0) // ensuring current index is nonzero
F[j] += F[j - blocks[i]]; // subtracting current block to get previous index
}
```

And that's all I have! Hopefully some of the people here found it interesting :) plz upvote if you did!