*Prelude: this post assumes the reader already know the concepts behind Binary Indexed Trees, as I will not be explaining them here.*

Let's talk about 2-dimensional Binary Indexed Trees. The implementation of 2D BIT usually goes as follows:

- Implement a standard, 1D BIT, with update and query functions;
- Implement 2D update and query; these look exactly the same as the 1D case, except basic operations are replaced by calls to the 1D versions of these operations.

This is OK and works well for most cases. However, there are some issues with this implementation:

- 2D methods are usually implemented (for coding speed reasons), by copy-pasting the 1D code and changing the operations. Copy-pasting is always risky, since it's so easy to miss a change you're supposed to make.
- It doesn't feel "clean". There's duplicate code in there that makes the source harder to read (and maybe debug). Also, if you've messed up something in one of the dimensions (for example, you wrote
`<`

instead of`<=`

), chances are you've also messed it up in the other one; and you have to remember to correct the mistake in both places. **It doesn't scale well**. Issues #1 and #2 are heavily amplified when more than 2 dimensions are concerned. It was a problem which required 3D BIT that motivated me to write this post, but it's not unreasonable to imagine problems with 4D or even 5D BITs.

There's a bunch of different approaches to "fixing" these issues, but I want to tell you about a very clean, performance-centered approach. It uses template programming in C++ 11 and up. We'll be able to write code like this:

```
BIT<N, M, K> b;
b.update(x, y, z, P);
b.query(x1, x2, y1, y2, z1, z2);
```

The first line declares a *N x M x K* (3-dimensional) BIT. The second line adds *P* to the point at *(x, y, z)*. The third line tells us the sum of values inside the cube with corners *(x1, y1, z1)* and *(x2, y2, z2)*. Let's now get into the code.

```
template <int... ArgsT> struct BIT {
int val = 0;
void update(int val) {
this->val += val;
}
int query() {
return val;
}
};
```

The declaration tells us that a BIT structure will be parametric in 0 or more integral arguments. As mentioned previously, these parameters will tell us the sizes of the dimensions of the BIT. The code inside the class block will be used for 0-dimension BITs. In other words, single values. It's obvious what the update and query functions should be for this structure.

Now it's time to specialize this template for the case in which we have at least one parameter (in other words, at least one dimension).

```
template <int N, int... Ns>
struct BIT<N, Ns...> {
BIT<Ns...> bit[N + 1];
template<typename... Args>
void update(int pos, Args... args) {
for (; pos <= N; bit[pos].update(args...), pos += lastbit(pos));
}
template<typename... Args>
int query(int l, int r, Args... args) {
int ans = 0;
for (; r >= 1; ans += bit[r].query(args...), r -= lastbit(r));
for (--l; l >= 1; ans -= bit[l].query(args...), l -= lastbit(l));
return ans;
}
};
```

Here we have a multi-dimensional BIT in which the first dimension is of size N. This tree contains N BITs similar to this one, but with the first dimension removed (in fact, it's N + 1 BITs, since we are indexing from 1). In other words a *BIT<10, 20, 30*> will contain 10 instances of *BIT<20, 30>*. Makes sense!

The update and query functions take multiple arguments: the first ones are used for "walking along" the current dimension (as you would in a standard, 1D BIT), the rest are passed on to the functions called from the lower-tier BITs.

And with that we are done! With only 20 lines of code, we have a fully-functional multi-dimensional BIT that will work naturally for however many dimensions we want it to have. Oh, you will also need the *lastbit* function, but you're probably familiar with that already if you have come this far.

```
inline int lastbit(int x) {
return x & (-x);
}
```

Why do I like this implementation? Not only is it short and clean, but it's also efficient. Template programming means that all the structures needed will be generated at **compile-time**. At runtime, the program will waste no operations managing the dimensions and the structure of the trees.

Closing remarks:

- Many times with 2D (and higher) BITs, due to memory constraints, you will not be able to store the array
`BIT<Ns...> bit[N + 1];`

. Rather, it is preferred to use something like`std::map< int, BIT<Ns...> > bit;`

, since most indexes will not be accessed anyway. You can find tons of articles on the internet about 2D BITs and you can read more about that, and other optimizations you can apply. - An example of a simple problem where you can try this out is problem 1470 on timus.