Generic data structure templates via C++ mixins

#	User	Rating
1	ecnerwala	3649
2	Benq	3581
3	orzdevinwang	3570
4	Geothermal	3569
4	cnnfls_csy	3569
6	tourist	3565
7	maroonrk	3531
8	Radewoosh	3521
9	Um_nik	3482
10	jiangly	3468

#	User	Contrib.
1	maomao90	174
2	awoo	164
3	adamant	163
4	TheScrasse	159
5	nor	158
6	maroonrk	156
7	-is-this-fft-	151
8	SecondThread	147
9	orz	146
10	pajenegod	145

Hi Codeforces,

I've spent some time recently writing template code for common data structures for my own use, and I think I came across a really nice design for implementing data structures that have to be generic over both types and operations on those types---e.g. segtrees with their element type and the operation that combines them. Rather than modifying the template code itself itself, have the data structure inherit from a template parameter that contains the information of the element and operation.

That was a pretty bad explanation, but take for example this idea applied to the implementation of segtrees as per this classic blog.

template<class B> struct Segtree : public B {
    using T = typename B::T;

    size_t n; vector<T> s;
    Segtree(size_t n) : n(n), s(2*n,B::e) {};
    void update(int i, T val) {
        for (s[i += n] = val; i >>= 1;) s[i] = B::op(s[2*i],s[2*i+1]);
    }

    T query(int l, int r) {
        T la = B::e, ra = B::e;
        for (l += n, r += n; l < r; l >>= 1, r >>= 1) {
            if (l & 1) la = B::op(la, s[l++]);
            if (r & 1) ra = B::op(s[--r], ra);
        }
        return B::op(la, ra);
    }
};

which can be used like

struct MaxInt {
    using T = int;
    const T e = numeric_limits<int>::min();
    T op(T a, T b) { return max(a, b); }
};

using MaxIntST = Segtree<MaxInt>;

...

MaxIntST st(200);
st.update(2, 3);
cout << st.query(0, 200) << endl;
st.update(5, 6);
cout << st.query(0, 3) << endl;
cout << st.query(3, 100) << endl;

...

and so on. Standard segtrees are a rather simple example, but I think the main benefits come in when dealing with more intricate templates like lazy segtrees or HLD.

For reference, my (jank) lazy segtree using this

template<class B> struct LazySegtree : public B {
    using T = typename B::T;
    using F = typename B::F;
    typedef struct {
        uint8_t kind; // set, act, nop
        T set; F act;
    } A;

    A compose(A a, A b) {
        if (a.kind == 2) return b;
        else if (b.kind == 2) return a;
        else if (a.kind == 0) return a;
        else if (b.kind == 0) return A{0, B::apply(a.act, b.set, 1), F()};
        else return A{1, T(), B::compose(a.act, b.act)};
    }

    size_t n, h;
    vector<T> s; vector<A> m;
    LazySegtree(size_t x) : n(1), h(0) {
        while (n < x) n <<= 1, h++;
        s.resize(2*n, B::e);
        m.resize(2*n, A{2, T(), F()});
    }

    void push(int i, int w) {
        if (m[i].kind == 2) return;
        if (i < n) {
            m[2*i] = compose(m[i], m[2*i]);
            m[2*i+1] = compose(m[i], m[2*i+1]);
        }
        if (m[i].kind == 0) s[i] = B::mult(m[i].set, w);
        else if (m[i].kind == 1) s[i] = B::apply(m[i].act, s[i], w);
        m[i] = A{2, T(), F()};
    }

    T query(int l, int r, int node, int w) {
        push(node, w);
        if (r <= 0 || w <= l) return B::e;
        if (l <= 0 && w <= r) return s[node];
        w >>= 1;
        return B::op(query(l,r,2*node,w), query(l-w,r-w,2*node+1,w));
    }

    void apply(int l, int r, A val, int node, int w) {
        push(node, w);
        if (r <= 0 || w <= l) return;
        if (l <= 0 && w <= r) {
            m[node] = compose(val, m[node]);
            push(node, w);
        } else {
            w >>= 1;
            apply(l, r, val, 2*node, w);
            apply(l-w, r-w, val, 2*node+1, w);
            s[node] = B::op(s[2*node],s[2*node+1]);
        }
    }

    T query(int l, int r) { return query(l, r, 1, n); }
    void set(int l, int r, T val) { apply(l, r, A{0, val, F()}, 1, n); }
    void apply(int l, int r, F f) { apply(l, r, A{1, T(), f}, 1, n); }

    // tree traversing --- invalidated on update to node
    struct node {
        LazySegtree* p; int i, s, e;
        node(LazySegtree* p, int i, int s, int e) : p(p), i(i), s(s), e(e) { p->push(i, e - s); }
        bool leaf() const { return i >= p->n; }
        node left() const { return node(p, 2*i, s, (s + e) / 2); }
        node right() const { return node(p, 2*i+1, (s + e) / 2, e); }
        T& val() const { return p->s[i]; }
    };
    node root() { return node(this, 1, 0, n); }
};

and it can be used like

struct AddInt {
    /// standard segtree info
    using T = int;
    const T e = 0;
    T op(T a, T b) { return a + b; } //
    T mult(T a, int n) { return a * n; } // needed for range set
    /// operation to apply on ranges
    using F = int;
    F compose(F a, F b) { return a + b; }
    // want the result of op(f(x), f(y), f(z)) where a = op(x, y, z) and n = #args
    T apply(F f, T a, int n) { return a + n * f; }
};

using AddIntLST = LazySegtree<AddInt>;

...

AddIntLST t(200);
t.set(50, 150, 3);
cout << t.query(25, 125) << endl;
t.apply(100, 175, -20);
cout << t.query(75, 145) << endl;
...

I'm obviously not the first person with an attempt at doing this. There's a ton of template code out there where you just modify the code directly to add the function you want, some of it isolates the parts that you change into the first few declarations of the code, and notably AtCoder's ACL itself has a function-generic segtree template. But, here's some of the advantages I believe this has over those approaches:

The place where you have to modify code is obvious, self-contained, doesn't pollute the global namespace (much), and you can easily make multiple of the same data structure with different functions / types. When using Kactl a few times I've run into the problem modifying some parts of the code to the operation I wanted, but missed a subtle place and had a weird bug because of it. This completely avoids that.
Since you aren't modifying any existing code, you can use scripts to insert in your template code automatically before compiling without worrying about having to modify code after insertion. There's a few editor extensions I've seen that advertise this feature, so this would in theory make that more useful.
If the same operation has to be used for multiple data structures, there's no need to duplicate code. The main example I can think of is HLD, where (at least with how Kactl does it) both the HLD code and the underlying segtree have to be aware of the operation you are doing---multiple places to potentially forget to change something and make a bug versus having an HLD<base class> that uses the information in the base class for itself and passes it down to the underlying segtree just by using a LazySegtree<base class>.
Optimization. Obviously substituting in everything yourself will optimize just as well as this, but as it turns out gcc on -O2 will not inline function pointer template arguments, so ACL has a (slight) extra overhead. For reference, look at the generated code on compiler explorer between mine and ACL. (Also, imo the global functions and template instantiation for ACL just looks uglier than instantiating via mixins).
Black-boxing: while it's always useful to know what the code you're calling out to is actually doing, sometimes you just want to treat it like a black box and use what it gives you without thinking about, reviewing, or even looking at the code. This and ACL involve no modification to prewritten code so you can do just that.

Assorted remarks: There's no actual reason why the template has to inherit from the base struct, it just avoids the pain of having to write static a couple times. As far as I can tell, the type of data structure that is generic over an operation is known as "augmented" and inheriting from a base class that doesn't have use on its own is called a "mixin".

I'm curious if other people have ran into any of the above issues / how they dealt with it (or didn't). Any thoughts / suggestions are welcome.

Comments (8)

Write comment?

Felerius

3 years ago, # |

← Rev. 3 →

+47

While I use a similar pattern for data structures, I found that for HLD it's much cleaner to have decompose methods call a function argument with the index ranges for the subpaths. That way, your HLD is completely decoupled from your segment tree or whatever else you're using for storage. And by using lambdas you can still aggregate over all ranges. You end up with something like this for queries:

int sum = 0;
hld.decompose(u, v, [&](int l, int r) {
    sum += segtree.query(l, r);
});

For updates, you just do the same but call update on the segment tree instead. KACTLs HLD pretty much uses this pattern (its called process in their code), but then still wraps it in separate query and update methods.

→ Reply

ecnerwala

← Rev. 2 →

+52

This is generally known as "policy-based design" (as in GNU policy-based data structures). It's a C++-specific trick, and you can look up a good number of articles about it. Policy-based design is kind of the inverse of a mixin; a mixin is a generic base class that the user inherits from, whereas this is a generic class that inherits from the user's class. Both designs can work, and both can benefit from other template tricks like CRTP. I recently changed my top tree template to a Mixin, you can check it out here.

Personally, my biggest gripe about PBDS or Mixins are that the customization points (user-defined functions) have to be specified ahead of time and adding new ones on the fly is a big pain, e.g. if the problem requires something special like segment-tree beats. For segment trees in particular, I prefer a lower level library that leaves control of the data to the user, and only manages indices: it can split ranges into $$$O(\log n)$$$ node ids or list the parent ids or things like that.

EDIT: Actually, seems like my definition of Mixin is wrong for C++, this is pretty much a Mixin. What I used in my top tree template is more of just a CRTP-based abstract base class.

randomQuestionsAcc

3 years ago, # ^ |

Do you happen to have a collection of problems that break the common customization points? (stuff similar to segment tree beats that ACL's seg tree can't solve)

I wonder if it just means that we haven't found the "right" api design for these data structures yet. For example ACL's abstraction uses mathematical objects like monoids and semigroup actions(?) but that leaves optimizations on the table for cases when you have additional properties like idempotence or commutative inverses or whatever.

No, I don't think it means we haven't found the right API, I think it's indicative that algorithms and optimization are fundamentally about designing new extensions and thus breaking abstractions. I think you just have to choose an API which satisfies most of your uses, and then be ready to redo it when you learn more (also be ready to reimplement from scratch in a contest).

Geothermal

+26

This is a very nice trick; thanks for sharing! I’m curious: have you benchmarked this implementation against other implementations of segment trees? If so, does this version run as fast as efficient conventional implementations?

clam

+31

With the exception of non-inlined functions only being generated once for the base class, this should generate the same code as declaring these in the struct definition itself. From looking at generated code this does happen at all optimization levels, including all of the inlining that you'd expect at O1/2 and up (which is why I contrasted it to ACL which can't actually do those inlines because of function pointer rules). Really this is just a different way of organizing code that I was surprised to not have seen in Kactl/ACL---the difference I pointed out above being an extremely slight advantage to using mixins in terms of performance.

albjeno

Minor nitpick: While there may be no reason for the inheritance in your example, it is perfectly possible that your monoid is based on some input data, e. g. modulus for modular arithmetic. To this end, the constructor may be written such that the base class is also initialized:

template <typename... Args>
Segtree(std::size_t n, Args&&... args) : Base{std::forward<Args>(args)...}, n{n}, values(n, Base::identity) {}

In case the identity element is dynamic this is even necessary – even though I can't think of an example right now.

That's a good example and I did run into this once with having a modulus given at runtime (not sure why you're being downvoted :/). Forwarding arguments is definitely the more clean OOP approach, but this is CP so you can get away with using a global to store your mod and have whatever templates you're using just read from those.

clam's blog