Блог пользователя clam

Автор clam, история, 3 года назад, По-английски

Hi Codeforces,

I've spent some time recently writing template code for common data structures for my own use, and I think I came across a really nice design for implementing data structures that have to be generic over both types and operations on those types---e.g. segtrees with their element type and the operation that combines them. Rather than modifying the template code itself itself, have the data structure inherit from a template parameter that contains the information of the element and operation.

That was a pretty bad explanation, but take for example this idea applied to the implementation of segtrees as per this classic blog.

template<class B> struct Segtree : public B {
    using T = typename B::T;

    size_t n; vector<T> s;
    Segtree(size_t n) : n(n), s(2*n,B::e) {};
    void update(int i, T val) {
        for (s[i += n] = val; i >>= 1;) s[i] = B::op(s[2*i],s[2*i+1]);
    }

    T query(int l, int r) {
        T la = B::e, ra = B::e;
        for (l += n, r += n; l < r; l >>= 1, r >>= 1) {
            if (l & 1) la = B::op(la, s[l++]);
            if (r & 1) ra = B::op(s[--r], ra);
        }
        return B::op(la, ra);
    }
};

which can be used like

struct MaxInt {
    using T = int;
    const T e = numeric_limits<int>::min();
    T op(T a, T b) { return max(a, b); }
};

using MaxIntST = Segtree<MaxInt>;

...

MaxIntST st(200);
st.update(2, 3);
cout << st.query(0, 200) << endl;
st.update(5, 6);
cout << st.query(0, 3) << endl;
cout << st.query(3, 100) << endl;

...

and so on. Standard segtrees are a rather simple example, but I think the main benefits come in when dealing with more intricate templates like lazy segtrees or HLD.

For reference, my (jank) lazy segtree using this

I'm obviously not the first person with an attempt at doing this. There's a ton of template code out there where you just modify the code directly to add the function you want, some of it isolates the parts that you change into the first few declarations of the code, and notably AtCoder's ACL itself has a function-generic segtree template. But, here's some of the advantages I believe this has over those approaches:

  • The place where you have to modify code is obvious, self-contained, doesn't pollute the global namespace (much), and you can easily make multiple of the same data structure with different functions / types. When using Kactl a few times I've run into the problem modifying some parts of the code to the operation I wanted, but missed a subtle place and had a weird bug because of it. This completely avoids that.
  • Since you aren't modifying any existing code, you can use scripts to insert in your template code automatically before compiling without worrying about having to modify code after insertion. There's a few editor extensions I've seen that advertise this feature, so this would in theory make that more useful.
  • If the same operation has to be used for multiple data structures, there's no need to duplicate code. The main example I can think of is HLD, where (at least with how Kactl does it) both the HLD code and the underlying segtree have to be aware of the operation you are doing---multiple places to potentially forget to change something and make a bug versus having an HLD<base class> that uses the information in the base class for itself and passes it down to the underlying segtree just by using a LazySegtree<base class>.
  • Optimization. Obviously substituting in everything yourself will optimize just as well as this, but as it turns out gcc on -O2 will not inline function pointer template arguments, so ACL has a (slight) extra overhead. For reference, look at the generated code on compiler explorer between mine and ACL. (Also, imo the global functions and template instantiation for ACL just looks uglier than instantiating via mixins).
  • Black-boxing: while it's always useful to know what the code you're calling out to is actually doing, sometimes you just want to treat it like a black box and use what it gives you without thinking about, reviewing, or even looking at the code. This and ACL involve no modification to prewritten code so you can do just that.

Assorted remarks: There's no actual reason why the template has to inherit from the base struct, it just avoids the pain of having to write static a couple times. As far as I can tell, the type of data structure that is generic over an operation is known as "augmented" and inheriting from a base class that doesn't have use on its own is called a "mixin".

I'm curious if other people have ran into any of the above issues / how they dealt with it (or didn't). Any thoughts / suggestions are welcome.

  • Проголосовать: нравится
  • +119
  • Проголосовать: не нравится

»
3 года назад, # |
Rev. 3   Проголосовать: нравится +47 Проголосовать: не нравится

While I use a similar pattern for data structures, I found that for HLD it's much cleaner to have decompose methods call a function argument with the index ranges for the subpaths. That way, your HLD is completely decoupled from your segment tree or whatever else you're using for storage. And by using lambdas you can still aggregate over all ranges. You end up with something like this for queries:

int sum = 0;
hld.decompose(u, v, [&](int l, int r) {
    sum += segtree.query(l, r);
});

For updates, you just do the same but call update on the segment tree instead. KACTLs HLD pretty much uses this pattern (its called process in their code), but then still wraps it in separate query and update methods.

»
3 года назад, # |
Rev. 2   Проголосовать: нравится +52 Проголосовать: не нравится

This is generally known as "policy-based design" (as in GNU policy-based data structures). It's a C++-specific trick, and you can look up a good number of articles about it. Policy-based design is kind of the inverse of a mixin; a mixin is a generic base class that the user inherits from, whereas this is a generic class that inherits from the user's class. Both designs can work, and both can benefit from other template tricks like CRTP. I recently changed my top tree template to a Mixin, you can check it out here.

Personally, my biggest gripe about PBDS or Mixins are that the customization points (user-defined functions) have to be specified ahead of time and adding new ones on the fly is a big pain, e.g. if the problem requires something special like segment-tree beats. For segment trees in particular, I prefer a lower level library that leaves control of the data to the user, and only manages indices: it can split ranges into $$$O(\log n)$$$ node ids or list the parent ids or things like that.

EDIT: Actually, seems like my definition of Mixin is wrong for C++, this is pretty much a Mixin. What I used in my top tree template is more of just a CRTP-based abstract base class.

  • »
    »
    3 года назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    Do you happen to have a collection of problems that break the common customization points? (stuff similar to segment tree beats that ACL's seg tree can't solve)

    I wonder if it just means that we haven't found the "right" api design for these data structures yet. For example ACL's abstraction uses mathematical objects like monoids and semigroup actions(?) but that leaves optimizations on the table for cases when you have additional properties like idempotence or commutative inverses or whatever.

    • »
      »
      »
      3 года назад, # ^ |
        Проголосовать: нравится +8 Проголосовать: не нравится

      No, I don't think it means we haven't found the right API, I think it's indicative that algorithms and optimization are fundamentally about designing new extensions and thus breaking abstractions. I think you just have to choose an API which satisfies most of your uses, and then be ready to redo it when you learn more (also be ready to reimplement from scratch in a contest).

»
3 года назад, # |
  Проголосовать: нравится +26 Проголосовать: не нравится

This is a very nice trick; thanks for sharing! I’m curious: have you benchmarked this implementation against other implementations of segment trees? If so, does this version run as fast as efficient conventional implementations?

  • »
    »
    3 года назад, # ^ |
      Проголосовать: нравится +31 Проголосовать: не нравится

    With the exception of non-inlined functions only being generated once for the base class, this should generate the same code as declaring these in the struct definition itself. From looking at generated code this does happen at all optimization levels, including all of the inlining that you'd expect at O1/2 and up (which is why I contrasted it to ACL which can't actually do those inlines because of function pointer rules). Really this is just a different way of organizing code that I was surprised to not have seen in Kactl/ACL---the difference I pointed out above being an extremely slight advantage to using mixins in terms of performance.

»
3 года назад, # |
  Проголосовать: нравится +8 Проголосовать: не нравится

Minor nitpick: While there may be no reason for the inheritance in your example, it is perfectly possible that your monoid is based on some input data, e. g. modulus for modular arithmetic. To this end, the constructor may be written such that the base class is also initialized:

template <typename... Args>
Segtree(std::size_t n, Args&&... args) : Base{std::forward<Args>(args)...}, n{n}, values(n, Base::identity) {}

In case the identity element is dynamic this is even necessary – even though I can't think of an example right now.

  • »
    »
    3 года назад, # ^ |
      Проголосовать: нравится +8 Проголосовать: не нравится

    That's a good example and I did run into this once with having a modulus given at runtime (not sure why you're being downvoted :/). Forwarding arguments is definitely the more clean OOP approach, but this is CP so you can get away with using a global to store your mod and have whatever templates you're using just read from those.