Generalizing Segment Trees in Rust

Правка en5, от EbTech, 2019-07-25 22:01:00

I recently completed a project, to solve a very general version of the associative range query problem using Rust's polymorphism. These are solved with some variation of segment trees; I prefer to call my generalized version an arq tree: "arq" is pronounced "arc" which has a similar meaning to segment, but also stands for "associative range query".

Associativity and Semigroups

You being with a finite array of elements a_1, a_2, a_3, ..., a_n. Each a_i belongs to a semigroup (S, +), where + stands for any associative binary operation on S.

Associative Law: +: S x S -> S such that (a + b) + c = a + (b + c) for all a, b, c in S.

Because + is associative, we can drop the parentheses and talk about range aggregates of the form a_l + a_{l+1} + ... a_r.

The ARQ Problem

In the Associative Range Query problem, we wish to support two types of operations: - Given bounds l and r, compute the aggregate a_l + a_{l+1} + ... a_r - Given bounds l and r, and a function f: S -> S, replace a_i with f(a_i) for all l <= i <= r.

In typical instances where computing a + b or f(a) take O(1) time, we wish to support each operation in O(log n) time.

Identity and Monoids

The segment tree is often presented as operating on monoids instead of semigroups. A monoid is simply a semigroup with a special identity element:

Identity Law: id in S such that a + a + id = id + a = a for all a in S.

Here's how you would represent Semigroup and Monoid as traits in Rust. The programmer must verify that each function satisfies its corresponding law:

trait Semigroup {
    fn compose(&self, other: &Self) -> Self;
}

trait Monoid: Semigroup {
    fn identity() -> Self;
}

In practice the trait bound Monoid turns out to be equivalent to Semigroup + Clone, and either of the two would suffice for our purposes. A semigroup can trivially be extended into a monoid by adding an identity element, and a monoid can implement Clone via composition with the identity element:

impl<T: Semigroup + Clone> Semigroup for Option<T> {
    fn compose(&self, other: &Self) -> Self {
        match self {
            Some(ref a) => match other {
                Some(ref b) => Some(a.compose(b)),
                None => self.clone()
            },
            None => other.clone()
        }
    }
}

impl<T: Semigroup + Clone> Monoid for Option<T> {
    fn identity() -> Self {
        None
    }
}

impl<T: Monoid> Clone for T {
    fn clone(&self) -> Self {
        self.compose(T::identity())
    }
}

Note that if a type implements Monoid, then Clone is not needed since a copy can be generated by composition with the indentity. We could have made our arqtree operate on elements of a type implementing Semigroup + Clone, but I think it's more convenient to work with Monoid. And so, we have our arqtree API:

pub struct ArqTree<T> {
    val: Vec<T>
}

impl<T: Monoid> ArqTree<T> {
    fn modify(&mut self, pos: usize, f: &dyn Fn(&T) -> T) {
        // implement modify
    }
    
    fn query(&self, l: usize, r: usize) -> T {
        // implement query
    }
}

However, this will not suffice when we need to do range updates! In order to update an entire range efficiently, we will need to apply f lazily, storing it in internal nodes of the tree. If multiple updates are performed, we may have to store a composition of updates for postponed application. While one may implement a composition operation f \circle g which simply calls applies g first and then f, this makes the cost of function application no longer O(1)!

Thus, we must switch from function pointers to an explicit, composable representation of our update operator. For instance, instead of storing a function that adds 10, we might store the number 10 to remember that we should add 10 later. The composition of an add-10 with an add-5 is an add-15, which still takes O(1) time.

pub struct ArqTree<T: ArqSpec> {
    app: Vec<Option<T::F>>,
    val: Vec<T::M>,
}

impl<T: ArqSpec> ArqTree<T>
where
    T::F: Clone,
{ ... }

pub trait ArqSpec {
    /// Type of data representing an endomorphism.
    // Note that while a Fn(M) -> M may seem like a more natural representation
    // for an endomorphism, compositions would then have to delegate to each of
    // their parts. This representation is more efficient.
    type F;
    /// Type of monoid elements.
    type M;

    /// For eager updates, compose() ho be unimplemented!(). For lazy updates:
    /// Require for all f,g,a: apply(compose(f, g), a) = apply(f, apply(g, a))
    fn compose(f: &Self::F, g: &Self::F) -> Self::F;
    /// For eager updates, apply() can assume to act on a leaf. For lazy updates:
    /// Require for all f,a,b: apply(f, op(a, b)) = op(apply(f, a), apply(f, b))
    fn apply(f: &Self::F, a: &Self::M) -> Self::M;
    /// Require for all a,b,c: op(a, op(b, c)) = op(op(a, b), c)
    fn op(a: &Self::M, b: &Self::M) -> Self::M;
    /// Require for all a: op(a, identity()) = op(identity(), a) = a
    fn identity() -> Self::M;
}
Теги rust, #segment tree, #lazy propagation, #persistence, monoid

История

 
 
 
 
Правки
 
 
  Rev. Язык Кто Когда Δ Комментарий
en30 Английский EbTech 2020-04-14 07:01:18 30 Tiny change: 's _segments trees_. H' -> 's _segment trees_. H'
en29 Английский EbTech 2020-04-14 05:55:59 4 Tiny change: 't as with _S_, we can c' -> 't as with $S$, we can c'
en28 Английский EbTech 2020-04-14 05:55:06 17 Tiny change: ', just as before, can choos' -> ', just as with _S_, we can choos'
en27 Английский EbTech 2020-04-14 05:46:13 5 Tiny change: 'push()`/`pop()` API ma' -> 'push()`/`pull()` API ma'
en26 Английский EbTech 2020-04-14 04:59:38 3732 Tiny change: 'ongs to a semigroup $(S, +)$;' -> 'ongs to a _semigroup_ $(S, +)$;'
en25 Английский EbTech 2020-04-14 03:10:53 63
en24 Английский EbTech 2020-04-14 03:09:03 169 Tiny change: 'pagation, based on ' -> 'pagation, which I like to call an ARQBIT. It's based on '
en23 Английский EbTech 2020-04-14 02:53:27 1126 Advanced Usage section
en22 Английский EbTech 2020-04-13 23:23:05 23 Tiny change: ' ARQ tree based on ' -> ' ARQ tree with lazy propagation, based on '
en21 Английский EbTech 2020-04-13 23:20:44 647
en20 Английский EbTech 2020-04-13 23:05:34 304
en19 Английский EbTech 2020-04-13 22:38:38 1708 Tiny change: '}\n~~~~~\nIn pract' -> '}\n~~~~~\n\n### Equivalence\n\nIn pract'
en18 Английский EbTech 2020-04-13 02:31:13 18
en17 Английский EbTech 2020-04-13 01:11:47 2 Tiny change: ' S \times X \rightarr' -> ' S \times S \rightarr'
en16 Английский EbTech 2020-04-13 01:10:47 81
en15 Английский EbTech 2020-04-13 01:08:33 1742 Tiny change: ')$, where + stands fo' -> ')$, where $+$ stands fo' (published)
en14 Английский EbTech 2020-04-13 00:43:22 5983
en13 Английский EbTech 2020-04-13 00:19:49 705 Tiny change: 'a_{l+1} + ... a_r$.\n\n' -> 'a_{l+1} + \ldots + a_r$.\n\n'
en12 Английский EbTech 2020-04-12 23:56:40 741 Tiny change: ' language.\n\nIn the' -> ' language. $\sqrt{3}$\n\nIn the'
en11 Английский EbTech 2019-07-25 22:44:43 67
en10 Английский EbTech 2019-07-25 22:43:58 6 Tiny change: 'sentation]\n//! (http://co' -> 'sentation](http://co'
en9 Английский EbTech 2019-07-25 22:43:12 969
en8 Английский EbTech 2019-07-25 22:38:33 821
en7 Английский EbTech 2019-07-25 22:31:36 710
en6 Английский EbTech 2019-07-25 22:19:38 2061
en5 Английский EbTech 2019-07-25 22:01:00 1493
en4 Английский EbTech 2019-07-16 10:30:56 2 Tiny change: 'ing law:\n~~~~~\nt' -> 'ing law:\n\n~~~~~\nt'
en3 Английский EbTech 2019-07-16 10:30:18 1217
en2 Английский EbTech 2019-07-16 06:20:45 1916 Tiny change: 'usize, f: fn(&T) -> T' -> 'usize, f: &dyn Fn(&T) -> T'
en1 Английский EbTech 2019-07-16 05:23:54 1106 Initial revision (saved to drafts)