Data structure to effectively insert elements into random places in a row

→ Pay attention

Before contest
Educational Codeforces Round 165 (Rated for Div. 2)
33:23:04
Register now »

→ Streams

CF Edu Round #165 Solution Discussion

By aryanc403

Before stream 35:33:04

Educational Round 165 Solution Discussion (with myself)

By Shayan

Before stream 35:33:04

View all →

→ Top rated

#	User	Rating
1	ecnerwala	3649
2	Benq	3581
3	jiangly	3578
4	orzdevinwang	3570
5	Geothermal	3569
5	cnnfls_csy	3569
7	tourist	3565
8	maroonrk	3531
9	Radewoosh	3521
10	Um_nik	3482

Countries | Cities | Organizations

View all →

→ Top contributors

#	User	Contrib.
1	maomao90	174
2	awoo	164
3	adamant	161
4	TheScrasse	159
5	nor	158
6	maroonrk	156
7	-is-this-fft-	152
8	SecondThread	146
8	orz	146
10	pajenegod	145

View all →

→ Find user

→ Recent actions

Detailed →

RodionGork's blog

Data structure to effectively insert elements into random places in a row

By RodionGork, history, 7 months ago, In English

Regard a simple problem of adding elements to a "list", every time into random place, e.g. in Python:

arr = []
for i in range(1000000):
    pos = random.randint(0, len(arr))
    value = random.randint(1000000, 9999999)
    arr.insert(pos, value)

With what data structure it can be effective? Linked list will allow O(1) insertion, but to find a place where to insert, we need to run O(n) from start. Some variants of trees may allow O(log(N)) insertion but it seems finding the proper index is equally difficult.

Wikipedia suggest Skip List with "minor modification" is what we want. Are there another structures (e.g. perhaps some modifications to some type of tree, perhaps, cartesian) useful for this task?

list, data structures, efficiency

RodionGork
7 months ago
17

Comments (17)

Write comment?

pajenegod

7 months ago, # |

+16

There exists a data structure called "Sorted list". This is what I would normally use when I need a sorted data structure in Python. I have an implementation of it up on pyrival . It runs in O(n^(1/3) per operation, with a really small constant factor.

Here is a modified version of the sorted list data structure that works like a normal Python list, but with fast insert()s and pop()s.

FastInsertList

class FenwickTree:
    def __init__(self, x):
        bit = self.bit = list(x)
        size = self.size = len(bit)
        for i in range(size):
            j = i | (i + 1)
            if j < size:
                bit[j] += bit[i]

    def update(self, idx, x):
        """updates bit[idx] += x"""
        while idx < self.size:
            self.bit[idx] += x
            idx |= idx + 1

    def __call__(self, end):
        """calc sum(bit[:end])"""
        x = 0
        while end:
            x += self.bit[end - 1]
            end &= end - 1
        return x

    def find_kth(self, k):
        """Find largest idx such that sum(bit[:idx]) <= k"""
        idx = -1
        for d in reversed(range(self.size.bit_length())):
            right_idx = idx + (1 << d)
            if right_idx < self.size and self.bit[right_idx] <= k:
                idx = right_idx
                k -= self.bit[idx]
        return idx + 1, k


class FastInsertList:
    block_size = 700

    def __init__(self, iterable=()):
        self.micros = [[]]
        self.micro_size = [0]
        self.fenwick = FenwickTree([0])
        self.size = 0
        for i, item in enumerate(iterable):
            self.insert(i, item)

    def insert(self, k, x):
        i, j = self._find_kth(k)
        self.micros[i].insert(j, x)
        self.size += 1
        self.micro_size[i] += 1
        self.fenwick.update(i, 1)
        if len(self.micros[i]) >= self.block_size:
            self.micros[i:i + 1] = self.micros[i][:self.block_size >> 1], self.micros[i][self.block_size >> 1:]
            self.micro_size[i:i + 1] = self.block_size >> 1, self.block_size >> 1
            self.fenwick = FenwickTree(self.micro_size)

    def pop(self, k=-1):
        i, j = self._find_kth(k)
        self.size -= 1
        self.micro_size[i] -= 1
        self.fenwick.update(i, -1)
        return self.micros[i].pop(j)

    def __getitem__(self, k):
        i, j = self._find_kth(k)
        return self.micros[i][j]

    def __setitem__(self, k, x):
        i, j = self._find_kth(k)
        self.micros[i][j] = k

    def _find_kth(self, k):
        if k == len(self):
            return len(self.micros) - 1, len(self.micros[-1])
        return self.fenwick.find_kth(k + self.size if k < 0 else k)

    def __len__(self):
        return self.size

    def __iter__(self):
        return (x for micro in self.micros for x in micro)

    def __repr__(self):
        return str(list(self))

# Example
import random
arr = FastInsertList()
s = 0
for i in range(1000000):
    pos = random.randint(0, len(arr))
    value = random.randint(1000000, 9999999)
    arr.insert(pos, value)
    s += value

print(sum(arr), s)

→ Reply

RodionGork

7 months ago, # ^ |

Hi, and thanks a lot!

I have difficulty finding what exactly you call "Sorted List" — is it something like TreeMap in java or OrderedDict in Python?

Seemingly not exactly... I'll dive into the example you kindly created to try figure this out!

→ Reply

pajenegod

7 months ago, # ^ |

← Rev. 3 →

I have difficulty finding what exactly you call "Sorted List"

Sorted list is a data structure that behaves like a list. The exception is that if you insert elements into the sorted list, then they get automatically sorted. See this example

A = SortedList()
A.insert(30)
A.insert(50)
A.insert(20)
A.insert(30)
A.insert(30)

print(A) # prints [20, 30, 30, 30, 50]
print(A[-1]) # prints 50
print(A.pop(1)) # prints 30
print(A) # prints [20, 30, 30, 50]
print(A.count(30)) # prints 2

The underlying data structure is based on insertion sort on a list of lists. It stores the sorted data in blocks of size <= n^(1/3). When a block grow bigger than n^(1/3), it splits that block in two. This is what allows it to do inserts in amortized O(n^1/3) time.

The story behind this data structure is that I came up with the idea for it and implemented it couple of years ago. However, I later found out that I was not the first to invent it. For example, see this . They called it a SortedList, so that is why I call it sorted list too.

→ Reply

Aliyyiakbar

7 months ago, # ^ |

python users before c++ sets fr

→ Reply

gultai4ukr

7 months ago, # |

← Rev. 2 →

+20

Yeah, treap (modified Cartesian tree) can do both operations in $$$O(log(N))$$$ on average

→ Reply

RodionGork

7 months ago, # ^ |

But how to maintain indices? I'm aware treap allows O(log(N)) insertion, but after every insertion we need to update all X for elements after the inserted one, right?

→ Reply

micho

7 months ago, # ^ |

Correct me if I'm wrong, but Treaps will allow you to do exactly what you want, online, as well as when the queries are mixed:

Insert query in O(log(N)): Just insert a single node treap at position k. To get to position k you maintain the size of each subtree in the treap. And update them accordingly when you modify the treap.
Get by index query in O(log(N)): Just go down the treap and by knowing the subtree size of the left child, you know if the index is in the left child, or in the right. If it's in the right, don't forget that the new index is id - sz[left] to offset it.

Most of the time, if you can't come up with a modification of a treap, what you can do is for each sqrt(Q) queries, you can iterate across the whole treap (which takes O(N)) time and update proper indexes in some other array. And in the queries in between these sqrt(Q) events, you will remember the newly inserted indexes by brute force (at most sqrt(Q) of them) and thus solve the problem in something like O((N + Q) sqrt(Q) + Q log(N)), which will be fast enough for most problems.

→ Reply

RodionGork

7 months ago, # ^ |

Thank you, seemingly I'm just quite poorly acquainted with Treap idea and need to go and study it bit harder :)

→ Reply

gultai4ukr

7 months ago, # ^ |

+10

But how to maintain indices?

That's the thing, you don't have to maintain them, otherwise you indeed will need to do $$$O(N)$$$ updates. You have to deal with them implicitly and that's why the version of treap which you need is called implicit treap.

I'm aware treap allows O(log(N)) insertion, but after every insertion we need to update all X for elements after the inserted one, right?

Every tree allows $$$O(log(N))$$$ insertion but treap can do more than just that :)

→ Reply

chromate00

7 months ago, # |

+15

Think of a structure similar to a binary search tree, but with random distinct values as the key. By carefully translating this structure to use an "implicit" method of randomness, we can implement random insertion in expected $$$O(\log n)$$$ operations (including random sampling). It is done like this.

For each node, maintain how many candidates of leaf nodes exist for the left subtree and the right subtree. Let us assume $$$p$$$ candidates exist for the left subtree, and $$$q$$$ for the right subtree. Sample an integer $$$x$$$ from the $$$[1,p+q]$$$ range. If $$$x \le p$$$, move down to the left subtree. Otherwise, move down to the right subtree. Repeat until you meet a leaf node candidate. Insert your item there and update the number of candidates correspondingly.

This chooses $$$N!$$$ orders equiprobably, with expected $$$O(\log n)$$$ operations per insertion. The proof of expected time complexity is not very hard, it's basically similar to proving the time complexity of random pivot Quicksort.

→ Reply

chromate00

7 months ago, # ^ |

← Rev. 2 →

Besides the actual structure itself, doesn't the pseudocode you have provided produce the exact same results with the following procedure?

Generate $$$n$$$ random values in an array.
Shuffle the array.

Considering both procedures produce $$$N!$$$ different orders when the values are distinct, and each output is equiprobable as well, I would say that the two are equivalent.

→ Reply

Wielomian

7 months ago, # |

+10

If you have all insert queries before all get queries, you can just append them into an array, storing info at which position they sould've been added. Then, when insert phase is finished, you can reorder the elements properly. This can be easily done in reverse order in $$$O(n \log n)$$$: use a segment tree to track which positions are taken already and to compute into which place the current element should go.

→ Reply

RodionGork

7 months ago, # ^ |

Cunning trick, perhaps not that much for practical use, but ideal for a small puzzle :) thanks a lot!

→ Reply

nor

7 months ago, # |

There's a non-standard data structure called a rope that can do these and much more, which is implemented as a libstdc++ extension, but sadly seems to have bugs.

For more info, see this and this.

It can be implemented in Python too, but I'm guessing you're looking for something more ready-to-use. Just mentioned it here because it is quite a nice data structure to know about.

→ Reply

harryzhengipsum

7 months ago, # |

In C++, you can use a rope. It will work in O(log(n)) time for random insertion, albeit with a large constant factor. rope

If you want to implement a random insertion data structure yourself, you can use a splay tree for O(log(n)) amortized insertion. splay tree

A treap would also work. treap

Both treaps and splay trees are very versatile and can be extended to do much more than just random insertion.

→ Reply

nor

7 months ago, # ^ |

← Rev. 2 →

A good thread-unsafe implementation of rope is usually pretty fast. The main issue with the usual implementation is that it is required to be thread safe, which it does by involving mutexes.

→ Reply

harryzhengipsum

7 months ago, # ^ |

Ah, I was basing my observation off of personal experience with the SGI rope. I'm not sure if there's a thread-unsafe implementation that's easily dropped in for competitive programming.

→ Reply