bhikkhu's blog

By bhikkhu, 9 years ago, In English

(Actually this is a question) So I thought I knew the intuition behind the Manber and Myers algorithm. Here is what I understood.

Suppose the string is "banana"

We first partition the suffixes in terms of similar first character as

a, anana, ana => bucket 1

banana => bucket 2

na, nana => bucket 3

Then to get the partition by the next 2h characters, my algo is:

  1. scan each bucket one by one

  2. take the first bucket

  3. for each suffix in this bucket, find the position of sa + 2h, if we go out of bounds assign position = 0

So picture looks like this:

a = 0, anana = 3, ana = 3 (since a + 1 > n, nana is in 3rd bucket and na is also in third bucket)

  1. Now, sort the assigned indices of the bucket using counting sort.

  2. Scan the new indices one by one and create new partitions, here we get

[a], [anana, ana]

  1. Do this until buckets = n

My problem is in 4th part, where I use counting sort.

First I coded as I had thought that I had understood the algorithm. But then I ran into trouble. As the number of buckets goes on increasing during each iteration, my algorithm approaches O(n^2) (as I assign ranks during counting sort according to the location of s + 2h suffix). So with some modification to the algorithm can I get O(nlogn)? If not what should I do?

Ok. I removed the code. So please answer me now.

  • Vote: I like it
  • -14
  • Vote: I do not like it

»
9 years ago, # |
  Vote: I like it 0 Vote: I do not like it

OKAY. why downvote ? If it's due to the format then that's not because of me. I am not joking here.

»
9 years ago, # |
  Vote: I like it +18 Vote: I do not like it

If you downvote, please give the reason too.

»
9 years ago, # |
  Vote: I like it +1 Vote: I do not like it

When you have the current bucket for each suffix, you can compute new ones as follows:

For each i, consider the ordered pair ( bucket[i], bucket[i + (1<<k)] ). (here, bucket[index beyond the end] is a value larger than any valid bucket[i] )

Sort the suffixes with those pairs used as keys. This cannot be done by an ordinary countsort (there are about n^2 possible pairs (x,y)), but it can be done by a two-pass radix sort in O(n), or if you are lazy, by a standard sort in O(n log n). (The second approach then gives you O(n log^2 n) overall time complexity.)

After the sort, relabel the buckets in O(n) and you are ready to start a new iteration.

  • »
    »
    9 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    I was sorting each bucket one after another then appending the buckets together. I had not thought of assigning pairwise ranks. very stupid of me. +1 and Thank you very much sir for your time.