Suffix Array / Manber and Myers Algo

#	User	Rating
1	tourist	3880
2	jiangly	3669
3	ecnerwala	3654
4	Benq	3627
5	orzdevinwang	3612
6	Geothermal	3569
6	cnnfls_csy	3569
8	jqdai0815	3532
9	Radewoosh	3522
10	gyh20	3447

#	User	Contrib.
1	awoo	161
2	maomao90	160
3	adamant	156
4	maroonrk	153
5	-is-this-fft-	148
5	SecondThread	148
5	atcoder_official	148
8	Petr	147
9	nor	144
9	TheScrasse	144

(Actually this is a question) So I thought I knew the intuition behind the Manber and Myers algorithm. Here is what I understood.

Suppose the string is "banana"

We first partition the suffixes in terms of similar first character as

a, anana, ana => bucket 1

banana => bucket 2

na, nana => bucket 3

Then to get the partition by the next 2h characters, my algo is:

scan each bucket one by one
take the first bucket
for each suffix in this bucket, find the position of sa + 2h, if we go out of bounds assign position = 0

So picture looks like this:

a = 0, anana = 3, ana = 3 (since a + 1 > n, nana is in 3rd bucket and na is also in third bucket)

Now, sort the assigned indices of the bucket using counting sort.
Scan the new indices one by one and create new partitions, here we get

[a], [anana, ana]

Do this until buckets = n

My problem is in 4th part, where I use counting sort.

First I coded as I had thought that I had understood the algorithm. But then I ran into trouble. As the number of buckets goes on increasing during each iteration, my algorithm approaches O(n^2) (as I assign ranks during counting sort according to the location of s + 2h suffix). So with some modification to the algorithm can I get O(nlogn)? If not what should I do?

Ok. I removed the code. So please answer me now.

Comments (5)

Write comment?

bhikkhu

10 years ago, # |

OKAY. why downvote ? If it's due to the format then that's not because of me. I am not joking here.

→ Reply

10 years ago, # ^ |

Why does the post appear so dirty?

+18

If you downvote, please give the reason too.

misof

When you have the current bucket for each suffix, you can compute new ones as follows:

For each i, consider the ordered pair ( bucket[i], bucket[i + (1<<k)] ). (here, bucket[index beyond the end] is a value larger than any valid bucket[i] )

Sort the suffixes with those pairs used as keys. This cannot be done by an ordinary countsort (there are about n^2 possible pairs (x,y)), but it can be done by a two-pass radix sort in O(n), or if you are lazy, by a standard sort in O(n log n). (The second approach then gives you O(n log^2 n) overall time complexity.)

After the sort, relabel the buckets in O(n) and you are ready to start a new iteration.

I was sorting each bucket one after another then appending the buckets together. I had not thought of assigning pairwise ranks. very stupid of me. +1 and Thank you very much sir for your time.

bhikkhu's blog