buGMaster's blog

By buGMaster, 9 years ago, In English

Hi again...
I studied the tutorial about the Data Structure by PrinceOfPersia. I had a problem with Suffix Array (the deterministic version), and I couldn't get it properly (or maybe it has bugs in it!). So I asked for help via comment, but no one answered (even the author). Then I asked the PrinceOfPersia (the author of tutorial) for double checking and give me more description about it by private message. Because he was busy and don't have enough time to help, I decided to make a blog for it.

Here's the the deterministic version of Suffix Array (used in the tutorial):

/*
Suffix array O(n lg^2 n)
LCP table O(n)
*/
#include <cstdio>
#include <algorithm>
#include <cstring>

using namespace std;

#define REP(i, n) for (int i = 0; i < (int)(n); ++i)

namespace SuffixArray
{
	const int MAXN = 1 << 21;
	char * S;
	int N, gap;
	int sa[MAXN], pos[MAXN], tmp[MAXN], lcp[MAXN];

	bool sufCmp(int i, int j)
	{
		if (pos[i] != pos[j])
			return pos[i] < pos[j];
		i += gap;
		j += gap;
		return (i < N && j < N) ? pos[i] < pos[j] : i > j;
	}

	void buildSA()
	{
		N = strlen(S);
		REP(i, N) sa[i] = i, pos[i] = S[i];
		for (gap = 1;; gap *= 2)
		{
			sort(sa, sa + N, sufCmp);
			REP(i, N - 1) tmp[i + 1] = tmp[i] + sufCmp(sa[i], sa[i + 1]);
			REP(i, N) pos[sa[i]] = tmp[i];
			if (tmp[N - 1] == N - 1) break;
		}
	}

	void buildLCP()
	{
		for (int i = 0, k = 0; i < N; ++i) if (pos[i] != N - 1)
		{
			for (int j = sa[pos[i] + 1]; S[i + k] == S[j + k];)
			++k;
			lcp[pos[i]] = k;
			if (k)--k;
		}
	}
} // end namespace SuffixArray

I don't get how the code works. Could you please give me some more clear description about it?
What does "tmp" store? It seems it contains something like [0,1,...,N-1], doesn't it?!
What about "pos"? What's the initialization of "tmp"?
Any help would be appreciated...

  • Vote: I like it
  • +6
  • Vote: I do not like it

»
9 years ago, # |
  Vote: I like it -13 Vote: I do not like it

This comment is just for making the post visible in the recent actions, maybe it's answered!