Shade100's blog

By Shade100, history, 7 years ago, In English

Hi! I'm having trouble understanding this O(NlogN) implementation of suffix array construction :

Code

In particular:

for (i = 0; i < n; i++) { // count the frequency of each integer rank
	c[i + k < n ? RA[i + k] : 0]++;
}

0 is used as the rank for an empty string when it is also the rank of the smallest suffix, as seen below.

tempRA[SA[0]] = r = 0; // re-ranking; start from rank r = 0

// compare adjacent suffixes
for (i = 1; i < n; i++) {
	// if same pair => same rank r; otherwise,increase r
	tempRA[SA[i]] = (RA[SA[i]] == RA[SA[i - 1]] && RA[SA[i] + k] == RA[SA[i - 1] + k]) ? r : ++r;
}

SA[i] + k can go out of bounds.

These seem like bugs, but the algorithm works, but strangely only when a sentinel character such as $ is appended to the end. Why?

Thanks in advance!

Full text and comments »

  • Vote: I like it
  • 0
  • Vote: I do not like it