Hacker Rank Isomorphic Substrings solution

→ Pay attention

Before contest
Codeforces Round 941 (Div. 1)
15:10:21
Register now »

*has extra registration

Before contest
Codeforces Round 941 (Div. 2)
15:10:21
Register now »

*has extra registration

→ Streams

Atcoder ABC #351 Short Solution Discussion

By aryanc403

Before stream 14:15:19

View all →

→ Top rated

#	User	Rating
1	ecnerwala	3649
2	Benq	3581
3	orzdevinwang	3570
4	Geothermal	3569
4	cnnfls_csy	3569
6	tourist	3565
7	maroonrk	3531
8	Radewoosh	3521
9	Um_nik	3482
10	jiangly	3468

Countries | Cities | Organizations

View all →

→ Top contributors

#	User	Contrib.
1	maomao90	174
2	awoo	164
3	adamant	161
4	TheScrasse	159
5	nor	158
6	maroonrk	156
7	-is-this-fft-	152
8	SecondThread	147
9	orz	146
10	pajenegod	145

View all →

→ Find user

→ Recent actions

Detailed →

CherryTree's blog

Hacker Rank Isomorphic Substrings solution

By CherryTree, 10 years ago, In English

I am the writer of this problem, I hope that somebody liked it ^^
I have described the solution here, but it seems to me that my solution is not the optimal one.
So it's very interesting to me to hear about your ideas and approaches to this problem, especially, about 80-100 point ones ^^

hackerrank, isomorphic_substrings

CherryTree
10 years ago
6

Comments (5)

Show archived | Write comment?

iscsi

10 years ago, # |

← Rev. 10 →

I liked this problem very much, and my solution gets 100 point, so I was very happy.

Here my solution sketch:

I will denote the original string with str. (n=length(str))

Firsty I calculate every index the order of the alphabet from this index, I mean e.g: str=abbca a=02210 b=100** c=2110* this running time in my code is O(n*26*log(26))
I realized if I could calculate every index i a number v[i] which conatins a number, and that means str.substr(i,v[i]) pattern !=str.substr(j,v[i]) pattern if j<i, and v[i] is the smallest number with this property, than we can calculate the answer easily (O(n)).
I've used hashing. At every step I will have a (sub)string, so If I store the string hash separetaly for every character I can reuse this information when I drop the first character (and adding new ones) e.g. substr=abbca subst with hash(a(substr))=q^0+q^4, hash(b(substr))=q^1+q^2 etc, and from this we can easily calculate : hash(substr)=ord[current_index][a]*hash(a(substr))+ord[current_index][b]*hash(b(substr))+... Of course now we can easily drop the first character from this hash, and calculate the new hash. (we should just decrease by one the first character hash and multiply every character's hashes with q inverse). Intuitively I've build a tree every node contains a hash, index, length tripplets (this is pointing a str.substr(index,length). Now maybe more simply if I write the code snippet:

len=0
actual_hash=0
for(i=0,i<n,i++)
{
  while(tree.contain(actual_hash,len))))
  {//while we have an isomorph pattern
    similar_index=tree(actual_hash,len)
    while(ord[similary_index][str[similar_index+len]]==ord[i][str[i+len]]) 
    {//find the end of the isomorph
      update(actual_hash)..//the hashes are equal
      tree.insert(actual_hash,len,similar_index)
      len++;
    }
  }
  update(actual_hash) //this will a new node in the tree
  tree.insert(actual_hash,len,i) 
  //here we drop the first character from the hash
  update(actual_hash) //drop first character
  v[i]=len
  len--;
  //if len>0 then tree contain hash(str.substr(i+1,len-1)) here this will provide the tree will be connected here
}

So here the len is <=n and every step we decrease just by one so the len increasing maximum 2*n, because I've used for the tree/(hash storing) simply map, this part running time is O(n*log n) for the tree and O(n*26) for the hash update. I think this is a robust solution because I think we can solve it faster because I feel unnecessary to store every node in the inner while loop in the tree (because of the last comment of the code snippet, it would be anough to store that, but I'm not 100% sure:) ). My code running time was 0.8s on the slowest test case.

→ Reply

iscsi

10 years ago, # ^ |

I've tested my feelings, in the inner loop tree.insert not necessary indeed. My code running time is become faster 0.7s now, and I've updated my code too.

→ Reply

mugurelionut

10 years ago, # |

I also got 100 points for this problem in the Hackerrank contest. My solution seems very similar to the one you described, but its theoretical time complexity is worse.

First of all, there was this problem at the Codechef September 2013 contest: http://www.codechef.com/SEPT13/problems/TMP01 The problem asks for the number of distinct substrings of a string after each string modification. The string can be modified by adding characters at the end or removing characters from the front (i.e. in a queue-like manner). My solution for that problem involved building a suffix array for the whole string (considering all the additions) and then maintaining the sum of the LCPs of the suffixes within the current "window" of the string ("window" = the current state of the string, which is a substring of the large substring).

Your problem can be formulated in terms of that problem (we can assume that the string is built by adding characters at the end, thus considering all of its prefixes). So I already had a solution which, after having the suffix array, could solve the problem in O(N * log(N)) time.

The difficult part in your problem (at least for me) was building the suffix array properly (according to the definition of isomorphism). For each step of the suffix array construction (corresponding to suffixes having a length equal to a power of 2), I would compute their LCPs (upper bounded by the current power of 2) and then a RMQ array over the LCPs. Then I used merge sort for sorting the suffixes according to the current power of two length. In order to compare two suffixes I used my own comparator which needed the LCP of the two suffixes s1 and s2, upper bounded by the current power of two length.

First of all, if the suffixes were part of different classes at the previous step then I can find their LCP in O(1) time (by using the current LCP array). Otherwise, I would need to find the LCP of the suffixes s1' and s2' (the ones located after s1 and s2 at the appropriate power of two length), restricted to the fact that s1 and s2 are already matched. Obviously, the LCP of s1' and s2' is upper bounded by their LCP in the current LCP array, but may be shorter due to some character mismatches — which can be checked in O(alphabet) time (we only need to check the first occurrence of each character of the alphabet starting from s1' and s2').

So, to summarize, my suffix array construction needs O(N * log^2 (N)) suffix comparisons (because I use merge sort O(log(N)) times). Some comparisons take only O(1) time (if the two suffixes are already part of different classes), but others may take O(alphabet) time. Since it seems difficult to analyze the details completely, I would have to say that the time complexity of my solution is O(N * log^2 (N) * |alphabet|). I guess the worst case for my solution would be a periodic string with period abcde....xyz (i.e. having the whole alphabet as a period) — did you have a similar test case during the contest ?

And to conclude, your problem could have also been solved under the same conditions from the Codechef problem I mentioned (with adding characters at the end and removing characters from the front).

→ Reply

CherryTree

10 years ago, # ^ |

Actually I was a tester of that CodeChef contest, so I know that problem. That time, O(Nlog^2N) solution was implemented by me, but it was far too slow, so it was needed to change the idea completely. I was surprised to see accepted solutions of O(Nlog^2N) complexity.
Yes, you are right, the problem can be solved under the conditions of the problem that you've mentioned, because after we have suffix array built, generally nothing needs to be changed in the solution. But the key idea of the problem was about building suffix array, not about adding/removing characters.
There was the test case that you had mentioned in the data. Regardless of the fact that my solution's complexity is O(N*logN*alphabet), I haven't managed to implement LCP function with a low constant factor, so perhaps, our timings should not differ much..

→ Reply

winger

10 years ago, # |

+16

My 100pt solution for this problem is O(n) and is based on the following idea.

First, lets define a string transform f(s), such that f(s)[i] = i — (index of previous occurrence of s[i]), assuming that (index of previous occurrence of s[i]) is -1 if there was no previous occurrence. Example: f("abacaba")=[1,2,2,4,2,4,2]. It is easy to show that strings u and v are isomorphic iff f(u) = f(v). Moreover, given f(s) it is easy to compute the transform of any suffix of s.

Next, I use slightly modified Ukkonen algorithm to incrementally build suffix tree of transformed suffixes of original string. Number of unique positions in the tree on each step is the answer for the problem.

→ Reply