404Error's blog

By 404Error, 10 years ago, In English

Let T be a String of length m. And S be a multiset of n characters. I need to find all substring of T in length n that are formed by characters of S.

example — S = {a,b,c,d,a} T = aabcdaah answer- {aabcd,abcda,bcdaa}

It should be running in O(m) times. It should also state, for each position i, the length of the longest substring in T starting at i that can be formed from S.

I was not sure of two pointer(not exactly) solution correctness. If possible can someone know any question similar to this on training site. Can It be find by some specific algorithm or data stricture or its just simple iteration?

  • Vote: I like it
  • +3
  • Vote: I do not like it

»
10 years ago, # |
Rev. 3   Vote: I like it +4 Vote: I do not like it

what about making dp[i][j] i max will be 26 and j max will be m and make partial sum for each character so at your example
S={a,b,c,d,a}
frequency of numbers are
a=2,b=1,c=1,d=1
and your text aabcdaah
for each character calculate frequency of it till now and compare it with your S
to get freqency of specific character as I mentioned you can use partial sum

freqofA=dp[0][j]-dp[0][j-n]
freqofB=dp[1][j]-dp[1][j-n]
freqofC=dp[2][j]-dp[2][j-n]

if freqofA == number of 'a' in S and freqOfB==number of 'b' in S and the same for all remaining characters
(I mean from a to z)
so now you can count it as a substring can be build with characters in S.
so complexity will be O(26*M)

  • »
    »
    10 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Thanks for solution. Actually I am looking for O(m) solution precisely. It is book problem. I solved the same but using two pointer. taking left as 0(0 based index) and right as 0 in start and a map G containing characters of S. So keep moving right, if character does not matches character in S reset left and right to next Index and also reinitialize G with S, otherwise keep moving right and update T[i] = right — left + 1 and deleting that's char value in G. If character is in S but not in G then insert character at left in G and increment left.
    if right — left + 1 = n, then update G by putting left index character and incrementing left by 1.

    I am not sure whether my solution will pass on corner cases or not.

    • »
      »
      »
      10 years ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      solution is:

      Helper h;
      for (int i = 0; i < n; ++i)
         h.add(a[i]);
      if (h.Ok())
          ++res;
      for (int i = n; i < m; ++i)
      {
          h.add(a[i]);
          h.del(a[i - n]);
          if (h.Ok())
              ++res;
      }
      

      It's easy to implement Helper with time O(size of the alphabet). To implement it with O(1) working time you can add counter to count number of characters X : number of occurences in h is equal to or greater than number of occurences in S

      • »
        »
        »
        »
        10 years ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        Is Ok() method is to compare the content with S?

»
10 years ago, # |
Rev. 2   Vote: I like it +3 Vote: I do not like it

The following Java-like pseudocode "should work" and runs in O(m+z) with O(z) memory where z is size of alphabet, but if z is smaller than m then the runtime is O(m) (we assume n<=m).

Input: T, S, m, n.
Output: The indices in T where a valid substring starts.

final int[] f = new int[26];
for(int i = 0; i<n; i++) f[S[i]-'a']++;

for(int i = 0; i<n; i++) f[T[i]-'a']--;

int cnt = 0;
for(int i = 0; i<26; i++) if(f[i]!=0) ++cnt;

if(cnt==0) output 0;

for(int i = 0; i+n<m; i++)
{
    final int prv = T[i]-'a', nxt = T[i+n]-'a';
    f[prv]++;
    if(f[prv]==0) --cnt;
    else if(f[prv]==1) ++cnt;
    f[nxt]--;
    if(f[nxt]==0) --cnt;
    else if(f[nxt]==-1) ++cnt;

    if(cnt==0) output i+1;
}
  • »
    »
    10 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    I was looking for this kind of solution. Thanks.