g_49's blog

By g_49, 3 years ago, In English

Hi, everyone. This problem was asked in Augnito Backend Developer Hiring Challenge on HackerEarth. This contest is now over. I've no idea how to solve this problem.

Statement

Given a string $$$A$$$ of length $$$N$$$ having lowercase English alphabets.

Let $$$S_T$$$ be the set of characters of a string $$$T$$$. For example, if $$$T = aabeec$$$ then $$$S_T = [a,b,c,e]$$$.

Two sets are different if one of the sets has a character that is not present in the other set.

Find the number of different sets that can be obtained from all the non-empty substrings of the given string $$$A$$$.

Constraints

Number of test cases $$$\leq 10$$$

$$$1 \leq N \leq 10^5$$$

Sample Input

3
4
abbc
3
bcb
5
abbca 

Sample Output

6
3
7

UPD: tiasmondal's approach seems to be correct. I've added the code based on it.

C++ code
  • Vote: I like it
  • +11
  • Vote: I do not like it

| Write comment?
»
3 years ago, # |
  Vote: I like it +4 Vote: I do not like it

I think the main crux here is that duplicate letters don't matter but order matters, for example.. If you have a string something like aabcba and you want to find the number of unique substring sets starting from last letter i.e a, is same as number of unique substrings in this string acba.. (omitting the duplicates) and this modified string size can be at most 26. so just traverse the string, maintain a vector of unique letters and if you have a previous occurrence of the current letter already in the vector, remove it and push_back this letter, and find different sets using bitmask by traversing this 26 size(at most) vector in order and insert in set. Finally return the answer. Hope it helps.

  • »
    »
    3 years ago, # ^ |
      Vote: I like it +1 Vote: I do not like it

    Surely this isn't correct. For example, consider string $$$ abcda $$$, then omitting one of the $$$ a $$$ will cause you to get less sets from substrings. Because, in particular, in the original string, there is a set $$$ \{ d, a \} $$$, but if you remove the last occurence of character $$$ a $$$, then you won't have that set.

    • »
      »
      »
      3 years ago, # ^ |
      Rev. 2   Vote: I like it +1 Vote: I do not like it

      I am not telling to modify the original string.. but modify the vector as you traverse and calculate this for every index..., for example in the string you mentioned abcda... for i=0; "a", for i=1 "ab", for i=2 "abc", for i=3 "abcd", for i=4 "bcda", For each index calculate bitmask traversing vector from end.. in this way you will not miss {d,a} as you can see you will incorporate that in the index i=4.

      • »
        »
        »
        »
        3 years ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        Hmm okay, your language was a bit confusing. I understand what you're saying.

»
3 years ago, # |
Rev. 2   Vote: I like it +3 Vote: I do not like it

Precompute $$$last[i][j]$$$ over all lowercase alphabets $$$j+'a'$$$ where $$$i$$$ is the position we are at currently and $$$last[i][j]$$$ is the last position where alphabet $$$j+'a$$$' occured uptil $$$i$$$. Now iterate over all alphabets and those which have $$$last[i][current]!=-1$$$ push them in a vector. Note that the size of this vector can be atmost $$$26$$$.

Also maintain a set $$$seen$$$ of binary masks of length $$$26$$$ in a set. For all set bits of mask we say that those bits(alphabets here) are valid and have occured.

Now create a $$$NULL$$$ mask of $$$26$$$ bits $$$(all 0's)$$$

Sort this vector of alphabets by their positions and iterate on it from the end as we want our current $$$s[i]$$$ to be included in all the substrings. Also set the bits in this mask as and when you iterate and check if it is contained in our set "seen". If not insert it and increment answer by 1.

We have solved the problem in $$$O(26log(26)N)$$$ time and $$$O(2^{26})$$$ space. It should pass. You can further reduce the space by $$$32$$$ or $$$64$$$ times by storing the masks in a $$$bitset$$$.