Augnito Coding Test Problem | Character Set Count

Hi, everyone. This problem was asked in Augnito Backend Developer Hiring Challenge on HackerEarth. This contest is now over. I've no idea how to solve this problem.

Statement

Given a string $$$A$$$ of length $$$N$$$ having lowercase English alphabets.

Let $$$S_T$$$ be the set of characters of a string $$$T$$$. For example, if $$$T = aabeec$$$ then $$$S_T = [a,b,c,e]$$$.

Two sets are different if one of the sets has a character that is not present in the other set.

Find the number of different sets that can be obtained from all the non-empty substrings of the given string $$$A$$$.

Constraints

Number of test cases $$$\leq 10$$$

$$$1 \leq N \leq 10^5$$$

Sample Input

3
4
abbc
3
bcb
5
abbca

Sample Output

6
3
7

UPD: tiasmondal's approach seems to be correct. I've added the code based on it.

C++ code

#include <bits/stdc++.h>
using namespace std;

const int N = 1<<26;

int main() {
    int T;
    cin >> T;

    while (T--) {
        int n;
        string s;
        cin >> n >> s;

        vector<char> v;
        bitset<N> vis;

        for (char c: s) {
            auto it = find(v.begin(), v.end(), c);
            if (it != v.end())
                v.erase(it);

            v.push_back(c);

            for (int i = 0; i < v.size(); i++) {
                int mask = 0;
                for (int j = i; j < v.size(); j++) {
                    int bit = v[j] - 'a';
                    mask |= (1<<bit);
                }
                vis.set(mask);
            }
        }

        cout << vis.count() << "\n";
    }    
}

Comments (6)

Show archived | Write comment?

tiasmondal

3 years ago, # |

I think the main crux here is that duplicate letters don't matter but order matters, for example.. If you have a string something like aabcba and you want to find the number of unique substring sets starting from last letter i.e a, is same as number of unique substrings in this string acba.. (omitting the duplicates) and this modified string size can be at most 26. so just traverse the string, maintain a vector of unique letters and if you have a previous occurrence of the current letter already in the vector, remove it and push_back this letter, and find different sets using bitmask by traversing this 26 size(at most) vector in order and insert in set. Finally return the answer. Hope it helps.

→ Reply

gupta_samarth

3 years ago, # ^ |

Surely this isn't correct. For example, consider string $$$ abcda $$$, then omitting one of the $$$ a $$$ will cause you to get less sets from substrings. Because, in particular, in the original string, there is a set $$$ \{ d, a \} $$$, but if you remove the last occurence of character $$$ a $$$, then you won't have that set.

← Rev. 2 →

I am not telling to modify the original string.. but modify the vector as you traverse and calculate this for every index..., for example in the string you mentioned abcda... for i=0; "a", for i=1 "ab", for i=2 "abc", for i=3 "abcd", for i=4 "bcda", For each index calculate bitmask traversing vector from end.. in this way you will not miss {d,a} as you can see you will incorporate that in the index i=4.

Hmm okay, your language was a bit confusing. I understand what you're saying.

Codeforcer

Precompute $$$last[i][j]$$$ over all lowercase alphabets $$$j+'a'$$$ where $$$i$$$ is the position we are at currently and $$$last[i][j]$$$ is the last position where alphabet $$$j+'a$$$' occured uptil $$$i$$$. Now iterate over all alphabets and those which have $$$last[i][current]!=-1$$$ push them in a vector. Note that the size of this vector can be atmost $$$26$$$.

Also maintain a set $$$seen$$$ of binary masks of length $$$26$$$ in a set. For all set bits of mask we say that those bits(alphabets here) are valid and have occured.

Now create a $$$NULL$$$ mask of $$$26$$$ bits $$$(all 0's)$$$

Sort this vector of alphabets by their positions and iterate on it from the end as we want our current $$$s[i]$$$ to be included in all the substrings. Also set the bits in this mask as and when you iterate and check if it is contained in our set "seen". If not insert it and increment answer by 1.

We have solved the problem in $$$O(26log(26)N)$$$ time and $$$O(2^{26})$$$ space. It should pass. You can further reduce the space by $$$32$$$ or $$$64$$$ times by storing the masks in a $$$bitset$$$.

DontLookBack

Well explained!

#	User	Rating
1	tourist	3690
2	jiangly	3647
3	Benq	3581
4	orzdevinwang	3570
5	Geothermal	3569
5	cnnfls_csy	3569
7	Radewoosh	3509
8	ecnerwala	3486
9	jqdai0815	3474
10	gyh20	3447

#	User	Contrib.
1	maomao90	174
2	awoo	164
3	adamant	163
4	TheScrasse	159
5	nor	157
6	maroonrk	155
7	-is-this-fft-	152
8	Petr	146
8	orz	146
10	BledDest	145

g_49's blog

Statement

Constraints

Sample Input

Sample Output