Minimization of DFA — different approach ideas?

№	Пользователь	Рейтинг
1	tourist	3947
2	ecnerwala	3654
3	jiangly	3627
4	jqdai0815	3620
5	orzdevinwang	3612
6	Benq	3586
7	Radewoosh	3582
8	Geothermal	3569
8	cnnfls_csy	3569
10	ksun48	3474

№	Пользователь	Вклад
1	awoo	163
2	maomao90	160
3	adamant	156
4	atcoder_official	155
5	maroonrk	152
6	-is-this-fft-	148
6	SecondThread	148
8	Petr	147
9	nor	145
10	cry	144

I got this problem from a friend: return the minimum number of states needed to produce a DFA that recognizes a provided language L (link; http://poj.org/problem?id=3576). So I see online there are some algorithms for it like Hopcroft's, but I was trying to think about it a different way that I haven't gotten to work.

So I was thinking about it by starting with an automaton with a start and a sink node and (|w|-1)-long chains for each word w. And so in the first pass we repeatedly merge nodes with the same first character until we can't anymore. And then we repeat the same process from the suffix end of the words. I've been trying to implement this for a while but it doesn't seem to work. This is what I've written so far:

t = int(input())

def remove_prefixes(group, completed):
    compressed = 0
    group.sort()
    i = 0
    while i < len(group):
        j = i - 1
        subgroup = []
        while j + 1 < len(group) and group[j + 1][0] == group[i][0]:
            if len(group[j + 1]) >= 2:
                subgroup.append(group[j + 1][1:])  # remove 1st char
            j += 1
        if j - i == 0:
            completed.append(group[i][1:])
        else:
            compressed += len(subgroup) - 1  # When merging n nodes, we decrease by n-1
            compressed += remove_prefixes(subgroup, completed)  # Continue recursively removing from this group 
        i = j + 1
    return compressed

for test in range(t):
    n = int(input())
    language = [input() for i in range(n)]
    nodes = 2 + sum(len(word) - 1 for word in language)
    completed = []
    nodes -= remove_prefixes(language, completed)
    # Remove the prefixes from the reversed strings
    remaining = [word[::-1] for word in completed if len(word) > 0]  # Exclude empty strings
    nodes -= remove_prefixes(remaining, [])
    print(nodes)

But a cleaner solution my friend showed me was the following:

    n = int(input())
    language = [input() for i in range(n)]
    suffix_sets = set()
    prefix_suffixes = {}
    for word in language:
        for i in range(0, len(word) + 1):
            prefix = word[:i]
            suffix = word[i:]
            if prefix in prefix_suffixes:
                prefix_suffixes[prefix].add(suffix)
            else:
                prefix_suffixes[prefix] = {suffix}
    suffix_sets = set(str(val) for val in prefix_suffixes.values())
    print(len(suffix_sets))

Which I see the idea for, but I'm wondering if my approach could be correct as well.

Rev.	Кто	Когда	Δ	Комментарий
en4	johnkim3104	2022-10-23 01:03:21	207
en3	johnkim3104	2022-10-23 01:01:08	2	(published)
en2	johnkim3104	2022-10-23 01:00:54	28	(saved to drafts)
en1	johnkim3104	2022-10-23 00:59:40	2591	Initial revision (published)

Rev.

Язык

Кто

Когда

Комментарий

en4