Minimization of DFA — different approach ideas?

Revision en2, by johnkim3104, 2022-10-23 01:00:54

I got this problem from a friend: return the minimum number of states needed to produce a DFA that recognizes a provided language L (link; http://poj.org/problem?id=3576). So I see online there are some algorithms for it like Hopcroft's, but I was trying to think about it a different way that I haven't gotten to work.

So I was thinking about it by starting with an automaton with a start and a sink node and (|w|-1)-long chains for each word w. And so in the first pass we repeatedly merge nodes with the same first character until we can't anymore. And then we repeat the same process from the suffix end of the words. I've been trying to implement this for a while but it doesn't seem to work. This is what I've written so far:

t = int(input())

def remove_prefixes(group, completed):
    compressed = 0
    group.sort()
    i = 0
    while i < len(group):
        j = i - 1
        subgroup = []
        while j + 1 < len(group) and group[j + 1][0] == group[i][0]:
            if len(group[j + 1]) >= 2:
                subgroup.append(group[j + 1][1:])  # remove 1st char
            j += 1
        if j - i == 0:
            completed.append(group[i][1:])
        else:
            compressed += len(subgroup) - 1  # When merging n nodes, we decrease by n-1
            compressed += remove_prefixes(subgroup, completed)  # Continue recursively removing from this group 
        i = j + 1
    return compressed

for test in range(t):
    n = int(input())
    language = [input() for i in range(n)]
    nodes = 2 + sum(len(word) - 1 for word in language)
    completed = []
    nodes -= remove_prefixes(language, completed)
    # Remove the prefixes from the reversed strings
    remaining = [word[::-1] for word in completed if len(word) > 0]  # Exclude empty strings
    nodes -= remove_prefixes(remaining, [])
    print(nodes)

But a cleaner solution my friend showed me was the following:

    n = int(input())
    language = [input() for i in range(n)]
    suffix_sets = set()
    prefix_suffixes = {}
    for word in language:
        for i in range(0, len(word) + 1):
            prefix = word[:i]
            suffix = word[i:]
            if prefix in prefix_suffixes:
                prefix_suffixes[prefix].add(suffix)
            else:
                prefix_suffixes[prefix] = {suffix}
    suffix_sets = set(str(val) for val in prefix_suffixes.values())
    print(len(suffix_sets))

Which I see the idea for, but I'm wondering if my approach could be correct as well.

History

 
 
 
 
Revisions
 
 
  Rev. Lang. By When Δ Comment
en4 English johnkim3104 2022-10-23 01:03:21 207
en3 English johnkim3104 2022-10-23 01:01:08 2 (published)
en2 English johnkim3104 2022-10-23 01:00:54 28 (saved to drafts)
en1 English johnkim3104 2022-10-23 00:59:40 2591 Initial revision (published)