Minimization of DFA — different approach ideas?

Правка en3, от johnkim3104, 2022-10-23 01:01:08

I got this problem from a friend: return the minimum number of states needed to produce a DFA that recognizes a provided language L (link; http://poj.org/problem?id=3576). So I see online there are some algorithms for it like Hopcroft's, but I was trying to think about it a different way that I haven't gotten to work.

So I was thinking about it by starting with an automaton with a start and a sink node and (|w|-1)-long chains for each word w. And so in the first pass we repeatedly merge nodes with the same first character until we can't anymore. And then we repeat the same process from the suffix end of the words. I've been trying to implement this for a while but it doesn't seem to work. This is what I've written so far:

t = int(input())

def remove_prefixes(group, completed):
    compressed = 0
    group.sort()
    i = 0
    while i < len(group):
        j = i - 1
        subgroup = []
        while j + 1 < len(group) and group[j + 1][0] == group[i][0]:
            if len(group[j + 1]) >= 2:
                subgroup.append(group[j + 1][1:])  # remove 1st char
            j += 1
        if j - i == 0:
            completed.append(group[i][1:])
        else:
            compressed += len(subgroup) - 1  # When merging n nodes, we decrease by n-1
            compressed += remove_prefixes(subgroup, completed)  # Continue recursively removing from this group 
        i = j + 1
    return compressed

for test in range(t):
    n = int(input())
    language = [input() for i in range(n)]
    nodes = 2 + sum(len(word) - 1 for word in language)
    completed = []
    nodes -= remove_prefixes(language, completed)
    # Remove the prefixes from the reversed strings
    remaining = [word[::-1] for word in completed if len(word) > 0]  # Exclude empty strings
    nodes -= remove_prefixes(remaining, [])
    print(nodes)

But a cleaner solution my friend showed me was the following:

    n = int(input())
    language = [input() for i in range(n)]
    suffix_sets = set()
    prefix_suffixes = {}
    for word in language:
        for i in range(0, len(word) + 1):
            prefix = word[:i]
            suffix = word[i:]
            if prefix in prefix_suffixes:
                prefix_suffixes[prefix].add(suffix)
            else:
                prefix_suffixes[prefix] = {suffix}
    suffix_sets = set(str(val) for val in prefix_suffixes.values())
    print(len(suffix_sets))

Which I see the idea for, but I'm wondering if my approach could be correct as well.

История

 
 
 
 
Правки
 
 
  Rev. Язык Кто Когда Δ Комментарий
en4 Английский johnkim3104 2022-10-23 01:03:21 207
en3 Английский johnkim3104 2022-10-23 01:01:08 2 (published)
en2 Английский johnkim3104 2022-10-23 01:00:54 28 (saved to drafts)
en1 Английский johnkim3104 2022-10-23 00:59:40 2591 Initial revision (published)