Optimized LCS for short sequence case

→ Pay attention

Before contest
Codeforces Round 940 (Div. 2)
5 days

→ Streams

CodeChef Starters 130 Solution Discussion

By aryanc403

Before stream 20:37:45

View all →

→ Top rated

#	User	Rating
1	ecnerwala	3648
2	Benq	3580
3	orzdevinwang	3570
4	cnnfls_csy	3569
5	Geothermal	3568
6	tourist	3565
7	maroonrk	3530
8	Radewoosh	3520
9	Um_nik	3481
10	jiangly	3467

Countries | Cities | Organizations

View all →

→ Top contributors

#	User	Contrib.
1	maomao90	174
2	adamant	164
2	awoo	164
4	TheScrasse	160
5	nor	159
6	maroonrk	156
7	SecondThread	150
7	-is-this-fft-	150
9	pajenegod	145
10	BledDest	144

View all →

→ Find user

→ Recent actions

Detailed →

LLI_E_P_JI_O_K's blog

Optimized LCS for short sequence case

By LLI_E_P_JI_O_K, history, 6 years ago, translation, In English

There is a well known algorithm of searching the largest common subsequence of two sequences with lengths N and M in O(N·M) time: Link to the algorithm description

But I've recently heard there is some technique that allows to reduce time if one of the sequences is short enough and works in O(min(N, M)² + max(N, M)). Does anybody know how to do it?

lcs, subsequence

LLI_E_P_JI_O_K
6 years ago
6

Comments (6)

Write comment?

aid

6 years ago, # |

+33

Seems pretty obvious: dp_i, j stores the least k such that LCS of i-th prefix of small sequence and k-th prefix of large sequence is j.

→ Reply

LLI_E_P_JI_O_K

6 years ago, # ^ |

+18

Wow, definitely, it's quite easy, thanks a lot! :)

→ Reply

LLI_E_P_JI_O_K

6 years ago, # ^ |

For moving to the state dp[i + 1][j + 1] we need to find the first entry of element s[i + 1] in big sequence starting from the dp[i][j] + 1 position, how to implement it in O(1) if the alphabet size is big? Won't there be an additional log2(max(N, M)) in the time estimation?

→ Reply

Kaban-5

6 years ago, # ^ |

← Rev. 2 →

+20

(Kind of overkill probably.)

Let n be the length of longer sequence and m be the length of shorter sequence.

First thing we do is deleting all characters from longer sequence that don't apper in shorter sequence in O(n + m) time using hashmap. Now, there are only m symbols left in the "effective alphabet". Also this construction allows us to think that alphabet is just [0, m - 1] (this part seems to be unnecessary if alphabet is something like [0, n - 1], but should reduce hidden constant if alphabet is big enough).

One thing we may notice is that iterating over (i, j) in any kind of lexicographical order is unnecessary: we may iterate over them in order of increasing dp[i][j], which will turn out to be more conventient. This way all dp recalculations will still happen in valid order.

To achieve this, we will use n + 1 queues: queue number k will hold all (i, j) such that dp[i][j] = k. Then we can just iterate k from 0 to n - 1, consider all pairs from k-th queue (there are two types of transitions (i, j, k) -> (i + 1, j, k) just adds new pair in current queue for later consideration, and (i, j, k) -> (i + 1, j + 1, nk > k) is interesting, but affects only later queues).

The difference between this and naive update order is that now queries "find first symbol equal to c in longer string from position p inclusive" happen in increasing order of p. Suppose that we have an array of answers for all such queries for position p. Transition to p + 1 is simple, because only for one symbol answer changes — exactly for symbol at p-th position of longer string. And for it, answer changes to next position of this symbol in longer string. Next positions of each symbol can be precalculated in O(n + m), so we can answer this queries in O(1) after some precalculations, because p only increases.

TL;DR: process states in order of increasing values of dp[i][j], this way start of the query segments always move right, which makes recalculating answers in O(1) possible.

→ Reply

LLI_E_P_JI_O_K

6 years ago, # ^ |

← Rev. 4 →

Ok, thanks a lot, interesting approach :)

→ Reply

LLI_E_P_JI_O_K

6 years ago, # ^ |

Not seems to be an "overkill". Additional log2(max(N, M)) can seriously influence to the time.

→ Reply