Find strings containing a certain substring dynamically with queries

→ Обратите внимание

До соревнования
Codeforces Round 941 (Div. 1)
2 дня
Зарегистрироваться »

*есть доп. регистрация

До соревнования
Codeforces Round 941 (Div. 2)
2 дня
Зарегистрироваться »

*есть доп. регистрация

→ Трансляции

AMA: TheOneYouWant

aryanc403

До начала 26:09:34

Всё →

→ Лидеры (рейтинг)

№	Пользователь	Рейтинг
1	ecnerwala	3649
2	Benq	3581
3	orzdevinwang	3570
4	Geothermal	3569
4	cnnfls_csy	3569
6	tourist	3565
7	maroonrk	3531
8	Radewoosh	3521
9	Um_nik	3482
10	jiangly	3468

Страны | Города | Организации

Всё →

→ Лидеры (вклад)

№	Пользователь	Вклад
1	maomao90	174
2	awoo	164
3	adamant	163
4	TheScrasse	159
5	nor	158
6	maroonrk	156
7	-is-this-fft-	151
8	SecondThread	147
9	orz	146
10	pajenegod	145

Всё →

→ Найти пользователя

→ Прямой эфир

Детальнее →

Блог пользователя Robbinb1993

Find strings containing a certain substring dynamically with queries

Автор Robbinb1993, история, 3 года назад, По-английски

Hi,

For this problem I have a set S with initially 0 strings.

There are three types of queries:

Add a string to set S with id = queryId.
Remove the string from set S with id = removeId.
Read a string M and print all strings in set S that have M as substring.

Let N be the size of S.

How would I solve this efficiently? I hope to reach a complexity of O((length of search string) + N) for query 3 and O(length of string) for queries 1 and 2. Is this possible, with for example a suffix tree?

I would like to use this for a fast search engine for a (big) data set that is changing dynamically.

strings

Robbinb1993
3 года назад
13

Комментарии (13)

Написать комментарий?

Robbinb1993

3 года назад, # |

Auto comment: topic has been updated by Robbinb1993 (previous revision, new revision, compare).

→ Ответить

Robbinb1993

3 года назад, # |

Auto comment: topic has been updated by Robbinb1993 (previous revision, new revision, compare).

→ Ответить

Lezendary_sandwich

3 года назад, # |

You can do it with an extra log overhead using suffix automaton.

→ Ответить

Robbinb1993

3 года назад, # ^ |

Cool, do you have any sources how to support the removal of a string from the suffix automaton?

→ Ответить

Lezendary_sandwich

3 года назад, # ^ |

https://codeforces.com/edu/course/2/lesson/2

→ Ответить

Robbinb1993

3 года назад, # ^ |

Thanks, I see this is a suffix array instead of suffix automaton though. But both should be possible. I don't see the mentioning of deleting an added string from the suffix array, I am not sure if it is possible to do this after creation.

→ Ответить

Robbinb1993

3 года назад, # ^ |

For suffix automaton and suffix array it is built from only one string, unlike suffix tree. So we should use a delimiter to support multiple strings to be searched. I am not sure if this would support a deletion operation of one of the substrings contained in the concatenated string from which the data structure was built.

→ Ответить

Lezendary_sandwich

3 года назад, # ^ |

Create suffix arrays for strings being added $$$O(S\log(s))$$$. When deleting remove their suffix arrays.

→ Ответить

Robbinb1993

3 года назад, # ^ |

← Rev. 2 →

I see. But I think the complexity for a search will be more efficient when using a Suffix tree. It will only be a traversal of M characters (the search pattern string length) and the vertices in the suffix tree will store all ID's of the strings that are below it in the tree (which are added when the string was added to the suffix tree, on its path). If we are at our final character of the search pattern, we will return the set of strings with endpoints below this vertex in the tree and this is our result. Which is only O(M + N) for a search.

→ Ответить

radoslav11

3 года назад, # ^ |

← Rev. 3 →

You can actually generalise the suffix automaton for multiple strings too (without adding the annoying delimiter) with a very small modification to the construction algorithm. Basically, whenever you're adding a new string, you reset the last node to 0/the root. You then need to also handle the case of already having a transition with the letter you're adding, but that's pretty similar to the normal SA cases. Here's the implementation I use. When you add a new string, you simply set last=0 and add the letter as normal.

Note that for the specific problem you have, this approach needs some tweaking as if the queries come online, you'll might change the structure of the SA, meaning that you need to be careful if you use the internal suffix links that form a tree (which is probably the easiest way to perform what you described). The easiest solution is probably to solve everything offline, by pre-building the generalised SA, and maintaining the suffix tree with a flag on every node. Now every addition/removal of a string is equivalent to adding/removing 1 on a path to the root of a tree, and the query is equivalent to counting the number of nonzero elements. All of those can be handled easily in $$$O(\log)$$$.

→ Ответить

Robbinb1993

3 года назад, # ^ |

Thank you, that is a nice modification. In my case it has to be used in practice where you don't know in front what queries and what input will be given, similar to the interactive problems. So unfortunately an offline solution is not possible. The use case is a big product table of a retailer in which data can be modified, added and deleted at any time. And users should be able to search quickly for search pattern matches in this list of data.

→ Ответить

radoslav11

3 года назад, # ^ |

If you need it for production, maybe a relatively good idea would be to use bucketing + the idea I mentioned, say by rebuilding the SA on every $$$B=500$$$ updates. Then you'll be able to query the latest version to get the matches and after that you'll only cross check with at most $$$B$$$ strings (which should be fast enough).

→ Ответить

Robbinb1993

3 года назад, # ^ |

Yes I think this is the way to go. Seems like a deletion operation is not so easy to implement for suffix data structures. Not a lot can be found about this.

→ Ответить

Соревнования по программированию 2.0

Время на сервере: 25.04.2024 15:20:26 (l2).

Десктопная версия, переключиться на мобильную.

При поддержке