[#strings #ICPC #LATAM #2006] Jukebox — can we do better than backtracking + suffix array + sparse table + binsearch?

Revision en8, by pabloskimg, 2021-04-21 15:59:18

I'm trying solve problem Jukebox from ICPC Latin America Regional 2006. You can read the problem statement and submit solutions here. You can also find the official inputs and outputs here and here. Please, read the problem statement to understand all the details, but here is an attempt of a summary:

Basically you are given N songs, each song with a title and an artist. There is at most 30 songs, 6 artists and all strings are at most 30 characters long each. For each song, you can decide if you keep or hide the artist. Your task then is to find the optimal way of keeping/hiding artists such that the sum total of the lengths of the songs' golden strings is minimized. A song's golden string is any shortest substring of either the song's title or artist (if the artist is not hidden) such that it is not a substring of any other song. In other words, when you type a song's golden string in a search box, only that song is matched, and there is no other shorter substring that accomplishes the same. There is always at least one valid solution.

My approach. You can find my code below.

Code

The idea: Since showing two or more times the same artist doesn't help, basically we have two options for each artist: either we hide it completely or reveal it in just one song. Given that there is only 6 artists in the worst case, I thought that backtracking would be good enough to explore all these possibilities. Then, to find out the golden string for each song, I came up with the following idea: we can concatenate all songs and titles in a single string, build a suffix array from the concatenation, and then for each song we can simulate that we do a search by iteratively typing each suffix of the title and each suffix of the artist. As we type a suffix, we can progressively narrow the scope of matches in the suffix array using two binary searches (lower bound and upper bound). We can check if in the current range only a single song is matched by doing a range query with a sparse table that computes the bitwise or over a range (we can encode each song as 1 << song_id, since there is at most 30 songs). This solution is correct, you can check that it gets all the outputs right (see official inputs and outputs at the begginning of this post). Unfortunately, it is too slow :(, because it does backtracking, which in the worst case involves checking 6^6 = 46,656 cases, and for each case I concatenate strings, build a suffix array, build a sparse table, try all suffixes for each song and do binary searches and queries to the sparse stable. So it's just way too slow.

Does anyone have any idea how to do better?

History

 
 
 
 
Revisions
 
 
  Rev. Lang. By When Δ Comment
en11 English pabloskimg 2021-04-22 07:38:46 1 Tiny change: 'at the begginning of' -> 'at the beginning of'
en10 English pabloskimg 2021-04-21 21:43:47 3 Tiny change: ''m trying solve pro' -> ''m trying to solve pro'
en9 English pabloskimg 2021-04-21 16:25:38 5 Tiny change: 'hat there is only 6 ar' -> 'hat there are only 6 ar'
en8 English pabloskimg 2021-04-21 15:59:18 3 Tiny change: 'bstring the accomplis' -> 'bstring that accomplis'
en7 English pabloskimg 2021-04-21 15:57:04 0 (published)
en6 English pabloskimg 2021-04-21 15:53:30 562
en5 English pabloskimg 2021-04-21 15:47:21 464
en4 English pabloskimg 2021-04-21 15:42:36 1123
en3 English pabloskimg 2021-04-21 15:32:26 578 Tiny change: 'a summary: basically y' -> 'a summary:\n\nBasically y'
en2 English pabloskimg 2021-04-21 15:13:08 940
en1 English pabloskimg 2021-04-21 14:53:44 6123 Initial revision (saved to drafts)