pabloskimg's blog

By pabloskimg, history, 3 years ago, In English

I'm trying to solve problem Jukebox from ICPC Latin America Regional 2006. You can read the problem statement and submit solutions here. You can also find the official inputs and outputs here and here. Please, read the problem statement to understand all the details, but here is an attempt of a summary:

Basically you are given N songs, each song with a title and an artist. There is at most 30 songs, 6 artists and all strings are at most 30 characters long each. For each song, you can decide if you keep or hide the artist. Your task then is to find the optimal way of keeping/hiding artists such that the sum total of the lengths of the songs' golden strings is minimized. A song's golden string is any shortest substring of either the song's title or artist (if the artist is not hidden) such that it is not a substring of any other song. In other words, when you type a song's golden string in a search box, only that song is matched, and there is no other shorter substring that accomplishes the same. There is always at least one valid solution.

My approach. You can find my code below.

Code

The idea: Since showing two or more times the same artist doesn't help, basically we have two options for each artist: either we hide it completely or reveal it in just one song. Given that there are only 6 artists in the worst case, I thought that backtracking would be good enough to explore all these possibilities. Then, to find out the golden string for each song, I came up with the following idea: we can concatenate all songs and titles in a single string, build a suffix array from the concatenation, and then for each song we can simulate that we do a search by iteratively typing each suffix of the title and each suffix of the artist. As we type a suffix, we can progressively narrow the scope of matches in the suffix array using two binary searches (lower bound and upper bound). We can check if in the current range only a single song is matched by doing a range query with a sparse table that computes the bitwise or over a range (we can encode each song as 1 << song_id, since there is at most 30 songs). This solution is correct, you can check that it gets all the outputs right (see official inputs and outputs at the beginning of this post). Unfortunately, it is too slow :(, because it does backtracking, which in the worst case involves checking 6^6 = 46,656 cases, and for each case I concatenate strings, build a suffix array, build a sparse table, try all suffixes for each song and do binary searches and queries to the sparse stable. So it's just way too slow.

Does anyone have any idea how to do better?

  • Vote: I like it
  • +14
  • Vote: I do not like it