How to merge a set of 3-grams?

Revision en4, by egor.okhterov, 2016-12-03 18:36:17

We have all the possible 3-grams of lower case letters (A = {a, b, c, ..., z}):
A3 = A × A × A = 
{
  (a, a, a), 
  (a, a, b), 
  (a, a, c), 
    ..., 
  (a, a, z), 
  (b, a, a), 
    ..., 
  (z, z, y), 
  (z, z, z), 
}

|A3| = 263 = 17576

  • If we try to merge aaa with bbb we have to append bbb to the end of aaa, because they don't overlap:
    merge(aaa, bbb) = aaabbb

  • If we merge aaa with aab we can create a smaller string:
    merge(aaa, aab) = aaab

In the best case, if we manage to always do a second kind of merge, we will be extending the resulting string by 1 character. There are 17576 triplets, so there can be at most 17575 single character extensions. This will result in 17575 + 3 = 17578 characters string.

In the worst case, we will append all of the 3-grams to each other, thus creating a 17576 * 3 = 52728 characters string.

How to combine all of them to get the total string of the minimum length?


Picture to attract people :)

Tags de bruijn sequence, de bruijn graph, strings

History

 
 
 
 
Revisions
 
 
  Rev. Lang. By When Δ Comment
ru3 Russian egor.okhterov 2016-12-03 18:40:02 135 Мелкая правка: '*\n\n---\n![ ](htt' -
en4 English egor.okhterov 2016-12-03 18:36:17 117 Tiny change: ' length?**' -
en3 English egor.okhterov 2016-12-03 17:47:10 6 Tiny change: '-grams of smaller case le' -> '-grams of lower case le'
ru2 Russian egor.okhterov 2016-12-03 17:10:40 2 (опубликовано)
ru1 Russian egor.okhterov 2016-12-03 17:08:48 1380 Первая редакция перевода на Русский (сохранено в черновиках)
en2 English egor.okhterov 2016-12-03 15:28:51 948 Tiny change: 'e letters:\n$A_3 = $' - (published)
en1 English egor.okhterov 2016-12-03 15:05:57 182 Initial revision (saved to drafts)