1000000 strings problem!

→ Pay attention

Before contest
Codeforces Round 940 (Div. 2)
3 days
Register now »

*has extra registration

→ Streams

[ID] 2023 ICPC World Finals Luxor

By jonathanirvings

Stream is running

The 2023 ICPC World Finals Luxor

By ICPCNews

Stream is running

ICPC World Finals Luxor Mirror Contest with Petr and ksun48

By tourist

Stream is running

View all →

→ Top rated

#	User	Rating
1	ecnerwala	3648
2	Benq	3580
3	orzdevinwang	3570
4	cnnfls_csy	3569
5	Geothermal	3568
6	tourist	3565
7	maroonrk	3530
8	Radewoosh	3520
9	Um_nik	3481
10	jiangly	3467

Countries | Cities | Organizations

View all →

→ Top contributors

#	User	Contrib.
1	maomao90	174
2	adamant	164
2	awoo	164
4	TheScrasse	160
5	nor	159
6	maroonrk	156
7	-is-this-fft-	150
7	SecondThread	150
9	orz	146
10	pajenegod	145

View all →

→ Find user

→ Recent actions

Detailed →

rezzaque's blog

1000000 strings problem!

By rezzaque, history, 8 years ago, In English

Hello everyone! I encountered a problem at work. I have to search from 1000000 strings that are built from only 0's and 1's if any of them is at least %80 percent similar to an arbitrarily given string. By similarity I mean hamming distance. The thing is linear searching is no good enough to make it in at least 0.2 seconds. All strings are of the same length which is 64 characters. Any help or suggestion is welcome, have a nice day!

hamming, searching

rezzaque
8 years ago
5

Comments (4)

Show archived | Write comment?

CountZero

8 years ago, # |

store strings as 64-bit numbers, Hamming distance between strings a and b will be popcount(a xor b). I believe this should fit in 0.2s.

→ Reply

rezzaque

8 years ago, # ^ |

But the strings are given as bits, how do I store them as 64 bit numbers?

→ Reply

Aemon

8 years ago, # ^ |

I mean, if the strings are just 1s and 0s, then it is simply a number in binary. Convert your strings into decimal numbers (64 bit integer type) using basic math. For example, say your dictionary of a million strings contains this string: "0011111000100011001100001110100011001000110011101000011110100111"

This converts to 4477476230895929255. All 64 bit numbers will fit inside an unsigned long long in C/C++. Then, just do as CountZero said and do the xor trick to sweep through all million numbers. Should be very speedy.

→ Reply

P_Nyagolov

8 years ago, # ^ |

http://wikihow.com/Convert-from-Binary-to-Decimal

→ Reply