2 sum problem expanded to distributed systems/map reduce

#	User	Rating
1	ecnerwala	3649
2	Benq	3581
3	orzdevinwang	3570
4	Geothermal	3569
4	cnnfls_csy	3569
6	tourist	3565
7	maroonrk	3531
8	Radewoosh	3521
9	Um_nik	3482
10	jiangly	3468

#	User	Contrib.
1	maomao90	174
2	awoo	164
3	adamant	163
4	TheScrasse	159
5	nor	158
6	maroonrk	156
7	-is-this-fft-	152
8	SecondThread	147
9	orz	146
10	pajenegod	145

So recently I was interviewing for a big data engineer role, I had cleared the first round which was just DSA, SQL, and discussion on the big data tech that I had worked on, in the second round I came across an interesting problem so I wanted to share it with you all, it was same as 2 sum but i was expected to use map-reduce:

Problem statement:

There are a large number of numbers residing on multiple nodes (storage + commodity compute), you have one main compute server where you can perform your computing, find all distinct pairs which equal to a sum of "target" and the count of those pairs, now you can't load all the numbers on your main compute as you do not have enough storage to do so.

find all x+y=target

The solution the interviewer expected:

approach: Map-Reduce:

send a map program to all the nodes where the map function calculates the following :

(number,count(number))

next is the reduce phase where we use the following hashing algorithm in order to assign the above-calculated info in various nodes into buckets in our main compute

The bucket a given number is assigned to, is calculated as follows: min(number, target-number)

in that way, a pair, if they have a sum equal to target, will be assigned to the same bucket and we can compute our answers easily (edge case for equal numbers but yeah that can be handled somehow)

Now I know this question has a bunch of loop holes but the I hope the intent of the question was understood.

This was the first time I came across such a problem, can someone recommend me resources to learn/practice such problems?

	Rev.	Lang.	By	When	Δ	Comment
	en2		Saucemaster102	2023-04-04 20:13:26	17	Tiny change: 'ed info into buck' -> 'ed info in various nodes into buck'
	en1		Saucemaster102	2023-04-04 19:50:56	1661	Initial revision (published)

Rev.

Lang.

When

Comment

en2

Saucemaster102

2023-04-04 20:13:26

Tiny change: 'ed info into buck' -> 'ed info in various nodes into buck'

en1

Saucemaster102

2023-04-04 19:50:56

1661

Initial revision (published)

History