MahmoudSayed's blog

By MahmoudSayed, 9 years ago, In English

There are 1 billion stars and you are standing at earth. Form the earth you know the distance of all 1 billion stars. Find the nearest 1 million stars from the earth.

Thanks in advance.

  • Vote: I like it
  • +2
  • Vote: I do not like it

»
9 years ago, # |
Rev. 2   Vote: I like it +9 Vote: I do not like it

If all stars are in an array, we assign each star a value equal to the distance of that star to earth. Then the median of medians algorithm finds the k-smallest element in O(n) time worst-case, and also partitions the array such that all elements smaller than the k-smallest element are left of it, and the elements larger are right of it (i.e. the k-smallest element is in position k in a 0-based array).

Note that this is asymptotically optimal, not necessarily faster in practice. An easier approach would be to just sort the array in O(n lg n).

  • »
    »
    9 years ago, # ^ |
    Rev. 2   Vote: I like it +1 Vote: I do not like it

    Firstly, I wanna thank you for your comment.

    Secondly, 1 billion is still a large number to store it in an array So i thought about another solution and I found this one out Using Binary Search Tree.

    While reading the input I will insert the current number(distance) to the tree.
    when the size of the tree is larger than 1 million I will delete the largest one and insert the current element(only if the current element is smaller than the largest one in the tree) otherwise ignore this number.
    Finally the tree has the the nearest 1 million stars and I can print them.

    while(cin >> n ) {
       if(NumberOfElementsInTheTree > int(1e6)) {
           if(n > TheLargestElementInTheTree) continue; 
           else Delete(TheLargestElementInTheTree); 
       }
       Insert(n); 
    }
    

    Cause I'm newbie with the Binary Search Tree I think this algorithm's complexity will be O(nlog(n)) (Without printing)
    I dunno if This algorithm and its complexity are right or not.

    • »
      »
      »
      9 years ago, # ^ |
        Vote: I like it +4 Vote: I do not like it

      O(m log(n)) where m is 1 billion, n is 1 million. Also, it's better to use a priority queue because it has O(1) access to the largest element, so some loop iterations will take O(1) instead of O(log n). Best case time (when all n largest elements are already in the beginning) will be O(m + n log(n)).

»
9 years ago, # |
Rev. 2   Vote: I like it +2 Vote: I do not like it

Have a sliding window of size 1 million and imagine it to be a max-heap. The problem boils down to finding K smallest elements from N elements where K is 1e6 and N is 1e9. When we slide the max heap over the billion distances, at the end of scan we will have nearest 1 million distances.

Steps : a) Add first K(1 million) distances into max-heap. b) For K+1 element, to add this into the heap, it needs to be smaller than the max heap root. If yes, extract-max and add this (K+1)th element into the heap and heapify. Else, continue scanning. c) At the end of scan, max heap has the answer.

Please correct me if I am wrong.

Complexity(worst case) : (N-K) extracts from Max heap and heapify = (N-K) log(K) plus O(K) for build Heap with K elements to begin with. => O(K+ (N-K) logK)

»
9 years ago, # |
Rev. 2   Vote: I like it 0 Vote: I do not like it

You may do it in O(n). Add elements one by one. When there's 2k elements run nth_element and remove k of them

  • »
    »
    9 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    I can't understand what you want to say.
    Could you write a Pseudocode for your idea, please?! riadwaw

    • »
      »
      »
      9 years ago, # ^ |
        Vote: I like it +3 Vote: I do not like it
      vector v
      for value in input:
         v.push_back(value)
         if v.size >= 2 * k:
             nth_element(v)
             //remove all elements except first k
      nth_element(v)
      //remove all elements except first k
      
      //v contains k nearest points here
      

      Rearranges the elements in the range [first,last), in such a way that the element at the nth position is the element that would be in that position in a sorted sequence.

      The other elements are left without any specific order, except that none of the elements preceding nth are greater than it, and none of the elements following it are less.

      http://www.cplusplus.com/reference/algorithm/nth_element/

      • »
        »
        »
        »
        9 years ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        nth_element itself only o(n) in average but afaik it's possible to find median in O(n) in worst case too (and if you have a median you may easily rearrange elements as nth_elements do.