Блог пользователя OneClickAC

Автор OneClickAC, история, 7 лет назад, По-английски

Hello codeforces community, I was solving this problem (http://codeforces.com/blog/entry/46425) on codeforces and came across an interesting doubt (at least according to me) . I just wanted to know that is there any algorithm for finding frequency of any number in a particular range in an array?

We use Mo`s Algorithm for finding max frequency in a range but is there any other one which is faster for this particular problem of mine (calculating frequency of any number)?

I tried writing an algorithm for it but I believe its too slow to pass in one second. Although I think that my build takes O (n log n) time and queries come out in O (log n) time but I believe it's the uncertainty of hashing that`s making the algorithm slower.

Here is my code — https://pastebin.com/Ypj90749

Note- Please do not tell me the solutions of the codeforces problem. I have already read the editorial and their implementation is pretty simple. I just want to get an answer for my curiosity to be able to learn a new beautiful algorithm (if there is) and help some other learners like me.

  • Проголосовать: нравится
  • +5
  • Проголосовать: не нравится

»
7 лет назад, # |
  Проголосовать: нравится +1 Проголосовать: не нравится

Read about merge sort tree it works if there are no updates.

  • »
    »
    7 лет назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    I even thought of this too but I believe it will be even slower than that ( on average ) as it will take o ( log n * log n ) time for answering each query.

    log n for first finding node while traversing and other log n for getting the count.

    That`s why I used unordered_map so that I am atleast free of the other log n .

»
7 лет назад, # |
  Проголосовать: нравится +3 Проголосовать: не нравится

I thought of a solution using persistent segment tree. It is also O(nlogn) preprocessing and O(logn) to answer queries.

To build the segment tree, we compress the numbers in the array (using rank). Then, we can loop through the range of ranks and add the numbers in increasing rank to the segment tree (each iteration will add elements of one rank only). Now we have a persistent segment tree, with the kth version being able to answer queries specifying what is the number of elements less than or equal to an element of rank k (in a range). Now to obtain our desired answer, we can simply take:

number of elements in the kth version of the tree — number of elements in the (k-1)th version of the tree.

This will give us the number of elements that are exactly equal to that number (i.e. the number of occurences of that rank k element in the entire array). Now you can just limit the query range to find the frequency of the element within a certain range.

  • »
    »
    7 лет назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    Thanks for taking time for writing this Lance_HAOH but can you please explain a bit more? Persistent segment tree is a completely new data structure for me and I would love if you can explain your solution in a little more easy way.

»
7 лет назад, # |
  Проголосовать: нравится +26 Проголосовать: не нравится

You can use a vector to store the positions where a number occurs, for each number!
Then just do Binary Search on the the list of a number to count it's number of occurrences in a range. O(n) Preprocessing and for query. I mean, do upperbound(r) - lowerbound(l) on the list of the positions of the number. If there are updates you can use Policy based Data structures (http://codeforces.com/blog/entry/11080). Then Preprocess and query .

  • »
    »
    7 лет назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    Your solution is good but I believe if I am having a[i] <= 1e9 , then I will have to use map instead of a vector and things will become O(log n * log n) again.

    It will work according to your time complexities only if I have say a[i] <= 1e6.

    Am I correct?

    • »
      »
      »
      7 лет назад, # ^ |
        Проголосовать: нравится +10 Проголосовать: не нравится

      Compress the numbers so it will become O(logn) again.

      • »
        »
        »
        »
        7 лет назад, # ^ |
          Проголосовать: нравится 0 Проголосовать: не нравится

        What do you mean by compressing?

        I didn`t understand this.

        • »
          »
          »
          »
          »
          7 лет назад, # ^ |
            Проголосовать: нравится 0 Проголосовать: не нравится

          It means each number turnes into his position in the sorted array built from all unique numbers of the given array. So if non-compressed array is defined by A and compressed by B, then for each i, j holds: if A[i] < A[j] (= or >) then B[i] < B[j] (= or > respectively).

          As far as the maximum element can't be more than n you can use array B more efficiently and without loss of the relations between elements.

          As an example, if A = [38, 100, 100500, 42, 100, 100, 100500], then B = [1, 3, 4, 2, 3, 3, 4].

    • »
      »
      »
      7 лет назад, # ^ |
      Rev. 2   Проголосовать: нравится +7 Проголосовать: не нравится

      Why ? One is for access to the map and the next is for binary search, not binary searches, isn't it?

  • »
    »
    7 лет назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    My implementation according to Rezwan.Arefin01:

    Code:
»
7 лет назад, # |
  Проголосовать: нравится +34 Проголосовать: не нравится

If you are allowed to solve it offline (and since you are using Mo's algorithm, I guess you are allowed) then you can do the following:

  1. split each query into freq(r,x)-freq(l-1,x)
  2. iterate over the input and store the counts in an array (or map if the range is big)
  3. if there is a query on your position, add/subtract the count from the array

This should work in O(n + q) if the elements are small enough for array or O(n + q log n) if you need to use map.

  • »
    »
    7 лет назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    Thanks for taking time to help me out ania.

    Can you explain point number 2 again or I should say, can you please elaborate it?

    I believe explaining things with a live example will be even better :)

    • »
      »
      »
      7 лет назад, # ^ |
        Проголосовать: нравится +20 Проголосовать: не нравится

      Let's say input array is 1 1 2 1 3 and we have queries: X (range=1-2, element=1), Y (range=2-5 element=1) and Z (range=3-5, element=3). Let +X be the upper part and -X the lower part of decomposed query X. In total, we have queries:

      pos=0, queries={-X}
      pos=1, queries={-Y}
      pos=2, queries={-Z, +X}
      pos=3, queries={}
      pos=4, queries={}
      pos=5, queries={+Y, +Z}
      

      The iteration goes as follows:

      pos=0, counts=0 0 0, result=0 0 0
      process query -X: subtract value counts[1]=0 from result of X
      pos=1, counts=0 0 0, result=0 0 0
      add element A[1]=1
      pos=1, counts=1 0 0, result=0 0 0
      process query -Y: subtract value counts[1]=1 from result of Y
      pos=2, counts=1 0 0, result=0 -1 0
      add element A[2]=1
      pos=2, counts=2 0 0, result=0 -1 0
      process query -Z: subtract value counts[3]=0 from result of Z
      process query +X: add value counts[1]=2 to result of X
      pos=3, counts=2 0 0, result=2 -1 0
      add element A[3]=2
      pos=4, counts=2 1 0, result=2 -1 0
      add element A[4]=1
      pos=5, counts=3 1 0, result=2 -1 0
      add element A[5]=3
      pos=5, counts=3 1 1, result=2 -1 0
      process query +Y: add value counts[1]=3 to result of Y
      pos=5, counts=3 1 1, result=2 2 0
      process query +Z: add value counts[3]=1 to result of Z
      pos=5, counts=3 1 1, result=2 2 1
      
      • »
        »
        »
        »
        7 лет назад, # ^ |
          Проголосовать: нравится +10 Проголосовать: не нравится

        I believe there is a reason why your contribution in codeforces community is 100 and I have got that :)

        This is a beauty, such an amazing explaination :)

        Although I will just disturb you for one more thing. You have divided queries into -X and X, what if there are 1e5 queries?

        I mean to ask that how to keep a note of it? I think we can use a map < int , set < int > > for this by storing position as key and query number in value (set of that key).

        We can probably make two maps for this , with one storing positions to add value and other storing positions to subtract.

        I just want to ask if there is any more effective way to do this, because my approach will increase complexity by O(n log n).

        • »
          »
          »
          »
          »
          7 лет назад, # ^ |
            Проголосовать: нравится +1 Проголосовать: не нравится

          I would use array of vectors of queries, where query type is up to your preference. It may be just integer +x or -x, it can be a pair (x,+1) or (x,-1) or even a triple like (x,+1,element). Or you can have two separate arrays of vector for + and for -, like you suggested.

          The memory taken is linear because you store only the needed values even though it seems you have nxn array.

          That's quite similar to what you want, but you don't need the log from map because keys are up to n. You also don't need the log from set because you process the whole vector/set at once.

  • »
    »
    7 лет назад, # ^ |
    Rev. 3   Проголосовать: нравится +11 Проголосовать: не нравится

    Here is my implementation, according to reply of ania..

    Code:
»
4 года назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Wavelet Tree.