i_love_emilia_clarke's blog

By i_love_emilia_clarke, history, 6 years ago, In English

Given an array A of N integers answer Q queries of type (L, R). Answer to a query (L, R) is the sum of distinct integers in the range A[L...R]

Constraints: N, Q <= 100000

0 <= L <= R < N

abs(A[i]) <= 1 000 000 000

I read somewhere that this can be done using persistence segment tree.

So, it would be helpful if one can explain that approach.

Other approach are welcome too.

 
 
 
 
  • Vote: I like it
  • +11
  • Vote: I do not like it

| Write comment?
»
6 years ago, # |
Rev. 2   Vote: I like it 0 Vote: I do not like it

I don't now the persistent tree solution,but I can tell you a nice offline solution. Let's use B[]=an array in which we can add/delete elemnts/query the sum in O(logN).So B can be basically a standard Fenwick Tree or Segment Tree. It is easy to understand that the sum of the distinct integers in the range [L,R] is the sum of the integers that appear for the last time in the sequence(we simply add each element the last time it occurs in the subsegment).We sort the queries by R and solve them offline.At each i (1<=i<=N) we insert on the position i of B[] the value of A[i].Let ist(1<=lst<i) the last occurence of A[i] until i.We delete the number for B[lst](we need only the last occurence before R for each query).After that,we solve all queries with R=i.If we query the sum on (L,i) each integer in the subsegment appears exactly once,so if we take the sum B[L,i],we have the sum of the distinct integers in the subsegment [L,i] of A.We answer each query in the sorted order like this then we output them in the right order. Note:For finding lst fast you can use map/unordered_map. PS.You can also come up with an O(NsqrtN+QsrtN) solution using Mo's algorithm,but I feel the first solution is easier to write and is definetely faster.

»
6 years ago, # |
Rev. 3   Vote: I like it 0 Vote: I do not like it

Imagine if you created a segment tree for every range of the array. To answer a query you just pick the segment tree corresponding to the range [L,R] and get your answer from the root of the tree. Query is O(1) but the construction would be O(N^3) which isn't feasible. What you can do is build a segment tree for every prefix of the array which can be done in O(NlogN). (This is what persistent segment tree is basically). Answer for a query [L,R] would just be answer for ans_for(R) — ans_for(L-1). Use coordinate compression initially to map the values in the range [1,N].
EDIT : ^This wouldn't work. Instead of just storing counts of values in segment tree node, store count of their rightmost occurences.

  • »
    »
    6 years ago, # ^ |
      Vote: I like it +1 Vote: I do not like it

    He wants the sum of distinct integers in the specified.range. Simply doing ans(R) — ans(L — 1) won't work here.

    • »
      »
      »
      6 years ago, # ^ |
      Rev. 2   Vote: I like it -9 Vote: I do not like it

      I dont see why not , we can traverse both the trees simultaneously like we do to find the Kth smallest number in a range. EDIT : My bad!

  • »
    »
    6 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    When you remove the contribution of distinct integer by subtracting ans_for(L-1), you are not simultaneously adding(if there are any) integers which are in the range L to R but were not counted because of ans_for(L-1)

    • »
      »
      »
      6 years ago, # ^ |
        Vote: I like it -9 Vote: I do not like it

      say the array is A = [1,2,2,3,3,3,4,4,4,4] , first store the rightmost occurence of every index , for this array right_most_occur = [11,3,11,5,6,11,8,9,10,11] , where 11 is a way of storing that the right most occurence does not exist. Now the distinct integers in a range [L,R] are all those indexes i where right_most_occur[i] > R. Should be easy to figure out the persistent segment tree solution now.

»
6 years ago, # |
  Vote: I like it +30 Vote: I do not like it

A persistent segment tree is able to support queries of the form : Count the number of values <= X in the range [L, R]. Using it, we can solve the given problem online (i.e without preprocessing the queries)

Let's solve a different problem first using persistent segment tree: Number of distinct integers in a range. DQUERY

Let prev[i] denote the previous occurrence of arr[i]. Replace every element of your array by the location of it's previous index (i.e by prev[i]).

For, example

1 2 1 2 3 

(1-based indexing) will now become

0 0 1 2 0

Now the answer for the number of distinct values in range [L, R] is simply the number of values in this new array <= L - 1 in the range [L, R]. Why? Because if a value occurs more than once in the range, all its occurrences except the first one will have prev[i] >= L. So, by restricting ourselves to values <= L - 1, we are only considering the first occurrence in the range.

When adding index i of the array to your persistent segment tree, you need to add 1 to index prev[i] of the persistent segment tree.

Now, lets come back to your original problem. It can be solved in a similar fashion. You will need to create a persistent segment tree which supports weighted sum for all values <= X in range [L, R]. (This is a trivial modification to the usual persistent segment tree. You may practice on this problem if you wish.)

When adding index i of the array to your persistent segment tree, you need to add a weight of arr[i] to index prev[i] of the persistent segment tree.

Time complexity : O(N * log(N) + Q * log(N) )