SirRembocodina's blog

By SirRembocodina, history, 5 weeks ago, translation, In English

Hi. I want to share a simple method for coordinate compression. Even noob can write it.

vector<int> a(n);
// read the vector
vector<int> b = a;
sort(b.begin(), b.end());
map<int, int> m;
for (int i = 0; i < n; i++) {
    m[b[i]] = i;
}
for (int i = 0; i < n; i++) {
    a[i] = m[a[i]];
}

Now every value of an array lies in [0, n). The most convineint it that if you need the original value for a[i], you can just write b[a[i]].

And how do you write it?

 
 
 
 
  • Vote: I like it
  • +21
  • Vote: I do not like it

»
5 weeks ago, # |
  Vote: I like it +15 Vote: I do not like it

I think when you are initializing the map, you need to use m[b[i]] instead of m[a[i]], right?

»
5 weeks ago, # |
  Vote: I like it +301 Vote: I do not like it
vector<int> d = a;
sort(d.begin(), d.end());
d.resize(unique(d.begin(), d.end()) - d.begin());
for (int i = 0; i < n; ++i) {
  a[i] = lower_bound(d.begin(), d.end(), a[i]) - d.begin();
}
//original value of a[i] can be obtained through d[a[i]]
  • »
    »
    5 weeks ago, # ^ |
    Rev. 5   Vote: I like it +87 Vote: I do not like it

    +1, this method is faster because it doesn't use sets/maps

    The one from the blog doesn't map to values 0,1,2,3,... but instead, it leaves some gaps. This might be confusing while debugging.

    EDIT: It might be even faster to sort pairs (value, index) to avoid binary search. This requires iterating the sorted sequence and doing nxt++ when we have value different than previous value. Longer implementation, better runtime (I think).

    int n = a.size();
    vector<pair<int, int>> pairs(n);
    for(int i = 0; i < n; ++i) {
    	pairs[i] = {a[i], i};
    }
    sort(pairs.begin(), pairs.end());
    int nxt = 0;
    for(int i = 0; i < n; ++i) {
    	if(i > 0 && pairs[i-1].first != pairs[i].first) nxt++;
    	a[pairs[i].second] = nxt;
    }
    
    • »
      »
      »
      5 weeks ago, # ^ |
      Rev. 2   Vote: I like it 0 Vote: I do not like it
      template<class T>
      struct compress {
          vector<T> &src;
          vector<T> comped;//change this to reference to mutate inplace
          vector<T> helper;
          int n;
      
          explicit compress(vector<T> &src) : src(src), comped(src), helper(comped), n(src.size()) {
              sort(helper.begin(), helper.end());
              helper.resize(unique(helper.begin(), helper.end()) - helper.begin());
              for (int i = 0; i < n; ++i) {
                  comped[i] = lower_bound(helper.begin(), helper.end(), src[i]) - helper.begin();
              }
          }
      
          T &operator[](int i) {
              assert(i >= 0 && i < n);
              return comped[i];
          }//returns compressed val
      
          T &at(int i) {
              assert(i >= 0 && i < n);
              return helper[comped[i]];
          }//returns original val before compression
      };
      

      A more generic ,OOP implementation of this great snippet :)

    • »
      »
      »
      5 weeks ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      would you please give a small snippet of it? I'm not getting it

      • »
        »
        »
        »
        5 weeks ago, # ^ |
          Vote: I like it 0 Vote: I do not like it
        void solve() {
            int n = 5;
            vector src = {5, 4, 3, 2, 5};
            compress<int> obj(src);
            deb(obj.helper); //2 3 4 5 -> sorted non-duplicate val list
            deb(obj.comped); // 3 2 1 0 3 -> compressed list where each val<n
            deb(obj.at(4));  // 5 -> this is same as src[4] 
            deb(obj[4]);     // 3 -> the hash-key which src[4] is mapped to 
        }
        
    • »
      »
      »
      5 weeks ago, # ^ |
        Vote: I like it +1 Vote: I do not like it

      I know my method isn't fast. I just think it's very easy to comprehend for the beginners.

    • »
      »
      »
      5 weeks ago, # ^ |
      Rev. 4   Vote: I like it 0 Vote: I do not like it

      Thanks for your idea, it is greate, here is my "short" template implementation from your code UwU

      #define fi first
      #define se second
      template<typename T>
      void compress(vector<T> &a)
      {
          vector<pair<T, int> > M;
          for (int i = 0; i < a.size(); ++i)
              M.emplace_back(a[i], i);
          sort(M.begin(), M.end());
      
          a[M[0].se] = 0; /// Assign first value with zero
          for(int i = 1; i < a.size(); ++i) /// Iterate the rest and increase when adj are different
              a[M[i].se] = a[M[i - 1].se] + (M[i - 1].fi != M[i].fi); 
      }
      
      • »
        »
        »
        »
        5 weeks ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        I don't claim that [N lower bounds] are slower than [sort]. I claim that sort + lower bounds are together slower than sort (even if it's sort on pairs).

        Also, it's faster to choose the size of vector M while creating it, instead of doing emplace_back many times.

        • »
          »
          »
          »
          »
          5 weeks ago, # ^ |
          Rev. 8   Vote: I like it +10 Vote: I do not like it

          Yes sir, thanks for pointing my mistakes

          And yes sir, but I just afraid that some class might not have constructor.

          Here are my fixed versions

          Reserve + Emplace back
          Assign directly
          When it is not good to create clone value of array
          • »
            »
            »
            »
            »
            »
            5 weeks ago, # ^ |
            Rev. 2   Vote: I like it 0 Vote: I do not like it

            Is there anything in C++ that can be compared with < and != and doesn't have a constructor? Even if you create your own class with some int/string attributes, they are 0/empty by default. So it's only if you create a class with some custom constructor with arguments? Seems very rare in CP.

            Btw. I don't think something so simple should be templated and copy-pasted to use during a contest. In the long run, you are better off by just implementing it every time. This way you get to know an algorithm well and you can modify it.

            • »
              »
              »
              »
              »
              »
              »
              5 weeks ago, # ^ |
                Vote: I like it 0 Vote: I do not like it

              Yes sir, it is really just very small chances for random cases when I read other solutions with non-constructor custome class/struct, but/and all of them also dont have comparators

            • »
              »
              »
              »
              »
              »
              »
              5 weeks ago, # ^ |
              Rev. 2   Vote: I like it 0 Vote: I do not like it

              Can I ask for a question related to the implementations ?

              Yesterday I use priority_queue with value comparator and it is amolst TLE while using position comparator give TLE. Why using position are slower

              struct value_comparator {
                  bool operator < (const myClass &A, const myClass &B) const {
                      return A > B;
                  }  
              };
              
              vector<myClass> A;
              struct position_comparator {
                  bool operator < (int i, int j) const {
                      return A[i] > A[j];
                  }  
              };
              

              And is it good to use my third implementation of coordinate compression than my other two ? (comparing using position instead of directly compare the values)

              • »
                »
                »
                »
                »
                »
                »
                »
                5 weeks ago, # ^ |
                  Vote: I like it +3 Vote: I do not like it

                Accessing something in a separate array costs time and it isn't cache-efficient. It's better to have a value here.

                And is it good to use my third implementation of coordinate compression than my other two?

                Just run some tests and see which version is faster.

                • »
                  »
                  »
                  »
                  »
                  »
                  »
                  »
                  »
                  5 weeks ago, # ^ |
                    Vote: I like it 0 Vote: I do not like it

                  Oh thanks sir, I thought using index will reduce the amount of extra memory but now I know sometimes cache miss is really a problem

            • »
              »
              »
              »
              »
              »
              »
              5 weeks ago, # ^ |
                Vote: I like it 0 Vote: I do not like it

              Yes sir, but instead of copying the template to every problems, I usually only write the template<> when I face such problem, since the algorithm is simple but it is still a bunch of lines of code if we use many times :V

»
5 weeks ago, # |
  Vote: I like it 0 Vote: I do not like it

Better if(!m.count(b[i]) m[b[i]]=i;

»
5 weeks ago, # |
Rev. 3   Vote: I like it +4 Vote: I do not like it

My way

template<typename T>
void compress(vector<T> &a)
{
    int order = -1;
    set<T> S;
    map<T, int> M;
    for (T  x : a) S.insert(x);    /// Taking all unique elements
    for (T  x : S) M[x] = ++order; /// Assign value for unique elements
    for (T &x : a) x = M[x];       /// Assign value for array
}

Notice that by using set, this would work faster when the number of duplicates higher, but will work slower when all array are unique

  • »
    »
    5 weeks ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Notice that by using set, this would work faster when the number of duplicates higher, but will work slower when all array are unique

    I came across a problem on CSES Salary Queries where Code using Set+Map for compression resulted in TLE, and Code using Vector+Sort+Map gave AC.

    PS: The Time-Limit of the problem is 1s.

  • »
    »
    5 weeks ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    you don't need sets

    map<T, int> mp;
    int cnt = 0;
    for (auto v : a) mp[v];
    for (auto& e : mp) e.second = cnt++;
    for (auto& v : a) v = mp[v]; 
    
    • »
      »
      »
      5 weeks ago, # ^ |
      Rev. 2   Vote: I like it 0 Vote: I do not like it

      Thanks for giving me a new idea

      I forgot auto method so I thought it would be long to use something like

      range based loop
      iterator

      but also I think I can use unordered_map to assigning value faster (if hash function for specific class then it would be more worse)

      unordered_map
»
5 weeks ago, # |
  Vote: I like it +11 Vote: I do not like it

I write it the same way as you

»
5 weeks ago, # |
  Vote: I like it 0 Vote: I do not like it

Can you suggest some problems based on this cool trick?

  • »
    »
    5 weeks ago, # ^ |
    Rev. 3   Vote: I like it 0 Vote: I do not like it

    Many algorithms only need the order of the number compare to each other and not they value, so something like {$$$1, 4, 9, 16$$$} can be use as {$$$1, 2, 3, 4$$$} with return the same result (smaller value, faster to do). So when you face some problem like that, we can compress the array down to smaller (since the limitation for $$$n$$$ usually much smaller than maximum value of $$$a_i$$$). One algorithm usually use this trick is Sweep Line

  • »
    »
    5 weeks ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Well, for example 61E - Enemy is weak. And in general it is used when you want to store a set of numbers with a segment tree.

»
5 weeks ago, # |
Rev. 2   Vote: I like it -13 Vote: I do not like it
set<int> co;
// Put all vals in the set
map<int,int> cocom;
int cocnt=1;
for(int x:co)
  cocom[x] = cocnt++;
// cocom[x] = compressed coordinate of x
  • »
    »
    5 weeks ago, # ^ |
      Vote: I like it -8 Vote: I do not like it

    Thanks, it's easy to understand

    Sorry, but why not unordered_map instead of map?

»
5 weeks ago, # |
  Vote: I like it 0 Vote: I do not like it
vector<int> coordinate_compression(vector<int> x, int start = 0, int step = 1)
{
	set<int>unique(all(x));
	map<int, int>valpos;
	int idx = 0;
	for (auto it : unique)
	{
		valpos[it] = start + idx * step;
		idx++;

	}
	for (auto& pos : x)pos = valpos[pos];
	return x;
}