My code for coordinate compression

→ Pay attention

Before contest
Codeforces Round 968 (Div. 2)
32:18:31
Register now »

*has extra registration

→ Streams

Atcoder ABC368 Solution Discussion

By aryanc403

Before stream 07:28:30

Codeforces Round 968 Solution Discussion

By aryanc403

Before stream 34:23:30

View all →

→ Top rated

#	User	Rating
1	tourist	3947
2	jiangly	3734
3	Radewoosh	3646
4	jqdai0815	3620
4	Benq	3620
6	orzdevinwang	3612
7	ecnerwala	3581
8	Geothermal	3569
8	cnnfls_csy	3569
10	ksun48	3479

Countries | Cities | Organizations

View all →

→ Top contributors

#	User	Contrib.
1	awoo	162
2	maomao90	160
3	nor	157
4	adamant	156
5	cry	155
5	atcoder_official	155
5	-is-this-fft-	155
8	maroonrk	153
9	SecondThread	147
10	Petr	146

View all →

→ Find user

→ Recent actions

Detailed →

rembocoder's blog

My code for coordinate compression

By rembocoder, history, 4 years ago, translation, In English

Hi. I want to share a simple method for coordinate compression. Even noob can write it.

vector<int> a(n);
// read the vector
vector<int> b = a;
sort(b.begin(), b.end());
map<int, int> m;
for (int i = 0; i < n; i++) {
    m[b[i]] = i;
}
for (int i = 0; i < n; i++) {
    a[i] = m[a[i]];
}

Now every value of an array lies in [0, n). The most convineint it that if you need the original value for a[i], you can just write b[a[i]].

And how do you write it?

rembocoder
4 years ago
32

Comments (28)

Show archived | Write comment?

vrooooom

4 years ago, # |

+15

I think when you are initializing the map, you need to use m[b[i]] instead of m[a[i]], right?

→ Reply

rembocoder

4 years ago, # ^ |

Yes, thank you.

→ Reply

penguinhacker

4 years ago, # |

+301

vector<int> d = a;
sort(d.begin(), d.end());
d.resize(unique(d.begin(), d.end()) - d.begin());
for (int i = 0; i < n; ++i) {
  a[i] = lower_bound(d.begin(), d.end(), a[i]) - d.begin();
}
//original value of a[i] can be obtained through d[a[i]]

→ Reply

Errichto

4 years ago, # ^ |

← Rev. 5 →

+87

+1, this method is faster because it doesn't use sets/maps

The one from the blog doesn't map to values 0,1,2,3,... but instead, it leaves some gaps. This might be confusing while debugging.

EDIT: It might be even faster to sort pairs (value, index) to avoid binary search. This requires iterating the sorted sequence and doing nxt++ when we have value different than previous value. Longer implementation, better runtime (I think).

int n = a.size();
vector<pair<int, int>> pairs(n);
for(int i = 0; i < n; ++i) {
	pairs[i] = {a[i], i};
}
sort(pairs.begin(), pairs.end());
int nxt = 0;
for(int i = 0; i < n; ++i) {
	if(i > 0 && pairs[i-1].first != pairs[i].first) nxt++;
	a[pairs[i].second] = nxt;
}

→ Reply

rembocoder

4 years ago, # ^ |

← Rev. 2 →

I know my method isn't fast. I just think it's very easy to comprehend for beginners.

→ Reply

SPyofgame

4 years ago, # ^ |

← Rev. 4 →

Thanks for your idea, it is greate, here is my "short" template implementation from your code UwU

#define fi first
#define se second
template<typename T>
void compress(vector<T> &a)
{
    vector<pair<T, int> > M;
    for (int i = 0; i < a.size(); ++i)
        M.emplace_back(a[i], i);
    sort(M.begin(), M.end());

    a[M[0].se] = 0; /// Assign first value with zero
    for(int i = 1; i < a.size(); ++i) /// Iterate the rest and increase when adj are different
        a[M[i].se] = a[M[i - 1].se] + (M[i - 1].fi != M[i].fi); 
}

→ Reply

Errichto

4 years ago, # ^ |

I don't claim that [N lower bounds] are slower than [sort]. I claim that sort + lower bounds are together slower than sort (even if it's sort on pairs).

Also, it's faster to choose the size of vector M while creating it, instead of doing emplace_back many times.

→ Reply

SPyofgame

4 years ago, # ^ |

← Rev. 8 →

+10

Yes sir, thanks for pointing my mistakes

And yes sir, but I just afraid that some class might not have constructor.

Here are my fixed versions

Reserve + Emplace back

#define fi first
#define se second
template<typename T>
void compress(vector<T> &a)
{
    vector<pair<T, int> > M;
    M.reserve(a.size());
    for (int i = 0; i < a.size(); ++i)
        M.emplace_back(a[i], i);
    sort(M.begin(), M.end());

    a[M[0].se] = 0; /// Assign first value with zero
    for(int i = 1; i < a.size(); ++i) /// Iterate the rest and increase when adj are different
        a[M[i].se] = a[M[i - 1].se] + (M[i - 1].fi != M[i].fi); 
}

Assign directly

#define fi first
#define se second
template<typename T>
void compress(vector<T> &a)
{
    vector<pair<T, int> > M(a.size());
    for (int i = 0; i < a.size(); ++i)
    {
        M[i].first = a[i];
        M[i].second = i;
    }
    sort(M.begin(), M.end());

    a[M[0].se] = 0;
    for(int i = 1; i < a.size(); ++i)
        a[M[i].se] = a[M[i - 1].se] + (M[i - 1].fi != M[i].fi);
}

When it is not good to create clone value of array

template<typename T>
void compress(vector<T> &a)
{
    vector<int> M(a.size());
    iota(M.begin(), M.end(), 0);
    sort(M.begin(), M.end(), [&](int i, int j) { return a[i] < a[j]; } );

    bool diff[a.size()];
    for (int i = 1; i < a.size(); ++i) diff[i] = (a[M[i - 1]] < a[M[i]]);

    a[M[0]] = T(0);
    for (int i = 1; i < a.size(); ++i)
    {
        a[M[i]] = a[M[i - 1]];
        if (diff[i]) ++a[M[i]];
    }
}

→ Reply

Errichto

4 years ago, # ^ |

← Rev. 2 →

Is there anything in C++ that can be compared with < and != and doesn't have a constructor? Even if you create your own class with some int/string attributes, they are 0/empty by default. So it's only if you create a class with some custom constructor with arguments? Seems very rare in CP.

Btw. I don't think something so simple should be templated and copy-pasted to use during a contest. In the long run, you are better off by just implementing it every time. This way you get to know an algorithm well and you can modify it.

→ Reply

SPyofgame

4 years ago, # ^ |

Yes sir, it is really just very small chances for random cases when I read other solutions with non-constructor custome class/struct, but/and all of them also dont have comparators

→ Reply

SPyofgame

4 years ago, # ^ |

← Rev. 2 →

Can I ask for a question related to the implementations ?

Yesterday I use priority_queue with value comparator and it is amolst TLE while using position comparator give TLE. Why using position are slower

struct value_comparator {
    bool operator < (const myClass &A, const myClass &B) const {
        return A > B;
    }  
};

vector<myClass> A;
struct position_comparator {
    bool operator < (int i, int j) const {
        return A[i] > A[j];
    }  
};

And is it good to use my third implementation of coordinate compression than my other two ? (comparing using position instead of directly compare the values)

→ Reply

Errichto

4 years ago, # ^ |

Accessing something in a separate array costs time and it isn't cache-efficient. It's better to have a value here.

And is it good to use my third implementation of coordinate compression than my other two?

Just run some tests and see which version is faster.

→ Reply

SPyofgame

4 years ago, # ^ |

Oh thanks sir, I thought using index will reduce the amount of extra memory but now I know sometimes cache miss is really a problem

→ Reply

SPyofgame

4 years ago, # ^ |

Yes sir, but instead of copying the template to every problems, I usually only write the template<> when I face such problem, since the algorithm is simple but it is still a bunch of lines of code if we use many times :V

→ Reply

BorisAngelov1

3 years ago, # ^ |

That's nice code. It help me. Thank you!

→ Reply

ayush_nishad

4 years ago, # |

Better if(!m.count(b[i]) m[b[i]]=i;

→ Reply

SPyofgame

4 years ago, # |

← Rev. 3 →

My way

template<typename T>
void compress(vector<T> &a)
{
    int order = -1;
    set<T> S;
    map<T, int> M;
    for (T  x : a) S.insert(x);    /// Taking all unique elements
    for (T  x : S) M[x] = ++order; /// Assign value for unique elements
    for (T &x : a) x = M[x];       /// Assign value for array
}

Notice that by using set, this would work faster when the number of duplicates higher, but will work slower when all array are unique

→ Reply

_dp95

4 years ago, # ^ |

Notice that by using set, this would work faster when the number of duplicates higher, but will work slower when all array are unique

I came across a problem on CSES Salary Queries where Code using Set+Map for compression resulted in TLE, and Code using Vector+Sort+Map gave AC.

PS: The Time-Limit of the problem is 1s.

→ Reply

Bashca

4 years ago, # ^ |

you don't need sets

map<T, int> mp;
int cnt = 0;
for (auto v : a) mp[v];
for (auto& e : mp) e.second = cnt++;
for (auto& v : a) v = mp[v];

→ Reply

SPyofgame

4 years ago, # ^ |

← Rev. 2 →

Thanks for giving me a new idea

I forgot auto method so I thought it would be long to use something like

range based loop

template<typename T>
void compress(vector<T> &a)
{
    int order = -1;
    map<T, int> M;
    for (T  x : a) M[x];
    for (pair<T, int> &e : M) e.second = ++order;
    for (T &x : a) x = M[x]; 
}

iterator

template<class T, class ITR>
void compress(vector<T> &a)
{
    int order = -1;
    map<T, int> M;
    for (T  x : a) M[x];
    for (ITR it = M.begin(); it != M.end(); ++it) it->second = ++order;
    for (T &x : a) x = M[x];
}

but also I think I can use unordered_map to assigning value faster (if hash function for specific class then it would be more worse)

unordered_map

template<typename T>
void compress(vector<T> &a)
{
    int order = -1;
    set<T> S;
    unordered_map<T, int> M;
    for (T  x : a) S.insert(x);    /// Taking all unique elements
    for (T  x : S) M[x] = ++order; /// Assign value for unique elements
    for (T &x : a) x = M[x];       /// Assign value for array
}

→ Reply

TwentyOneHundredOrBust

4 years ago, # |

+11

I write it the same way as you

→ Reply

dv.jakhar

4 years ago, # |

Can you suggest some problems based on this cool trick?

→ Reply

SPyofgame

4 years ago, # ^ |

← Rev. 3 →

Many algorithms only need the order of the number compare to each other and not they value, so something like {$$$1, 4, 9, 16$$$} can be use as {$$$1, 2, 3, 4$$$} with return the same result (smaller value, faster to do). So when you face some problem like that, we can compress the array down to smaller (since the limitation for $$$n$$$ usually much smaller than maximum value of $$$a_i$$$). One algorithm usually use this trick is Sweep Line

→ Reply

rembocoder

4 years ago, # ^ |

Well, for example 61E - Enemy is weak. And in general it is used when you want to store a set of numbers with a segment tree.

→ Reply

demoralizer

4 years ago, # |

← Rev. 2 →

-13

set<int> co;
// Put all vals in the set
map<int,int> cocom;
int cocnt=1;
for(int x:co)
  cocom[x] = cocnt++;
// cocom[x] = compressed coordinate of x

→ Reply

dv.jakhar

4 years ago, # ^ |

← Rev. 2 →

-8

[Deleted]

→ Reply

traviscott

4 years ago, # ^ |

In unordered_map the keys are not sorted

→ Reply

BrayanD

3 years ago, # |

-20

I use this template:

struct COORDINATE_COMPRESSION
{
    typedef long long ll;

    vector<ll> vect;
    bool is_compress = true;

    int size()
    {
        if(!is_compress)compress();
        return vect.size();
    }

    void clear()
    {
        vect.clear();
        is_compress = true;
    }

    void insert(ll x)
    {
        vect.push_back(x);
        is_compress = false;
    }

    void compress()
    {
        sort(vect.begin(), vect.end());
        vect.resize(unique(vect.begin(), vect.end()) - vect.begin());
        is_compress = true;
    }

    vector<ll> compress_offline(vector<ll> vect)
    {
        if(!vect.size())return vect;

        vector<pair<ll,int>> vvv;

        for(int i = 0 ; i < vect.size() ; i++)
        {
            vvv.push_back({vect[i],i});
        }

        sort(vvv.begin(), vvv.end());

        int cont = 0;
        ll last = vvv[0].first;

        vect[vvv[0].second] = 0;

        for(int i = 1 ; i < vvv.size() ; i++)
        {
            if(vvv[i].first != last)
            {
                cont++;
                last = vvv[i].first;
            }
            vect[vvv[i].second] = cont;
        }

        return vect;
    }

    int get(ll x)
    {
        if(!is_compress)compress();
        int pos = lower_bound(vect.begin(), vect.end(), x) - vect.begin();
        assert(pos != vect.size() && vect[pos] == x);
        return pos;
    }

    ll iget(int x)
    {
        if(!is_compress)compress();
        assert(0 <= x && x < vect.size());
        return vect[x];
    }
};

→ Reply