### Bekh's blog

By Bekh, history, 4 weeks ago,

Hello,

I was implementing a simple Trie when I found a behavior that I didn't understand. I followed the program using a debugger, the issue seems to be in the insert function. I explained the issue in code comments.

struct Trie{
struct Node{
int id, nxt[26];
Node(int id) : id(id)
{
memset(nxt, -1, sizeof(nxt));
}
};
vector<Node> v;
vector<bool> leaf;
void init()
{
v.clear();
leaf.clear();
}

{
v.push_back(v.size());
leaf.push_back(false);
return (int)(v.size()) - 1;
}

int getSizeMinusOne()
{
return (int)(v.size()) - 1;
}

void insert(const string &s)
{
int cur = 0;
for (char c : s)
{
c -= 'a';
if (v[cur].nxt[c] == -1)
{
// Note I only use one of the following NOT both at the same time

// Approach (1)
// v[cur].nxt[c] doesn't get updated. Stays -1 and RTEs later on due to out of boundary access

// Approach (2)
// works fine
v[cur].nxt[c] = getSizeMinusOne();
}
cur = v[cur].nxt[c];
}
leaf[cur] = true;
}
};


I also found out that both approaches seem to work find when changing the array nxt to a vector instead of an array. Could someone explain why this happens when nxt is an array? And why approach (2) works fine even if nxt is an array?

• 0

 » 4 weeks ago, # |   0 Auto comment: topic has been updated by Bekh (previous revision, new revision, compare).
 » 4 weeks ago, # |   +16 Bad stuff happens when a vector resizes. Saving addNode() to an intermediate variable and then setting v[cur].nxt[c] equal to that variable suffices (or what you did in approach 2).
•  » » 4 weeks ago, # ^ |   0 Could you provide more insight on what kind of bad stuff happen? When do the bad stuff effects fade and I can safely use the vector again? Why is the program not affected by the bad stuff if nxt is a vector instead of an array (Or is it just a coincidence?)?
•  » » » 4 weeks ago, # ^ | ← Rev. 2 →   +16 I asked ecnerwala about something similar a while back, I think the same reasoning applies (although I can't say that I understand it that well). Unfortunately, before c++17, the index could get resolved firstSo it first computes v[cur].nxt[c] and stores that pointerAnd then calls addNode()But then that pointer is now invalidUbsan should be able to catch thisPretty much in c++, evaluation order is essentially entirely up to the compilerIt is true that in c++17, they guarantee rhs is evaluated before lhs, but i assume you want this code to work without that
•  » » » » 4 weeks ago, # ^ |   0 Aha, got you. And yes, I'm using C++14. Calling v.reserve() with a big amount before running the code works fine so this is indeed the issue.And it seems to evaluate the rhs first when nxt is a vector for some reason lol (Tried generating many random strings and inserting them to make sure pointers are indeed getting invalidated).Thank you for the explanation.