karamkontar's blog

By karamkontar, 6 years ago, In English

Hello,

As you might know, the Palindromic Tree is a new data structure that has been introduced recently and can solve some queries related to palindromes in linear time O(length of string).

First, I will explain the basic idea of the data structure and then the code.

The basic idea of the Palindromic Tree is that a palindrome is actually a palindrome with the same character added before and after it. For example, “ababa” is a palindrome since “bab” is a palindrome and we add the character a before and after it. The same applies for “bab” and so on. Therefore, to add a new palindrome of length L, there should be some other palindrome of length L-2 that we add a character before and after.

Note: The palindromic tree implementation below is not limited to English alphabets, so it accepts any character.

The palindromic tree (not actually a tree) is made up of several vertices or nodes. Each node stores the string that it represents. So the nodes of the palindromic tree store their corresponding palindromes. Each node also stores a list of its children (edges to children are weighted according to character). Let’s say we are initially on a node representing a palindrome X, then the child on edge with weight ‘a’ represents the palindrome aXa. The child on edge with weight ‘b’ from aXa means that it represents the palindrome baXab. All this is based on our knowledge that X is a palindrome, so adding ‘a’ or ‘b’ to it keeps it a palindrome. In addition, every node has a link (edge but unweighted and different from the weighted edges to children). This link points to the longest palindromic proper suffix of the current node palindrome. I will explain the usefulness of such a link later. We don’t consider the node itself as the longest palindromic suffix simply because every palindrome is the longest palindromic suffix of itself. By excluding the node itself, we also avoid self loops.

So, we said that every palindrome is made up of a smaller palindrome of length – 2. But what about “a” or “aa”? We do the same thing! “aa” is actually an empty string with ‘a’ added on both sides. For “a”, however, the case is a bit different. We consider that there is some imaginary string of length = -1. When we add ‘a’ to both sides of this imaginary string, the length becomes -1 + 2 = 1 which is the length of “a” and this makes sense. Therefore, we initially have 2 root nodes in the palindromic tree. The first node has length = -1 and is the imaginary node mentioned above (I’ll call it the imaginary root). The second node is for an empty string and has length = 0 (I’ll call it the real root). The longest palindromic proper suffix of the real root is the imaginary root since it cannot be the node itself. The longest palindromic proper suffix of the imaginary root is itself since we cannot go any higher, and it is imaginary. The is also explained later.

Throughout the program, we store a node called the current node. This node represents the last inserted node. Initially, the current node is the real root.

To insert a character ‘a’ for example, we need to find the palindrome X such that aXa will be the current palindrome. X should exist in the tree since it came before inserting ‘a’. Remember that ‘a’ is being inserted now at the end of the string, so the last inserted character is the last character of X, whatever X is. So, it makes sense to start checking from the stored current node, since it contains the last characters in the string (the longest palindromic suffix of the string inserted until now). It is important to see that the current node is the longest palindromic suffix of the string inserted until the last character before ‘a’, and that’s why we start checking from it. If we find an ‘a’ before the substring of the current node in the main string, then great! We make a new child for the current node with an edge weight of ‘a’. What if we don’t find ‘a’? It is optimal to move to the longest palindromic proper suffix of the current node. To make the palindrome aXa, X should be a palindrome, and it is also best for X to be of maximal length. Also, X should be a suffix of the string inserted until now. That is why we move to the suffix node linked to the current node. We keep on doing so until we reach a node that has ‘a’ before it. We might reach the real node (“aa” is the palindrome), or continue to reach the imaginary node, and it works for sure (“a” alone is the palindrome). Let’s denote the found parent by a temporary node “temp” and refer to it like that.

Note that the palindromic tree only stores unique palindromes, so it temp already has a child with edge ‘a’ or the inserted character, we just make this child the current node and terminate the method to avoid overwriting any of the previous data. (we don’t create a new node)

Now that we have inserted the new character (new node), we need to find its longest palindromic suffix. The longest palindromic suffix of the new node will have the form of aYa where Y is a palindrome already inserted in the tree before inserting ‘a’. All we need to do is to find Y and get its child at the edge with weight ‘a’ or the inserted character and make this child the suffix palindrome of the new node that we inserted earlier. It can be proven that Y and its desired child always exist.

How do we do that efficiently? We use temp! We know that temp stores a node X such that aXa is the current palindrome. However, we can’t use temp since the child at edge weighted ‘a’ from temp is the currently inserted node. This will create a self loop and will not be a proper suffix. So, we start, not from temp, but from the longest palindromic suffix of temp. We keep on going up through the longest palindromic suffixes until we reached the desired one (the one which has ‘a’ before it). Since we are traversing the longest palindromic suffix, we guarantee that our result will be a suffix and will be the longest possible. In the extreme case, we will reach the imaginary node. When we find our desired node, we make its child at edge weighted ‘a’ the longest palindromic suffix of the currently inserted node.

If for some reason we are at the imaginary root and try to go to its longest suffix, we will return to the imaginary root, and this is the purpose of linking it to itself.

Don’t forget to make the newly inserted node the current node saved in the program. If the new node was created, set the current node to it. If it was found to be a duplicate (case discussed above), set the current node to the node that was present first and don’t create a new node.

Now, for the coding part…

We have a problem. How do we store all the palindrome strings without affecting the memory used? The real question is: Do we actually need to store the palindromic strings? It is enough for each node to keep track of the length of the palindrome that it holds and the end index of this palindrome in the main string (or the start index).

Also, instead of making edges with weights, we can make a map. This maps the character to its corresponding child node, and makes adding children or finding them easy and fast.

First, we’ll create a class Node

class Node{
	HashMap<Character, Node> next = new HashMap<>(); // the node's children and their weights
	Node suffix; // the longest palindromix proper suffix
	int length, endIndex; // the length of the palindrome and its end's index in the main string
	public Node(int length, int endIndex) {
		this.length = length;
		this.endIndex = endIndex;
	}
}

Second, we'll declare and initialize the 3 main nodes and the character arraylist which will be the inserted string

Node imaginaryRoot = new Node(-1, -1), realRoot = new Node(0, -1), currNode = realRoot;
// imaginary and real roots don't have an end index since they do not represent palindromes in the string
ArrayList<Character> str = new ArrayList<>();

void init(){ // to initialize the palindromic tree
	imaginaryRoot.suffix = imaginaryRoot;
	realRoot.suffix = imaginaryRoot;
}		

Third, we'll implement the method that adds characters to the inserted string and to the palindromic tree

void addChar(char ch) {
	int index = str.size(); // the index where the new character will be inserted
	str.add(ch); // adding the character to the main string
	Node temp = currNode; // making use of the previously inserted node
	while(index - temp.length - 1 < 0 || str.get(index - temp.length - 1) != str.get(index))
		temp = temp.suffix;
// first condition is just to check bounds, second condition is to check where the palindrome has the current character ch before it
// it is important to see that this loop should terminate at the imaginary root since it will compare the character to itself
	if(temp.next.containsKey(ch)) { // node already exists, don't overwrite
		currNode = temp.next.get(ch); // set current node to the required node
		return; // exit the method to avoid overwriting data
	}
	currNode = new Node(temp.length + 2, index); // new node has size of parent + 2, and ends at the index where the new character ch was inserted
	temp.next.put(ch, currNode); // make the current node a child of temp with edge weight = ch
	if(currNode.length == 1) { // if the current palindrome has a length of 1 (only ch)
		currNode.suffix = realRoot; // the longest palindromic suffix will be the empty string
		return; // exit to avoid finding suffix
	}
	temp = temp.suffix; // start checking from the suffix of temp
	while(index - temp.length - 1 < 0 || str.get(index - temp.length - 1) != str.get(index))
		temp = temp.suffix;
// first condition is just to check bounds, second condition is to check where the palindrome has the current character ch before it
// it is important to see that this loop should terminate at the imaginary root since it will compare the character to itself
	currNode.suffix = temp.next.get(ch); // set the suffix of the current node as the child of temp at edge weighted ch
}

Note: when inserting the nodes, you can keep track of the longest palindrome, the palindrome count, or even an array or list of all palindromes You can even dfs the tree at any time to find these values

  • Vote: I like it
  • +28
  • Vote: I do not like it

| Write comment?
»
6 years ago, # |
Rev. 3   Vote: I like it 0 Vote: I do not like it

.