Skip to content
  • by
Longest proper prefix table

1. Overview

We have many String matching algorithms available and we can choose the algorithm based on our requirements. Among those algorithms, let’s look at the KMP algorithm. Before moving on to the algorithm, we must have a thorough knowledge of the LPS prefix table.

The KMP Algorithm efficiently identifies patterns within the text by employing a prefix table. Alternatively known as the LPS (Longest Proper Prefix which is also a suffix) table, and helps the KMP algorithm avoid redundant comparisons, enhancing performance.

By comparing characters sequentially and leveraging the preprocessed table, KMP minimizes backtracking.

2. KMP Proper Prefix

Before we look into generating the longest proper prefix table, let’s learn some key terms.

If this is your first time hearing the word Proper prefix, you might wonder what it is. A proper prefix of the pattern will be a subset of the pattern using only the beginning portion (the first index) or the first few indices of the pattern. You can take as many as you want except the last element.

A proper prefix: All the characters in a string with one or more cut off the end. “s”, “sn”, “sna”, “snap” are all the proper prefixes of “Snape”.

3. KMP Proper Suffix

A proper suffix of any pattern would be a subset of the pattern with elements taken only from the right end of the pattern except the first character of the string.

A proper suffix: All the characters in a string with one or more cut off the beginning. “agrid”, “grid”, “rid”, “id” and “d” are all proper suffixes of “Hagrid”.

4. Longest proper prefix table

The length of the longest proper prefix in the subpattern matches a proper suffix in the same subpattern. Let’s take an example “abababca” and manually calculate the LPS longest proper prefix table. Each value in the LPS table denotes the length of the longest proper prefix.

4.1. Step-by-step LPS calculation

“aba” – Proper prefixes are “a” and “ab” whereas proper suffixes are “a” and “ba”. The proper prefix “a” matches the proper suffix “a”. The length of the longest proper prefix that matches a proper suffix is 1.

Xaba
Index012
LPS001
“aba” – Longest proper prefix table

“abab” – Proper prefixes: a, ab, aba. Proper suffixes are b, ab, bab. The proper prefix “ab” matches the proper suffix “ab”. So, the length is 2.

Xabab
Index0123
LPS0012
“abab” – Longest proper prefix

“ababa” – Proper prefixes: a, ab, aba, abab. Proper suffixes are a, ba, aba, baba. aba matches here. So the length of the longest proper prefix is 3.

Xababa
Index01234
LPS00123
“ababa” – Longest proper prefix

“ababab” – Proper prefixes: a, ab, aba, abab, ababa. Proper suffixes are b, ab, bab, abab, babab. “abab” is the longest proper prefix with length 4.

Xababab
Index012345
LPS001234
“ababab” – Longest proper prefix

“abababc” – Proper prefixes are a, ab, aba, abab, ababa, ababab. Proper suffixes are c, bc, abc, babc, ababc, bababc. No matching longest proper prefix to be found.

Xabababc
Index0123456
LPS0012340
“abababc” – Longest proper prefix

4.2. Final LPS table

Now, we are about to complete our LPS table.

“abababca” – Proper prefixes are a, ab, aba, abab, ababa, abababa, abababc. Proper suffixes are a, ca, bca, abca, babca, ababca, bababca. The longest proper prefix is a of length 1.

Xabababca
Index01234567
LPS00123401
“abababca” – Longest proper prefix

4. Algorithm to compute LPS – proper prefix table

This longest proper prefix algorithm intends to compute the array in O(n) complexity.

  1. Initialize lps[0] to 0 and assume that the variable P holds our string or pattern.
  2. Maintain 2 pointers: len and index. The len maintains the length of the longest prefix and the index is used to iterate over and compute all the elements of the lps array. Initialize len to 0 and index to 1.
  3. If p[index] = p[len], it means that we found a new and longer prefix that matches the suffix. So p[index] = len(length of the last prefix) + 1.
  4. If p[index] != p[len], it means that for the string p[0…index], the length of the longest prefix must be lesser than len, so we try to find a lower value for len that might satisfy the prefix criteria. For now, let’s say that the next smaller possible value of len that could be a potential match would be p[len-1] if len > 0 otherwise p[index]=0.

6. LPS Algorithm in Java

void lpsArr(String pattern, int M, int lps[]) {
   int len = 0;
   int i = 1;
   lps[0] = 0;
   while (i < M) {
     if (p.charAt(i) == p.charAt(len) {
         len = len + 1;
         lps[i] = len;
         i++;
     } else {
         if (len == 0) {
             lps[i] = 0;
             i++;
         else 
             lps = lps[len-1];
     }
   }
}

7. Conclusion

In this article, we have seen the purpose of the longest proper prefix table and its usage in the KMP algorithm. To learn more about other DS algorithms, refer to our articles and GitHub repository.

Leave a Reply

Your email address will not be published. Required fields are marked *

Longest Proper Prefix Table