1. Overview
We have many String matching algorithms available and we can choose the algorithm based on our requirements. Among those algorithms, let’s look at the KMP algorithm. Before moving on to the algorithm, we must have a thorough knowledge of the LPS prefix table.
The KMP Algorithm efficiently identifies patterns within the text by employing a prefix table. Alternatively known as the LPS (Longest Proper Prefix which is also a suffix) table, and helps the KMP algorithm avoid redundant comparisons, enhancing performance.
By comparing characters sequentially and leveraging the preprocessed table, KMP minimizes backtracking.
2. KMP Proper Prefix
Before we look into generating the longest proper prefix table, let’s learn some key terms.
If this is your first time hearing the word Proper prefix, you might wonder what it is. A proper prefix of the pattern will be a subset of the pattern using only the beginning portion (the first index) or the first few indices of the pattern. You can take as many as you want except the last element.
A proper prefix: All the characters in a string with one or more cut off the end. “s”, “sn”, “sna”, “snap” are all the proper prefixes of “Snape”.
3. KMP Proper Suffix
A proper suffix of any pattern would be a subset of the pattern with elements taken only from the right end of the pattern except the first character of the string.
A proper suffix: All the characters in a string with one or more cut off the beginning. “agrid”, “grid”, “rid”, “id” and “d” are all proper suffixes of “Hagrid”.
4. Longest proper prefix table
The length of the longest proper prefix in the subpattern matches a proper suffix in the same subpattern. Let’s take an example “abababca” and manually calculate the LPS longest proper prefix table. Each value in the LPS table denotes the length of the longest proper prefix.
4.1. Step-by-step LPS calculation
“aba” – Proper prefixes are “a” and “ab” whereas proper suffixes are “a” and “ba”. The proper prefix “a” matches the proper suffix “a”. The length of the longest proper prefix that matches a proper suffix is 1.
X | a | b | a |
Index | 0 | 1 | 2 |
LPS | 0 | 0 | 1 |
“abab” – Proper prefixes: a, ab, aba. Proper suffixes are b, ab, bab. The proper prefix “ab” matches the proper suffix “ab”. So, the length is 2.
X | a | b | a | b |
Index | 0 | 1 | 2 | 3 |
LPS | 0 | 0 | 1 | 2 |
“ababa” – Proper prefixes: a, ab, aba, abab. Proper suffixes are a, ba, aba, baba. aba matches here. So the length of the longest proper prefix is 3.
X | a | b | a | b | a |
Index | 0 | 1 | 2 | 3 | 4 |
LPS | 0 | 0 | 1 | 2 | 3 |
“ababab” – Proper prefixes: a, ab, aba, abab, ababa. Proper suffixes are b, ab, bab, abab, babab. “abab” is the longest proper prefix with length 4.
X | a | b | a | b | a | b |
Index | 0 | 1 | 2 | 3 | 4 | 5 |
LPS | 0 | 0 | 1 | 2 | 3 | 4 |
“abababc” – Proper prefixes are a, ab, aba, abab, ababa, ababab. Proper suffixes are c, bc, abc, babc, ababc, bababc. No matching longest proper prefix to be found.
X | a | b | a | b | a | b | c |
Index | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
LPS | 0 | 0 | 1 | 2 | 3 | 4 | 0 |
4.2. Final LPS table
Now, we are about to complete our LPS table.
“abababca” – Proper prefixes are a, ab, aba, abab, ababa, abababa, abababc. Proper suffixes are a, ca, bca, abca, babca, ababca, bababca. The longest proper prefix is a of length 1.
X | a | b | a | b | a | b | c | a |
Index | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
LPS | 0 | 0 | 1 | 2 | 3 | 4 | 0 | 1 |
4. Algorithm to compute LPS – proper prefix table
This longest proper prefix algorithm intends to compute the array in O(n) complexity.
- Initialize lps[0] to 0 and assume that the variable P holds our string or pattern.
- Maintain 2 pointers: len and index. The len maintains the length of the longest prefix and the index is used to iterate over and compute all the elements of the lps array. Initialize len to 0 and index to 1.
- If p[index] = p[len], it means that we found a new and longer prefix that matches the suffix. So p[index] = len(length of the last prefix) + 1.
- If p[index] != p[len], it means that for the string p[0…index], the length of the longest prefix must be lesser than len, so we try to find a lower value for len that might satisfy the prefix criteria. For now, let’s say that the next smaller possible value of len that could be a potential match would be p[len-1] if len > 0 otherwise p[index]=0.
6. LPS Algorithm in Java
void lpsArr(String pattern, int M, int lps[]) { int len = 0; int i = 1; lps[0] = 0; while (i < M) { if (p.charAt(i) == p.charAt(len) { len = len + 1; lps[i] = len; i++; } else { if (len == 0) { lps[i] = 0; i++; else lps = lps[len-1]; } } }
7. Conclusion
In this article, we have seen the purpose of the longest proper prefix table and its usage in the KMP algorithm. To learn more about other DS algorithms, refer to our articles and GitHub repository.