CSES - String Matching

Author: Dustin Miao

Solution - Knuth-Morris-Prat Algorithm

Define an array πS\pi_S on some string SS such that πS[i]\pi_S[i] stores the length of the longest nontrivial prefix of the entire string that is equivalent to a suffix ending at position ii. Formally,

πS[i]=max{k1k<i and S[0:k1]S[i(k1):i]}\pi_S[i] = \max\{k \: | \: 1 \leq k < i \text{ and } S[0:k - 1] \equiv S[i - (k - 1): i]\}

If we are searching for string PP inside of string TT, create a new string S=P+#+TS = P + \# + T, where #\# is any arbitrary character that does not occur in either string. Then, create an array πS\pi_S using the KMP algorithm. The answer is simply the number of indicies inside of πS\pi_S that is equal to P|P| (the length of PP).

C++

Time Complexity: O(n+m)\mathcal{O}(n + m)

#include <bits/stdc++.h>
using namespace std;
namespace str {
/** Computes the Pi array of s. */
vector<int> pi(const string &s) {
int n = (int)s.size();
vector<int> pi_s(n);
for (int i = 1, j = 0; i < n; i++) {
while (j > 0 && s[j] != s[i]) { j = pi_s[j - 1]; }

Python

from typing import List
def pi(s: str) -> List[int]:
"""Computes the Pi array of s."""
n = len(s)
pi_s = [0] * n
j = 0
for i in range(1, n):
while j > 0 and s[j] != s[i]:

Solution - Z-Algorithm

Like the previous solution, we now define an array zSz_S such that zS[i]z_S[i] is the length of the longest prefix beginning at index ii that is equivalent to a prefix of the entire string. Formally,

zS[i]=max{k1k and S[0:k1]S[i:i+k1]}z_S[i] = \max\{k \: | \: 1 \leq k\text{ and } S[0:k - 1] \equiv S[i:i+k-1]\}

As before, for pattern string PP and text string TT, create a new string S=P+#+TS = P + '\#' + T, and create the array zSz_S using the z-algorithm. The answer is the number of indices inside of zSz_S that is equal to P|P|.

C++

Time Complexity: O(n+m)\mathcal{O}(n + m)

#include <bits/stdc++.h>
using namespace std;
namespace str {
// Computes the Z-array of s
vector<int> z(const string &s) {
int n = (int)s.size();
vector<int> z_S(n);
for (int i = 1, l = 0, r = 0; i < n; i++) {
if (i <= r) { z_S[i] = min(z_S[i - l], r - i + 1); }

Solution - Rabin-Karp Algorithm

Precompute the rolling hash of both PP and TT. Each substring of length P|P| can be compared for equality in O(1)\mathcal{O}(1) time. Since there is a relatively small number of comparisons, using a single set of hash values is fine (although it doesn't hurt to add more!). Using hash base 99739973 and modulo 109+710^9 + 7 suffices (see the string hashing module for more details on this choice).

C++

Time Complexity: O(n+m)\mathcal{O}(n + m)

#include <bits/stdc++.h>
using namespace std;
namespace str {
// Computes the rolling hash of s with power P and modulo M
vector<long long> rhash(const string &s, const long long P, const long long M) {
int n = (int)s.size();
vector<long long> rhash_S(n);
for (int i = 0; i < n; i++) {
if (i != 0) { rhash_S[i] = rhash_S[i - 1] * P % M; }

Join the USACO Forum!

Stuck on a problem, or don't understand a module? Join the USACO Forum and get help from other competitive programmers!