You're not signed in!

Sign in to save your progress and sync your settings across devices.

Somewhat Frequent
 0/19

Prefix Sums

Authors: Darren Yao, Eric Wei

Computing range sum queries in constant time over a fixed array.

Focus Problem – read through this problem before continuing!

Resources

Resources: Additional
IUSACOmodule is based off this
CPHrather brief
PAPSalso rather brief

Introduction

Let's say we have a one-indexed integer array arr\texttt{arr} of size NN and we want to compute the value of

arr[a]+arr[a+1]++arr[b]\texttt{arr}[a]+\texttt{arr}[a+1]+\cdots+\texttt{arr}[b]

for QQ different pairs (a,b)(a,b) satisfying 1abN1\le a\le b\le N. We'll use the following example with N=6N = 6:

Index ii123456
arr[i]\texttt{arr}[i]164253

Naively, for every query, we can iterate through all entries from index aa to index bb to add them up. Since we have QQ queries and each query requires a maximum of O(N)O(N) operations to calculate the sum, our total time complexity is O(NQ)O(NQ). For most problems of this nature, the constraints will be N,Q105N, Q \leq 10^5, so NQNQ is on the order of 101010^{10}. This is not acceptable; it will almost certainly exceed the time limit.

Instead, we can use prefix sums to process these array sum queries. We designate a prefix sum array prefix\texttt{prefix}. First, because we're 1-indexing the array, set prefix[0]=0\texttt{prefix}[0]=0, then for indices kk such that 1kn1 \leq k \leq n, define the prefix sum array as follows:

prefix[k]=i=1karr[i]\texttt{prefix}[k]=\sum_{i=1}^{k} \texttt{arr}[i]

Basically, what this means is that the element at index kk of the prefix sum array stores the sum of all the elements in the original array from index 11 up to kk. This can be calculated easily in O(N)O(N) by the following formula for each 1kn1\le k\le n:

prefix[k]=prefix[k1]+arr[k]\texttt{prefix}[k]=\texttt{prefix}[k-1]+\texttt{arr}[k]

For the example case, our prefix sum array looks like this:

Index ii0123456
prefix[i]\texttt{prefix}[i]01711131821

Now, when we want to query for the sum of the elements of arr\texttt{arr} between (1-indexed) indices aa and bb inclusive, we can use the following formula:

i=LRarr[i]=i=1Rarr[i]i=1L1arr[i]\sum_{i=L}^{R} \texttt{arr}[i] = \sum_{i=1}^{R} \texttt{arr}[i] - \sum_{i=1}^{L-1} \texttt{arr}[i]

Using our definition of the elements in the prefix sum array, we have

i=LRarr[i]=prefix[R]prefix[L1]\sum_{i=L}^{R} \texttt{arr}[i]= \texttt{prefix}[R]-\texttt{prefix}[L-1]

Since we are only querying two elements in the prefix sum array, we can calculate subarray sums in O(1)O(1) per query, which is much better than the O(N)O(N) per query that we had before. Now, after an O(N)O(N) preprocessing to calculate the prefix sum array, each of the QQ queries takes O(1)O(1) time. Thus, our total time complexity is O(N+Q)O(N+Q), which should now pass the time limit.

Let's do an example query and find the subarray sum between indices a=2a = 2 and b=5b = 5, inclusive, in the 1-indexed arr\texttt{arr}. From looking at the original array, we see that this is

i=25arr[i]=6+4+2+5=17.\sum_{i=2}^{5} \texttt{arr}[i] = 6 + 4 + 2 + 5 = 17.

Index ii

123456

arr[i]\texttt{arr}[i]

164253

Using prefix sums:

prefix[5]prefix[1]=181=17.\texttt{prefix}[5] - \texttt{prefix}[1] = 18 - 1 = 17.

Index ii

0123456

prefix[i]\texttt{prefix}[i]

01711131821

These are also known as partial sums.

Problems

StatusSourceProblem NameDifficultyTagsSolution
LCVery EasyShow Sketch
SilverVery Easy
SilverEasy
SilverEasy
CSESEasyShow Sketch
CSESEasyShow Sketch
SilverNormal
Old BronzeNormal

Warning!

The last problem isn't submittable on USACO. Download the test data and use the script provided here to test your solution.

Max Subarray Sum

Now we'll look at some extensions.

Focus Problem – read through this problem before continuing!

This problem has a solution known as Kadane's Algorithm. Please don't use that solution; try to solve it with prefix sums.

Why are the two methods equivalent?

Prefix Minimum, XOR, etc.

Resources: Additional
CPHdefines XOR

Similar to prefix sums, you can also take prefix minimum or maximum; but you cannot answer min queries over an arbitrary range with prefix minimum. (This is because minimum doesn't have an inverse operation, the way subtraction is to addition.) On the other hand, XOR is its own inverse operation, meaning that the XOR of any number with itself is zero.

StatusSourceProblem NameDifficultyTagsSolution
SilverEasy
Show Tags

Prefix Sums

External Sol
CSESEasy
Show Tags

Prefix Sums

Show Sketch

2D Prefix Sums

Focus Problem – read through this problem before continuing!

Now, what if we wanted to process QQ queries for the sum over a subrectangle of a NN rows by MM columns matrix in two dimensions? Let's assume both rows and columns are 1-indexed, and we use the following matrix as an example:

000000
0156118
0171194
046132
075423

Naively, each sum query would then take O(NM)O(NM) time, for a total of O(QNM)O(QNM). This is too slow.

Let's take the following example region, which we want to sum:

000000
0156118
0171194
046132
075423

Manually summing all the cells, we have a submatrix sum of 7+11+9+6+1+3=377+11+9+6+1+3 = 37.

The first logical optimization would be to do one-dimensional prefix sums of each row. Then, we'd have the following row-prefix sum matrix. The desired subarray sum of each row in our desired region is simply the green cell minus the red cell in that respective row. We do this for each row to get (281)+(144)=37(28-1) + (14-4) = 37.

000000
016122331
018192832
0410111416
0712161821

Now, if we wanted to find a submatrix sum, we could break up the submatrix into a subarray for each row, and then add their sums, which would be calculated using the prefix sums method described earlier. Since the matrix has NN rows, the time complexity of this is O(QN)O(QN). This might be fast enough for Q=105Q=10^5 and N=103N=10^3, but we can do better.

In fact, we can do two-dimensional prefix sums. In our two dimensional prefix sum array, we have

prefix[a][b]=i=1aj=1barr[i][j].\texttt{prefix}[a][b]=\sum_{i=1}^{a} \sum_{j=1}^{b} \texttt{arr}[i][j].

This can be calculated as follows for row index 1in1 \leq i \leq n and column index 1jm1 \leq j \leq m:

prefix[i][j]=prefix[i1][j]+prefix[i][j1]prefix[i1][j1]+arr[i][j]\begin{aligned} \texttt{prefix}[i][j] =& \, \texttt{prefix}[i-1][j]+ \texttt{prefix}[i][j-1] \\ &- \texttt{prefix}[i-1][j-1]+ \texttt{arr}[i][j] \end{aligned}

The submatrix sum between rows aa and AA and columns bb and BB, can thus be expressed as follows:

i=aAj=bBarr[i][j]=prefix[A][B]prefix[a1][B]prefix[A][b1]+prefix[a1][b1]\begin{aligned} \sum_{i=a}^{A} \sum_{j=b}^{B} \texttt{arr}[i][j]=&\,\texttt{prefix}[A][B] - \texttt{prefix}[a-1][B] \\ &- \texttt{prefix}[A][b-1] + \texttt{prefix}[a-1][b-1] \end{aligned}

Summing the blue region from above using the 2D prefix sums method, we add the value of the green square, subtract the values of the red squares, and then add the value of the gray square. In this example, we have

65236+1=37,65-23-6+1 = 37,

as expected.

000000
016122331
0214315163
0624426579
013365883100

Since no matter the size of the submatrix we are summing, we only need to access four values of the 2D prefix sum array, this runs in O(1)O(1) per query after an O(NM)O(NM) preprocessing.

StatusSourceProblem NameDifficultyTagsSolution
SilverEasy
Show Tags

Prefix Sums

External Sol
GoldHard
Show Tags

Prefix Sums, Max Subarray Sum

External Sol
PlatVery HardExternal Sol

More Complex Applications

Instead of storing just the values themselves, you can also take a prefix sum over iaii\cdot a_i, or 10iai10^i \cdot a_i, for instance. Some math is usually helpful for these problems; don't be afraid to get dirty with algebra!

For instance, we can quickly answer the following type of query:

Find 1al+2al+1+3al+2++(rl+1)ar1\cdot a_l+2\cdot a_{l+1}+3\cdot a_{l+2}+\cdots+(r-l+1)\cdot a_{r}.

First, define the following:

ps[i]=a1+a2+a3+a4++ai\texttt{ps}[i] = a_1+a_2+a_3+a_4+\cdots+a_iips[i]=1a1+2a2++iai\texttt{ips}[i] = 1\cdot a_1+2\cdot a_2+\cdots+i\cdot a_i

Then, we have:

lal+(l+1)al+1++rar=ips[r]ips[l1]l\cdot a_l + (l+1) \cdot a_{l+1} + \cdots + r \cdot a_r = \texttt{ips}[r]-\texttt{ips}[l-1](l1)al+(l1)al+1++(l1)ar=(l1)(ps[r]ps[l1])(l-1) \cdot a_l + (l-1) \cdot a_{l+1} + \cdot + (l-1) \cdot a_r = (l-1)(\texttt{ps}[r]-\texttt{ps}[l-1])

And so,

1al+2al+1++(rl+1)ar=ips[r]ips[l1](l1)(ps[r]ps[l1])1\cdot a_l + 2 \cdot a_{l+1} + \cdots + (r-l+1) \cdot a_r = \texttt{ips}[r]-\texttt{ips}[l-1]-(l-1)(\texttt{ps}[r]-\texttt{ps}[l-1])

Which is what we were looking for!

StatusSourceProblem NameDifficultyTagsSolution
KSEasy
Show Tags

Prefix Sums

ACHard
Show Tags

Prefix Sums

Check AC
PlatInsaneExternal Sol

Module Progress:

Give Us Feedback on Prefix Sums!