# Persistent Data Structures

Authors: Andi Qu, Benjamin Qi

What if data structures could time travel?

### Prerequisites

## Overview

A persistent data structure is a data structure that preserves the previous versions of itself when modified, allowing access to any historical version. In other words, once a change is made to the structure, both the original and modified versions remain accessible. This is particularly useful in scenarios where you need to keep track of the history of updates or backtrack to previous states of the data structure.

## Persistent Array

Persistent arrays are one of the simplest persistent data structures. A persistent array should be able to access and update its elements at given times.

### Fat Nodes

C++

In C++, one can implement this to run in $\mathcal O(\log N)$ time per query and
$\mathcal O(1)$ time per update by using an array of `vector`

s.

vector<pair<int, int>> arr[100001]; // The persistent arrayint get_item(int index, int time) {// Gets the array item at a given index and timeauto ub =upper_bound(arr[index].begin(), arr[index].end(), make_pair(time, INT_MAX));return prev(ub)->second;}void update_item(int index, int value, int time) {

This approach (i.e. storing multiple values at each index without erasing old
values) is known as **fat nodes**.

Although easy to implement, fat nodes are only **partially persistent**, meaning
that only the latest version of the data structure can be modified.

For most competitive programming problems involving persistent data structures,
we use **path copying** instead.

### Path Copying

One can implement path copying to run in $\mathcal O(\log N)$ time per query and update by using a binary-tree-like structure where array elements are the leaves.

This is very similar to a sparse segment tree. The key differences are that we have multiple roots and every time we "update" a node, we actually create a new node in its place (hence persistence).

C++

struct Node {int val;Node *l, *r;Node(ll x) : val(x), l(nullptr), r(nullptr) {}Node(Node *ll, Node *rr) : val(0), l(ll), r(rr) {}};int n, a[100001]; // The initial array and its sizeNode *roots[100001]; // The persistent array's roots

Path copying is **fully persistent**.

## Persistent Segment Tree

Since persistent arrays with path copying are so similar to sparse segment trees, it's relatively straightforward to convert one into a persistent segment tree - just add range queries!

Focus Problem – try your best to solve this problem before continuing!

### Resources

Resources | ||||
---|---|---|---|---|

oml1111 | ||||

Anudeep2011 | not great formatting |

### This section is not complete.

### Solution

Since this problem involves range queries, we'll use some type of segment tree to solve it. (We can also use a Fenwick tree, but that's much harder to make persistent.)

When dealing with problems involving multiple dimensions, it's often helpful to view one of those dimensions as time. In this problem, we'll view the index of each array as its time dimension.

Using a persistent segment tree, we can then turn the problem into the following:

- Type 1 queries involve a point update on the segment tree at some time.
- Type 2 queries involve a range query on the segment tree at some time.
- Type 3 queries involve copying the root of the segment tree at some time and appending it to the array of segment tree roots.

C++

#include <bits/stdc++.h>typedef long long ll;using namespace std;struct Node {ll val;Node *l, *r;Node(ll x) : val(x), l(nullptr), r(nullptr) {}Node(Node *ll, Node *rr) {

### Application 1 - Static 2D Range Sums on Large Grids

Persistent segment trees can be used for online 2D static range sum queries in $\mathcal O(\log N)$ time (think of it like prefix sums).

Note that 2D Fenwick trees with coordinate compression often also work for this (and are easier to implement), but it's still good to know this application.

### Application 2 - Largest Interval Completely Inside a Range

Consider the following problem:

Given $N$ intervals on the number line, answer $Q$ queries of the form "what is the largest interval completely contained inside the range $[x, y]$?"

$N, Q \leq 10^5$.

Since each interval has two dimensions (i.e. left and right endpoints $l_i$ and
$r_i$), we can view it as a *point* on the number line at $l_i$ with "value"
$r_i - l_i$ that was inserted at time $r_i$.

Now, each query becomes "what is the most valuable point in the range $[x, y]$ that was inserted at or before time $y$?" This is much easier to handle, so we can solve this problem in $\mathcal O(Q \log N)$ time.

### Problems

Status | Source | Problem Name | Difficulty | Tags | |
---|---|---|---|---|---|

CF | Easy | ## Show TagsPersistent Segtree | |||

COCI | Normal | ## Show TagsPersistent Segtree | |||

NOI | Normal | ## Show TagsPersistent Segtree | |||

CEOI | Hard | ## Show TagsPersistent Segtree, Sqrt | |||

APIO | Hard | ## Show Tags2DRQ, Euler's Formula, Persistent Segtree | |||

COCI | Very Hard | ## Show TagsPersistent Segtree | |||

IOI | Very Hard | ## Show Tags2DRQ, Persistent Segtree |

## Persistent Heap

Resources | ||||
---|---|---|---|---|

Benq |

Status | Source | Problem Name | Difficulty | Tags | |
---|---|---|---|---|---|

Wesley's Anger Contest | Very Hard | ||||

YS | Very Hard |

### Module Progress:

### Join the USACO Forum!

Stuck on a problem, or don't understand a module? Join the USACO Forum and get help from other competitive programmers!