Rare
0/9

# Small-To-Large Merging

Authors: Michael Cao, Benjamin Qi

Contributor: Neo Wang

A way to merge two sets efficiently.

## Merging Data Structures

Obviously linked lists can be merged in $\mathcal{O}(1)$ time. But what about sets or vectors?

Focus Problem – read through this problem before continuing!

Let's consider a tree rooted at node $1$, where each node has a color.

For each node, let's store a set containing only that node, and we want to merge the sets in the nodes subtree together such that each node has a set consisting of all colors in the nodes subtree. Doing this allows us to solve a variety of problems, such as query the number of distinct colors in each subtree.

### Naive Solution

Suppose that we want merge two sets $a$ and $b$ of sizes $n$ and $m$, respectively. One possiblility is the following:

for (int x: b) a.insert(x);

which runs in $\mathcal{O}(m\log (n+m))$ time, yielding a runtime of $\mathcal{O}(N^2\log N)$ in the worst case. If we instead maintain $a$ and $b$ as sorted vectors, we can merge them in $\mathcal{O}(n+m)$ time, but $\mathcal{O}(N^2)$ is also too slow.

### Better Solution

With just one additional line of code, we can significantly speed this up.

if (a.size() < b.size()) swap(a,b);for (int x: b) a.insert(x);

Note that swap exchanges two sets in $\mathcal{O}(1)$ time. Thus, merging a smaller set of size $m$ into the larger one of size $n$ takes $\mathcal{O}(m\log n)$ time.

Claim: The solution runs in $\mathcal{O}(N\log^2N)$ time.

Proof: When merging two sets, you move from the smaller set to the larger set. If the size of the smaller set is $X$, then the size of the resulting set is at least $2X$. Thus, an element that has been moved $Y$ times will be in a set of size at least $2^Y$, and since the maximum size of a set is $N$ (the root), each element will be moved at most $\mathcal{O}(\log N$) times.

Full Code

## Generalizing

We can also merge other standard library data structures such as std::map or std:unordered_map in the same way. However, std::swap does not always run in $\mathcal{O}(1)$ time. For example, swapping std::arrays takes time linear in the sum of the sizes of the arrays, and the same goes for GCC policy-based data structures such as __gnu_pbds::tree or __gnu_pbds::gp_hash_table.

To swap two policy-based data structures a and b in $\mathcal{O}(1)$ time, use a.swap(b) instead. Note that for standard library data structures, swap(a,b) is equivalent to a.swap(b).

## Problems

StatusSourceProblem NameDifficultyTags
CFNormal
Show TagsMerging
PlatNormal
Show TagsIndexed Set, Merging
PlatNormal
Show TagsMerging
POINormal
Show TagsIndexed Set, Merging
IOINormal
Show TagsCentroid, Merging
JOIHard
Show TagsMerging
COIHard
Show TagsMerging
JOIVery Hard
Show TagsMerging, SCC

### Optional: Faster Merging

It's easy to merge two sets of sizes $n\ge m$ in $\mathcal{O}(n+m)$ or $(m\log n)$ time, but sometimes $O\left(m\log \left(1+\frac{n}{m}\right)\right)$ can be significantly better than both of these. Check "Advanced - Treaps" for more details. Also see this link regarding merging segment trees.

### Join the USACO Forum!

Stuck on a problem, or don't understand a module? Join the USACO Forum and get help from other competitive programmers!