# Small-To-Large Merging

Authors: Michael Cao, Benjamin Qi

A way to merge two sets efficiently.

## Merging Data Structures

Obviously linked lists can be merged in $\mathcal{O}(1)$ time. But what about sets or vectors?

Focus Problem – read through this problem before continuing!

Let's consider a tree rooted at node $1$, where each node has a color.

For each node, let's store a set containing only that node, and we want to merge the sets in the nodes subtree together such that each node has a set consisting of all colors in the nodes subtree. Doing this allows us to solve a variety of problems, such as query the number of distinct colors in each subtree.

### Naive Solution

Suppose that we want merge two sets $a$ and $b$ of sizes $n$ and $m$, respectively. One possiblility is the following:

for (int x: b) a.insert(x);

which runs in $\mathcal{O}(m\log (n+m))$ time, yielding a runtime of $\mathcal{O}(N^2\log N)$ in the worst case. If we instead maintain $a$ and $b$ as sorted vectors, we can merge them in $\mathcal{O}(n+m)$ time, but $\mathcal{O}(N^2)$ is also too slow.

### Better Solution

With just one additional line of code, we can significantly speed this up.

if (a.size() < b.size()) swap(a,b);for (int x: b) a.insert(x);

Note that swap exchanges two sets in $\mathcal{O}(1)$ time. Thus, merging a smaller set of size $m$ into the larger one of size $n$ takes $\mathcal{O}(m\log n)$ time.

**Claim:** The solution runs in $\mathcal{O}(N\log^2N)$ time.

**Proof:** When merging two sets, you move from the smaller set to the larger
set. If the size of the smaller set is $X$, then the size of the resulting set
is at least $2X$. Thus, an element that has been moved $Y$ times will be in a
set of size at least $2^Y$, and since the maximum size of a set is $N$ (the
root), each element will be moved at most $\mathcal{O}(\log N$) times.

Full Code

## Generalizing

We can also merge other standard library data structures such as `std::map`

or
`std:unordered_map`

in the same way. However,
`std::swap`

does not always
run in $\mathcal{O}(1)$ time. For example, swapping
`std::array`

s takes time
linear in the sum of the sizes of the arrays, and the same goes for
GCC policy-based data structures such
as `__gnu_pbds::tree`

or `__gnu_pbds::gp_hash_table`

.

To swap two policy-based data structures `a`

and `b`

in $\mathcal{O}(1)$ time,
use `a.swap(b)`

instead. Note that for standard library data structures,
`swap(a,b)`

is equivalent to `a.swap(b)`

.

## Problems

Status | Source | Problem Name | Difficulty | Tags | |||||
---|---|---|---|---|---|---|---|---|---|

CF | Normal | ## Show TagsMerging | |||||||

Plat | Normal | ## Show TagsIndexed Set, Merging | |||||||

Plat | Normal | ## Show TagsMerging | |||||||

POI | Normal | ## Show TagsIndexed Set, Merging | |||||||

IOI | Normal | ## Show TagsCentroid, Merging | |||||||

JOI | Hard | ## Show TagsMerging | |||||||

COI | Hard | ## Show TagsMerging | |||||||

JOI | Very Hard | ## Show TagsMerging, SCC | |||||||

### Optional: Faster Merging

It's easy to merge two sets of sizes $n\ge m$ in $\mathcal{O}(n+m)$ or $(m\log n)$ time, but sometimes $O\left(m\log \left(1+\frac{n}{m}\right)\right)$ can be significantly better than both of these. Check "Advanced - Treaps" for more details. Also see this link regarding merging segment trees.

### Module Progress:

### Join the USACO Forum!

Stuck on a problem, or don't understand a module? Join the USACO Forum and get help from other competitive programmers!