PrevNext
Rare
 0/5

Small-To-Large Merging

Authors: Michael Cao, Benjamin Qi

A way to merge two sets efficiently.

Merging Data Structures

Obviously linked lists can be merged in O(1)O(1) time. But what about sets or vectors?

Focus Problem – read through this problem before continuing!

Let's consider a tree rooted at node 11, where each node has a color.

For each node, let's store a set containing only that node, and we want to merge the sets in the nodes subtree together such that each node has a set consisting of all colors in the nodes subtree. Doing this allows us to solve a variety of problems, such as query the number of distinct colors in each subtree.

Naive Solution

Suppose that we want merge two sets aa and bb of sizes nn and mm, respectively. One possiblility is the following:

1for (int x: b) a.insert(x);

which runs in O(mlog(n+m))O(m\log (n+m)) time, yielding a runtime of O(N2logN)O(N^2\log N) in the worst case. If we instead maintain aa and bb as sorted vectors, we can merge them in O(n+m)O(n+m) time, but O(N2)O(N^2) is also too slow.

Better Solution

With just one additional line of code, we can significantly speed this up.

1if (a.size() < b.size()) swap(a,b);
2for (int x: b) a.insert(x);

Note that swap exchanges two sets in O(1)O(1) time. Thus, merging a smaller set of size mm into the larger one of size nn takes O(mlogn)O(m\log n) time.

Claim: The solution runs in O(Nlog2N)O(N\log^2N) time.

Proof: When merging two sets, you move from the smaller set to the larger set. If the size of the smaller set is XX, then the size of the resulting set is at least 2X2X. Thus, an element that has been moved YY times will be in a set of size at least 2Y2^Y, and since the maximum size of a set is NN (the root), each element will be moved at most O(logNO(\log N) times.

Full Code

Generalizing

A set doesn't have to be an std::set. Many data structures can be merged, such as std::map or std:unordered_map. However, std::swap doesn't necessarily work in O(1)O(1) time; for example, swapping two arrays takes time linear in the sum of the sizes of the arrays, and the same goes for indexed sets. For two indexed sets a and b we can use a.swap(b) in place of swap(a,b).

This section is not complete.

Feel free to file a request to complete this using the "Contact Us" button.

Problems

StatusSourceProblem NameDifficultyTagsSolution
CFNormal
Show Tags

Merging

Check CF
PlatNormal
Show Tags

Merging, Indexed Set

PlatNormal
Show Tags

Merging

External Sol
POINormal
Show Tags

Merging, Indexed Set

External Sol
Optional: Faster Merging

It's easy to merge two sets of sizes nmn\ge m in O(n+m)O(n+m) or (mlogn)(m\log n) time, but sometimes O(mlog(1+nm))O\left(m\log \left(1+\frac{n}{m}\right)\right) can be significantly better than both of these. Check "Advanced - Treaps" for more details. Also see this link regarding merging segment trees.

Module Progress:

Give Us Feedback on Small-To-Large Merging!

PrevNext