CF - Connected Components?

Author: Daniel Qiu

Official Editorial

Solution 1 - Faster DFS

Doing a standard DFS would be too slow. There can be at most N(N1)2\frac{N(N-1)}2 edges, which would result in O(N2)\mathcal{O}(N^2) time complexity. To speed it up, we have to do two optimizations.

Optimization 1

We can store whether edges are non-existent using an array of hashsets instead of an adjacency list. This way we can find if there is no edge between two nodes in O(1)\mathcal{O}(1) time. Using an adjacency matrix would take too much memory.

Optimization 2

Rather than using a standard boolean array to check if a node has not yet been visited, we can use a set of unvisited nodes. This way we only loop through the nodes that have not been visited yet and can find the next unvisited node in O(logN)\mathcal{O}(\log N). We need to use upper_bound to find the next unvisited node after a DFS - an iterator may be invalidated due to the node it is pointing to being destroyed when we remove it from the unvisited set.

Proof

We know that a node can either be skipped or visited. We also know that once a node is visited, it cannot be revisited. Therefore we will always visit NN nodes once. We also know that the sum of skips can be at most 2M2M (each non-edge skips two nodes). Therefore the overall complexity will be O(NlogN+M)O(N\log N+M).

Implementation

Time Complexity: O(NlogN+M)\mathcal{O}(N\log N+M)

C++

#include <bits/stdc++.h>
using namespace std;
const int MAX_N = 200000;
unordered_set<int> adj[MAX_N];
set<int> unvis; // the set of nodes that have not been visited
int sz[MAX_N]; // sz[i] = size of the ith connected component
int cur = 0; // current amount of groups
void dfs(int x) {
sz[cur]++;

Solution 2 - Selective DFS

Let N=N = the number of nodes in the graph, n=n = an arbitrary node, and degree(n)=\texttt{degree}(n)= the number of neighboring nodes (nodes that are directly connected to nn).

We know that the size of the component containing node nn must be at least degree(n)+1\texttt{degree}(n)+1 (degree(n)\texttt{degree}(n) doesn't include nn).

If degree(n1)+1>N2\texttt{degree}(n_1)+1>\dfrac{N}2 and degree(n2)+1>N2\texttt{degree}(n_2)+1>\dfrac{N}2, n1n_1 and n2n_2 must intersect, otherwise the graph must have more than NN nodes.

Given this, we can just put all nodes with degree+1>N2\texttt{degree} +1 > \dfrac{N}2 into the same group in O(N)O(N) time.

Now we can just process all nodes with degree+1N2\texttt{degree}+1\le\dfrac{N}2, with DFS.

Proof

We can handle all nodes with degree+1>N2\texttt{degree}+1>\dfrac{N}2 in O(N)O(N) time.

Let there be NN' nodes that have at most N21N2\dfrac{N}2 -1 ≈ \dfrac{N}2 edges. Then, they also must have at least N2\dfrac{N}2 non-edges. Therefore, there must be at least NN22=NN4\dfrac{N'\cdot\dfrac{N}2}2 = \dfrac{N\cdot N'}4 non-edges (each edge accounts for two nodes).

The input states that the number of edges is less than 21052 \cdot 10^5, therefore NN42105\dfrac{N \cdot N'}4\le2\cdot10^5, therefore NN8105N \cdot N' \le8\cdot10^5.

Doing a simple DFS and checking whether two nodes intersect would be O(NN+logM)\mathcal{O}(NN'+ \log M), which would fit under the time limit. The logM\log M comes from processing input. Therefore, this runs under the time limit.

Implementation

Time Complexity: O(NN+logM)\mathcal{O}(NN'+ \log M)

C++

#include <bits/stdc++.h>
using namespace std;
using vi = vector<int>;
#define pb push_back
const int MAX_N = 200000;
unordered_set<int> adj[MAX_N];
// group[i] = the group of the ith node
// group[i] = 0 if the ith node hasn't been assigned a group yet
int group[MAX_N];

Join the USACO Forum!

Stuck on a problem, or don't understand a module? Join the USACO Forum and get help from other competitive programmers!