Solution: Genetics | Hashing

Explanation

A naive method would be to iterate through each pair of clones and calculate the number of positions with different genomes in near $\mathcal{O}(1)$ time. This seems to be virtually impossible, so we should probably try a different method.

What if we somehow managed to check each clone against all the other clones at the same time? This way we could iterate through each character and maintain an $\mathcal{O}(N^2)$ time complexity.

We can precalculate for each position the number of As, Ts, Cs, and Gs that all the clones have. Then, when checking a single subject, we see how many of the other genomes have been found at that position and sum up all the differences at the end.

However, this brings up another problem: there's no way to check which genome belongs to which clone! If our subject differs from one clone by $k-1$ positions and from another clone by $k+1$ positions, our current algorithm would have no idea that they aren't the actual villain, as the total difference of the two is still $2k$ , the same as if the subject differed from both by $k$ positions.

To address this problem, we can assign each clone a random weight $R_i$ . Then, instead of just calculating the number of clones with a certain genome at a certain position, we calculate the sum of the weights of these clones. This allows us to differentiate between a genome difference due to clone $A$ versus a genome difference due to clone $B$ .

Implementation

Time Complexity: $\mathcal{O}(N^2)$

C++

#include <algorithm>
#include <iostream>
#include <random>
#include <vector>

using std::cout;
using std::endl;
using std::vector;

const vector<char> DNA{'A', 'T', 'C', 'G'};

Join the USACO Forum!

Stuck on a problem, or don't understand a module? Join the USACO Forum and get help from other competitive programmers!

Join Forum

Table of Contents

Table of Contents

Explanation

Implementation

Join the USACO Forum!