Table of Contents

ExplanationImplementation

Official Analysis

Explanation

A naive method would be to iterate through each pair of clones and calculate the number of positions with different genomes in near O(1)\mathcal{O}(1) time. This seems to be virtually impossible, so we should probably try a different method.

What if we somehow managed to check each clone against all the other clones at the same time? This way we could iterate through each character and maintain an O(N2)\mathcal{O}(N^2) time complexity.

We can precalculate for each position the number of As, Ts, Cs, and Gs that all the clones have. Then, when checking a single subject, we see how many of the other genomes have been found at that position and sum up all the differences at the end.

However, this brings up another problem: there's no way to check which genome belongs to which clone! If our subject differs from one clone by k1k-1 positions and from another clone by k+1k+1 positions, our current algorithm would have no idea that they aren't the actual villain, as the total difference of the two is still 2k2k, the same as if the subject differed from both by kk positions.

To address this problem, we can assign each clone a random weight RiR_i. Then, instead of just calculating the number of clones with a certain genome at a certain position, we calculate the sum of the weights of these clones. This allows us to differentiate between a genome difference due to clone AA versus a genome difference due to clone BB.

Implementation

Time Complexity: O(N2)\mathcal{O}(N^2)

C++

#include <algorithm>
#include <iostream>
#include <random>
#include <vector>
using std::cout;
using std::endl;
using std::vector;
const vector<char> DNA{'A', 'T', 'C', 'G'};

Join the USACO Forum!

Stuck on a problem, or don't understand a module? Join the USACO Forum and get help from other competitive programmers!