Hashing Custom Hashing Anti-Hash Tests A Faster HashMap in C++Custom Hashing Resizing Solving 3SUM Problems

Rare

0/2

(Optional) Hashmaps

Authors: Darren Yao, Benjamin Qi

Contributors: Neo Wang, Nathan Gong

Maintaining collections of distinct elements with hashing.

Edit This Page

Prerequisites

Hashing Custom Hashing Anti-Hash Tests A Faster HashMap in C++Custom Hashing Resizing Solving 3SUM Problems

Warning!

You can almost always use ordered sets and maps instead, but it's good to know that these exist.

Resources
		IUSACO	4.4 - Sets & Maps	module is based off this

Hashing

Hashing refers to assigning a unique code to every variable/object which allows insertions, deletions, and searches in $\mathcal{O}(1)$ time, albeit with a high constant factor, as hashing requires a large constant number of operations. However, as the name implies, elements are not ordered in any meaningful way, so traversals of an unordered set will return elements in some arbitrary order.

Custom Hashing

C++

There is no built in method for hashing pairs or vectors. Namely, unordered_set<vector<int>> does not work. In this case, we can use an sorted map (which supports all of the functions used in the code above) or declare our own hash function.

Resources
		Mark Nelson	Hash Functions for C++ Unordered Containers	How to create user-defined hash function for `unordered_map`.

The link provides an example of hashing pairs of strings, as well as other data structures like pairs.

#include <bits/stdc++.h>
using namespace std;

typedef pair<int, int> pi;
#define f first
#define s second

struct hashPi {
	size_t operator()(const pi &p) const { return p.f ^ p.s; }
};

int main() { unordered_map<pi, int, hashPi> um; }

#include <bits/stdc++.h>
using namespace std;

typedef pair<int, int> pi;
#define f first
#define s second

namespace std {
template <> struct hash<pi> {
	size_t operator()(const pi &p) const { return p.f ^ p.s; }
};
}  // namespace std

int main() { unordered_map<pi, int> um; }

Java

Java has its own hash functions for pre-defined objects like ArrayList's. However, a custom hash function is still needed for user-defined objects. In order to create one, we can implement the hashCode method.

Additionally, in order for HashSet's and HashMap's to work with a custom class, we must also implement the equals method.

import java.io.*;
import java.util.*;

public class HashTest {
	public static void main(String[] args) {
		// uses custom hash function in class Pair
		Set<Pair> set = new HashSet<>();
	}

	static class Pair {

However, this hash function is quite bad; if we insert $(0,0), (1,1), (2,2) \ldots$ then they will all be mapped to the same bucket (so it would easily be hacked in a Codeforces contest).

C++

Java

In Java, an easy way to hash custom objects is to use the builtin Objects.hash() method. This method takes in multiple objects and uses them to create a hash code. However, this is easily hackable, as the code below shows.

import java.io.*;
import java.util.*;

public class HashTest {
	public static void main(String[] args) {
		// uses custom hash function in class Pair
		Set<Pair> set = new HashSet<>();
		for (int i = 0; i < 100000; ++i) {
			// Pair p = new Pair(i, 0); // ~0.2s on ide.usaco.guide
			Pair p = new Pair(i, -31 * i);  // >5s

Python

A better way to hash pairs would be to use polynomial hashing with a randomized base (as described in this module).

Anti-Hash Tests

The built-in hashing algorithm for integers in C++ is vulnerable to pathological tests, causing abnormally slow runtimes. We describe the issue below and how to fix it. Java users are not affected (see this comment).

Should I worry about anti-hash tests in USACO?

No - historically, no USACO problem has included an anti-hash test. However, these sorts of tests often appear in Codeforces, especially in educational rounds, where open hacking is allowed.

Resources
		CF	Blowing up Unordered Map	Explanation of this problem and how to fix it.

Hack Case Generator

In order to make your unordered_map unhackable, the hash function $H$ you select must have the following property: it is hard to find integers $x$ and $y$ such that $x\neq y$ but $H(x)\equiv H(y) \pmod{P}$ , where $P$ is the prime modulus corresponding to the number of buckets in your unordered map. Note that:

The default hash function, $H(x)=x$ , obviously does not satisfy this property, because you can simply choose $x=0$ and $y=P$ .
If open hacking is allowed, then any deterministic hash function does not satisfy this property, because a hacker could generate a test case specifically to break your solution. You should introduce randomness to ensure that your hash function changes from run to run.

Here is a drop-in replacement for unordered_map that seems to work well in practice:

Resources
		Benq (from KACTL)	HashMap

struct chash {
	// any random-ish large odd number will do
	const uint64_t C = uint64_t(2e18 * PI) + 71;
	// random 32-bit number
	const uint32_t RANDOM = chrono::steady_clock::now().time_since_epoch().count();
	size_t operator()(uint64_t x) const {
		// see https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html
		return __builtin_bswap64((x ^ RANDOM) * C);
	}
};
template <class K, class V> using cmap = unordered_map<K, V, chash>;
// example usage: cmap<int, int>

A Faster HashMap in C++

In C++, it's usually faster to use gp_hash_table<K, V> instead of unordered_map<K, V>. Read / writes are much faster than unordered_map. The documentation is rather confusing, so I'll just summarize the most useful functions here. If you need a replacement for unordered_set<K>, use gp_hash_table<K, null_type>.


CF	Chilli - Order of magnitude faster hash tables	Introduces gp_hash_table
GCC	gp_hash_table Interface	documentation
Benq (from KACTL)	HashMap

Custom Hashing

Like unordered_map, gp_hash_table is vulnerable to hash collisions if a custom hash function isn't used. We recommend using the same custom hash we used for unordered map above:

#include <ext/pb_ds/assoc_container.hpp>
using namespace __gnu_pbds;

gp_hash_table<int, int, chash> table;

Resizing

Unordered map has reserve. Calling this function before inserting any elements can result in a constant factor speedup.

We can modify the declaration of gp_hash_table so that it supports the resize function, which operates similarly.

template <class K, class V>
using ht =
    gp_hash_table<K, V, hash<K>, equal_to<K>, direct_mask_range_hashing<>,
                  linear_probe_fn<>,
                  hash_standard_resize_policy<hash_exponential_size_policy<>,
                                              hash_load_check_resize_trigger<>, true>>;

These are the same template arguments as the default gp_hash_table, except false has been changed to true. This modification allows us to change the actual size of the hash table.

int main() {
	ht<int, null_type> g;
	g.resize(5);
	cout << g.get_actual_size() << "\n";  // 8
	cout << g.size() << "\n";             // 0
}

When calling g.resize(x), x is rounded up to the nearest power of 2. Then the actual size of g is changed to be equal to x (unless x < g.size(), in which case an error is thrown).

Resources
		GCC	Resize Policy	documentation

Furthermore, if we construct g with the following arguments:

ht<int, null_type> g({}, {}, {}, {}, {1 << 16});

then the actual size of g is always at least 1<<16 (regardless of calls to resize). The last argument must be a power of 2 (or else errors will be thrown).

Solving 3SUM

3SUM

Gold - Normal

Focus Problem – try your best to solve this problem before continuing!

View Internal Solution

Since all the values are quite small, you can use an array instead of a hashmap. But if you didn't read the constraints carefully enough, you're in luck!

Warning!

This code passes in 1180ms when initial capacity is specified, and TLEs when it is not.

#include <bits/stdc++.h>  // see C++ Tips & Tricks
using namespace std;

using ll = long long;

using vi = vector<int>;
#define pb push_back
#define rsz resize
#define all(x) begin(x), end(x)
#define sz(x) (int)(x).size()

Problems

Status		Source	Problem Name	Difficulty	Tags
		CSES	Sum of Four Values	Normal	Show Tags Set

Module Progress:

Join the USACO Forum!

Stuck on a problem, or don't understand a module? Join the USACO Forum and get help from other competitive programmers!

Join Forum

Table of Contents

(Optional) Hashmaps

Prerequisites

Table of Contents

Warning!

Hashing

Custom Hashing

Anti-Hash Tests

Should I worry about anti-hash tests in USACO?

A Faster HashMap in C++

Custom Hashing

Resizing

Solving 3SUM

Warning!

Problems

Module Progress:

Join the USACO Forum!

Table of Contents

(Optional) Hashmaps

Prerequisites

Table of Contents

Warning!

Hashing

Custom Hashing

Anti-Hash Tests

Should I worry about anti-hash tests in USACO?

A Faster HashMap in C++

Custom Hashing

Resizing

Solving 3SUM

Warning!

Problems

Module Progress:Not Started

Join the USACO Forum!

Module Progress: