# (Optional) Hashmaps

Authors: Darren Yao, Benjamin Qi

Contributors: Neo Wang, Nathan Gong

Maintaining collections of distinct elements with hashing.

### Warning!

You can almost always use ordered sets and maps instead, but it's good to know that these exist.

Resources | ||||
---|---|---|---|---|

IUSACO | module is based off this |

## Hashing

**Hashing** refers to assigning a unique code to every variable/object which
allows insertions, deletions, and searches in $\mathcal{O}(1)$ time, albeit with
a high constant factor, as hashing requires a large constant number of
operations. However, as the name implies, elements are not ordered in any
meaningful way, so traversals of an unordered set will return elements in some
arbitrary order.

## Custom Hashing

C++

There is no built in method for hashing pairs or vectors. Namely,
`unordered_set<vector<int>>`

does not work. In this case, we can use an
sorted map (which supports all of the functions used
in the code above) or declare our own hash function.

Resources | ||||
---|---|---|---|---|

Mark Nelson | How to create user-defined hash function for |

The link provides an example of hashing pairs of strings, as well as other data structures like pairs.

#include <bits/stdc++.h>using namespace std;typedef pair<int, int> pi;#define f first#define s secondstruct hashPi {size_t operator()(const pi &p) const { return p.f ^ p.s; }};int main() { unordered_map<pi, int, hashPi> um; }

#include <bits/stdc++.h>using namespace std;typedef pair<int, int> pi;#define f first#define s secondnamespace std {template <> struct hash<pi> {size_t operator()(const pi &p) const { return p.f ^ p.s; }};} // namespace stdint main() { unordered_map<pi, int> um; }

Java

Java has its own hash functions for pre-defined objects like `ArrayList`

's.
However, a custom hash function is still needed for user-defined objects.
In order to create one, we can implement the `hashCode`

method.

Additionally, in order for `HashSet`

's and `HashMap`

's to work with
a custom class, we must also implement the `equals`

method.

import java.io.*;import java.util.*;public class HashTest {public static void main(String[] args) {// uses custom hash function in class PairSet<Pair> set = new HashSet<>();}static class Pair {

However, this hash function is quite bad; if we insert $(0,0), (1,1), (2,2) \ldots$ then they will all be mapped to the same bucket (so it would easily be hacked in a Codeforces contest).

C++

Java

In Java, an easy way to hash custom objects is to use the builtin
`Objects.hash()`

method. This method takes in multiple objects and uses them
to create a hash code. However, this is easily hackable, as the code
below shows.

import java.io.*;import java.util.*;public class HashTest {public static void main(String[] args) {// uses custom hash function in class PairSet<Pair> set = new HashSet<>();for (int i = 0; i < 100000; ++i) {// Pair p = new Pair(i, 0); // ~0.2s on ide.usaco.guidePair p = new Pair(i, -31 * i); // >5s

Python

A better way to hash pairs would be to use polynomial hashing with a randomized base (as described in this module).

## Anti-Hash Tests

The built-in hashing algorithm for integers in C++ is vulnerable to pathological tests, causing abnormally slow runtimes. We describe the issue below and how to fix it. Java users are not affected (see this comment).

### Should I worry about anti-hash tests in USACO?

No - historically, no USACO problem has included an anti-hash test. However, these sorts of tests often appear in Codeforces, especially in educational rounds, where open hacking is allowed.

Resources | ||||
---|---|---|---|---|

CF | Explanation of this problem and how to fix it. |

Hack Case Generator

In order to make your `unordered_map`

unhackable, the hash function $H$ you select must have the following property: it is hard to find integers $x$ and $y$ such that $x\neq y$ but $H(x)\equiv H(y) \pmod{P}$, where $P$ is the prime modulus corresponding to the number of buckets in your unordered map. Note that:

- The default hash function, $H(x)=x$, obviously does not satisfy this property, because you can simply choose $x=0$ and $y=P$.
- If open hacking is allowed, then any deterministic hash function does not satisfy this property, because a hacker could generate a test case specifically to break your solution. You should introduce randomness to ensure that your hash function changes from run to run.

Here is a drop-in replacement for `unordered_map`

that seems to work well in practice:

Resources | ||||
---|---|---|---|---|

Benq (from KACTL) |

struct chash {// any random-ish large odd number will doconst uint64_t C = uint64_t(2e18 * PI) + 71;// random 32-bit numberconst uint32_t RANDOM =chrono::steady_clock::now().time_since_epoch().count();size_t operator()(uint64_t x) const {// see https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.htmlreturn __builtin_bswap64((x ^ RANDOM) * C);}};template <class K, class V> using cmap = unordered_map<K, V, chash>;// example usage: cmap<int, int>

## A Faster HashMap in C++

In C++, it's usually faster to use `gp_hash_table<K, V>`

instead of
`unordered_map<K, V>`

. Read / writes are much faster than `unordered_map`

.
The documentation is rather confusing, so I'll just summarize the most useful
functions here. If you need a replacement for `unordered_set<K>`

, use
`gp_hash_table<K, null_type>`

.

Resources | ||||
---|---|---|---|---|

CF | Introduces gp_hash_table | |||

GCC | documentation | |||

Benq (from KACTL) |

### Custom Hashing

Like `unordered_map`

, `gp_hash_table`

is vulernable to hash collisions if a
custom hash function isn't used. We recommend using the same custom hash
we used for unordered map above:

#include <ext/pb_ds/assoc_container.hpp>using namespace __gnu_pbds;gp_hash_table<int, int, chash> table;

### Resizing

Unordered map has
`reserve`

.
Calling this function before inserting any elements can result in a constant
factor speedup.

We can modify the declaration of `gp_hash_table`

so that it supports the
`resize`

function, which operates similarly.

template <class K, class V>using ht = gp_hash_table<K, V, hash<K>, equal_to<K>, direct_mask_range_hashing<>, linear_probe_fn<>,hash_standard_resize_policy<hash_exponential_size_policy<>,hash_load_check_resize_trigger<>, true>>;

These are the same template arguments as the default `gp_hash_table`

, except
`false`

has been changed to `true`

. This modification allows us to change the
actual size of the hash table.

int main() {ht<int, null_type> g;g.resize(5);cout << g.get_actual_size() << "\n"; // 8cout << g.size() << "\n"; // 0}

When calling `g.resize(x)`

, `x`

is rounded up to the nearest power of 2. Then
the actual size of `g`

is changed to be equal to `x`

(unless `x < g.size()`

, in
which case an error is thrown).

Resources | ||||
---|---|---|---|---|

GCC | documentation |

Furthermore, if we construct `g`

with the following arguments:

ht<int, null_type> g({}, {}, {}, {}, {1 << 16});

then the actual size of `g`

is always at least `1<<16`

(regardless of calls to
`resize`

). The last argument **must** be a power of 2 (or else errors will be
thrown).

### Solving 3SUM

Focus Problem – try your best to solve this problem before continuing!

View Internal SolutionSince all the values are quite small, you can use an array instead of a hashmap. But if you didn't read the constraints carefully enough, you're in luck!

### Warning!

This code passes in 1180ms when initial capacity is specified, and TLEs when it is not.

#include <bits/stdc++.h> // see C++ Tips & Tricksusing namespace std;using ll = long long;using vi = vector<int>;#define pb push_back#define rsz resize#define all(x) begin(x), end(x)#define sz(x) (int)(x).size()

## Problems

Status | Source | Problem Name | Difficulty | Tags | |
---|---|---|---|---|---|

CSES | Normal | ## Show TagsSet |

### Module Progress:

### Join the USACO Forum!

Stuck on a problem, or don't understand a module? Join the USACO Forum and get help from other competitive programmers!