Introduction Sets Sorted Sets Hashsets Solution - Distinct Numbers Method 1 - Sorted Set Method 2 - Hashset Method 3 - Sorting Maps Iterating Over Maps Solution - Associative Array Problems Quiz

Not Frequent

0/11

Introduction to Sets & Maps

Authors: Darren Yao, Benjamin Qi, Allen Li, Jesse Choe, Nathan Gong

Contributor: Elliott Harper

Maintaining collections of distinct elements/keys with sets and maps.

Edit This Page

Prerequisites

Bronze - Introduction to Sorting

Introduction Sets Sorted Sets Hashsets Solution - Distinct Numbers Method 1 - Sorted Set Method 2 - Hashset Method 3 - Sorting Maps Iterating Over Maps Solution - Associative Array Problems Quiz

Introduction

Resources
		IUSACO	4.4 - Sets & Maps	module is based off this
		CPH	4.2, 4.3 - Sets, Maps	covers similar material

A set is a collection of unique elements. Sets have three primary methods:

one to add an element
one to remove an element
one to check whether an element is present

A map is a collection of entries, each consisting of a key and a value. In a map, all keys are required to be unique (i.e., they will form a set), but values can be repeated. Maps have three primary methods:

one to add a specified key-value pairing
one to remove a key-value pairing
one to retrieve the value for a given key

C++ and Java both have two implementations of sets and maps; one uses sorting while the other uses hashing. Python's implementation of sets and maps uses hashing.

Sets

Distinct Numbers

CSES - Easy

Focus Problem – try your best to solve this problem before continuing!

View Internal Solution

Sorted Sets

Sorted sets store elements in sorted order. All primary methods (adding, removing, and checking) run in $\mathcal{O}(\log N)$ worst-case time, where $N$ is the number of elements in the set.

C++

Sorted sets are implemented in C++ by std::set in the <set> header. Some basic operations on an std::set named s include:

s.insert(x), which adds the element x to s if not already present.
s.erase(x), which removes the element x from s if present.
s.count(x), which returns 1 if s contains x and 0 if it doesn't.

You can also iterate through a set in sorted order using a for-each loop.

#include <iostream>
#include <set>

using namespace std;

void demo() {
	set<int> s;
	s.insert(1);                 // [1]
	s.insert(4);                 // [1, 4]
	s.insert(2);                 // [1, 2, 4]

Java

Sorted sets are implemented in Java by the TreeSet class.

TreeSet includes all the operations that HashSet has, but also includes some additional ones. Refer to the More Operations on Sorted Sets module for more detail.

You can iterate through a TreeSet in sorted order using a for-each loop.

Set<Integer> set = new TreeSet<>();
set.add(1);  // {1}
set.add(4);  // {1, 4}
set.add(2);  // {1, 2, 4}

/*
 * Output:
 * 1
 * 2
 * 4
 */

for (int element : set) { System.out.println(element); }

Python

Warning!

Ignore this section, as Python doesn't implement sorted sets.

Hashsets

Hashsets store elements using hashing. Roughly, a hashset consists of some number of buckets $B$ , and each element is mapped to a bucket via a hash function. If $B\approx N$ and the hash function independently maps each distinct element to a uniformly random bucket, then no bucket is expected to contain many elements, and all primary methods will all run in $\mathcal O(1)$ expected time.

C++

Warning!

In the worst case, hashsets in C++ may take proportional to $N$ time per operation. This will be demonstrated later in the module.

Hashsets are implemented in C++ by std::unordered_set in the <unordered_set> header.

#include <iostream>
#include <unordered_set>

using namespace std;

void demo() {
	unordered_set<int> s;
	s.insert(1);                 // {1}
	s.insert(4);                 // {1, 4}
	s.insert(2);                 // {1, 2, 4}

Hashsets work with primitive types, but require a custom hash function for structures/classes like vectors and pairs.

Java

Warning!

In the worst case, hashsets in Java may take proportional to $\mathcal O(\log N)$ time per operation, same as sorted sets (ref).

Hashsets are implemented in Java by the HashSet class (which comes from the java.util library).

Some operations on a HashSet named set include:

set.add(x), which adds the element x to set if not already present.
set.remove(x), which removes the element x from set if present.
set.contains(x), which checks whether set contains the element x.

Hashsets work with primitive types and their object wrappers, but require a custom hash function for custom classes.

Set<Integer> set = new HashSet<>();
set.add(1);  // {1}
set.add(4);  // {1, 4}
set.add(2);  // {1, 4, 2}
set.add(1);  // {1, 4, 2}
// the add method did nothing because 1 was already in the set
System.out.println(set.contains(1));  // true
set.remove(1);                        // {4, 2}
System.out.println(set.contains(5));  // false
set.remove(0);                        // does nothing because 0 wasn't in the set

Python

Warning!

In the worst case, hashsets in Python may take proportional to $N$ time per operation. This will be demonstrated later in the module.

Python's built-in set uses hashing to support $\mathcal{O}(1)$ insertion, deletion, and searches. Some operations on a Python set named s include:

s.add(x): adds element x to s if not already present
s.remove(x): removes an element x from set if present
x in s: checks whether s contains the element x

s = set()
s.add(1)  # {1}
s.add(4)  # {1, 4}
s.add(2)  # {1, 4, 2}
s.add(1)  # {1, 4, 2}
# the add method did nothing because 1 was already in the set
print(1 in s)  # True
s.remove(1)  # {4, 2}
print(5 in s)  # False
s.remove(0)  # {4, 2}
# if the element to be removed does not exist, nothing happens

Solution - Distinct Numbers

This problem asks us to calculate the number of distinct values in a given list.

Method 1 - Sorted Set

Because sets only store one copy of each value, we can insert all the numbers into a set, and then print out the size of the set.

C++

#include <bits/stdc++.h>

using namespace std;

int main() {
	int n;
	cin >> n;

	set<int> distinctNumbers;
	for (int i = 0; i < n; i++) {

Java

// Source: Daniel

import java.io.*;
import java.util.*;

public class DistinctNumbers {
	public static void main(String[] args) throws IOException {
		Kattio io = new Kattio();
		int n = io.nextInt();
		Set<Integer> set = new TreeSet<>();

Python

Warning!

Ignore this section, as Python doesn't implement sorted sets.

Method 2 - Hashset

C++

Same as method 1, but with sorted set replaced by a hashset.

#include <bits/stdc++.h>

using namespace std;

int main() {
	int n;
	cin >> n;

	unordered_set<int> distinctNumbers;
	for (int i = 0; i < n; i++) {

However, this fails on one test designed specifically to cause unordered_set to run in $\Theta(N^2)$ time. To get around this, you can either switch to set or use a custom hash function.

Should I worry about anti-hash tests in USACO?

No - historically, no USACO problem has included an anti-hash test. However, these sorts of tests often appear in Codeforces, especially in educational rounds, where open hacking is allowed.

Java

Same as method 1, but with sorted set replaced by a hashset.

// Source: Daniel

import java.io.*;
import java.util.*;

public class DistinctNumbers {
	public static void main(String[] args) throws IOException {
		Kattio io = new Kattio();
		int n = io.nextInt();
		Set<Integer> set = new HashSet<>();

Python

n = int(input())  # unused
nums = [int(x) for x in input().split()]
distinct_nums = set(nums)
print(len(distinct_nums))

Or we can do this more concisely by skipping the creation of the list, and use a set comprehension directly:

n = int(input())  # unused
distinct_nums = {int(x) for x in input().split()}
print(len(distinct_nums))

However, it is possible to construct a test case that causes the above solution to run in $\Theta(N^2)$ time, so this solution does not receive full credit.

Hack Case Generator

One way to get around this with high probability is to incorporate randomness; see this comment for more information.

import random

RANDOM = random.randrange(2**62)


def Wrapper(x):
	return x ^ RANDOM


n = int(input())  # unused
distinct_nums = {Wrapper(int(x)) for x in input().split()}
print(len(distinct_nums))

Another (less efficient) way is to use strings instead of integers since the hash function for strings is randomized.

n = int(input())  # unused
distinct_nums = set(input().split())
print(len(distinct_nums))

Should I worry about anti-hash tests in USACO?

No - historically, no USACO problem has included an anti-hash test. However, these sorts of tests often appear in Codeforces, especially in educational rounds, where open hacking is allowed.

Method 3 - Sorting

Check out the solution involving sorting.

Maps

Associative Array

YS - Easy

Focus Problem – try your best to solve this problem before continuing!

In sorted maps, the pairs are sorted in order of key. As with sorted sets, all primary methods run in $\mathcal{O}(\log N)$ worst-case time, where $N$ is the number of pairs in the map.

In hashmaps, the pairs are hashed to buckets by the key, and as with hashsets, all primary methods run in $\mathcal O(1)$ expected time under some assumptions about the hash function.

C++

In C++, sorted maps are implemented with std::map and hashmaps are implemented with std::unordered_map.

Some operations on an std::map and std::unordered_map named m include:

m[key], which returns a reference to the value associated with the key key.
- If key is not present in the map, then the value associated with key is constructed using the default constructor of the value type. For example, if the value type is int, then calling m[key] for a key not within the map sets the value associated with that key to 0. As another example, if the value type is std::string, then calling m[key] for a key not within the map sets the value associated with that key to the empty string. More discussion regarding what happens in this case can be found here.
- Alternatively, m.at(key) behaves the same as m[key] if key is contained within m but throws an exception otherwise.
- m[key] = value will assign the value value to the key key.
m.count(key), which returns the number of times the key is in the map (either one or zero), and therefore checks whether a key exists in the map.
m.erase(key), which removes the map entry associated with the specified key if the key was present in the map.

#include <iostream>
#include <map>

using namespace std;

void demo() {
	map<int, int> m;
	m[1] = 5;                    // [(1, 5)]
	m[3] = 14;                   // [(1, 5); (3, 14)]
	m[2] = 7;                    // [(1, 5); (2, 7); (3, 14)]

Java

In Java, sorted maps are implemented with TreeMap and hashmaps are implemented with HashMap.

In both TreeMap and HashMap, the put(key, value) method assigns a value to a key and places the key and value pair into the map. The get(key) method returns the value associated with the key. The containsKey(key) method checks whether a key exists in the map. Lastly, remove(key) removes the map entry associated with the specified key.

Map<Integer, Integer> map = new TreeMap<Integer, Integer>();
map.put(1, 5);                           // [(1, 5)]
map.put(3, 14);                          // [(1, 5); (3, 14)]
map.put(2, 7);                           // [(1, 5); (2, 7); (3, 14)]
map.remove(2);                           // [(1, 5); (3, 14)]
System.out.println(map.get(1));          // 5
System.out.println(map.containsKey(7));  // false
System.out.println(map.containsKey(1));  // true

Python

Colloquially, hashmaps are referred to as dicts in python.

d = {}
d[1] = 5  # {1: 5}
d[3] = 14  # {1: 5, 3: 14}
d[2] = 7  # {1: 5, 2: 7, 3: 14}
del d[2]  # {1: 5, 3: 14}
print(d[1])  # 5
print(7 in d)  # False
print(1 in d)  # True

Iterating Over Maps

C++

An std::map stores entries as pairs in the form {key, value}. To iterate over maps, you can use a for loop. The auto keyword suffices to iterate over any type of pair (here, auto substitutes for pair<int, int>).

// Both of these output the same thing
for (const auto &x : m) { cout << x.first << " " << x.second << endl; }
for (auto x : m) { cout << x.first << " " << x.second << endl; }

The first method (iterating over const references) is generally preferred over the second because the second will make a copy of each element that it iterates over. Additionally, you can pass by reference when iterating over a map, allowing you to modify the values (but not the keys) of the pairs stored in the map:

for (auto &x : m) {
	x.second = 1234;  // Change all values to 1234
}

Java

To iterate over maps, you can use a for-each loop over the keys:

for (int k : m.keySet()) { System.out.println(k + " " + m.get(k)); }

You can also use a for-each loop over the entries:

for (Map.Entry entry : m.entrySet()) {
	System.out.println(entry.getKey() + " " + entry.getValue());
}

It's also possible to change the values while iterating over the keys (or over the values themselves, if they're mutable):

for (int k : m.keySet()) {
	m.put(k, 1234);  // Change all values to 1234
}

Python

To iterate over dicts, there are three options, all of which involve for loops. Dicts will be returned in the same order of insertion in Python 3.6+. You can iterate over the keys:

for key in d:
	print(key)

Over the values:

for value in d.values():
	print(value)

And even over key-value pairs:

for key, value in d.items():
	print(key, value)

It's also possible to change the values while iterating over the keys (or over the values themselves, if they're mutable):

for key in d:
	d[key] = 1234  # Change all values to 1234

While you are free to change the values in a map when iterating over it (as demonstrated above), it is generally a bad idea to insert or remove elements of a map while iterating over it.

Python

For example, the following code attempts to remove every entry from a map, but results in a runtime error.

d = {i: i for i in range(10)}

for i in d:
	del d[i]

Traceback (most recent call last):
  File "test.py", line 3, in <module>
    for i in d:
RuntimeError: dictionary changed size during iteration

One way is to get around this is to create a new map.

d = {i: i for i in range(10)}

# only includes every third element
d_new = dict(item for i, item in enumerate(d.items()) if i % 3 == 0)

print("new dict:", d_new)  # new dict: {0: 0, 3: 3, 6: 6, 9: 9}

Another is to maintain a list of all the keys you want to remove and remove them after the iteration finishes:

d = {i: i for i in range(10)}

# removes every third element
to_remove = {key for i, key in enumerate(d) if i % 3 == 0}

for key in to_remove:
	del d[key]

print("new dict:", d)  # new dict: {1: 1, 2: 2, 4: 4, 5: 5, 7: 7, 8: 8}

C++

For example, the following code attempts to remove every entry from a map, but results in a segmentation fault.

map<int, int> m;

for (int i = 0; i < 10; ++i) m[i] = i;

for (auto &it : m) {
	cout << "Current Key: " << it.first << endl;

	m.erase(it.first);
}

The reason is due to "iterators, pointers and references referring to elements removed by the function [being] invalidated" (as stated in the documentation for erase), though iterators are beyond the scope of this module.

One way to get around this is to just create a new map instead of removing from the old one.

map<int, int> m, M;

for (int i = 0; i < 10; ++i) m[i] = i;

int current_iteration = 0;

for (const auto &it : m) {
	// only includes every third element
	if (current_iteration % 3 == 0) { M[it.first] = it.second; }

Another is to maintain a list of all the keys you want to erase and erase them after the iteration finishes.

map<int, int> m;
for (int i = 0; i < 10; ++i) { m[i] = i; }

vector<int> to_erase;
int current_iteration = 0;

for (const auto &it : m) {
	// removes every third element
	if (current_iteration % 3 == 0) { to_erase.push_back(it.first); }

Java

Modifying a Collection (Set, Map, etc.) in the middle of a for-each loop will cause a ConcurrentModificationException. See the following snippet for an example:

Map<Integer, Integer> m = new TreeMap<>();
// m starts as {0: 0, 1: 1, 2: 2}
m.put(0, 0);
m.put(1, 1);
m.put(2, 2);

for (int key : m.keySet()) {
	m.remove(key);  // ConcurrentModificationException thrown!!
}

One work-around is to use Iterator and the .remove() method to remove elements while looping over them, like in the next code snippet:

Map<Integer, Integer> m = new TreeMap<>();
// m starts as {0: 0, 1: 1, 2: 2}
m.put(0, 0);
m.put(1, 1);
m.put(2, 2);

Iterator<Map.Entry<Integer, Integer>> iter = m.entrySet().iterator();
while (iter.hasNext()) {
	int key = iter.next().getKey();
	if (key == 0 || key == 2) { iter.remove(); }

However, Iterator is outside the scope of this module.

The easiest option (in most cases) if you want to remove/insert multiple entries at once is to use your Container's .addAll(c) or .removeAll(c) methods. That means that you should put all the elements you want to remove (or add) in a new Collection, and then use that new Collection as the parameter of the .addAll(c) or .removeAll(c) method that you call on your original Collection. See the following code snippet for an example (it works equivalently to the code above):

Map<Integer, Integer> m = new TreeMap<>();
// m starts as {0: 0, 1: 1, 2: 2}
m.put(0, 0);
m.put(1, 1);
m.put(2, 2);

Set<Integer> keysToRemove = new TreeSet<>();
for (Map.Entry<Integer, Integer> entry : m.entrySet()) {
	int key = entry.getKey();
	if (key == 0 || key == 2) { keysToRemove.add(key); }

Solution - Associative Array

To solve this problem efficiently, we need a data structure that can:

Assign a value to any index k (where k can be as large as $10^{18}$ ).
Retrieve the value at any index k quickly.

A regular array won't work because the indices can be extremely large, making it impossible to allocate enough memory. However, since all values are initially 0 and only a small subset of indices will ever be set or queried, we can use a map (also called an associative array or dictionary) to store only the indices that have been assigned a value.

When we receive a 0 k v query, we set a[k] = v in the map.
When we receive a 1 k query, we print a[k] if it exists in the map, or 0 otherwise.

This approach is efficient because both operations are fast in maps, and we only store the keys that are actually used.

Note that we use 64-bit integers since $k$ and $v$ may be large.

C++

#include <iostream>
#include <map>

using namespace std;

int main() {
	int query_num;
	cin >> query_num;
	map<long long, long long> a;  // Map to store only assigned indices
	for (int q = 0; q < query_num; q++) {

Java

import java.io.*;
import java.util.*;

public class Main {
	public static void main(String[] args) throws IOException {
		BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
		int queryNum = Integer.parseInt(br.readLine());
		// Map to store assigned indices
		HashMap<Long, Long> a = new HashMap<Long, Long>();
		for (int i = 0; i < queryNum; i++) {

Python

Unfortunately, the straightforward solution fails a few tests specifically designed to make Python dicts run slowly:

a = dict()  # Dictionary to store assigned indices
for _ in range(int(input())):
	nums = list(map(int, input().split()))
	if nums[0] == 0:
		a[nums[1]] = nums[2]
	elif nums[0] == 1:
		# Print a[k] if present, else 0
		print(a.get(nums[1], 0))

To pass all the tests, we can use one of the workarounds mentioned for set.

import random

RANDOM = random.randrange(2**62)


def wrap(x):
	return x ^ RANDOM


a = dict()  # Dictionary to store assigned indices

Problems

Some of these problems can be solved by sorting alone, though sets or maps could make their implementation easier.

Source	Problem Name	Difficulty	Tags
CSES	Sum of Two Values	Easy	Show Tags Map
Bronze	Where Am I?	Easy	Show Tags Set
Bronze	Team Tic Tac Toe	Normal	Show Tags Set, Simulation
Bronze	Year of the Cow	Normal	Show Tags Map
Bronze	Don't Be Last!	Normal	Show Tags Map, Sorting
Silver	Cities & States	Normal	Show Tags Map
CF	Jury Marks	Normal	Show Tags Prefix Sums, Set
AC	Made Up	Hard	Show Tags Map
CF	Into Blocks	Hard	Show Tags Map, Set

Quiz

C++

What is the worst-case time complexity of insertions, deletions, and searches in an std::set of size $N$ ?

Question 1 of 7

Java

What is the worst-case time complexity of insertions, deletions, and searches in an TreeSet of size $N$ ?

Question 1 of 7

Python

What is the worst-case time complexity of insertions, deletions, and searches in a set of size $N$ ?

Question 1 of 7

Module Progress:

Join the USACO Forum!

Stuck on a problem, or don't understand a module? Join the USACO Forum and get help from other competitive programmers!

Join Forum

Table of Contents

Introduction to Sets & Maps

Prerequisites

Table of Contents

Introduction

Sets

Sorted Sets

Warning!

Hashsets

Warning!

Warning!

Warning!

Solution - Distinct Numbers

Method 1 - Sorted Set

Warning!

Method 2 - Hashset

Should I worry about anti-hash tests in USACO?

Should I worry about anti-hash tests in USACO?

Method 3 - Sorting

Maps

Iterating Over Maps

Solution - Associative Array

Problems

Quiz

Module Progress:

Join the USACO Forum!

Table of Contents

Introduction to Sets & Maps

Prerequisites

Table of Contents

Introduction

Sets

Sorted Sets

Warning!

Hashsets

Warning!

Warning!

Warning!

Solution - Distinct Numbers

Method 1 - Sorted Set

Warning!

Method 2 - Hashset

Should I worry about anti-hash tests in USACO?

Should I worry about anti-hash tests in USACO?

Method 3 - Sorting

Maps

Iterating Over Maps

Solution - Associative Array

Problems

Quiz

Module Progress:Not Started

Join the USACO Forum!

Module Progress: