31251 - Data Structures and Algorithms

Fundamentals

Model of computation

Model that describes the computational process

Seqeuntial models
- Finite state machines
- Pushdown automata
- Turing machines
- Random-access machines
Functional models
- Lambda calculus
Concurrent models

This subject implements data structures and runs algorithms on a Random-access machine; a model of computation with infinite discrete memory addresses.

such that the following can be executed in \(O(1)\) on a random-access machine:

Reading an address
Writing to an address
Allocation of an object
Deallocation of an object
Arithmetic on numerical objects

Abstract Data Type (ADT)

Description of the permitted logical operations of a data type; mathematical abstraction of data type's behaviour

Data structure

Description of the physical implementation of a data type; concrete implementation of data type within a model of computation

Algorithmic time-space continuum

Technique of using extra memory to reduce time complexity

Iteration

Repeated process of an algorithm

Invariant

Condition that holds true for every iteration of an algorithm

Big-O Analysis

Big-\(O\) Grande-\(O\) 大\(O\)

\(O(g)\) represents the set of all functions that \(k \cdot g\) dominates

Formal definition

\( f \in O(g) \iff \exists k,n_0 \gt 0 \exists n_0 ( \forall n \geq n_0 [ |f(n)| \leq k g(n) ] ) \)

Limit definition

\( f \in O(g) \iff \limsup_{n \to \infty} \frac{f(n)}{g(n)} \lt \infty \)

Big-\(\Omega\) Grande-\(\Omega\) 大\(\Omega\)

\(\Omega (g)\) represents the set of all functions that \(k \cdot g\) is dominated by

Formal definition

\( f \in \Omega (g) \iff \exists k,n_0 \gt 0 ( \forall n \geq n_0 [ f(n) \geq k g(n) ] ) \)

\( g \in O(f) \iff f \in \Omega (g)\)

Limit definition

\( \liminf_{n \to \infty} \frac{f(n)}{g(n)} \gt 0 \iff f \in O(g)\)

Big-\(\Theta\) Grande-\(\Theta\) 大\(\Theta\)

\(\Theta (g)\) represents the set of all functions that can be bound by two multiples of function \(g\), say \(k_1 \cdot g\) and \(k_2 \cdot g\)

Formal definition

\( f \in \Theta (g) \iff \exists k_1,k_2,n_0 \gt 0 ( \forall n \geq n_0 [ k_1 g(n) \leq f(n) \leq k_2 g(n) ] ) \)

\( f \in O(g) \land f \in \Omega (g) \iff f \in \Theta (g)\)

Limit definition

\( \limsup_{n \to \infty} \frac{f(n)}{g(n)} \lt \infty \land \liminf_{n \to \infty} \frac{f(n)}{g(n)} \gt 0 \iff f \in \Theta (g)\)

Amortized analysis Analisi ammortizzato 償却解析

The concept of taking the average performance case of some algorithm rather than the best or worst; this is often a more practical result

Sorting algorithms Algoritmi d'ordinamento ソート

Sorting algorithm

Algorithm permutating objects in a sequence containter such that the sequence container follows a monotonic order (some property of the objects increases or decreases as the container index is higher)

Orders

Monotonic orders have two types:

Ascending; smallest to largest
Descending; largest to smallest

Properties

Comparison based; only uses comparison operators on data values
In place; constant amount of auxiliary variables used
Stable; data values that compare equal in the sorted array are in the same relative order as the input

Insertion sort

Sorting algorithm that sort that given the invariant that the first \(i\) elements are sorted,

Sorting algorithm that for iteration \(i\), has the invariant that the initial \(i-1\) data values are sorted. From this, it makes comparisions backwards from the sequence container to conclude where data value \(i\) will be placed (this subroutine in the worst case runs in \(\Theta (i)\)). The whole algorithm in the worst case runs in \( \Theta (n^2) \) and in the best case runs in \( \Theta (n)\)

Begin with index \(j\) at the end of the sequence container, if the value \(v\) is:
- \(v \geq A[j]\), then shift sequence container as appropriate and set \(v\) at \(A[j+1]\)
- \(v < A[j]\), then decrement \(j\) and repeat step 1

Invariant

First \(i\) data values of array sorted

Binary search

Searching algorithm on a sorted list to find a data value with a certain valuer. It initially reads the center index and compares its size with the desired data value, oscillating between the end. it has a worst-case runtime of \(\Theta (\log(n) ) \)

Set the upper, middle, and lower indexes \(u,m,\ell\) to borders of sorted list
if the value \(v\) is:
- \(v > A[m]\), then set \(\ell = m\), calculate the new \(m\) and repeat step 2
- \(v < A[m]\), then set \(u = m\), calculate the new \(m\) and repeat step 2
- \(v = A[m]\), then return \(m\)

Invariant

For data value \(a\), there is an upper \(u\) and lower bound \(\ell\) index known

Counting sort

Sorting algorithm for \(n\) natural numbers in the discrete interval \([0,k]\) with worst-case running time \( O(n + k)\). It is stable but not in place.

Let \(A\) be the array to sort:

Create a counting array \(C\) of size \(k+2\) and assign the frequency of the number \(i\) at \(C[i+1]\) (note that \( \sum^{k+2}_{i=0} C[i]= n\))
Reasign \(C\) to be cumulative, such that \(C[i]\) represents the frequency of the first \(i-1\) numbers in \(A\), so \(C[i] = \sum^{i-1}_{j=0} C[j]\)
Create a new array \(B\) of size \(n\) and assign values by starting \(i=0\) and placing \(A[i]\) in the original array at \(B[C[A[i]]]\), or if that index is already assigned, assign \(A[i]\) at \(B[C[A[i]]+1]\)
Copy new array as original array, so make \(A=B\)

Radix sort

Sorting algorithm for \(n\) natural numbers interval \([0,10^{d}]\) that employs a stable sorting method to sort each digit place on \(k\) passes. It has a worst case runtime of \( O(dN) \), where \(N\) is the worst time runtime of the chosen stable sorting algorithm.

Recursion Ricorsione 再帰

The concept of solving a subproblem within some problem, calling a function within itself

Divide and Conquer algorithm Algoritmo dividi e conquista 分割統治法

Algorithm paradigm that recursively partitioning a problem into sub-problems (partitioning half of an array etc) and applying some algorithm independently on both partitions. Note that in some contexts the solution to the problem may be relative two different elements in different sub-problems, hence the standard approach is:

Partition the sub-problems
Resolve sub-problems
Combine solutions to sub-problems

Recursive time complexity

Algorithms that employ recursive measures will have their time complexity bound by a recurrence relation, where \(T(n)\) and some base case for a reduced case like \(T(1)\)

Mergesort

Sorting algorithm with worst-case running time \( O(n \log(n))\). It is stable but not in place.

Let \(A\) be the array to sort:

If partition has size \(1\) then exit the recursion layer, of if there is no recursion layer, the algorithm is complete
Partition \(A\) in two half and 'mergesort' both halves and then 'merge' both halves together

Note the use of divide and conquer strategies

Quicksort

Sorting algorithm with worst-case running time \( \Theta(n^2)\) and average-case running time \(O(n \log (n))\). It is in place and comparison based but not stable.

Let \(A\) be the array to sort:

Pivot \(A[0]\) to a position \(i\) such that \(j \lt i \iff A[j] \leq A[i] \), note that \(A[i]\) forms two partition
'Quicksort' these two partitions

Lomuto loop

A pivoting algorithm that can be implemented in quicksort, the general idea is to partition a vector; one part has all values less that or equal to the pivot, another greater than the pivot. This partitioning index is then where the pivot is inserted.

Let \(A\) be the array, \(i=1\) be an iterator over \(A\) and \(\ell=1\) be the partitioning index:

For every iteration of \(i\), if \(A[i] \leq A[0]\), swap \(A[\ell]\) and \(A[i]\) and increment \(\ell\)
\(\ell-1\) is the index at which to insert the pivot

Funny sort

Bubble sort

Bogo sort

Hashing

Hash table ハッシュテーブル

Data structure comprised a static array with each slot containing a linked list for buckets. A hashing function is also used to allocate key-value pairs to designated buckets

Bucket Secchio バケツ

Memory store in a hash table labelled by key values, storing data values

Hashing function

Function that assigns data values to buckets

Collision Collisione 衝突

When two values map to the same hash object due to a lack of 1:1 correspondence

Load factor

Ratio of entry array size to hash table buckets \(\frac{|A|}{|B|}\)

Separate chaining

Designating each bucket with a sequence container to prevent collisions

Simple Uniform Hashing Assumptionm (SUHA)

Assumption that for some hash table \(H\) and some bucket \(B\) chosen for some value, then treating the bucket as a random variable, \(B \sim \text{U}(0,|H|-1)\).

In other words, this assumption is that for resolving some value with the hashing function as a black box, all buckets have an equal probability of being resolved to. This means there is no numerical instability in the hash table that favours certain buckets.

Graphs

See Discrete Mathematics for an introduction to graph theory concepts

Binary tree Albero binario 二分木

Tree data structure such that each parent has at most two children

Complete binary tree

Binary tree such that each layer has maximum vertexes, except possibly the last layer which is filled from the left

A few properties of this tree become visible:

\(G \text{ is a complete binary tree} \implies h = \lfloor \log(|V|) \rfloor\)

\(G \text{ is a complete binary tree} \implies |V| \gt \sum^{h-1}_{i=0} 2^i\)

Binary heap

Binary tree data structure typically employing a vector with the invariant that it represents a complete binary tree and satisfies the heap property

Heap property

The key stored in each vertex is less than or equal to the keys its children hold

Swimming

Technique of a key in a binary heap (minus heap property) swapping keys with a parent vertex until heap property is satisfied. The need for this arises when appending new vertexes to the binary heap.

Sinking

Technique of a key in a binary heap (minus heap property) swapping keys with a smallest child vertex until heap property is satisfied. The need for this arises when removing hte minimum element from a binary heap, which is done by replacing root vertex with the rightmost vertex and sinking this datavalue to restore the heap property.

Edge weight

Value associated with an edge to represent some quantity for comparison of edges

Negative weights

Weighted graph Graffico pesato 重量グラフ

A graph \(G\) is a pair \( (V,E,W) \) where:

\(V\) is the set of vertexes
\(E\) is the set of edges
\(W\) is the set of edge weights

Weighted adjacency matrix Matrice delle adiacenze pesato 重量隣接行列

A matrix \(A\) with dimensions \(|V| \times |V|\) such that:

\( A_{ij} = \begin{cases} 0 & \{i,j\} \notin E \\ w( \{i,j \}) & \{i,j\} \in E \end{cases}\)

Undirected graphs have symmetric matrixes

Directed

\( A_{ij} = \begin{cases} 0 & (i,j) \notin E \\ 1 & (i,j) \in E \end{cases}\)

Graph Implementations

Adjacency matrix Matrice delle adiacenze 隣接行列

\(\text{Graph}() \in \Theta (n^2)\)
\(\text{add_edge}() \in \Theta (1) \)
\(\text{edge}(i,j) \in \Theta (1)\)
\(\text{list_out_adjacent}(v) \in \Theta (n)\)

Adjacency list lista delle adiacenze 隣接リスト

\(\text{Graph}() \in \Theta (n+m)\)
\(\text{add_edge}() \in \Theta (1) \)
\(\text{edge}(i,j) \in \Theta ( \deg (i))\)
\(\text{list_out_adjacent}(v) \in \Theta ( \deg (v) )\)

Adjacency set Insieme delle adiacenze 隣接集合

A linked list sequence container containing edge weights between vertexes

Depth first search (DFS)

Searching algorithm on a graph that works by:

Marking current vertex, so that the algorithm remembers that it has traversed through that vertex
Looping through the vertex's adjacent vertexes
- If adjacent vertex unmarked, recrusively perform DFS on it

It has running time \( O(|V| + |E|) \)

Recursive

DFS recursively operating on each node

Iterative

DFS can emulate recursion using a stack data structure to fill with child nodes to verify that they have been marked.

Breadth first search (BFS)

Graph searching algorithm based on iterative DFS with the stack replaced with a queue, so that the the FIFO structure can

It has a running time of \( O( |V| + |E| ) \) with the adjacency list model

Distance function

For weighted graphs, \(d : \mathbb{N}^2 \to \mathbb{R}_{+} \) is a distance function denoting the weight of the shortest path between two vertexes.

Unconnected case

If two vertexes are not connected, then the distance is \(\infty\).

Negative weight case

If there exists a negative weight that can be infinitely abused in a cycle, the distance is \(-\infty\). This is the only case that a cycle occurs in a shortest path

Arborescence

A directed graph such that there is a directed path from vertex \(v\) to all other vertexes

Single-Source Shortest Path

Shortest paths from a particular vertex \(v\) to any other vertex on the graph

Single-Pair Shortest Path

Shortest path between two vertexes\((v_1, v_2)\)

Path relaxation

checking if \( \text{dist_to}[v_0] + e.\text{weight} \lt \text{dist_to}[v_1] \), if so, then let:

\( \text{dist_to}[v_1] = \text{dist_to}[v_0] + e.\text{weight} \)
\( \text{edge_to}[v_1] = v_0 \)

Generic Shortest Path Algorithm

Shortest path algorithms note the length of the known path to some vertex \(v\) and the previous vertex encountered when following this path, say \(u\). When exploring vertexes, these paths are relaxed by comparing the shortest known path with the newly found path, and updating information as required.

Let vertex \(0\) be the source:

Initialize \(\text{dist_to}[i] = \begin{cases} 0 & i = 0 \\ \infty & i \neq 0 \end{cases} \)
Initialize \(\text{edge_to}[i] = -1 \)
Employing some form of iterative edge relaxation

Invariants

\( d(0,v) \leq \text{dist_to}[u] + e.\text{weight} \), in other words, \(\text{dist_to}\) is always bounded below by the true shortest distance \(d\)
Recursively tracking a vertex \(v\) by recursively using \( \text{edge_to} \) gives a path from \(0\) to \(v\) with length \(\text{dist_to[v]}\)

Bellman-Ford Algorithm Algoritmo Bellman-Ford ベルマンフォード法

Algorithm that, starting from the source vertex, relaxes all the graph's edges \(|V|-1\) times, with a running time of \(O( |V| \cdot |E| )\). This works since the pidgeon hole principle implies that relaxing edges more than \(|V|-1\)would have no effect, that is if there are no negative weights

Negative cycle

A graph has negative weights iff a \(|V|\)th iteration changes the \(\text{dist_to}\) array, meaning Bellman-Ford can detect negative cycles with an extra loop on all edges

Algorithm

For each vertex, relax all edges

Dijkstra's Algorithm Algoritmo di Dijkstra ダイクストラ法

Algorithm that, starting from the source vertex, relaxes the outgoing edges of the vertex closest to the source \(O( |V| + |E|\log (|E|) )\). This works since inductive logic shows that a shortest path to \(v_n\) requires the path knowing the shortest path to \(v_{n-1}\)

Invariants

\( \forall v \in S, d(0,v) = \text{dist_to}[v] \)
\( \forall v \in S, u\notin S, d(0,v) \leq d(0,u) \)

Algorithm

While the priority queue is not empty, pop vertex \(v\) and do the following:

Record all edges and edgeweights of vertex \(v\) to its neighbours \(N\)
Relax all vertexes in \(N\)
Push each vertex in \(N\) on the queue in order of closest to \(v\)

Binary Search Tree (BST)

Binary tree data structure with the BTS property

BST property

Invariant that for every subtree, the left branches of the subtree are less than the subtree's root's value and the right branches of the subtree are greater than the subtree's root's value

Job scheduling

Topological order

A linear order of vertexes such that no vertex has a path to a vertex earlier in the order

Directed Acyclic Graph (DAG)

A directed graph with no cycles, a graph has a topological order iff it is a DAG. To find

Unmarked path property

For an unmarked path \( (v_1,v_k) \), the recursive depth visit of \(v_k\) is embedded within the depth visit of \( v_1\)

DAG DFS

DFS with the following modifications:

Only outgoing edges are noted in adjacency matrix/list
All unmarked vertexes are looped over rather than the ones accessible through vertex \(0\)
If a vertex is encountered and it is on stack, the DFS alerts that the graph is not a DAG
Each vertex is pushed in reverse postorder. Preorder list the vertexes of each search in the order they are found as desired, however may list paths discoverable by subsequent searches early; reverse postorder has the initial vertex property like preorder (since it is reversed), but mitigates this particular issue

Back egde

A directed edge to a vertex already on stack. Graphs have a back edge iff they have a cycle (when a graph is not a DAG)

Proof

To show \( \text{Back egde} \implies \implies \text{Cycle}\), we assume there is a back edge \( (v_i, v_j) \), this means some vertex \(v_j\) is already on the stack, meaning that \(v_j\) has been encountered on \( \text{dfs}(v_i) \) and hence a cycle has occured.

To show \( \text{Cycle} \implies \text{Back edge} \), we assume there is some cycle on vertex \( v_i\), or in other words, \(\text{dfs}(v_i)\) finds \(v_i\). The unmarked path property implies that \(\text{dfs}(v_j)\) is embedded within the algorithm \(\text{dfs}(v_i) \). Therefore when \( \text{dfs}(v_j)\) finds \(v_i\) which is on the stack due to this call being embedded within the call \(\text{dfs}(v_i)\). Therefore there is a back edge \( (v_i, v_j) \)

Shortest path for DAG

Use DAG DFS to find a topological sort
Use Bellman-Ford to relax each edge for each vertex in the sort

Memoization

Storing previously calculated values in a sequence container called a memo for future reference

Dynamic programming

Splitting a problem into subproblems and using the information from overlapping subproblems

ADTs

Iterator

Generalized pointer object for a sequence container (note that some sequence containers are not contiguous; iterators are useful for handling the complexities of data structures like linked lists)

std::iterator

Static array

Fixed size array

std::array

Dynamic array

Variable size array

std::vector

As a data structure, it can be initialized as a static array and is replaced with a static array of double the size once new indexes become necessary, this is array doubling

\(\text{Vector} \in O(1) \) allocates a static array of size 1
\(\text{push_back} \in O(n) \) accesses the end of the array and substitutes its value, or when out of capacity of the static array, recreates the array with double the size, copies all data values to the new array and then substitutes the next empty register with its value
\(\text{get} \in O(1) \) acesses the address and returns its value
\(\text{set} \in O(1) \) accesses the address and substitutes its value

Linked list Lista concatenata 連結リスト

Data structure containing nodes that hold both data values and pointers to the previous and subsequent nodes. This is the data structure used to implement the idea of a list in C++.

data value

pointer to the next node

std::forward_list

Doubly linked list

Linked list where nodes also hold pointer to the previous node

std::list

List Lista リスト

Sequence container holding a finite number of ordered values where repetition of data values may occur

std::list

Common implementations include linked lists such that each data value also has a pointer to the next data value, allowing for non-contiguous storage.

Deque 両端キュー

Sequence container with FIFO (First-In-First-Out) access at both ends of the sequence container.

One can construct a deque by establishing a front dynamic array and back dynamic array, where one can 'push and pop' the heads of the front array and back array. Deques are rebalanced when one of the arrays is exhausted.

std::deque

Set セット

Sequence container that holds fixed size key values

std::set std::unordered_set

As for data structure implementation, consider an array \(A : \forall i, A[i] \in [0,n]\), see that a hash table \(B: |B| = n\) is not feasible memory-wise, hence some mapping function (for instance, the modulus) is used and the array is often made to be double the size of the \(B\).

Map Mappa マップ

Sequence container that maps a data value to fixed size key values

std::map std::unordered_map

Stack Pila スタック

Sequence container with LIFO (Last-In-First-Out) access

std::stack

Queue Coda キュー

Sequence container with FIFO (First-In-First-Out) access

std::queue

Priortiy queue Coda di priorità 優先度付キュー

Sequence container that allows access to a data value with some highest attribute

std::priority_queue

Graph Grafo グラフ

A sequence containter containing edge weights between vertexes

Tree Albero 木

A sequence containter containing values of a tree

zaco | ザコ

31251 - Data Structures and Algorithms

Fundamentals

Model of computation

Abstract Data Type (ADT)

Data structure

Algorithmic time-space continuum

Iteration

Invariant

Big-O Analysis

Big-\(O\) Grande-\(O\) 大\(O\)

Formal definition

Limit definition

Big-\(\Omega\) Grande-\(\Omega\) 大\(\Omega\)

Formal definition

Limit definition

Big-\(\Theta\) Grande-\(\Theta\) 大\(\Theta\)

Formal definition

Limit definition

Amortized analysis Analisi ammortizzato 償却解析

Sorting algorithms Algoritmi d'ordinamento ソート

Sorting algorithm

Orders

Properties

Insertion sort

Invariant

Binary search

Invariant

Counting sort

Radix sort

Recursion Ricorsione 再帰

Divide and Conquer algorithm Algoritmo dividi e conquista 分割統治法

Recursive time complexity

Mergesort

Quicksort

Lomuto loop

Funny sort

Bubble sort

Bogo sort

Hashing

Hash table ハッシュテーブル

Bucket Secchio バケツ

Hashing function

Collision Collisione 衝突

Load factor

Separate chaining

Simple Uniform Hashing Assumptionm (SUHA)

Graphs

Binary tree Albero binario 二分木

Complete binary tree

Binary heap

Heap property

Swimming

Sinking

Edge weight

Negative weights

Weighted graph Graffico pesato 重量グラフ

Weighted adjacency matrix Matrice delle adiacenze pesato 重量隣接行列

Directed

Graph Implementations

Adjacency matrix Matrice delle adiacenze 隣接行列

Adjacency list lista delle adiacenze 隣接リスト

Adjacency set Insieme delle adiacenze 隣接集合

Depth first search (DFS)

Recursive

Iterative

Breadth first search (BFS)

Distance function

Unconnected case

Negative weight case

Arborescence

Single-Source Shortest Path

Single-Pair Shortest Path

Path relaxation

Generic Shortest Path Algorithm

Invariants

Bellman-Ford Algorithm Algoritmo Bellman-Ford ベルマンフォード法

Negative cycle

Algorithm

Dijkstra's Algorithm Algoritmo di Dijkstra ダイクストラ法