31251 - Data Structures and Algorithms


Fundamentals

Model of computation

Model that describes the computational process

This subject implements data structures and runs algorithms on a Random-access machine; a model of computation with infinite discrete memory addresses.

such that the following can be executed in \(O(1)\) on a random-access machine:

Abstract Data Type (ADT)

Description of the permitted logical operations of a data type; mathematical abstraction of data type's behaviour

Data structure

Description of the physical implementation of a data type; concrete implementation of data type within a model of computation

Algorithmic time-space continuum

Technique of using extra memory to reduce time complexity

Iteration

Repeated process of an algorithm

Invariant

Condition that holds true for every iteration of an algorithm

Big-O Analysis

Big-\(O\) Grande-\(O\) 大\(O\)

\(O(g)\) represents the set of all functions that \(k \cdot g\) dominates

Formal definition

\( f \in O(g) \iff \exists k,n_0 \gt 0 \exists n_0 ( \forall n \geq n_0 [ |f(n)| \leq k g(n) ] ) \)

Limit definition

\( f \in O(g) \iff \limsup_{n \to \infty} \frac{f(n)}{g(n)} \lt \infty \)

Big-\(\Omega\) Grande-\(\Omega\) 大\(\Omega\)

\(\Omega (g)\) represents the set of all functions that \(k \cdot g\) is dominated by

Formal definition

\( f \in \Omega (g) \iff \exists k,n_0 \gt 0 ( \forall n \geq n_0 [ f(n) \geq k g(n) ] ) \)

\( g \in O(f) \iff f \in \Omega (g)\)

Limit definition

\( \liminf_{n \to \infty} \frac{f(n)}{g(n)} \gt 0 \iff f \in O(g)\)

Big-\(\Theta\) Grande-\(\Theta\) 大\(\Theta\)

\(\Theta (g)\) represents the set of all functions that can be bound by two multiples of function \(g\), say \(k_1 \cdot g\) and \(k_2 \cdot g\)

Formal definition

\( f \in \Theta (g) \iff \exists k_1,k_2,n_0 \gt 0 ( \forall n \geq n_0 [ k_1 g(n) \leq f(n) \leq k_2 g(n) ] ) \)

\( f \in O(g) \land f \in \Omega (g) \iff f \in \Theta (g)\)

Limit definition

\( \limsup_{n \to \infty} \frac{f(n)}{g(n)} \lt \infty \land \liminf_{n \to \infty} \frac{f(n)}{g(n)} \gt 0 \iff f \in \Theta (g)\)

Amortized analysis Analisi ammortizzato 償却解析

The concept of taking the average performance case of some algorithm rather than the best or worst; this is often a more practical result

Sorting algorithms Algoritmi d'ordinamento ソート

Sorting algorithm

Algorithm permutating objects in a sequence containter such that the sequence container follows a monotonic order (some property of the objects increases or decreases as the container index is higher)

Orders

Monotonic orders have two types:

Properties

Insertion sort

Sorting algorithm that sort that given the invariant that the first \(i\) elements are sorted,

Sorting algorithm that for iteration \(i\), has the invariant that the initial \(i-1\) data values are sorted. From this, it makes comparisions backwards from the sequence container to conclude where data value \(i\) will be placed (this subroutine in the worst case runs in \(\Theta (i)\)). The whole algorithm in the worst case runs in \( \Theta (n^2) \) and in the best case runs in \( \Theta (n)\)

  1. Begin with index \(j\) at the end of the sequence container, if the value \(v\) is:
    • \(v \geq A[j]\), then shift sequence container as appropriate and set \(v\) at \(A[j+1]\)
    • \(v < A[j]\), then decrement \(j\) and repeat step 1

Invariant

First \(i\) data values of array sorted

Binary search

Searching algorithm on a sorted list to find a data value with a certain valuer. It initially reads the center index and compares its size with the desired data value, oscillating between the end. it has a worst-case runtime of \(\Theta (\log(n) ) \)

  1. Set the upper, middle, and lower indexes \(u,m,\ell\) to borders of sorted list
  2. if the value \(v\) is:
    • \(v > A[m]\), then set \(\ell = m\), calculate the new \(m\) and repeat step 2
    • \(v < A[m]\), then set \(u = m\), calculate the new \(m\) and repeat step 2
    • \(v = A[m]\), then return \(m\)

Invariant

For data value \(a\), there is an upper \(u\) and lower bound \(\ell\) index known

Counting sort

Sorting algorithm for \(n\) natural numbers in the discrete interval \([0,k]\) with worst-case running time \( O(n + k)\). It is stable but not in place.

Let \(A\) be the array to sort:

  1. Create a counting array \(C\) of size \(k+2\) and assign the frequency of the number \(i\) at \(C[i+1]\) (note that \( \sum^{k+2}_{i=0} C[i]= n\))
  2. Reasign \(C\) to be cumulative, such that \(C[i]\) represents the frequency of the first \(i-1\) numbers in \(A\), so \(C[i] = \sum^{i-1}_{j=0} C[j]\)
  3. Create a new array \(B\) of size \(n\) and assign values by starting \(i=0\) and placing \(A[i]\) in the original array at \(B[C[A[i]]]\), or if that index is already assigned, assign \(A[i]\) at \(B[C[A[i]]+1]\)
  4. Copy new array as original array, so make \(A=B\)

Radix sort

Sorting algorithm for \(n\) natural numbers interval \([0,10^{d}]\) that employs a stable sorting method to sort each digit place on \(k\) passes. It has a worst case runtime of \( O(dN) \), where \(N\) is the worst time runtime of the chosen stable sorting algorithm.

Recursion Ricorsione 再帰

The concept of solving a subproblem within some problem, calling a function within itself

Divide and Conquer algorithm Algoritmo dividi e conquista 分割統治法

Algorithm paradigm that recursively partitioning a problem into sub-problems (partitioning half of an array etc) and applying some algorithm independently on both partitions. Note that in some contexts the solution to the problem may be relative two different elements in different sub-problems, hence the standard approach is:

Recursive time complexity

Algorithms that employ recursive measures will have their time complexity bound by a recurrence relation, where \(T(n)\) and some base case for a reduced case like \(T(1)\)

Mergesort

Sorting algorithm with worst-case running time \( O(n \log(n))\). It is stable but not in place.

Let \(A\) be the array to sort:

  1. If partition has size \(1\) then exit the recursion layer, of if there is no recursion layer, the algorithm is complete
  2. Partition \(A\) in two half and 'mergesort' both halves and then 'merge' both halves together

Note the use of divide and conquer strategies

Quicksort

Sorting algorithm with worst-case running time \( \Theta(n^2)\) and average-case running time \(O(n \log (n))\). It is in place and comparison based but not stable.

Let \(A\) be the array to sort:

  1. Pivot \(A[0]\) to a position \(i\) such that \(j \lt i \iff A[j] \leq A[i] \), note that \(A[i]\) forms two partition
  2. 'Quicksort' these two partitions

Lomuto loop

A pivoting algorithm that can be implemented in quicksort, the general idea is to partition a vector; one part has all values less that or equal to the pivot, another greater than the pivot. This partitioning index is then where the pivot is inserted.

Let \(A\) be the array, \(i=1\) be an iterator over \(A\) and \(\ell=1\) be the partitioning index:

  1. For every iteration of \(i\), if \(A[i] \leq A[0]\), swap \(A[\ell]\) and \(A[i]\) and increment \(\ell\)
  2. \(\ell-1\) is the index at which to insert the pivot

Funny sort

Bubble sort

Bogo sort

Hashing

Hash table ハッシュテーブル

Data structure comprised a static array with each slot containing a linked list for buckets. A hashing function is also used to allocate key-value pairs to designated buckets

Bucket Secchio バケツ

Memory store in a hash table labelled by key values, storing data values

Hashing function

Function that assigns data values to buckets

Collision Collisione 衝突

When two values map to the same hash object due to a lack of 1:1 correspondence

Load factor

Ratio of entry array size to hash table buckets \(\frac{|A|}{|B|}\)

Separate chaining

Designating each bucket with a sequence container to prevent collisions

Simple Uniform Hashing Assumptionm (SUHA)

Assumption that for some hash table \(H\) and some bucket \(B\) chosen for some value, then treating the bucket as a random variable, \(B \sim \text{U}(0,|H|-1)\).

In other words, this assumption is that for resolving some value with the hashing function as a black box, all buckets have an equal probability of being resolved to. This means there is no numerical instability in the hash table that favours certain buckets.

Graphs

See Discrete Mathematics for an introduction to graph theory concepts

Binary tree Albero binario 二分木

Tree data structure such that each parent has at most two children

Complete binary tree

Binary tree such that each layer has maximum vertexes, except possibly the last layer which is filled from the left

A few properties of this tree become visible:

\(G \text{ is a complete binary tree} \implies h = \lfloor \log(|V|) \rfloor\)

\(G \text{ is a complete binary tree} \implies |V| \gt \sum^{h-1}_{i=0} 2^i\)

Binary heap

Binary tree data structure typically employing a vector with the invariant that it represents a complete binary tree and satisfies the heap property

Heap property

The key stored in each vertex is less than or equal to the keys its children hold

Swimming

Technique of a key in a binary heap (minus heap property) swapping keys with a parent vertex until heap property is satisfied. The need for this arises when appending new vertexes to the binary heap.

Sinking

Technique of a key in a binary heap (minus heap property) swapping keys with a smallest child vertex until heap property is satisfied. The need for this arises when removing hte minimum element from a binary heap, which is done by replacing root vertex with the rightmost vertex and sinking this datavalue to restore the heap property.

Edge weight

Value associated with an edge to represent some quantity for comparison of edges

Negative weights

Weighted graph Graffico pesato 重量グラフ

A graph \(G\) is a pair \( (V,E,W) \) where:

Weighted adjacency matrix Matrice delle adiacenze pesato 重量隣接行列

A matrix \(A\) with dimensions \(|V| \times |V|\) such that:

\( A_{ij} = \begin{cases} 0 & \{i,j\} \notin E \\ w( \{i,j \}) & \{i,j\} \in E \end{cases}\)

Undirected graphs have symmetric matrixes

Directed

\( A_{ij} = \begin{cases} 0 & (i,j) \notin E \\ 1 & (i,j) \in E \end{cases}\)

Graph Implementations

Adjacency matrix Matrice delle adiacenze 隣接行列

Adjacency list lista delle adiacenze 隣接リスト

Adjacency set Insieme delle adiacenze 隣接集合

A linked list sequence container containing edge weights between vertexes

Depth first search (DFS)

Searching algorithm on a graph that works by:

It has running time \( O(|V| + |E|) \)

Recursive

DFS recursively operating on each node

Iterative

DFS can emulate recursion using a stack data structure to fill with child nodes to verify that they have been marked.

Breadth first search (BFS)

Graph searching algorithm based on iterative DFS with the stack replaced with a queue, so that the the FIFO structure can

It has a running time of \( O( |V| + |E| ) \) with the adjacency list model

Distance function

For weighted graphs, \(d : \mathbb{N}^2 \to \mathbb{R}_{+} \) is a distance function denoting the weight of the shortest path between two vertexes.

Unconnected case

If two vertexes are not connected, then the distance is \(\infty\).

Negative weight case

If there exists a negative weight that can be infinitely abused in a cycle, the distance is \(-\infty\). This is the only case that a cycle occurs in a shortest path

Arborescence

A directed graph such that there is a directed path from vertex \(v\) to all other vertexes

Single-Source Shortest Path

Shortest paths from a particular vertex \(v\) to any other vertex on the graph

Single-Pair Shortest Path

Shortest path between two vertexes\((v_1, v_2)\)

Path relaxation

checking if \( \text{dist_to}[v_0] + e.\text{weight} \lt \text{dist_to}[v_1] \), if so, then let:

Generic Shortest Path Algorithm

Shortest path algorithms note the length of the known path to some vertex \(v\) and the previous vertex encountered when following this path, say \(u\). When exploring vertexes, these paths are relaxed by comparing the shortest known path with the newly found path, and updating information as required.

Let vertex \(0\) be the source:

  1. Initialize \(\text{dist_to}[i] = \begin{cases} 0 & i = 0 \\ \infty & i \neq 0 \end{cases} \)
  2. Initialize \(\text{edge_to}[i] = -1 \)
  3. Employing some form of iterative edge relaxation

Invariants

Bellman-Ford Algorithm Algoritmo Bellman-Ford ベルマンフォード法

Algorithm that, starting from the source vertex, relaxes all the graph's edges \(|V|-1\) times, with a running time of \(O( |V| \cdot |E| )\). This works since the pidgeon hole principle implies that relaxing edges more than \(|V|-1\)would have no effect, that is if there are no negative weights

Negative cycle

A graph has negative weights iff a \(|V|\)th iteration changes the \(\text{dist_to}\) array, meaning Bellman-Ford can detect negative cycles with an extra loop on all edges

Algorithm

  1. For each vertex, relax all edges

Dijkstra's Algorithm Algoritmo di Dijkstra ダイクストラ法

Algorithm that, starting from the source vertex, relaxes the outgoing edges of the vertex closest to the source \(O( |V| + |E|\log (|E|) )\). This works since inductive logic shows that a shortest path to \(v_n\) requires the path knowing the shortest path to \(v_{n-1}\)

Invariants

Algorithm

While the priority queue is not empty, pop vertex \(v\) and do the following:

  1. Record all edges and edgeweights of vertex \(v\) to its neighbours \(N\)
  2. Relax all vertexes in \(N\)
  3. Push each vertex in \(N\) on the queue in order of closest to \(v\)

Binary Search Tree (BST)

Binary tree data structure with the BTS property

BST property

Invariant that for every subtree, the left branches of the subtree are less than the subtree's root's value and the right branches of the subtree are greater than the subtree's root's value

Job scheduling

Topological order

A linear order of vertexes such that no vertex has a path to a vertex earlier in the order

Directed Acyclic Graph (DAG)

A directed graph with no cycles, a graph has a topological order iff it is a DAG. To find

Unmarked path property

For an unmarked path \( (v_1,v_k) \), the recursive depth visit of \(v_k\) is embedded within the depth visit of \( v_1\)

DAG DFS

DFS with the following modifications:

Back egde

A directed edge to a vertex already on stack. Graphs have a back edge iff they have a cycle (when a graph is not a DAG)

Proof

To show \( \text{Back egde} \implies \implies \text{Cycle}\), we assume there is a back edge \( (v_i, v_j) \), this means some vertex \(v_j\) is already on the stack, meaning that \(v_j\) has been encountered on \( \text{dfs}(v_i) \) and hence a cycle has occured.

To show \( \text{Cycle} \implies \text{Back edge} \), we assume there is some cycle on vertex \( v_i\), or in other words, \(\text{dfs}(v_i)\) finds \(v_i\). The unmarked path property implies that \(\text{dfs}(v_j)\) is embedded within the algorithm \(\text{dfs}(v_i) \). Therefore when \( \text{dfs}(v_j)\) finds \(v_i\) which is on the stack due to this call being embedded within the call \(\text{dfs}(v_i)\). Therefore there is a back edge \( (v_i, v_j) \)

Shortest path for DAG

  1. Use DAG DFS to find a topological sort
  2. Use Bellman-Ford to relax each edge for each vertex in the sort

Memoization

Storing previously calculated values in a sequence container called a memo for future reference

Dynamic programming

Splitting a problem into subproblems and using the information from overlapping subproblems

ADTs

Iterator

Generalized pointer object for a sequence container (note that some sequence containers are not contiguous; iterators are useful for handling the complexities of data structures like linked lists)

std::iterator

Static array

Fixed size array

std::array

Dynamic array

Variable size array

std::vector

As a data structure, it can be initialized as a static array and is replaced with a static array of double the size once new indexes become necessary, this is array doubling

Linked list Lista concatenata 連結リスト

Data structure containing nodes that hold both data values and pointers to the previous and subsequent nodes. This is the data structure used to implement the idea of a list in C++.

std::forward_list

Doubly linked list

Linked list where nodes also hold pointer to the previous node

std::list

List Lista リスト

Sequence container holding a finite number of ordered values where repetition of data values may occur

std::list

Common implementations include linked lists such that each data value also has a pointer to the next data value, allowing for non-contiguous storage.

Deque 両端キュー

Sequence container with FIFO (First-In-First-Out) access at both ends of the sequence container.

One can construct a deque by establishing a front dynamic array and back dynamic array, where one can 'push and pop' the heads of the front array and back array. Deques are rebalanced when one of the arrays is exhausted.

std::deque

Set セット

Sequence container that holds fixed size key values

std::set std::unordered_set

As for data structure implementation, consider an array \(A : \forall i, A[i] \in [0,n]\), see that a hash table \(B: |B| = n\) is not feasible memory-wise, hence some mapping function (for instance, the modulus) is used and the array is often made to be double the size of the \(B\).

Map Mappa マップ

Sequence container that maps a data value to fixed size key values

std::map std::unordered_map

Stack Pila スタック

Sequence container with LIFO (Last-In-First-Out) access

std::stack

Queue Coda キュー

Sequence container with FIFO (First-In-First-Out) access

std::queue

Priortiy queue Coda di priorità 優先度付キュー

Sequence container that allows access to a data value with some highest attribute

std::priority_queue

Graph Grafo グラフ

A sequence containter containing edge weights between vertexes

Tree Albero 木

A sequence containter containing values of a tree