Datastructures using C++: October 2023

Monday, October 30, 2023

Prim’s algorithm for finding MST (Minimum Spanning Tree)

Prim's algorithm to find minimum cost spanning tree uses the greedy approach. Prim's algorithm, in contrast with Kruskal's algorithm, treats the nodes as a single tree and keeps on adding new nodes to the spanning tree from the given graph.

Example:

Step 1 - Remove all loops and parallel edges.

In case of parallel edges, keep the one which has the least cost associated and remove all others.

Step 2 - Choose any arbitrary node as root node

In this case, we choose S node as the root node of Prim's spanning tree. This node is arbitrarily chosen, so any node can be the root node.

Step 3 - Check outgoing edges and select the one with less cost

After choosing the root node S, we see that S,A and S,C are two edges with weight 7 and 8, respectively. We choose the edge S,A as it is lesser than the other.

We check for all edges going out from (S,A). We select the one which has the lowest cost and include it in the tree.

This process is repeated until all the nodes in the graph are covered.

Kruskal’s algorithm for finding MST(Minimum Spanning Tree)

Kruskal's algorithm to find the minimum cost spanning tree uses the greedy approach.

To understand Kruskal's algorithm let us consider the following example –

Step 1 - Remove all loops and Parallel Edges

In case of parallel edges, keep the one which has the least cost associated and remove all others.

Step 2 - Arrange all edges in their increasing order of weight

Step 3 - Add the edge which has the least weightage

Now we start adding edges to the graph beginning from the one which has the least weight. Throughout, we shall keep checking that the spanning properties remain intact. In case, by adding one edge, the spanning tree property does not hold (forms a cycle) then we shall not include the edge in the graph.

The least cost is 2 and edges involved are B,D and D,T. We add them.

Next cost is 3, and associated edges are A,C and C,D. We add them again –

Next cost in the table is 4, and we observe that adding it will create a circuit in the graph. We ignore it.

We observe that edges with cost 5 and 6 also create circuits. We ignore them and move on.

Now we are left with only one node to be added. Between the two least cost edges available 7 and 8, we shall add the edge with cost 7.

Minimum Spanning Tree (MST)

Given a connected and undirected graph, a spanning tree of that graph is a subgraph that is a tree and connects all the vertices together. A single graph can have many different spanning trees.

A minimum spanning tree (MST) or minimum weight spanning tree for a weighted, connected and undirected graph is a spanning tree with weight less than or equal to the weight of every other spanning tree. The weight of a spanning tree is the sum of weights given to each edge of the spanning tree.

Following are a few properties of the spanning tree connected to graph G −

A connected graph G can have more than one spanning tree.
All possible spanning trees of graph G, have the same number of edges and vertices.
The spanning tree does not have any cycle (loops).
Removing one edge from the spanning tree will make the graph disconnected, i.e. the spanning tree is minimally connected.
Adding one edge to the spanning tree will create a circuit or loop, i.e. the spanning tree is maximally acyclic.

Mathematical Properties of Spanning Tree

Spanning tree has n-1 edges, where n is the number of nodes (vertices).
From a complete graph, by removing maximum e - n + 1 edges, we can construct a spanning tree.
A complete graph can have maximum n^n-2 number of spanning trees.

A complete undirected graph can have maximum n^n-2 number of spanning trees, where n is the number of nodes. In the below given example, n is 3, hence 3³⁻² = 3spanning trees are possible.

A minimum spanning tree has (V – 1) edges where V is the number of vertices in the given graph.

Two most important spanning tree algorithms are −

Kruskal's Algorithm
Prim's Algorithm

Depth-first search

The strategy followed by depth-first search is, as its name implies, to search "deeper" in the graph whenever possible.

In depth-first search, edges are explored out of the most recently discovered vertex v that still has unexplored edges leaving it. When all of v's edges have been explored, the search "backtracks" to explore edges leaving the vertex from which v was discovered. This process continues until we have discovered all the vertices that are reachable from the original source vertex. If any undiscovered vertices remain, then one of them is selected as a new source and the search is repeated from that source. This entire process is repeated until all vertices are discovered.

As C does not have any unvisited adjacent node so we keep popping the stack until we find a node that has an unvisited adjacent node. In this case, there's none and we keep popping until the stack is empty.

Breadth-first search

Breadth-first search is one of the simplest algorithms for searching a graph and the important concept for many important graph algorithms. Dijkstra's single-source shortest-paths algorithm and Prim's minimum-spanning-tree algorithm use ideas similar to those in breadth-first search.

Given a graph G = (V, E) and a distinguished source vertex s, breadth-first search systematically explores the edges of G to "discover" every vertex that is reachable from s. It computes the distance (fewest number of edges) from s to all such reachable vertices. It also produces a "breadth-first tree" with root s that contains all such reachable vertices. For any vertex v reachable from s, the path in the breadth-first tree from s to v corresponds to a "shortest path" from s to v in G, that is, a path containing the fewest number of edges. The algorithm works on both directed and undirected graphs.

A queue is used to perform Breadth First Search. Algorithm starts from a source node and goes on finding the adjacent nodes adding nodes to the queue and pop the visited nodes.

The BFS sequence for the given graph is : s,w,r,t,x,v,u,y.

Representations of graphs

A Graph is a non-linear data structure consisting of nodes and edges. The nodes are sometimes also referred to as vertices and the edges are lines or arcs that connect any two nodes in the graph.

A Graph consists of a finite set of vertices(or nodes) and set of Edges which connect a pair of nodes.

Two most commonly used representations of a graph.

1. Adjacency Matrix

2. Adjacency List

The choice of the graph representation is situation specific. It totally depends on the type of operations to be performed and ease of use.

Adjacency Matrix:

Adjacency Matrix is a two dimension array of size V x V where V is the number of vertices in a graph. Let the 2D array be adj[ ][ ], a slot adj[i][j] = 1 indicates that there is an edge from vertex i to vertex j. Adjacency matrix for undirected graph is always symmetric. Adjacency Matrix is also used to represent weighted graphs. If adj[i][j] = w, then there is an edge from vertex i to vertex j with weight w.

Adjacency List:

An array of lists is used. Size of the array is equal to the number of vertices. Let the array be array[]. An entry array[i] represents the list of vertices adjacent to the i^th vertex. This representation can also be used to represent a weighted graph. The weights of edges can be represented as lists of pairs.

Following is adjacency list representation of the above graph.

AVL Tree

The AVL tree is named after its inventors, Georgy Adelson-Velsky and Evgenii Landis

An AVL tree defined as a self-balancing Binary Search Tree (BST) where the difference between heights of left and right subtrees for any node cannot be more than one.

The difference between the heights of the left subtree and the right subtree for any node is known as the balance factor of the node.

The above tree is AVL because the differences between the heights of left and right subtrees for every node are less than or equal to 1.

Rotating the subtrees in an AVL Tree:

An AVL tree may rotate in one of the following four ways to keep itself balanced:

Left Rotation:

When a node is added into the right subtree of the right subtree, if the tree gets out of balance, a single left rotation is performed to balance the tree.

Right Rotation:

If a node is added to the left subtree of the left subtree, the AVL tree may get out of balance, a single right rotation is performed to balance the tree.

Left-Right Rotation:

A left-right rotation is a combination in which first left rotation takes place after that right rotation is performed.

Right-Left Rotation:

A right-left rotation is a combination in which first right rotation takes place after that left rotation is performed.

Advantages of AVL Tree:

AVL trees can self-balance themselves.
It is surely not skewed.
It provides faster lookups than Red-Black Trees
Better searching time complexity compared to other trees like binary tree.
Height cannot exceed log(N), where, N is the total number of nodes in the tree.

Disadvantages of AVL Tree:

It is difficult to implement.
It has high constant factors for some of the operations.
Less used compared to Red-Black trees.
Due to its rather strict balance, AVL trees provide complicated insertion and removal operations as more rotations are performed.
Take more processing for balancing.

External Sorting(Merge Sort)

Performance of merge sort:

Regardless of best or worst case, the number of swaps in this algorithm is always 2(last -first +1) per merge, where first and last are the indexes of the first and last items in the (sub)list, respectively.

In the best case, where the list is sorted, the number of comparisons per pass is (first +last)/2.

In the worst case, where the last item of one sublist precedes only the last item of the other sublist, the number of comparisons is last -first, which approaches N.

Thus, for the worst and average cases, the number of comparisons and number of swaps for merge sort is O(N log N ).

In theory, merge sort is as fast as quicksort. In practice, however, merge sort performs a bit more slowly, because of all the data copying the algorithm requires.