GT

My Summary on Graph Algorithms

In this article, I will summarize the some topics on graph theories and algorithms.

In a nutshell, the topics on graph data structure include: traversal, connectivity, shortest path, minimum spanning tree, topology sorting.

Graph Traversal

DFS, BFS. Too boring to elaborate.

Connectivity

A typical description for this type of problem is finding the number of components in an undirected graph.

Given n nodes labeled from 0 to n - 1 and a list of undirected edges (each edge is a pair of nodes), write a function to find the number of connected components in an undirected graph.

Union-Find algorithm is a very common solution for this type of problems. Here I want to spent a few paragraphs to talk about union-find and disjoint set.

1. Disjoint set

We can just imagine a disjoint set data structure is a group of sets. This structure supports three major methods: union, find, parent. A very nice introduction can be found here on Youtube by Tushar Roy.

2. Methods: find, union, parent

Find method is finding the representative element of the given element’s set.

Union method is join the sets of the given two element together.

Parent is returning the parent of given elements. In most algorithm excercise, we just use a list and its index to perform it. A snippet for the quoted problem above:


def number_of_connected_graph(n, edges):
	def find(v):
		while v != parent[v]:
			# following line is path compression. 
			#A optimization for find
			parent[v] = parent[parent[v]]
			v = parent[v]
		return v
	def union(e):
		# e is an edge.
		parent_u, parent_v = map(find, e)
		parent[parent_u] = parent_v
	
	parent = list(range(n))
	for e in edges:
		union(e)
	return len(set(map(find,parent)))
	# Note the last line here:
	# One more find operation on nodes is to compress the path.

Shortest Path

This topic can be categorized into several sub-topics: directed edges vs undirected edges; weighted edges vs unweighted edges.

BFS for unweightededges. Dijkstra for weighted edges, complexity O(EV + V^2 * logV) Floyd for weighted edges, but time complexity is O(V^3). Choosing between Dijkstra and Floyd depends on the density of the graph.

Minimum Spanning Tree

Prim:

T <- spanning tree
S <- any node
Q <- minQueue for all neighbor of S
while T not complete:
	n = Q.heappop
	T.add(n)
	for all n.neighbors:
		if not in T:
			Q.heappush

Topology Sorting

Topology sort of a directed graph is a linear ordering of its vertices such that for every directed edge uv, u comes before v in the ordering.

DFS vs BFS (Kahn’s algorithm)

There are a total of n courses you have to take, labeled from 0 to n - 1.

Some courses may have prerequisites, for example to take course 0 you have to first take course 1, which is expressed as a pair: [0,1]

Given the total number of courses and a list of prerequisite pairs, is it possible for you to finish all courses?

Kahan’s Algorithm:

L ← Empty list that will contain the sorted elements
S ← Set of all nodes with no incoming edges
while S is non-empty do
    remove a node n from S
    add n to tail of L
    for each node m with an edge e from n to m do
        remove edge e from the graph
        if m has no other incoming edges then
            insert m into S
if graph has edges then
    return error (graph has at least one cycle)
else
    return L (a topologically sorted order)