Breadth-first search

Breadth-first search
	Order in which the nodes are expanded
Class	Search algorithm
Data structure	Graph
Worst-case performance
Worst-case space complexity

Animated example of a breadth-first search

Breadth-first search (BFS) is an algorithm for traversing or searching tree or graph data structures. It starts at the tree root (or some arbitrary node of a graph, sometimes referred to as a 'search key'^[1]), and explores all of the neighbor nodes at the present depth prior to moving on to the nodes at the next depth level.

It uses the opposite strategy as depth-first search, which instead explores the highest-depth nodes first before being forced to backtrack and expand shallower nodes.

BFS and its application in finding connected components of graphs were invented in 1945 by Konrad Zuse and Michael Burke, in their (rejected) Ph.D. thesis on the Plankalkül programming language, but this was not published until 1972.^[2] It was reinvented in 1959 by Edward F. Moore, who used it to find the shortest path out of a maze,^[3]^[4] and later developed by C. Y. Lee into a wire routing algorithm (published 1961).^[5]

Pseudocode

Breadth first traversal is accomplished by enqueueing each level of a tree sequentially as the root of any subtree is encountered. There are 2 cases in the iterative algorithm.

Root case: The traversal queue is initially empty so the root node must be added before the general case.
General case: Process any items in the queue, while also expanding their children. Stop if the queue is empty. The general case will halt after processing the bottom level as leaf nodes have no children.

Input: A search problem. A search-problem abstracts out the problem specific requirements from the actual search algorithm.

Output: An ordered list of actions to be followed to reach from start state to the goal state.

Below is a Python listing for a breadth first problem, where the exact nature of the problem is abstracted in the problem object.

def breadth_first_search(problem):

  # a FIFO open_set
  open_set = Queue()

  # an empty set to maintain visited nodes
  closed_set = set()
  
  # a dictionary to maintain meta information (used for path formation)
  # key -> (parent state, action to reach child)
  meta = dict()

  # initialize
  root = problem.get_root()
  meta[root] = (None, None)
  open_set.enqueue(root)

  # For each node on the current level expand and process, if no children 
  # (leaf) then unwind
  while not open_set.is_empty():

    subtree_root = open_set.dequeue()
    
    # We found the node we wanted so stop and emit a path.
    if problem.is_goal(subtree_root):
      return construct_path(subtree_root, meta)

    # For each child of the current tree process
    for (child, action) in problem.get_successors(subtree_root):
      
      # The node has already been processed, so skip over it
      if child in closed_set:
        continue
      
      # The child is not enqueued to be processed, so enqueue this level of
      # children to be expanded
      if child not in open_set:
        meta[child] = (subtree_root, action) # create metadata for these nodes
        open_set.enqueue(child)              # enqueue these nodes
    
    # We finished processing the root of this subtree, so add it to the closed 
    # set
    closed_set.add(subtree_root)

# Produce a backtrace of the actions taken to find the goal node, using the 
# recorded meta dictionary
def construct_path(state, meta):
  action_list = list()
  
  # Continue until you reach root meta data (i.e. (None, None))
  while meta[state][0] is not None:
    state, action = meta[state]
    action_list.append(action)
  
  action_list.reverse()
  return action_list

More details

This non-recursive implementation is similar to the non-recursive implementation of depth-first search, but differs from it in two ways:

it uses a queue (First In First Out) instead of a stack (Last In First Out) and
it checks whether a vertex has been discovered before enqueueing the vertex rather than delaying this check until the vertex is dequeued from the queue.

The open_set queue contains the frontier along which the algorithm is currently searching.

The closed_set set is used to track which vertices have been visited (required for a general graph search, but not for a tree search). At the beginning of the algorithm, the set is empty. At the end of the algorithm, it contains all vertices with a distance from root less than the goal.

Note that the word state is usually interchangeable with the word node or vertex.

Breadth-first search produces a so-called breadth-first tree. You can see how a breadth-first tree looks in the following example.

Example

The following is an example of the breadth-first tree obtained by running a BFS starting from Frankfurt:

An example map of Germany with some connections between cities

The breadth-first tree obtained when running BFS on the given map and starting in Frankfurt

Analysis

Time and space complexity

The time complexity can be expressed as $O(|V|+|E|)$ , since every vertex and every edge will be explored in the worst case. $|V|$ is the number of vertices and $|E|$ is the number of edges in the graph. Note that $O(|E|)$ may vary between $O(1)$ and $O(|V|^{2})$ , depending on how sparse the input graph is.^[6]

When the number of vertices in the graph is known ahead of time, and additional data structures are used to determine which vertices have already been added to the queue, the space complexity can be expressed as $O(|V|)$ , where $|V|$ is the cardinality of the set of vertices. This is in addition to the space required for the graph itself, which may vary depending on the graph representation used by an implementation of the algorithm.

When working with graphs that are too large to store explicitly (or infinite), it is more practical to describe the complexity of breadth-first search in different terms: to find the nodes that are at distance $d$ from the start node (measured in number of edge traversals), BFS takes $O (b d + 1)$ time and memory, where $b$ is the "branching factor" of the graph (the average out-degree).^[7]^:81

Completeness

In the analysis of algorithms, the input to breadth-first search is assumed to be a finite graph, represented explicitly as an adjacency list or similar representation. However, in the application of graph traversal methods in artificial intelligence the input may be an implicit representation of an infinite graph. In this context, a search method is described as being complete if it is guaranteed to find a goal state if one exists. Breadth-first search is complete, but depth-first search is not. When applied to infinite graphs represented implicitly, breadth-first search will eventually find the goal state, but depth-first search may get lost in parts of the graph that have no goal state and never return.^[8]

BFS ordering

An enumeration of the vertices of a graph is said to be a BFS ordering if it is the possible output of the application of BFS to this graph.

Let $G=(V,E)$ be a graph with $n$ vertices. Recall that $N(v)$ is the set of neighbors of $v$ . For $\sigma =(v_{1},\dots ,v_{m})$ be a list of distinct elements of $V$ , for $v\in V\setminus \{v_{1},\dots ,v_{m}\}$ , let $\nu _{\sigma }(v)$ be the least $i$ such that $v_{i}$ is a neighbor of $v$ , if such a $i$ exists, and be $\infty$ otherwise.

Let $\sigma =(v_{1},\dots ,v_{n})$ be an enumeration of the vertices of $V$ . The enumeration $\sigma$ is said to be a BFS ordering (with source $v_{1}$ ) if, for all $1<i\leq n$ , $v_{i}$ is the vertex $w\in V\setminus \{v_{1},\dots ,v_{i}-1\}$ such that $\nu _{(v_{1},\dots ,v_{i-1})}(w)$ is minimal. Equivalently, $\sigma$ is a BFS ordering if, for all $1\leq i<j<k\leq n$ with $v_{i}\in N(v_{k})\setminus N(v_{j})$ , there exists a neighbor $v_{m}$ of $v_{j}$ such that $m<i$ .

Applications

Breadth-first search can be used to solve many problems in graph theory, for example:

Copying garbage collection, Cheney's algorithm
Finding the shortest path between two nodes u and v, with path length measured by number of edges (an advantage over depth-first search)^[9]
(Reverse) Cuthill–McKee mesh numbering
Ford–Fulkerson method for computing the maximum flow in a flow network
Serialization/Deserialization of a binary tree vs serialization in sorted order, allows the tree to be re-constructed in an efficient manner.
Construction of the failure function of the Aho-Corasick pattern matcher.
Testing bipartiteness of a graph.

References

↑ "Graph500 benchmark specification (supercomputer performance evaluation)". Graph500.org, 2010.
↑ Zuse, Konrad (1972), Der Plankalkül (in German), Konrad Zuse Internet Archive . See pp. 96–105 of the linked pdf file (internal numbering 2.47–2.56).
↑ Moore, Edward F. (1959). "The shortest path through a maze". Proceedings of the International Symposium on the Theory of Switching. Harvard University Press. pp. 285–292. As cited by Cormen, Leiserson, Rivest, and Stein.
↑ Skiena, Steven (2008). The Algorithm Design Manual. Springer. p. 480. doi:10.1007/978-1-84800-070-4_4.
↑ Lee, C. Y. (1961). "An Algorithm for Path Connections and Its Applications". IRE Transactions on Electronic Computers.
↑ Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2001) [1990]. "22.2 Breadth-first search". Introduction to Algorithms (2nd ed.). MIT Press and McGraw-Hill. pp. 531–539. ISBN 0-262-03293-7.
↑ Russell, Stuart; Norvig, Peter (2003) [1995]. Artificial Intelligence: A Modern Approach (2nd ed.). Prentice Hall. ISBN 978-0137903955.
↑ Coppin, B. (2004). Artificial intelligence illuminated. Jones & Bartlett Learning. pp. 79–80.
↑ Aziz, Adnan; Prakash, Amit (2010). "4. Algorithms on Graphs". Algorithms for Interviews. p. 144. ISBN 1453792996.

Knuth, Donald E. (1997), The Art of Computer Programming Vol 1. 3rd ed., Boston: Addison-Wesley, ISBN 0-201-89683-4

External links

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[1] "Graph500 benchmark specification (supercomputer performance evaluation)". Graph500.org, 2010.

[2] Zuse, Konrad (1972), Der Plankalkül (in German), Konrad Zuse Internet Archive . See pp. 96–105 of the linked pdf file (internal numbering 2.47–2.56).

[3] Moore, Edward F. (1959). "The shortest path through a maze". Proceedings of the International Symposium on the Theory of Switching. Harvard University Press. pp. 285–292. As cited by Cormen, Leiserson, Rivest, and Stein.

[skiena-4] Skiena, Steven (2008). The Algorithm Design Manual. Springer. p. 480. doi:10.1007/978-1-84800-070-4_4.

[5] Lee, C. Y. (1961). "An Algorithm for Path Connections and Its Applications". IRE Transactions on Electronic Computers.

[clrs-6] Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2001) [1990]. "22.2 Breadth-first search". Introduction to Algorithms (2nd ed.). MIT Press and McGraw-Hill. pp. 531–539. ISBN 0-262-03293-7.

[7] Russell, Stuart; Norvig, Peter (2003) [1995]. Artificial Intelligence: A Modern Approach (2nd ed.). Prentice Hall. ISBN 978-0137903955.

[coppin-8] Coppin, B. (2004). Artificial intelligence illuminated. Jones & Bartlett Learning. pp. 79–80.

[9] Aziz, Adnan; Prakash, Amit (2010). "4. Algorithms on Graphs". Algorithms for Interviews. p. 144. ISBN 1453792996.