Big O Isn't Everything

At a previous technical interview the topic of Breadth-first search (BFS) vs Depth-first search (DFS) came up. This is an extremely common topic for interview questions and programming puzzles, and my interviewer was looking for me to regurgitate the usual lines about BFS finding the guaranteed shortest path and (presummably) that BFS is always preferred over DFS when shortest path is a requirement. I wanted to discuss the graph itself and whether it had any exploitable structure. With an extensive background in mathematics, I've been trained to think about worst case scenarios and extreme cases (which non-mathematicians often find puzzling since the focus is not immediately on the typical or most likely cases).

The reasoning given by the interviewer for preferring BFS over DFS was that BFS gives the shortest path and BFS and DFS have the same run time, which is true asymptotically since both have a time complexity of $\mathcal{O}(|V| + |E|))$ (worst case). To see this let the graph be a straight line of vertices (a linked list, essentially), and let the starting and ending points be the two boundary vertices. Then both DFS and BFS will visit every vertex between the start and end vertices. (My interviewer also insisted that BFS used less memory than DFS when tracking which nodes have been visited, which isn't true in all cases, for similar reasons.)

Actual Run Times

The true (actual) run time of BFS and DFS depend on:

  • the structure of the graph, and
  • the starting and ending points

The apparently harsh reality that many fail to consider is that two algorithms with the same Big O (asymptotic) run time can differ in true run time (in specific cases) by an arbitrary constant multiple. That's right, one algorithm could take 10 or 100 (or any constant factor) times longer and consume that same factor more power even though they have the same asymptotic runtime (average or worst case). When you operate at scale this can make a huge difference in the power bill and resource usage, so it's worth digging into the details.

Here's an example for BFS and DFS where BFS takes twice as long as DFS. Let the graph be a cycle of size $N = 2M$ and put the starting and ending vertices on opposite sides of the graph (at antipodal points). Then BFS will visit every single vertex before finding the shortest path while DFS will visit only half of the vertices, traveling around one half of the cycle and not visiting any of the vertices on the other side.

BFS vs DFS Figure

In the diagram above, a BFS between the green vertices (left) will visit every vertex (colored red) outward one step at a time from the starting vertex. A DFS between the same vertices will only visit half of the nodes (the blue nodes). Both find a shortest length path.

Of course, with different initial placements, BFS may be better. For example, if the starting and ending vertices are one-quarter cycle apart, then BFS will visit half the vertices and always find the shortest path. DFS will visit one-quarter or three quarters of the vertices depending on which side it tries first (for an average of one-half vertices visited, and the shortest path only half of the time, assuming the initial direction is random).

BFS can be Arbitrarily Worse than DFS

If we generalize from a cycle to a k-regular graph, we can find cases where DFS is arbitrarily better than BFS. There exists a $2M$-regular graph of size $N=4M$ (for all $M$) for which BFS visits at least $M$ times as many vertices as DFS.

Let the graph be an 2M-regular cycle of size $N=4M$ for $M \geq 1$ where each vertex is connected to its $2M$ spacially-closest neighbors (in Euclidean distance). Order the vertices in the graph data structure so that the furthest vertex to the left is always the first edge. Then there is a path from one point of the cycle to its antipodal point for DFS that takes just 2 moves, whereas BFS will visit at least $2M + 1$ vertices. So in this case the actual run times differ by a factor of at least $M$. That's arbitrarily worse in $M$.

Conclusions

  • Big O is great for comparing algorithms that have different asymptotic run times
  • But for algorithms that have the same asymptotic run times, it matters if we are discussing the average case or extreme cases
  • For specific cases, the asymptotic run time is not always useful: either algorithm could be a better choice (depending on the extreme case chosen)
  • So if you know the problem has some additional structure, it's worth taking a closer look
In [ ]: