SAT Encoding: Breadth First Traversal of Directed Graph
In this post, I will describe how to encode breadth-first traversal (search) (BFT/BFS) of a directed graph as a SAT problem. It will follow the same flow of the post describing how to solve a simple version of Sudoku by encoding it as a SAT problem.
Given a directed graph and a node n in the graph, breadth first traversal of the graph visits the nodes reachable from n in increasing order of levels where level i consists of all nodes that are i hops and no less than i hops away from n.
Starting the traversal from node A at level 0 in the above graph, we will visit nodes B and C at level 1 and nodes D, E, and F at level 2 (as denoted by the bracket notation in the diagram). In the same graph, starting the traversal from node B (level 0), we will visit nodes D and E at level 1, node A at level 2, node C at level 3, and node F at level 4.
How is this related to BFS?
Given a traversal in terms of which nodes were visited at what level, there can be multiple traversal orderings because there are numerous visiting orders of nodes at a level, e.g., ACBDEF and ABCDEF are both valid BFT/BFS orderings.
If you are thinking that both these orderings would not be generated by the queue-based BFS algorithm, then you are right. The reason for this is the design choices in the queue-based algorithm do not consider all possible orderings.
The input is a graph with N nodes and E directed edges. The output will be a collection of node-level pairs according to a BFT visiting order.
- Since there are N nodes, there are at most N levels (in case of path graph).
- Since each node should be visited exactly once, each node occurs at exactly one level.
- When a node m is visited at level g, there are three possibilities for each successor n of m: a) n has already been visited earlier as it has a predecessor visited at level h < g, b) n will be visited at level g as it has a predecessor visited at level g-1, or c) n will be visited at level g+1 as m is its first predecessor to be visited.
In other words, for every edge (m, n), if node m occurs at level g, then n does not occur at level h > g+1.
- Consider a mutation G of above graph without the edge (E, A). In graph G, if we start the traversal at node B. The traversal will end after visiting nodes D and E. To ensure all nodes are visited, the traversal should be “restarted” from an unvisited node. Such restarts can be admitted by associating the next level with the restart node. [Note: This association does not agree with the definition of level stated earlier but it can be retrofitted by subtracting the level of the restart node from the level of the visited nodes from the restart node.] So, in the mutation graph G, (B)(DE)(A)(C)(F) should be admitted as a valid BFT with the restart happening at A at level 3 while the traversal (B)(DEA)(C )(F) should be rejected. Based on the difference between these traversals, we observe a restart node should be the only node in its level.
Also, constraints 1–3 can be trivially satisfied by assigning the same level to all nodes. We want to prohibit such assignments by requiring nodes at a level have predecessors in the previous level.
Both these constraints can be captured as, for every pair of distinct nodes m and n occurring at level g, there exist two edges (p, m) and (q, n) such that p and q occur at level g-1.
- We want the levels to start from 1. In other words, at least one node occurs at level 1.
- There should be no gaps in level numbers, i.e., we do not want node A at level 1 and nodes B and C at level 3. In other words,
a) if there exists a node at level g, then there exists a node at level g-1, and
b) if there is no node at level g, then there are no nodes at level g+1.
We use one variable vNxLy to represent the occurrence of node Nx at level Ly. Since we have N nodes and N levels, we will have N*N variables.
Using the common constraint patterns, we can mechanically translate the constraints into CNF formulae as follows.
- Enforce constraint 1 by controlling the number of variables associated with levels.
- Use complex pattern X7 to encode constraint 2.
- Use simple pattern S6 to encode for every edge (m, n) part of constraint 3. For each edge, use compound pattern C6 to encode if node m occurs at level g, then n does not occur at level h > g+1 as it is equivalent to if node m occurs at level g, then, for all h > g+1, n does not occur at level h.
- Use simple pattern S6 to encode for every pair of distinct nodes m and n occurring at level g part of constraint 4. Use simple pattern S3 to encode there exist edges (p, m) and (q, n) part. Use Tseitin transformation to encode p and q occur at level g-1 part.
- Use simple pattern S5 to encode constraint 5.
- Use simple pattern S6 to encode for level g part and C5 to encode if ..., then there exists a node part of constraint 6a. Use simple pattern S6 to encode for level g part and C6 to encode if …, then there are not nodes part of constraint 6b (as there are no nodes can be state as for all nodes).
The above encoding scheme has some redundancy. For example, when encoding every pair of distinct nodes m and n in constraint 4, clauses generated for (m, n) will be the same as clauses generated for (n, m). This redundancy can be avoided at the time of encoding or filtered out after encoding by eliminating duplicates.
Now, we combine the formulae from encoding the constraints into one CNF formula and feed it to a SAT solver. If the solver provides a model, then we interpret each variable vNxLy that is true in the model as visiting node x at level y. From this information, we can easily identify a valid BFT/BFS ordering by arranging the nodes in the ascending order of levels and picking any ordering for nodes at the same level. That’s it :)
For You To Do
In the above code, BFT/BFS ordering is identified in the Groovy code without using the SAT solver. Extend the constraints and the encoding to identify the ordering using the SAT solver.
In the next blog, I will describe how to encode depth-first traversal (search) of a directed graph as a SAT problem.
Here’s the code snippet solves of the same problem by 1) calculating the ordering without calculating the levels and 2) using the solver to identify the ordering. However, this solution admits only those orderings that result from queue-based BFT/BFS.