Consider a logical formula. Variables combined with AND, OR, and NOT. You are asked: can you assign TRUE or FALSE to each variable such that the whole thing evaluates to TRUE?
This is the Boolean Satisfiability Problem, or SAT. It looks simple but it is, in fact, the problem that defines computational hardness.
In 1971, Stephen Cook proved that SAT is NP-Complete — the first problem ever shown to have this property. Every other problem in NP can be reduced to SAT. Solve SAT efficiently, and you solve everything. (You would also become very famous. And possibly break the internet. Literally.)
A formula in conjunctive normal form:
(x OR y) AND (NOT x OR z) AND (NOT y OR NOT z)
Can you make this TRUE?
Let’s try x = TRUE, y = FALSE, z = TRUE:
- (TRUE OR FALSE) = TRUE
- (NOT TRUE OR TRUE) = TRUE
- (NOT FALSE OR NOT TRUE) = TRUE
All clauses satisfied. The formula is satisfiable.
Now consider:
(x) AND (NOT x)
This requires x to be both TRUE and FALSE. It is not. The formula is unsatisfiable. There is nothing to be done.
The SAT problem: given any formula, determine if it’s satisfiable. If so, find an assignment. If not, prove impossibility.
Why This Is Difficult
For small formulas, SAT is trivial. Enumerate all combinations. Check each one.
The trouble is that the number of combinations is 2ⁿ.
| Variables | Assignments | Time at 1 billion/sec |
|---|---|---|
| 10 | 1,024 | Instant |
| 20 | ~1 million | 1 millisecond |
| 50 | 10¹⁵ | 13 days |
| 100 | 10³⁰ | 40 billion years |
| 300 | 10⁹⁰ | (no) |
For context: there are approximately 10⁸⁰ atoms in the observable universe. A 300-variable brute force would require more steps than there are atoms. By a factor of 10¹⁰.
Real-world SAT instances have millions of variables. Brute force is not a strategy. It is a cry for help.
NP-Completeness
Cook’s theorem established that SAT is NP-Complete. This means:
NP: If someone hands you a solution, you can verify it quickly. (Just plug in the values and evaluate. Polynomial time.)
Complete: Every problem in NP can be transformed into an equivalent SAT instance. Solve SAT, solve them all.
The implication: SAT is at least as hard as every other problem in NP. Graph coloring, travelling salesman, scheduling, cryptography — all of them reduce to SAT.
This is the P versus NP question in disguise. If P = NP, SAT has a polynomial-time solution and we rewrite civilization. If P ≠ NP (the prevailing belief), SAT is fundamentally intractable.
Most researchers have made peace with the latter.
How SAT Is Actually Solved
Despite being NP-Complete, modern SAT solvers routinely handle millions of variables. This seems contradictory. It is not.
Worst-case complexity is exponential. Average-case complexity, for structured instances, is often manageable. The algorithms are clever. The heuristics are ruthless.
DPLL
The Davis-Putnam-Logemann-Loveland algorithm (1962) is the foundation. It is backtracking, but with intelligence.
Unit propagation: If a clause has only one literal, that literal must be TRUE. Set it and Simplify.
Pure literal elimination: If a variable appears only positively (or only negatively) throughout the formula, set it to the polarity that helps.
Branching: Pick an unassigned variable. Assume TRUE. Recurse. If contradiction, backtrack and try FALSE.
This is smarter than enumeration because it detects failure early and avoids pointless work. It prunes.
CDCL
Conflict-Driven Clause Learning. The modern engine. Every serious SAT solver uses this.
The key innovation: when you reach a contradiction, don’t just backtrack. Understand why.
- Analyze the conflict: Which assignments conspired to produce this failure?
- Learn a clause: Create a new constraint that forbids this combination.
- Add it to the formula: Future search will not repeat this mistake.
It is, in essence, solving a maze while leaving notes. “Dead end. Don’t come this way if you have x=TRUE and y=FALSE.” The notes accumulate and it shrinks the search space.
VSIDS
Variable State Independent Decaying Sum. A heuristic for choosing which variable to branch on next.
- Variables involved in recent conflicts get higher scores
- Scores decay over time
- Always branch on the highest-scoring variable
The intuition: troublesome variables are likely to remain troublesome. Focus on them. (This heuristic was discovered empirically, not derived theoretically. It simply works.)
Preprocessing
Before the search begins, clean the formula:
- Remove duplicate clauses
- Subsumption: if (x) exists, (x OR y) is redundant
- Variable elimination: some variables can be substituted out
- Logical simplification
A smaller formula is a faster formula. Often dramatically so.
The Phase Transition
For random 3-SAT instances (each clause has exactly 3 literals), there is a critical ratio: m/n ≈ 4.27, where m is the number of clauses and n is the number of variables.
Below this ratio: almost all instances are satisfiable. Easy.
Above this ratio: almost all instances are unsatisfiable. Also easy (to prove).
At the ratio: maximum difficulty. The boundary between satisfiable and unsatisfiable.
Real-world instances, thankfully, are not random. They have structure which can be exploited. This is why SAT solvers work.
Wherein SAT Appears in Unexpected Places
Hardware verification: Intel uses SAT solvers to verify processor designs before manufacturing. A bug caught here saves millions. A bug missed costs more. (See: the Pentium FDIV bug, 1994.)
Software verification: Does this program ever crash? Can this security property be violated? Encode the question as SAT. Let the solver answer.
Package management: Your package manager (npm, pip, cargo) determines compatible dependency versions by solving SAT. Every successful install is a satisfying assignment. (Every failed one is proof of unsatisfiability, or user error.)
Cryptanalysis: Encoding cryptographic problems as SAT and solving. Finding hash collisions, attacking weak ciphers.
Sudoku: A classic. Each constraint becomes a clause.
Beyond SAT
SAT has variants for those who find Boolean unsatisfying.
SMT (Satisfiability Modulo Theories): SAT extended with arithmetic, arrays, bit-vectors. Tools like Z3. Used heavily in program verification. If you want to ask “can these two pointer expressions ever alias?” this is how.
MaxSAT: Find an assignment that satisfies the maximum number of clauses. When perfection is impossible, settle for optimization.
#SAT: Count the satisfying assignments. Harder than SAT (#P-complete). Used in probabilistic inference, model counting, and generating existential dread about computational complexity.
The Takeaway
SAT is the canonical hard problem. The first NP-Complete problem. The benchmark against which all others are measured.
And yet, despite theoretical intractability, we solve it daily. Millions of variables. Seconds of runtime. The gap between worst-case theory and average-case practice is vast.
This is either a triumph of engineering or a sign that we don’t fully understand what makes problems hard. Possibly both.