Reasoning with Computers


Brian Harvey
University of California, Berkeley
Computer Science Division,
Berkeley, CA 94720–1776, USA,
tel: 1 510 642-8311,


Logic puzzles are explored as an example of computer programs that reason. Two approaches are contrasted: rule-based logical inference and brute-force backtracking. Each approach works well for certain puzzles but can be unacceptably slow for others.


Logic puzzles, inference systems, backtracking

1 Introduction

"Intelligent agents" are said to be right around the corner; some products are already available. These programs are supposed to watch you work, make inferences about your preferred style, and automate repetitive tasks. They will sift through all the nonsense on the World Wide Web and find only those items you’ll really want to read. They’ll be teammates or opponents in computer games.

How can computer programs make inferences? I chose logic puzzles as a testbed for experimentation. Logic puzzles are a relatively easy test case because each puzzle forms a "closed world"; we know in advance all the possible names, ages, house colors, or whatever characteristics the puzzle asks us to match up. There is no new computer science here. I am more or less recapitulating the early history of computer inference systems. But educationally that may be more helpful than the complexities of the most modern attempts.

2 An inference system for one logic puzzle

I first worked on this project when I wrote the third volume of Computer Science Logo Style (MIT Press, 1987). In that volume the task I set for myself was to introduce some of the topics in the undergraduate computer science curriculum to a younger audience, and to illustrate the ideas with Logo programs rather than with formal proofs. I wantd to discuss logic as part of discrete mathematics, and thought of logic puzzles as the task for an illustrative Logo program.

The program I wrote solved only one logic puzzle, taken from Mind Benders Book B-2, by Anita Harnadek (Critical Thinking Press, 1978):

A cub reporter interviewed four people. He was very careless, however. Each statement he wrote was half right and half wrong. He went back and interviewed the people again. And again, each statement he wrote was half right and half wrong. From the inormation below, can you straighten out the mess?

The first names were Jane, Larry, Opal, and Perry. The last names were Irving, King, Mendle, and Nathan. The ages were 32, 38, 45, and 55. The occupations were drafter, pilot, police sergeant, and test car driver.

On the first interview, he wrote these statements, one from each person:

1. Jane: "My name is Irving, and I’m 45."
2. King: "I’m Perry and I drive test cars."
3. Larry: "I’m a police sergeant and I’m 45."
4. Nathan: "I’m a drafter, and I’m 38."

On the second interview, he wrote these statements, one from each person:

5. Mendle: "I’m a pilot, and my name is Larry."
6. Jane: "I’m a pilot, and I’m 45."
7. Opal: "I’m 55 and I drive test cars."
8. Nathan: "I’m 38 and I drive test cars."

This puzzle includes four categories: first name, last name, job, and age. In each category there are four individuals; for example, the first name individuals are Jane, Larry, Opal, and Perry.

For every possible pairing of individuals in different categories (for example, Jane and pilot), the program keeps track of what it knows about whether or not they go together. Initially it knows nothing, but, for example, after the first interview the program knows that Jane is not King, since the four statements are from different people.

For any given pairing, the program can know nothing, can know that the two individuals are the same, or can know that the two individuals are not the same. But there is also a fourth, perhaps more interesting, situation. After the first interview, the program does not know whether or not Jane and Irving are the same person, nor whether Jane and age 45 are the same person. But it does know that if one of these is true, the other must be false, and vice versa. This fact is represented as a link between the Jane-Irving pair and the Jane-45 pair.

The program works by making assertions based on the puzzle statement. From the first interview we get the following assertions:

Jane-King is false.
Jane-Nathan is false.
Larry-King is false.
Larry-Nathan is false.
Jane-Irving is linked to Jane-45.
King-Perry is linked to King-driver
.Larry-sergeant is linked to Larry-45.
Nathan-drafter is linked to Nathan-38.

As each assertion is recorded in the database, the program tries to use rules of inference to draw conclusions from the new assertion and the assertions already recorded. Here are the rules:

Elimination rule: If the X-Y pairs are false for all but one individual Y in a given category, then the remaining possibility must be true
Uniqueness rule: If some X-Y pair is true, then every other X-Z pair must be false if Z is in the same category as Y.
Transitive rules: If X-Y is true, and Y-Z is true, then X-Z must also be true. If X-Y is true, and Y-Z is false, then X-Z must be false.
Link falsification: If X-Y is linked to Z-W, and we learn that X-Y is true, then Z-W must be false.
Link verification: If X-Y is linked to Z-W, and we learn that X-Y is false, then Z-W must be true.

For each new assertion, there are only a finite number of possible inferences using these rules. If we assert that X-Y is true, then the program must check the uniqueness rule for each other individual in the same category as X, and for each other individual in the same category as Y. It must check the transitive rules for the other known pairs involving X or Y. And if there was already a link involving X-Y, then it must apply the link falsification rule.

Similarly, if we assert that X-Y is false, then the program must check the elimination rule, the second transitive rule, and the link verification rule.

3 An inference system for many logic puzzles

This first program worked fine for this particular puzzle, but couldn’t handle other puzzles. The most obvious problem was that the link verification and link falsification rules apply only to this puzzle. (The other three rules apply to any logic puzzle.)

In preparing the second edition of Computer Science Logo Style (MIT Press, 1997), I decided to generalize these link rules. In Anita Harnadek’s puzzle, two pairs are linked only through mutual exclusion; exactly one of the pairs must be true. More generally, two pairs might be linked by an implication:

If X-Y is true/false, then Z-W must be true/false.

So each link in the first program became two implications in the second version. For example:

If Jane-Irving is true, then Jane-45 must be false.
If Jane-Irving is false, then Jane-45 must be true.

These two implications are logically independent; one cannot be derived from the other.

Once the program (reproduced in appendix A) can record implications, we replace the link falsification and link verification rules with three new rules about implications:

Contrapositive rule:If P implies Q, then not-Q implies not-P.
Implication rule (modus ponens): If P implies Q, and P is true, then Q must be true.
Contradiction rule: If P implies Q, and P also implies not-Q, then P must be false.

In these rules, P and Q represent statements about the truth or falsehood of a pair, e.g., "Jane-Irving is false."

The modified program can solve not only Anita Harnadek’s puzzle but also several others that I tried, taken from a Dell puzzle book.


By coincidence, my work on the second edition of Computer Science Logo Style happened at about the same time that Harold Abelson, Gerald Jay Sussman, and Julie Sussman were working on the second edition of their brilliant text, Structure and Interpretation of Computer Programs (MIT Press, 1996). Their second edition introduced several new topics, one of which was -- here's the coincidence -- logic puzzles. I tried out my program on their example puzzles, such as this one:

Five schoolgirls sat for an examination. Their parents -- so they thought -- showed an undue degree of interest in the result. They therefore agreed that, in writing home about the examination, each girl should make one true statement and one untrue one. The following are the relevant passages from their letters:

What in fact was the order in which the five girls were placed?

Since this puzzle is very similar in form to the other one, with paired true and false statements, I thought my program would solve it easily. Here is how I represented the puzzle in Logo:

to exam
category "person [Betty Ethel Joan Kitty Mary]
category "place [1 2 3 4 5]
xor "Kitty 2 "Betty 3
xor "Ethel 1 "Joan 2
xor "Joan 3 "Ethel 5
xor "Kitty 2 "Mary 4
xor "Mary 4 "Betty 1
print []
To my dismay, my program was unable to discover any facts at all from this puzzle!

The program used by Abelson and Sussman does not work by making inferences from known facts. Instead, it works by backtracking: trying every possible combination of names with places, and rejecting the ones that lead to a contradiction. This is more of a ``brute force'' approach; many possible combinations must be tried. (In this example, there are 120 possibilities, five factorial.)

A Logo version of the backtracking program is in appendix B. Here is how the program can be used to solve the examination puzzle:

to exam
track [[Betty Ethel Joan Kitty Mary] [1 2 3 4 5]] ~
      [[not equalp (is "Kitty 2) (is "Betty 3)]
       [not equalp (is "Ethel 1) (is "Joan 2)]
       [not equalp (is "Joan 3) (is "Ethel 5)]
       [not equalp (is "Kitty 2) (is "Mary 4)]
       [not equalp (is "Mary 4) (is "Betty 1)]]

The general backtracking procedure TRACK takes two inputs. The first is a list of lists, one for each category, naming the individuals in that category. (The categories themselves don't have names in this program.) The second input is also a list of lists, each of which is a Logo expression whose value must be TRUE for a correct solution. The tests use a predicate procedure IS that takes two inputs and outputs true if they correspond to the same person in the particular proposed combination that the program is trying.

The backtracking procedure can also be used to solve the earlier puzzle about the cub reporter:

to cub.reporter
track [[Jane Larry Opal Perry]
       [Irving King Mendle Nathan]
       [32 38 45 55]
       [drafter pilot sergeant driver]] ~
      [[differ [Jane King Larry Nathan]]
       [says "Jane "Irving 45]
       [says "King "Perry "driver]
       [says "Larry "sergeant 45]
       [says "Nathan "drafter 38]
       [differ [Mendle Jane Opal Nathan]]
       [says "Mendle "pilot "Larry]
       [says "Jane "pilot 45]
       [says "Opal 55 "driver]
       [says "Nathan 38 "driver]]

to differ :things
if emptyp bf :things [op "true]
op and (differ1 first :things bf :things) (differ bf :things)

to differ1 :this :those
foreach :those [if is :this ? [output "false]]
output "true

to says :who :one :two
output not equalp (is :who :one) (is :who :two)

I wrote the backtracking solution to this puzzle using the same names DIFFER and SAYS for the procedures that embody the facts of the puzzle, but they are not the same DIFFER and SAYS that are used in the original version. The originals add assertions to a database; these are predicates that output TRUE if the current combination satisfies the condition.

The trouble with the backtracking solution to the cub reporter puzzle is that it's quite slow. There are 13,824 possible arrangements of first names, last names, jobs, and ages. (There are 24 possible combinations of first and last names, times 24 combinations of name and job, times 24 combinations of name and age.) The program might get lucky and find a solution on its first try, but on average it will have to examine half of the possible combinations before finding a solution.

Repairing the Inference System

Why couldn't my inference program solve the examination puzzle? One crucial difference between the two puzzles discussed here is that the first includes some direct assertions, such as the fact that Jane-King is false. The second puzzle tells us no actual facts; it's entirely implications. As a result, the implication rule (modus ponens) can't infer any facts.

If we don't have enough facts, we have to get more mileage out of the implications. Each of the inference rules for assertions gives rise to a corresponding rule for implications:

When the inference program is modified to include these new rules (appendix C), it can solve the examination puzzle as well as the cub reporter puzzle. The cost is that the solution is very slow, even for the original puzzle, because the new rules allow the program to infer many new implications, each of which must be tested in later steps to see if it, combined with new information, allows yet another implication to be inferred.

I discovered this example as I was working on the final draft of my books. Should I include the modified program? In the end, I decided not to change the printed version, because the modified version is so slow even for easy puzzles. The original version does work for most of the puzzles I found in puzzle books; the difficulty of published puzzles is limited by the fact that mere human beings must be able to solve them!

Implications Unleashed

Even the modified version of the program does not make every possible inference from implications. For example, I included these rules:

but I didn't include these:

Also, my program can only accept implications about basic assertions. That is, if P and Q are statements such as ``X-Y is true'' or ``Z-W is false'' then I can represent the implication ``P implies Q,'' but my program has no way to represent an implication such as ``(P implies Q) implies R.''

In fact, it's because of the limitation on the assertions that can be represented in this program that I need so many rules. A general inference system won't have transitive rules at all, meta- or not. Instead it will represent assertions in a more general way so that

for any x, y, and z, is(x,y) and is(y,z) implies is(x,z)
can be represented as an assertion, not as a rule. In such a system, the number of rules needed is much smaller. In effect, I've again fallen into the same trap that led me to have the link falsification and link verification rules in the first version of the program. I eliminated the need for those rules by allowing my program to represent implications as well as basic facts. But I'm still limited to implications tied to a particular X-Y pair. What I can't represent in an assertion is the ``for any x, y, and z'' part of this transitive property. A system like mine, in which assertions are about specific individuals, is a propositional logic. One in which I can say ``for any x'' is a predicate logic.

My original program, which could only record basic assertions except for one ad hoc kludge for links, could truly discover every possible inference from the facts it was given. But once we introduce the idea of implications, there is no bound on the number of possible inferences. To write a practical program, we must draw a line somewhere, and decline to make inferences that are too complicated.

Forward and Backward Chaining

My program works by starting with the known facts and inferring as many new facts as it can. This approach is called ``forward chaining.'' Most practical inference systems today use ``backward chaining'': The program starts with a question, such as ``What is Jane's last name,'' and looks for known facts that might help answer that question. In practice this can effectively limit the number of dead-end chains of inference that the program follows.

Inference Versus Backtracking

Backtracking works best for puzzles with few categories, because increasing the number of categories dramatically increases the number of possible combinations that must be tested. But a backtracking program is not much affected by the nature of the information given by the puzzle. By contrast, inference works best for puzzles that include plenty of basic facts in the information given, but an inference program is not much affected by the number of categories. Each approach has strengths and weaknesses.

How do people solve logic puzzles? We often use a combination of the two methods. We generally start by making inferences, but if we get stuck, we switch to a backtracking approach. Backtracking works well if inferences have already ruled out most of the possible solutions, so that there aren't as many left to test. Computer inference systems have also been written using this hybrid technique. Such a program is harder to write, because it's not easy to specify precise rules to decide when to switch from inference to backtracking, and because the program's data structures must accommodate both techniques. The advantage is that solutions can be found quickly for a wide range of problems.

Appendix A: The Inference Program

;; Establish categories

to category :members
print (list "category :members)
if not namep "categories [make "categories []]
make "categories lput :categories
make :members
foreach :members [pprop ? "category]

;; Verify and falsify matches

to verify :a :b
settruth :a :b "true

to falsify :a :b
settruth :a :b "false

to settruth :a :b :truth.value
if equalp (gprop :a "category) (gprop :b "category) [stop]
localmake "oldvalue get :a :b
if equalp :oldvalue :truth.value [stop]
if equalp :oldvalue (not :truth.value) ~
   [(throw "error (sentence [inconsistency in settruth]
                            :a :b :truth.value))]
print (list :a :b "-> :truth.value)
store :a :b :truth.value
settruth1 :a :b :truth.value
settruth1 :b :a :truth.value
if not emptyp :oldvalue ~
   [foreach (filter [equalp first ? :truth.value] :oldvalue)
            [apply "settruth butfirst ?]]

to settruth1 :a :b :truth.value
apply (word "find not :truth.value) (list :a :b)
foreach (gprop :a "true) [settruth ? :b :truth.value]
if :truth.value [foreach (gprop :a "false) [falsify ? :b]
                 pprop :a (gprop :b "category) :b]
pprop :a :truth.value (fput :b gprop :a :truth.value)

to findfalse :a :b
foreach (filter [not equalp get ? :b "true] peers :a) ~
        [falsify ? :b]

to findtrue :a :b
if equalp (count peers :a) (1+falses :a :b) ~
   [verify (find [not equalp get ? :b "false] peers :a)

to falses :a :b
output count filter [equalp "false get ? :b] peers :a

to peers :a
output thing gprop :a "category

;; Common types of clues

to differ :list
print (list "differ :list)
foreach :list [differ1 ? ?rest]

to differ1 :a :them
foreach :them [falsify :a ?]

to justbefore :this :that :lineup
falsify :this :that
falsify :this last :lineup
falsify :that first :lineup
justbefore1 :this :that :lineup

to justbefore1 :this :that :slotlist
if emptyp butfirst :slotlist [stop]
equiv :this (first :slotlist) :that (first butfirst :slotlist)
justbefore1 :this :that (butfirst :slotlist)

;; Remember conditional linkages

to implies :who1 :what1 :truth1 :who2 :what2 :truth2
implies1 :who1 :what1 :truth1 :who2 :what2 :truth2
implies1 :who2 :what2 (not :truth2) :who1 :what1 (not :truth1)

to implies1 :who1 :what1 :truth1 :who2 :what2 :truth2
localmake "old1 get :who1 :what1
if equalp :old1 :truth1 [settruth :who2 :what2 :truth2  stop]
if equalp :old1 (not :truth1) [stop]
if memberp (list :truth1 :who2 :what2 (not :truth2)) :old1 ~
   [settruth :who1 :what1 (not :truth1)  stop]
if memberp (list :truth1 :what2 :who2 (not :truth2)) :old1 ~
   [settruth :who1 :what1 (not :truth1)  stop]
store :who1 :what1 ~
      fput (list :truth1 :who2 :what2 :truth2) :old1

to equiv :who1 :what1 :who2 :what2
implies :who1 :what1 "true :who2 :what2 "true
implies :who2 :what2 "true :who1 :what1 "true

to xor :who1 :what1 :who2 :what2
implies :who1 :what1 "true :who2 :what2 "false
implies :who1 :what1 "false :who2 :what2 "true

;; Interface to property list mechanism

to get :a :b
output gprop :a :b

to store :a :b :val
pprop :a :b :val
pprop :b :a :val

;; Print the solution

to solution
foreach thing first :categories [solve1 ? butfirst :categories]

to solve1 :who :order
type :who
foreach :order [type "| |   type gprop :who ?]
print []

;; Get rid of old problem data

to cleanup
if not namep "categories [stop]
ern :categories
ern "categories

Appendix B: The Backtracking Program

to track :lists :tests
foreach first :lists [make ? array count bf :lists]
catch "tracked [track1 first :lists bf :lists 1]

to track1 :master :others :index
if emptyp :others [tracktest stop]
track2 :master first :others bf :others

to track2 :names :these :those
if emptyp :these [track1 :master :those :index+1 stop]
foreach :these [setitem :index thing first :names ?
                track2 bf :names remove ? :these :those]

to tracktest
foreach :tests [if not run ? [stop]]
foreach :master [pr se ? arraytolist thing ?]
throw "tracked

to is :this :that
if memberp :this :master [output memberp :that thing :this]
if memberp :that :master [output memberp :this thing :that]
localmake "who find [memberp :this thing ?] :master
output memberp :that thing :who

Appendix C: The Enhanced Inference System

Only the procedures changed from the version in appendix A are given here:

to implies :who1 :what1 :truth1 :who2 :what2 :truth2
if equalp (gprop :who1 "category) (gprop :what1 "category) [stop]
if equalp (gprop :who2 "category) (gprop :what2 "category) [stop]
implies1 :who1 :what1 :truth1 :who2 :what2 :truth2
implies1 :who2 :what2 (not :truth2) :who1 :what1 (not :truth1)

to implies1 :who1 :what1 :truth1 :who2 :what2 :truth2
localmake "old1 get :who1 :what1
if equalp :old1 :truth1 [settruth :who2 :what2 :truth2  stop]
if equalp :old1 (not :truth1) [stop]
if memberp (list :truth1 :who2 :what2 :truth2) :old1 [stop]
if memberp (list :truth1 :what2 :who2 :truth2) :old1 [stop]
if memberp (list :truth1 :who2 :what2 (not :truth2)) :old1 ~
   [settruth :who1 :what1 (not :truth1)  stop]
if memberp (list :truth1 :what2 :who2 (not :truth2)) :old1 ~
   [settruth :who1 :what1 (not :truth1)  stop]
store :who1 :what1 ~
      fput (list :truth1 :who2 :what2 :truth2) :old1
if :truth2 [foreach (remove :who2 peers :who2)
                    [implies :who1 :what1 :truth1 ? :what2 "false]
            foreach (remove :what2 peers :what2)
                    [implies :who1 :what1 :truth1 :who2 ? "false]]
if not :truth2 [implies2 :what2 (remove :who2 peers :who2)
                implies2 :who2 (remove :what2 peers :what2)]
foreach (gprop :who2 "true) ~
        [implies :who1 :what1 :truth1 ? :what2 :truth2]
foreach (gprop :what2 "true) ~
        [implies :who1 :what1 :truth1 :who2 ? :truth2]
if :truth2 ~
   [foreach (gprop :who2 "false)
            [implies :who1 :what1 :truth1 ? :what2 "false]
    foreach (gprop :what2 "false)
            [implies :who1 :what1 :truth1 :who2 ? "false]]

to implies2 :one :others
localmake "left filter [not (or memberp (list :truth1 :one ? "false) :old1
                                memberp (list :truth1 ? :one "false) :old1
                                (and :truth1
                                     (or (and equalp ? :who1
                                              equalp gprop :what1 "category