**Reasoning with Computers**

*Brian Harvey
University of California, Berkeley
Computer Science Division,
Berkeley, CA 94720–1776, USA,
tel: 1 510 642-8311,
email: *

**Abstract**

Logic puzzles are explored as an example of computer programs that reason. Two approaches are contrasted: rule-based logical inference and brute-force backtracking. Each approach works well for certain puzzles but can be unacceptably slow for others.

**Keywords**

Logic puzzles, inference systems, backtracking

**1 Introduction**

"Intelligent agents" are said to be right around the corner; some products are already available. These programs are supposed to watch you work, make inferences about your preferred style, and automate repetitive tasks. They will sift through all the nonsense on the World Wide Web and find only those items you’ll really want to read. They’ll be teammates or opponents in computer games.

How can computer programs make inferences? I chose logic puzzles as a testbed for experimentation. Logic puzzles are a relatively easy test case because each puzzle forms a "closed world"; we know in advance all the possible names, ages, house colors, or whatever characteristics the puzzle asks us to match up. There is no new computer science here. I am more or less recapitulating the early history of computer inference systems. But educationally that may be more helpful than the complexities of the most modern attempts.

**2 An inference system for one logic puzzle**

I first worked on this project when I wrote the third volume
of *Computer Science Logo Style *(MIT Press, 1987). In that
volume the task I set for myself was to introduce some of the
topics in the undergraduate computer science curriculum to a
younger audience, and to illustrate the ideas with Logo programs
rather than with formal proofs. I wantd to discuss logic as part
of discrete mathematics, and thought of logic puzzles as the task
for an illustrative Logo program.

The program I wrote solved only one logic puzzle, taken from *Mind
Benders *Book B-2, by Anita Harnadek (Critical Thinking Press,
1978):

A cub reporter interviewed four people. He was very careless, however. Each statement he wrote was half right and half wrong. He went back and interviewed the people again. And again, each statement he wrote was half right and half wrong. From the inormation below, can you straighten out the mess?

The first names were Jane, Larry, Opal, and Perry. The last names were Irving, King, Mendle, and Nathan. The ages were 32, 38, 45, and 55. The occupations were drafter, pilot, police sergeant, and test car driver.

On the first interview, he wrote these statements, one from each person:

1. Jane: "My name is Irving, and I’m 45."

2. King: "I’m Perry and I drive test cars."

3. Larry: "I’m a police sergeant and I’m 45."

4. Nathan: "I’m a drafter, and I’m 38."

On the second interview, he wrote these statements, one from each person:

5. Mendle: "I’m a pilot, and my name is Larry."

6. Jane: "I’m a pilot, and I’m 45."

7. Opal: "I’m 55 and I drive test cars."

8. Nathan: "I’m 38 and I drive test cars."

This puzzle includes four *categories: *first name, last
name, job, and age. In each category there are four *individuals;
*for example, the first name individuals are Jane, Larry,
Opal, and Perry.

For every possible pairing of individuals in different categories (for example, Jane and pilot), the program keeps track of what it knows about whether or not they go together. Initially it knows nothing, but, for example, after the first interview the program knows that Jane is not King, since the four statements are from different people.

For any given pairing, the program can know nothing, can know
that the two individuals *are* the same, or can know that
the two individuals *are not *the same. But there is also a
fourth, perhaps more interesting, situation. After the first
interview, the program does not know whether or not Jane and
Irving are the same person, nor whether Jane and age 45 are the
same person. But it does know that if one of these is true, the
other must be false, and vice versa. This fact is represented as
a *link* between the Jane-Irving pair and the Jane-45 pair.

The program works by making *assertions* based on the
puzzle statement. From the first interview we get the following
assertions:

Jane-King is false.

Jane-Nathan is false.

Larry-King is false.

Larry-Nathan is false.

Jane-Irving is linked to Jane-45.

King-Perry is linked to King-driver

.Larry-sergeant is linked to Larry-45.

Nathan-drafter is linked to Nathan-38.

As each assertion is recorded in the database, the program
tries to use *rules of inference *to draw conclusions from
the new assertion and the assertions already recorded. Here are
the rules:

*Elimination rule:* If the X-Y pairs are false for all
but one individual Y in a given category, then the remaining
possibility must be true

*Uniqueness rule:* If some X-Y pair is true, then every
other X-Z pair must be false if Z is in the same category as Y.

*Transitive rules:* If X-Y is true, and Y-Z is true, then
X-Z must also be true. If X-Y is true, and Y-Z is false, then X-Z
must be false.

*Link falsification*: If X-Y is linked to Z-W, and we learn
that X-Y is true, then Z-W must be false.

*Link verification:* If X-Y is linked to Z-W, and we learn
that X-Y is false, then Z-W must be true.

For each new assertion, there are only a finite number of possible inferences using these rules. If we assert that X-Y is true, then the program must check the uniqueness rule for each other individual in the same category as X, and for each other individual in the same category as Y. It must check the transitive rules for the other known pairs involving X or Y. And if there was already a link involving X-Y, then it must apply the link falsification rule.

Similarly, if we assert that X-Y is false, then the program must check the elimination rule, the second transitive rule, and the link verification rule.

**3 An inference system for many logic puzzles**

This first program worked fine for this particular puzzle, but couldn’t handle other puzzles. The most obvious problem was that the link verification and link falsification rules apply only to this puzzle. (The other three rules apply to any logic puzzle.)

In preparing the second edition of *Computer Science Logo
Style *(MIT Press, 1997), I decided to generalize these link
rules. In Anita Harnadek’s puzzle, two pairs are linked only
through mutual exclusion; exactly one of the pairs must be true.
More generally, two pairs might be linked by an *implication:*

If X-Y is true/false, then Z-W must be true/false.

So each link in the first program became two implications in the second version. For example:

If Jane-Irving is true, then Jane-45 must be false.

If Jane-Irving is false, then Jane-45 must be true.

These two implications are logically independent; one cannot be derived from the other.

Once the program (reproduced in appendix A) can record implications, we replace the link falsification and link verification rules with three new rules about implications:

Contrapositive rule:If P implies Q, then not-Q implies not-P.

Implication rule (modus ponens): If P implies Q, and P is true, then Q must be true.

Contradiction rule: If P implies Q, and P also implies not-Q, then P must be false.

In these rules, P and Q represent statements about the truth or falsehood of a pair, e.g., "Jane-Irving is false."

The modified program can solve not only Anita Harnadek’s puzzle but also several others that I tried, taken from a Dell puzzle book.

By coincidence, my work on the second edition of *Computer Science Logo
Style* happened at about the same time that Harold Abelson, Gerald Jay
Sussman, and Julie Sussman were working on the second edition of their
brilliant text, *Structure and Interpretation of Computer Programs* (MIT
Press, 1996). Their second edition introduced several new topics, one of
which was -- here's the coincidence -- logic puzzles. I tried out my
program on their example puzzles, such as this one:

Five schoolgirls sat for an examination. Their parents -- so they thought -- showed an undue degree of interest in the result. They therefore agreed that, in writing home about the examination, each girl should make one true statement and one untrue one. The following are the relevant passages from their letters:

- Betty: ``Kitty was second in the examination. I was only third.''
- Ethel: ``You'll be glad to hear that I was on top. Joan was second.''
- Joan: ``I was third, and poor old Ethel was bottom.''
- Kitty: ``I came out second. Mary was only fourth.''
- Mary: ``I was fourth. Top place was taken by Betty.''
What in fact was the order in which the five girls were placed?

Since this puzzle is very similar in form to the other one, with paired true and false statements, I thought my program would solve it easily. Here is how I represented the puzzle in Logo:

to exam cleanup category "person [Betty Ethel Joan Kitty Mary] category "place [1 2 3 4 5] xor "Kitty 2 "Betty 3 xor "Ethel 1 "Joan 2 xor "Joan 3 "Ethel 5 xor "Kitty 2 "Mary 4 xor "Mary 4 "Betty 1 print [] solution endTo my dismay, my program was unable to discover any facts at all from this puzzle!

The program used by Abelson and Sussman does not work by making inferences
from known facts. Instead, it works by *backtracking*: trying every
possible combination of names with places, and rejecting the ones that lead
to a contradiction. This is more of a ``brute force'' approach; many possible
combinations must be tried. (In this example, there are 120 possibilities,
five factorial.)

A Logo version of the backtracking program is in appendix B. Here is how the program can be used to solve the examination puzzle:

to exam track [[Betty Ethel Joan Kitty Mary] [1 2 3 4 5]] ~ [[not equalp (is "Kitty 2) (is "Betty 3)] [not equalp (is "Ethel 1) (is "Joan 2)] [not equalp (is "Joan 3) (is "Ethel 5)] [not equalp (is "Kitty 2) (is "Mary 4)] [not equalp (is "Mary 4) (is "Betty 1)]] end

The general backtracking procedure TRACK takes two inputs. The first is a list of lists, one for each category, naming the individuals in that category. (The categories themselves don't have names in this program.) The second input is also a list of lists, each of which is a Logo expression whose value must be TRUE for a correct solution. The tests use a predicate procedure IS that takes two inputs and outputs true if they correspond to the same person in the particular proposed combination that the program is trying.

The backtracking procedure can also be used to solve the earlier puzzle about the cub reporter:

to cub.reporter track [[Jane Larry Opal Perry] [Irving King Mendle Nathan] [32 38 45 55] [drafter pilot sergeant driver]] ~ [[differ [Jane King Larry Nathan]] [says "Jane "Irving 45] [says "King "Perry "driver] [says "Larry "sergeant 45] [says "Nathan "drafter 38] [differ [Mendle Jane Opal Nathan]] [says "Mendle "pilot "Larry] [says "Jane "pilot 45] [says "Opal 55 "driver] [says "Nathan 38 "driver]] end to differ :things if emptyp bf :things [op "true] op and (differ1 first :things bf :things) (differ bf :things) end to differ1 :this :those foreach :those [if is :this ? [output "false]] output "true end to says :who :one :two output not equalp (is :who :one) (is :who :two) end

I wrote the backtracking solution to this puzzle using the same names DIFFER and SAYS for the procedures that embody the facts of the puzzle, but they are not the same DIFFER and SAYS that are used in the original version. The originals add assertions to a database; these are predicates that output TRUE if the current combination satisfies the condition.

The trouble with the backtracking solution to the cub reporter puzzle is that it's quite slow. There are 13,824 possible arrangements of first names, last names, jobs, and ages. (There are 24 possible combinations of first and last names, times 24 combinations of name and job, times 24 combinations of name and age.) The program might get lucky and find a solution on its first try, but on average it will have to examine half of the possible combinations before finding a solution.

Why couldn't my inference program solve the examination puzzle? One crucial difference between the two puzzles discussed here is that the first includes some direct assertions, such as the fact that Jane-King is false. The second puzzle tells us no actual facts; it's entirely implications. As a result, the implication rule (modus ponens) can't infer any facts.

If we don't have enough facts, we have to get more mileage out of the implications. Each of the inference rules for assertions gives rise to a corresponding rule for implications:

- Meta-elimination rules: If P implies that the X-Y pair is false for
all but one individual Y in a given category, then P implies that
the remaining possibility must be true. If P implies that the X-Y
pair is false for
*every*Y in a given category, then P is false. - Meta-uniqueness rule: If P implies that some X-Y pair is true, then P implies that every other X-Z pair must be false if Z is in the same category as Y.
- Meta-transitive rules: If P implies that X-Y is true, and if Y-Z is true, then P also implies that X-Z is true. If P implies that X-Y is true, and if Y-Z is false, then P implies that X-Z is false.

When the inference program is modified to include these new rules (appendix
C), it can solve the examination puzzle as well as the cub reporter puzzle.
The cost is that the solution is *very* slow, even for the original puzzle,
because the new rules allow the program to infer many new implications, each
of which must be tested in later steps to see if it, combined with new
information, allows yet another implication to be inferred.

I discovered this example as I was working on the final draft of my books. Should I include the modified program? In the end, I decided not to change the printed version, because the modified version is so slow even for easy puzzles. The original version does work for most of the puzzles I found in puzzle books; the difficulty of published puzzles is limited by the fact that mere human beings must be able to solve them!

Even the modified version of the program does not make every possible inference from implications. For example, I included these rules:

- Meta-transitive rules: If P implies that X-Y is true, and if Y-Z is true, then P also implies that X-Z is true. If P implies that X-Y is true, and if Y-Z is false, then P implies that X-Z is false.

- Meta-meta-transitive rules: If P implies that X-Y is true, and if P implies that Y-Z is true, then P also implies that X-Z is true. If P implies that X-Y is true, and if P implies that Y-Z is false, then P implies that X-Z is false.
- All bases covered rule: If P implies Q, and not-P implies Q, then Q must be true.
- Meta-meta-meta-transitive rules: If P implies that X-Y is true, and if Q implies that Y-Z is true, then P and Q together imply that X-Z is true. If P implies that X-Y is true, and if Q implies that Y-Z is false, then P and Q together imply that X-Z is false.

Also, my program can only accept implications about basic assertions. That is, if P and Q are statements such as ``X-Y is true'' or ``Z-W is false'' then I can represent the implication ``P implies Q,'' but my program has no way to represent an implication such as ``(P implies Q) implies R.''

In fact, it's because of the limitation on the assertions that can be represented in this program that I need so many rules. A general inference system won't have transitive rules at all, meta- or not. Instead it will represent assertions in a more general way so that

for any x, y, and z, is(x,y) and is(y,z) implies is(x,z)can be represented as an assertion, not as a rule. In such a system, the number of rules needed is much smaller. In effect, I've again fallen into the same trap that led me to have the link falsification and link verification rules in the first version of the program. I eliminated the need for those rules by allowing my program to represent implications as well as basic facts. But I'm still limited to implications tied to a particular X-Y pair. What I can't represent in an assertion is the ``for any x, y, and z'' part of this transitive property. A system like mine, in which assertions are about specific individuals, is a

My original program, which could only record basic assertions except for one ad hoc kludge for links, could truly discover every possible inference from the facts it was given. But once we introduce the idea of implications, there is no bound on the number of possible inferences. To write a practical program, we must draw a line somewhere, and decline to make inferences that are too complicated.

My program works by starting with the known facts and inferring as many new facts as it can. This approach is called ``forward chaining.'' Most practical inference systems today use ``backward chaining'': The program starts with a question, such as ``What is Jane's last name,'' and looks for known facts that might help answer that question. In practice this can effectively limit the number of dead-end chains of inference that the program follows.

Backtracking works best for puzzles with few categories, because increasing the number of categories dramatically increases the number of possible combinations that must be tested. But a backtracking program is not much affected by the nature of the information given by the puzzle. By contrast, inference works best for puzzles that include plenty of basic facts in the information given, but an inference program is not much affected by the number of categories. Each approach has strengths and weaknesses.

How do people solve logic puzzles? We often use a combination of the two methods. We generally start by making inferences, but if we get stuck, we switch to a backtracking approach. Backtracking works well if inferences have already ruled out most of the possible solutions, so that there aren't as many left to test. Computer inference systems have also been written using this hybrid technique. Such a program is harder to write, because it's not easy to specify precise rules to decide when to switch from inference to backtracking, and because the program's data structures must accommodate both techniques. The advantage is that solutions can be found quickly for a wide range of problems.

;; Establish categories to category :category.name :members print (list "category :category.name :members) if not namep "categories [make "categories []] make "categories lput :category.name :categories make :category.name :members foreach :members [pprop ? "category :category.name] end ;; Verify and falsify matches to verify :a :b settruth :a :b "true end to falsify :a :b settruth :a :b "false end to settruth :a :b :truth.value if equalp (gprop :a "category) (gprop :b "category) [stop] localmake "oldvalue get :a :b if equalp :oldvalue :truth.value [stop] if equalp :oldvalue (not :truth.value) ~ [(throw "error (sentence [inconsistency in settruth] :a :b :truth.value))] print (list :a :b "-> :truth.value) store :a :b :truth.value settruth1 :a :b :truth.value settruth1 :b :a :truth.value if not emptyp :oldvalue ~ [foreach (filter [equalp first ? :truth.value] :oldvalue) [apply "settruth butfirst ?]] end to settruth1 :a :b :truth.value apply (word "find not :truth.value) (list :a :b) foreach (gprop :a "true) [settruth ? :b :truth.value] if :truth.value [foreach (gprop :a "false) [falsify ? :b] pprop :a (gprop :b "category) :b] pprop :a :truth.value (fput :b gprop :a :truth.value) end to findfalse :a :b foreach (filter [not equalp get ? :b "true] peers :a) ~ [falsify ? :b] end to findtrue :a :b if equalp (count peers :a) (1+falses :a :b) ~ [verify (find [not equalp get ? :b "false] peers :a) :b] end to falses :a :b output count filter [equalp "false get ? :b] peers :a end to peers :a output thing gprop :a "category end ;; Common types of clues to differ :list print (list "differ :list) foreach :list [differ1 ? ?rest] end to differ1 :a :them foreach :them [falsify :a ?] end to justbefore :this :that :lineup falsify :this :that falsify :this last :lineup falsify :that first :lineup justbefore1 :this :that :lineup end to justbefore1 :this :that :slotlist if emptyp butfirst :slotlist [stop] equiv :this (first :slotlist) :that (first butfirst :slotlist) justbefore1 :this :that (butfirst :slotlist) end ;; Remember conditional linkages to implies :who1 :what1 :truth1 :who2 :what2 :truth2 implies1 :who1 :what1 :truth1 :who2 :what2 :truth2 implies1 :who2 :what2 (not :truth2) :who1 :what1 (not :truth1) end to implies1 :who1 :what1 :truth1 :who2 :what2 :truth2 localmake "old1 get :who1 :what1 if equalp :old1 :truth1 [settruth :who2 :what2 :truth2 stop] if equalp :old1 (not :truth1) [stop] if memberp (list :truth1 :who2 :what2 (not :truth2)) :old1 ~ [settruth :who1 :what1 (not :truth1) stop] if memberp (list :truth1 :what2 :who2 (not :truth2)) :old1 ~ [settruth :who1 :what1 (not :truth1) stop] store :who1 :what1 ~ fput (list :truth1 :who2 :what2 :truth2) :old1 end to equiv :who1 :what1 :who2 :what2 implies :who1 :what1 "true :who2 :what2 "true implies :who2 :what2 "true :who1 :what1 "true end to xor :who1 :what1 :who2 :what2 implies :who1 :what1 "true :who2 :what2 "false implies :who1 :what1 "false :who2 :what2 "true end ;; Interface to property list mechanism to get :a :b output gprop :a :b end to store :a :b :val pprop :a :b :val pprop :b :a :val end ;; Print the solution to solution foreach thing first :categories [solve1 ? butfirst :categories] end to solve1 :who :order type :who foreach :order [type "| | type gprop :who ?] print [] end ;; Get rid of old problem data to cleanup if not namep "categories [stop] ern :categories ern "categories erpls end

to track :lists :tests foreach first :lists [make ? array count bf :lists] catch "tracked [track1 first :lists bf :lists 1] end to track1 :master :others :index if emptyp :others [tracktest stop] track2 :master first :others bf :others end to track2 :names :these :those if emptyp :these [track1 :master :those :index+1 stop] foreach :these [setitem :index thing first :names ? track2 bf :names remove ? :these :those] end to tracktest foreach :tests [if not run ? [stop]] foreach :master [pr se ? arraytolist thing ?] throw "tracked end to is :this :that if memberp :this :master [output memberp :that thing :this] if memberp :that :master [output memberp :this thing :that] localmake "who find [memberp :this thing ?] :master output memberp :that thing :who end

Only the procedures changed from the version in appendix A are given here:

to implies :who1 :what1 :truth1 :who2 :what2 :truth2 if equalp (gprop :who1 "category) (gprop :what1 "category) [stop] if equalp (gprop :who2 "category) (gprop :what2 "category) [stop] implies1 :who1 :what1 :truth1 :who2 :what2 :truth2 implies1 :who2 :what2 (not :truth2) :who1 :what1 (not :truth1) end to implies1 :who1 :what1 :truth1 :who2 :what2 :truth2 localmake "old1 get :who1 :what1 if equalp :old1 :truth1 [settruth :who2 :what2 :truth2 stop] if equalp :old1 (not :truth1) [stop] if memberp (list :truth1 :who2 :what2 :truth2) :old1 [stop] if memberp (list :truth1 :what2 :who2 :truth2) :old1 [stop] if memberp (list :truth1 :who2 :what2 (not :truth2)) :old1 ~ [settruth :who1 :what1 (not :truth1) stop] if memberp (list :truth1 :what2 :who2 (not :truth2)) :old1 ~ [settruth :who1 :what1 (not :truth1) stop] store :who1 :what1 ~ fput (list :truth1 :who2 :what2 :truth2) :old1 if :truth2 [foreach (remove :who2 peers :who2) [implies :who1 :what1 :truth1 ? :what2 "false] foreach (remove :what2 peers :what2) [implies :who1 :what1 :truth1 :who2 ? "false]] if not :truth2 [implies2 :what2 (remove :who2 peers :who2) implies2 :who2 (remove :what2 peers :what2)] foreach (gprop :who2 "true) ~ [implies :who1 :what1 :truth1 ? :what2 :truth2] foreach (gprop :what2 "true) ~ [implies :who1 :what1 :truth1 :who2 ? :truth2] if :truth2 ~ [foreach (gprop :who2 "false) [implies :who1 :what1 :truth1 ? :what2 "false] foreach (gprop :what2 "false) [implies :who1 :what1 :truth1 :who2 ? "false]] end to implies2 :one :others localmake "left filter [not (or memberp (list :truth1 :one ? "false) :old1 memberp (list :truth1 ? :one "false) :old1 (and :truth1 (or (and equalp ? :who1 equalp gprop :what1 "category