Gordon S. Novak Jr.

Department of Computer Sciences

University of Texas, Austin, TX 78712

William C. Bulko

IBM Corporation

11400 Burnett Road

Austin, Texas 78758

Copyright © 1992 by AAAI.

This article appears in 1992 AAAI Spring Symposium on Reasoning with Diagrammatic Representations,Stanford, CA, March 25-27, 1992 (AAAI Press).

This research was supported in part by the U.S. Army Research Office under contract DAAG29-84-K-0060. Computer equipment used in this research was donated by Hewlett Packard and Xerox Corporation.

Solving physics problems usually requires geometric reasoning; a computer problem solver must use a representation that is in some respects equivalent to the use of diagrams by human problem solvers. In this paper, we review the uses of diagrams and geometric reasoning in physics problem solving programs. Next, we consider the many roles that diagrams seem to play in human problem solving, the relative strengths and weaknesses of computers and human problem solvers for reasoning with diagram-like representations, and ways in which some of the uses of diagrams by humans might be implemented in computer programs.

The ISAAC program [novak:isaac,novak:ijcai] solves physics problems stated in English, in the area of rigid body statics. Figure 1 is the most geometrically complex of the 40 problems solved by ISAAC. ISAAC uses only English text as input; the diagram is produced from its understanding of the English and from calculated values. A geometric model, similar to the diagram, is used in problem solving; in the geometric model most objects are reduced to lines and points.

The program contains a geometric model for each type of object. This model gives the dimensions of a bounding box containing the object together with coordinates of named points on the object; there is also a program to draw a picture of the object corresponding to these sizes and coordinates. By scaling the geometric model in two dimensions by appropriate size factors, rotating it, and translating it to the proper position, the geometry of an instance of the object type is obtained. Of course, the scaling, rotation, and translation are done by vector operations on points, rather than by manipulation of an image.

Rigid body statics problems generally involve analysis of forces on a
single ``main'' object; in any case, all of the objects in a given
problem are attached to each other. This permits a simple algorithm to
construct a model of a situation involving multiple objects. A single
object is chosen and assigned the coordinates ` (0 0)`; the objects
that are attached to it are then scaled to the appropriate size, rotated
by the appropriate angle, and then translated so that the point of
attachment has the same coordinates as the point in the composite model to
which it is attached. Objects attached to the newly added objects are
then added in the same manner until all objects are positioned in the
composite model.

The foot of a ladder rests against a vertical wall and on a horizontal floor. The top of the ladder is supported from the wall by a horizontal rope 30 ft long. The ladder is 50 ft long, weighs 100 lb with its center of gravity 20 ft from the foot, and a 150 lb man is 10 ft from the top. Determine the tension in the rope.

**Fig. 1: ISAAC Problem (Schaum p. 25, no. 19)**

The above algorithm is sufficient provided that the attachments of
objects form a tree structure, which is the case for most of our
example problems. In the case of the ladder problem shown in Figure
1, however, a triangle must be solved in order to obtain the
sizes and rotations of the objects that form the triangle. The triangle is
detected by an * ad hoc* program that looks for triangles by checking
whether an object ` a` is attached to an object ` b` that is
attached to ` c`, which is attached to ` a`. If a triangle is
detected, the known parameters are abstracted and given to a triangle
solver, which returns the complete set of angles and sides of the
triangle. The returned parameters must be translated back to the form
needed for the definition of the object; in the example shown, the angle
between the ladder and the vertical wall is returned, but the angle from
the horizontal is needed as the rotation of the ladder.

The geometric model produced by ISAAC is entirely described by analytic geometry within a single planar coordinate system. Some lengths in the geometric model may be symbolic variables rather than numeric values; the solution of the equations produced for the physics problem in general produces values for these variables, so that numeric values will be available when the diagram is drawn. In general, however, this form of geometric model has weaknesses. Many geometric features of physics problems are unspecified in the problem statement. Although it would be possible to make a single, unified geometric model with symbolic values for the unspecified lengths, positions, and angles, to do so would complicate the algebra to such a degree that the equations would quickly become unsolvable, even with the help of symbolic algebra packages. Instead, we believe it would be better to have multiple local geometries that are locally precise regarding certain geometric features, with topological connections between the local geometries.

It is difficult to describe complex geometries in English.
The BEATRIX program [bulko:diss,novak-bulko90] understands physics
problems specified by a combination of English text and a diagram. In
general, neither modality provides a complete description of the
problem; it is necessary to produce a single, unified model that
incorporates information from both. This requires that
the understander establish * coreference* between parts of the text
and diagram that refer to the same object or feature. An example of a
problem understood by BEATRIX is shown in Figure 2.

Two masses are connected by a cable as shown in the figure. The strut is held in position by a cable. The incline is smooth, and the cable passes over a smooth peg. Find the tension in the cable for theta = 30 degrees and m1 = m2 = 20 kg. Neglect the weight of the strut.

**Fig. 2: Test Problem A2**

The input to BEATRIX is constructed by means of a user interface that allows drawing components to be selected and moved, scaled, and rotated as desired. The interface also allows entry of bits of text within the diagram, as well as entry and editing of the English problem statement. The drawing is displayed in a window as it is constructed. As a side effect, a symbolic description of the items in the diagram is constructed; this serves as input to the understanding program.

We have taken care to ensure that the diagram input consists of ``neutral'' components such as lines, circles, and rectangles -- a form of input that could reasonably be produced automatically from a printed diagram by a machine vision system [ballard-brown]. Thus, BEATRIX does not perform pixel-level analysis of diagrams, but is assumed to be one level above components, such as line detectors, that do pixel-level analysis.

Understanding of diagrams shares many of the difficulties of understanding natural language: ambiguity of the meanings of elements, ambiguity of combination of elements into groups, and underspecification. An element such as a line is ambiguous because it might represent an edge of an object, a shared boundary between two objects, or an object in itself (such as a cable). Groups of lines can often be combined in many possible ways, only a few of which are correct. Diagrams often omit things that can easily be inferred; for example, the attachment between a rope and an object that it supports is usually not represented by anything other than contact of the diagram elements. As in speech understanding [hearsay], ambiguity can be reduced by making use of various constraints:

- An item mentioned in the English text is expected to appear in the
diagram. Often, there will be only one diagram element that could match.
- As some objects are identified, the set of possible identifications
of the remaining objects is often reduced.
- Common-sense physical principles provide constraints. For example, a physical object is expected to be supported by something; a rope that terminates at an object is likely to be attached to it.

A person reading a physics problem probably will not read all of the
text, and only then look at the diagram, or * vice versa*; instead,
the person will typically look briefly at the picture, read some text,
look back at the picture, and so forth until the problem has been
understood. It is unlikely that any fixed order of processing would
suffice for a broad selection of problems, especially since a given
problem might be specified entirely by text, entirely by a diagram, or
by a combination of the two. For this reason, BEATRIX is organized
to allow * co-parsing* of the two input modalities, using the BB1
blackboard system [bb1-manual].

The input to BEATRIX consists of an analytic-geometry description
of the diagram in terms of points, lines, rectangles, and circles.
Although BEATRIX does not deal with a diagram represented as pixels,
it does perform some low-level analysis of details, * e.g.* to
determine whether one line terminates perpendicularly at another line,
or whether a line is approximately tangent to a circle.

In this section, we consider the various roles that a diagram may play in human problem solving. Some of these roles have previously been described in [larkin-simon:diag10k].

A central feature of human intelligence is limited short-term memory [miller:magic7]. By writing down intermediate results, a person frees limited short-term memory for other uses; writing and re-perceiving the intermediate results is much faster than memorizing them. Surely diagrams play such a role also. Consider the following problem: [This problem was taken from a precalculus midterm exam.]

It is known about a triangleThis is not a difficult problem, but it is hard to keep track of the components and their relationships mentally if no diagram (Figure 3) is available. The problem solver often will progressively annotate the diagram with derived results, making these results immediately available by inspection when needed. Such retrieval by inspection is often opportunistic,ABCthatAB = 1,angle C = 90,^{o}angle A = alpha.CDis the perpendicular fromCtoAB(see picture). (a) Find the angleBand the lengths ofAC,BC, andCD. (b) Findalphafor whichCD = 1/2.

In some cases, a diagram serves the function of a ``coordinate system'' or substrate, allowing the remainder of a problem to be described relative to a substrate that has been mentioned and which the reader is assumed to understand. For example,

A car leaves pointSince the natural language problem statement makes definite references to geometric features of the substrate, a detailed model of the substrate geometry is required to understand such problems.Aand drives north for 6 miles at 30 MPH ...

A punter located at his 40-yard line kicks a punt at an angle of 45 degrees to a receiver at the opposite 20-yard line ...

A fundamental problem faced by many A.I. systems is underspecification of a problem or situation. A diagram can help the problem solver to infer the correct context by encouraging the elaboration of elements normally associated with the diagram. A simple example of an underspecified problem is the following problem solved by ISAAC:

What force is required to lift one end of a pole?To a person, a drawing of a horizontal pole supported only by a force at one end ``looks wrong''; the exercise of drawing a free body diagram may help a human problem solver to consider all the relevant forces until the set of forces drawn on the body looks balanced. In the above problem, ISAAC assumes (by symbolic inference) that the other end of the pole is supported by a pivot; it also assumes that the pole is uniform (having its center of gravity at its geometric center) and that it has a nonzero length and weight. Physics problems often omit important geometric facts,

Larkin and Simon [larkin-simon:diag10k] describe ``perceptual''
inferences as the major advantage of the use of diagrams. While such
inferences (* e.g.* the fact that vertical angles formed by intersecting
lines are equal) can be made symbolically, they can be made at almost
no cost by perception.

Larkin and Simon describe perceptual inferences that are essentially
identical to symbolic inferences that can be made formally. While
perceptual inferences may suggest subproblems that can then be treated
formally (* e.g.*, the perception that vertical angles appear to be equal
may trigger the memory that this is indeed a theorem in geometry), we
believe that many perceptual inferences are made and accepted without
proof or even much thought. For example, in the problem of Figure
2 the problem solver will make the assumption that the
string is parallel to the inclined plane; this is unstated and cannot
be formally proved due to underspecification. Thus, we believe that
many perceptual inferences are of the form ``these things are
equal because they appear to be equal''.

A skilled problem solver will deliberately construct diagrams to
facilitate inference by recognition. For example, in the problem of
Figure 3, a skilled problem solver will draw the figure
so that the angle * * is clearly less than * 45 ^{o} *, thus
facilitating recognition that the angle

**Figure 3: Analytic Geometry Problem**

It appears that ``perceptual'' inferences are important in related domains, even when diagrams are not used. For example, a person skilled in performing mental arithmetic can perform the mental calculation:

by recognizing the problem as an instance of the pattern:4 / .97 ~= 4.12

The recognition that1 / (1 - epsilon) ~= (1 + epsilon), whereepsilonis small.

People seem to be able to perform at least the following kinds of recognition perceptually using diagrams:

- Parallel or perpendicular lines.
- Relative positions of objects (
*e.g.*above, below, left, right). - Objects that are similar under translation, scaling, rotation, or
some combination of these.
- Approximate equivalence of lengths, sizes, or angles.
- Relative sizes (smaller/larger) of lines or angles.
- Proportionality, especially division in half, of lines or angles.

Some inference rules used with diagrams seem almost to be ``plastic overlays'' that can be moved into position and added to a diagram. The right-hand rule of electromagnetic fields is such an operator, which may be invoked with actual movement of the hand. The rule that ``sine = opposite / hypotenuse'' can be thought of as a diagrammatic operator (Figure 4) that can be moved (mentally) into position and then used to add inferences directly to the diagram:

**Figure 4: Sine Rule Overlay**

An advantage of such diagrammatic operators is that they can be used
locally by making simple mental transformations such as translation,
rotation, and reflection to make the diagrammatic operator match the
existing diagram. Intermediate results that are written on the diagram
become available for subsequent use, so that a chain of such operations
can be used. For example, in the problem of Figure 3, the
sine rule can be applied to the large triangle to find that
* BC = sin(alpha) *; this value can then be used as input to a cosine
rule for the smaller triangle sharing the side * BC * to find
*CD = sin(alpha) * cos(alpha)*.

We have proposed [kook:diss,kook-novak-tkde] that the analysis of a physics problem should be represented not just as a set of equations, but as sets of correspondences between problem features and physical models. In more complex problems, a single object in the problem may have multiple views in different physical models. When represented as networks, the correspondence sets become large and complex. A diagram can serve as a compact representation of such correspondences, in which a particular part of the diagram carries (in one form or other) multiple labels.

One form of such multiple-view diagrams shows the formal geometric model drawn on a picture of the actual objects, as in the following example (Figure 5) of a claw hammer drawn as a lever problem ([miller], problem 64).

**Figure 5: Hammer Geometry**

Diagrams are often furnished with statements of physical laws
[gieck]. Such diagrams presumably facilitate retrieval
of the appropriate formula from memory when a similar problem diagram
is seen. In addition, the diagram facilitates matching between problem
features and corresponding features of the physical model because the
corresponding features appear in similar locations in each diagram.
Chi * et al.* [chi] found that novices sorted physics problems
primarily on the basis of diagrammatic similarities.

For example, consider a centrifugal force diagram, as given in [gieck], and a problem such as:

Given the gravitational constantGand the known facts about the orbit of the earth, calculate the mass of the sun.

**Figure 6: Centrifugal Force Law and Planet Problem**

These diagrams (Figure 6) immediately suggest the correct
correspondences: the sun corresponds to the center of the circle, the
earth to the mass (suggesting that the earth be ``coerced'' to a point
mass), the radius * r * to the earth-sun distance * d *, and the
velocity * v * to the velocity of the earth along its orbital path
(which then becomes a subproblem to be solved).

Larkin and Simon [larkin:cogsci] proposed the representation of problems and of physical situations as directed graphs and the use of graph-matching algorithms to find and invoke appropriate physical models. This may be difficult, both because graph matching is computationally intractable and because graphs are prevented from matching by missing or extra nodes. Diagram matching may be more useful because diagrams that represent physical principles can be indexed by their ``main'' features, such as circular motion, which are likely to have only one or a few matches in a given problem. In addition, a match between a diagram and a given problem need not be exact: extra elements in the problem do not matter, and missing elements can be ignored (if the corresponding variable is not used) or taken as subproblems to be found.

An interesting approach to teaching physics problem solving might be to
allow the student to select physical models for a given problem and to
instantiate those models by mouse clicks on corresponding points,
* e.g.* selecting the sun as the center of the circle, the earth
as the mass, etc. The result would be instantiated equations and
subproblems that remain to be solved. A solvable equation set could be
solved automatically by symbolic algebra. This would move the focus
of problem solving away from algebra and toward selection and
instantiation of physical models. Like the use of pocket calculators
(which seems to have caused students to lose the habit of mentally
checking the plausibility of answers that was required with the use of
a slide rule), such an approach might cause degradation of algebraic
skills along with improvement in modeling skills. However, we have
observed students rushing headlong into thickets of incorrect equations
(from which they never emerged) rather than reconsidering the sources
of the equations, so such a shift in emphasis may be beneficial.

A related research direction is machine learning of methods for analyzing particular kinds of problems based on correspondences, selected as described above, made by a physics expert. For example, the result of the correspondences described above between the centrifugal force law and the earth-sun system could be a general method for automatically producing the correspondences between a planetary system and the centrifugal force law; moreover, the fact that such a correspondence is known could suggest the use of this law in a new problem or even, by analogy, in a related problem such as the Bohr model of the hydrogen atom. Thus, learning of the method of application of physical principles could be a useful form of ``chunking'' that would allow future problems of a similar type to be solved more rapidly and automatically as a result of expert-guided practice.

Skilled problem solvers often use ``gedanken experiments'' involving diagrams to analyze problems; such analysis may be used to determine the direction of change in a system, equilibrium points, bounding points or extrema, connectivity between parts of a system, or how a change in one quantity will affect another. An excellent example is a method for determining whether a structural member of a bridge is under tension or compression: imagine the bridge with the member removed and imagine how the bridge would collapse. If the member would become shorter in the collapse (Figure 7), it must be under compression.

**Figure 7:** Removal of Bridge Member

The differences between human abilities to deal with diagrams and the
abilities of existing computer systems are striking:

Humans: | Computers: | |

Short-term memory: | Very limited | Large |

Perception: | Easy | Slow, difficult |

Calculation: | Poor; avoided | Easy, fast |

These differences, and especially the difficulty of perception of diagrams (still a research problem in itself), suggest that it would be unprofitable to try to duplicate human diagram processing directly. However, machine processing at a ``sketch'' level above the level of direct perception may be reasonable.

We believe that it is possible to identify a set of basic ``perceptual'' operators, analogous to those that people use with diagrams but implemented above the pixel level of an actual diagram, and to implement these to take advantage of the strengths of the computer.

A representation of geometric features such as lines, points, and circles by means of analytic geometry seems most appropriate for computer processing. Such a representation will not necessarily be exact; there may be small truncation errors, especially if the original input source was a pixel-based drawing program such as the one that provides input to BEATRIX. Nevertheless, such a representation should be sufficiently accurate to determine approximately such geometric features as a line terminating at another line, a line tangent to a circle, parallel lines, etc.

Geometric features such as lines should be connected, bidirectionally, with problem features that are represented symbolically. Sometimes geometric features represent actual objects, but in other cases they represent relationships (such as the earth-sun distance) or variable values. It must be possible to post values to the diagram representations; in this way, a diagram representation can serve the short-term memory function and allow opportunistic use of intermediate results that are ``read'' back from the diagram.

It should be possible to group geometric objects into (multiple) groups that are treated as units; for example, in the bridge problem above, two triangles formed of bridge members are treated as rigid bodies in analyzing how the bridge would collapse with a member removed.

A library of substrate models is essential if minimally specified problems, such as those found in textbooks, are to be understood. For example, a statement such as ``a ladder leans against a wall'' implies the existence not only of the wall, but also of a floor, and support of the bottom of the ladder by the floor. It is reasonable to assume that a prototypical diagrammatic representation of the typical spatial relationships of a ladder, wall, and floor is stored for use in understanding such problems. Careful reading of textbook problems shows that the reader is assumed to have and be able to employ such knowledge.

Perceptual operators (* e.g.* detection of parallel lines, triangles,
etc.) can operate at the analytic geometry level as special-purpose
programs distinct from the production rule or other symbolic analysis
system. We believe that ``noticing'' these features can be done
rapidly if done by special-purpose programs that perform only this
function. Noticing of features can be used to direct the attention
of the problem analysis system to make inferences based on observed
relationships. For example, the BEATRIX program uses the facts that
two lines are tangent to a circle to infer the existence of a pulley
system, resulting in the identification of the two lines as parts of
a single rope. In some cases, things that are noticed should be
assumed to be true (as when the rope appears to be parallel to the
inclined plane); in other cases, noticing an apparent equality could
trigger an attempt to prove equality by more rigorous methods.

Another kind of perceptual inference is relating of similar models. For example, in relating the earth-sun system to a circle, there are correspondences between the location of the sun and the center of the circle, between the earth-sun distance and the radius of the circle, etc. Here, a stored relationship between a physical relationship and a diagram of that relationship could be used to relate corresponding parts of two situations that have similar diagrams. For example, ``x orbits y'' corresponds to a circle diagram in which y is the center of the circle, x is a point on the circumference, and the distance between x and y is the radius. Given a similar diagram associated with a physical law (as in Figure 6), correspondences can be drawn between the elements that correspond to similar diagram elements. In this way, the diagrammatic representation becomes the basis for expressing the isomorphism between a problem situation and its physical model.

We have described uses of diagrams in existing programs that solve physics problems and considered additional ways in which diagrams appear to be used by humans. By implementing ``perceptual'' operations at a level below the operation of production rules or other symbolic reasoning, and by making use of relationships expressed as correspondences between diagrams, it may be possible to gain some of the advantages that humans derive from diagrams for computer problem-solving systems.

[ballard-brown]
Ballard, D. H. and Brown, C. M., * Computer Vision*,
Prentice-Hall, 1982.

[bulko:diss]
Bulko, W., * Understanding Coreference in a System for Solving
Physics Word Problems*, Ph.D. dissertation, Tech. Report
AI-89-102, A.I. Lab, CS Dept., Univ. of Texas at Austin, 1989.

[bb1-manual] Garvey, A., Hewett, M., Schulman, R., and Hayes-Roth, Barbara, ``BB1 User Manual -- Interlisp Verison'', working paper KSL 86-60, Knowledge Systems Lab, Stanford Univ., 1986.

[chi]
Chi, M., Feltovich, P., and Glaser, R., ``Categorization and
Representation of Physics Problems by Experts and Novices'',
* Cognitive Science*, vol. 5, no. 2 (April 1981), pp. 121-152.

[feynman:joking]
Feynman, R. P., * Surely You're Joking, Mr. Feynman!*,
New York: Norton, 1985.

[gieck]
Gieck, K., * Engineering Formulas*, 5th ed., McGraw-Hill, 1986.

[hearsay]
Erman, L. D., * et al.*, ``The Hearsay-II Speech-Understanding
System: Integrating Knowledge to Resolve Uncertainty'',
* ACM Computing Surveys*, vol 12, no. 2 (June 1980), pp. 213-253.

[kook:diss]
Kook, Hyung Joon, * A Model-Based Representational
Framework for Expert Physics Problem Solving*, Ph.D. dissertation,
Tech. Report AI-89-103,
A.I. Lab, C.S. Dept., Univ. of Texas at Austin, 1989.

[kook-novak-tkde]
Kook, Hyung Joon and Novak, G., ``Representation of Models for
Expert Problem Solving in Physics, * IEEE Trans. on Knowledge
and Data Engineering*, ** 3:**1, pp. 48-54, March 1991.

[larkin:cogsci]
Larkin, J. and Simon, H. A., ``Learning through Growth of Skill
in Mental Modeling'', * Proc. Cognitive Science Society*, 1981;
also in [simon:mot2].

[larkin:lmss]
Larkin, J., J. McDermott, D. Simon and H. A. Simon.
``Expert and Novice Performance in Solving Physics Problems'',
* Science*, 208 (20 June 1980), pp. 1335-1342.

[larkin-simon:diag10k]
Larkin, J. and Simon, H. A., ``Why a Diagram
is (Sometimes) Worth 10,000 Words'', * Cognitive Science*,
** 11:**65-99, 1987; also in [simon:mot2].

[miller]
Miller, F., * Progressive Problems in Physics*,
Boston: D.C. Heath, 1949.

[miller:magic7]
Miller, G. A., ``The Magical Number Seven, Plus or Minus Two'',
* Psychological Review*, ** 63:**81-97, 1956.

[novak:glisp]
Novak, G., ``GLISP: A LISP-Based
Programming System With Data Abstraction'',
* A.I. Magazine*, vol. 4, no. 3 (Fall 1983), pp. 37-47.

[novak:isaac]
Novak, G., ``Computer Understanding of Physics Problems Stated
in Natural Language'', * Am. J. Computational
Linguistics*, Microfiche 53, 1976.

[novak:ijcai]
Novak, G., ``Representations of Knowledge in a Program for
Solving Physics Problems'', * IJCAI*, 1977, pp. 286-291.

[novak-bulko90]
Novak, G. and Bulko, W., ``Understanding Natural Language with
Diagrams'', * Proc. Eighth National Conference on Artificial
Intelligence (AAAI-90)*, 1990, pp. 465-470.

[simon:mot2]
Simon, H. A., * Models of Thought*, vol. 2, Yale Univ. Press, 1989.