Gordon S. Novak Jr.

Department of Computer Sciences

University of Texas, Austin, TX 78712

Copyright © 1994 by AAAI.

Permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the AAAI.

This article appears in Diagrammatic Reasoning: Cognitive and Computational Perspectives, Janice Glasgow, N. Hari Narayanan, and B. Chandrasekaran, eds., AAAI Press / MIT Press, 1995, pp. 753-774.

Humans often use diagrams when solving physical problems; diagrams appear in physics books and serve as a means of formal communication in engineering. Diagrams are used because physical problems require the solution of geometric subproblems, but they serve many other roles. People find it easy to interpret diagrams; this is not the case for computer programs, where vision is an unsolved problem. The challenge for AI is to give programs the ability to reason with diagrams as humans do.

This paper describes three computer programs that use diagrams in solving physical problems. ISAAC, which understands and solves physics problems stated in English, constructs a geometric model that is equivalent to a free-body diagram for problem solving; it also constructs a diagram that serves to illustrate its understanding of the problem. BEATRIX understands physics problems specified by both English text and a diagram. The focus of this program is on understanding; the diagram and text must be understood together, and each helps to disambiguate the other. The VIP program allows a program to be specified by connections between diagrams of physical models; here, diagrams serve as a medium of communication that is natural to the user.

Section 5 discusses ways in which humans use diagrams. The final section proposes ways in which some of these uses of diagrams might be implemented in computer programs.

ISAAC [novak:isaac,novak:ijcai77] solves rigid body statics problems stated in English. Fig. 1 shows a problem that ISAAC can read, understand, and solve in less than one second. The diagram is produced from ISAAC's understanding of the English and from calculated values. A geometric model, similar to the diagram, is used in problem solving; in the geometric model most objects are reduced to lines and points.

The foot of a ladder rests against a vertical wall and on a horizontal floor. The top of the ladder is supported from the wall by a horizontal rope 30 ft long. The ladder is 50 ft long, weighs 100 lb with its center of gravity 20 ft from the foot, and a 150 lb man is 10 ft from the top. Determine the tension in the rope.

**Fig. 1: ISAAC Problem**

A problem statement in natural language is not a complete
description of the problem; it is only a minimal outline,
requiring the reader to fill in details. To construct a geometric
model sufficient for problem solving, many inferences must be
made. Consider the problem statement of Fig.
1 [schaum]: it says ``a 150 lb man *is 10 ft from
the top*''. The definite noun phrase *the top* denotes the top of
the ladder. *10 ft from the top* must be a location on the
ladder, not just any location that is 10 ft from the top of the
ladder. Finally, the man *is* at this
location; this must be interpreted as an attachment by contact between
the feet of the man and the ladder, with the ladder supporting the
man. Some of these inferences may be viewed as linguistic, but
others must be based on geometric knowledge about the objects,
knowledge about typical spatial relationships, and
common-sense physics.
ISAAC makes these inferences in several steps. A statement that an
object ``is'' at a location on another object is interpreted as an
attachment. A location relative to a point on an object is assumed
to be toward the center of the object. When ISAAC
writes physics equations, it finds that the ladder is
supported at two points and that the man has a specified weight; it
therefore assumes that the ladder supports the man. Finally, in
drawing the diagram, ISAAC assumes that a person is supported at the
feet.

ISAAC generates a symbolic geometric model for problem solving and a symbolic diagram model for drawing the diagram. For each type of object, ISAAC has a geometric model, including dimensions of a bounding box, names and coordinates of interesting points on the object, a program to draw a picture of the object, the name of a parameter of the object that indicates its size, and a program to estimate the size for the drawing if no size is specified.

The geometry of an individual object within a model is specified by its object geometry, the location of a reference point, the rotation of the object about the reference point, and the vector size of the object. These data are sufficient for calculation of the location of any named point on the object and for drawing it. To make a geometric or diagram model, these data must be determined for each object. The geometric and diagram models are similar, except for the following features:

- The geometric model is a ``skeleton'' model, in which most objects
are represented by lines or points.
The diagram model requires actual sizes and points of attachment for
each object.
- The geometric model may contain symbolic variables and algebraic expressions in its coordinate values; the desired solution to a problem may be the value of a geometric variable. In the diagram, all coordinates must be numeric. The solution to the physics problem often provides numeric values for variables; when it does not, default values are assigned.

Before the diagram and geometric models are made, the model of the
problem is a semantic network containing symbolic
descriptions of objects, their properties, and their relationships.
The diagram model is constructed from this network in
the following way. In most rigid body statics problems, all of the
objects are attached to each other; thus, the objects and attachments
form a connected graph. A single object is chosen and assigned the
coordinates `(0 0)`; the objects that are attached to it are then
scaled to the appropriate size, rotated by the appropriate angle, and
translated to the point of attachment to the composite model. Of
course, these are vector operations on points, rather than
manipulation of an image. Objects that are attached to a newly
added object are then added to the diagram in the same manner.

This algorithm is sufficient if the attachments of objects form a tree
structure, which is the case for most of our example problems. In the
case of the ladder problem shown in Figure 1, however, a
triangle must be solved. The triangle is detected by an *ad hoc*
program that tests whether some object `a` is attached to an
object `b` that is attached to `c`, which is attached to `
a`. If a triangle is detected, the known parameters are abstracted
and given to a triangle solver, which returns the complete set of
angles and sides of the triangle. The returned parameters must then
be translated back to the form needed for the definition of the
object.

ISAAC's geometric model is based on analytic geometry in a single planar coordinate system. For more general application, this form of geometric model may be inadequate. Many geometric features of a physics problem may be unspecified in the problem statement. Although it would be possible to make a single, unified geometric model with symbolic values for all unspecified lengths, positions, and angles, to do so would greatly complicate the algebra. It would be better to have multiple locally precise geometries. Connections between local geometries could be topological rather than exact; exact geometries are needed only when relative distances between locations within separate local geometries are required, and such cases are unusual.

It is difficult to describe geometry using natural language.
BEATRIX [novak-bulko90,novak-bulko93]
understands physics problems specified by English text and a diagram.
Often neither text nor diagram is a complete
description; a unified model must be produced from both.
*Coreference* must be established between parts of the text and the
diagram that refer to the same object or feature. Fig. 2
shows an example understood by BEATRIX.

Two masses are connected by a light string as shown in the figure. The incline and peg are smooth. Find the acceleration of the masses and the tension in the string for theta = 30 degrees and m1 = m2 = 5 kg.

**Figure 2:** BEATRIX Example.

Two masses are connected by a cable as shown in the figure. The strut is held in position by a cable. The incline is smooth, and the cable passes over a smooth peg. Find the tension in the cable for theta = 30 degrees and m1 = m2 = 20 kg. Neglect the weight of the strut.

**Figure 3:** BEATRIX Example.

BEATRIX's user interface allows diagram elements to be selected, moved, scaled, and rotated as desired; it also allows entry of text within the diagram. A symbolic description of the diagram is constructed for input to the understanding program. The diagram input consists of ``neutral'' components such as lines, circles, and rectangles -- input that could be produced from a printed diagram by a machine vision system [ballard-brown].

Many difficulties of understanding natural language are also
present with diagrams: ambiguity of meanings of
elements, ambiguity of combination of elements, and
underspecification. An element such as a line is ambiguous
because it might represent an edge of an object, or an object itself
(*e.g.*, a cable). Lines
may be combined in many ways, only a few of which are meaningful.
Diagrams often omit things that can be inferred by the reader:
the attachment between a rope and an object that it supports is often
represented only by contact. As in speech
understanding [hearsay], ambiguity can be reduced by using
several kinds of constraints:

- An object mentioned in the text is expected to
appear in the diagram.
- As objects are identified, identifications
of other objects are constrained.
- Common-sense physics provides constraints: an object is expected to be supported; a rope terminating at an object is probably attached to it.

A person reading a physics problem will
alternate attention between the diagram and text.
No fixed order of processing suffices for all problems, since a
problem might be specified entirely by text, entirely by a diagram, or
by some combination. For this reason, BEATRIX performs
*co-parsing* of the two modalities, using the BB1
blackboard system [bb1-manual].

The diagram input consists of points, lines, rectangles, and circles
described by analytic geometry. BEATRIX performs low-level analysis
of details, *e.g.* to determine whether a line is approximately
tangent to a circle. The diagram is parsed by knowledge
sources (KS's) that recognize special combinations of picture elements,
as in a picture grammar [ksfu74]. A diagram is inherently
ambiguous: it may omit objects or details,
exaggerate features, or include descriptive elements that
are not objects (*e.g.*, arrows used to show dimensions of
objects). BEATRIX opportunistically
combines related elements based on expectations of typical
combinations; for example, if two lines meet at an acute
angle, and there is a variable name that typically denotes an angle
(such as `theta`) inside and near the vertex, then these elements
will be grouped as an `angle`.

**Figure 4:** `theta` is part of angle, but `N` is not.

As parts of the diagram are interpreted, they trigger other KS's.
For example, after a small circle with a line to its center has been
interpreted as a `pulley`, a KS is triggered to look for lines
tangent to the pulley that represent a rope; the two lines that
represent the rope are grouped into a single `rope` object, with
the distal endpoints identified as its ends.
This, in turn, triggers additional inferences: the ends of a rope are
expected to be attached to objects or surfaces. When a KS can
interpret part of the diagram, it *obviates*
(removes from the execution queue) other KS's that might
attempt alternative interpretations.

The diagram parsing KS's also trigger expectations
for natural language processing. For example, identification of a
`contact` between a mass and a surface triggers an expectation
that a normal force and a coefficient of friction may be specified in
the English text. Such an expectation is necessary to
interpret a definite noun phrase such as ``the coefficient of
friction'' if the `contact` appears only in the diagram.

Diagram parsing continues until no further interpretations can be
made. Fig. 5 illustrates the features that are identified
by diagram parsing in an example problem; most of the `touch`
relations and some `contact` relations are omitted for
readability.

**Figure 5:** Interpretation after Diagram Parsing

The understanding module of BEATRIX combines the parsed English text and the parsed diagram, establishing coreference between them to produce a unified model. For example, the text might say ``the coefficient of friction is 0.25'', referring to a contact between a block and an inclined plane shown in the diagram. The friction value must be associated with the contact relation that was derived from the diagram. The KS's of the understanding module also make inferences based on common-sense physics. For example, BEATRIX infers that the rotation of an edge of an object is the same as that of a surface on which it rests, or that an object hanging from a rope hangs directly below it. Contact between an object and a surface is assumed to be a frictional touch contact, while contact between a rope and an object that it supports is assumed to be an attachment. Such inferences are important, since both text and diagrams often omit things that an intelligent reader can infer.

Priority ratings cause KS's with the best input data to
execute first. For example, *Identify-Masses* gives itself a
high rating if there is only one mass object it could match. Default
KS's are triggered at a low priority to provide
default values or to move objects that are mentioned in only one input
modality to the unified level. Low-level KS's are triggered by the
problem statement and diagram, while the higher-level KS's are
triggered by the output of the low-level KS's.

A diagram represents much more information than is shown explicitly.
Understanding a diagram is not a passive process of absorbing what is
plainly in the diagram, but is an active process of model construction
and inference, using the diagram as an outline of the model to be
constructed. ISAAC demonstrated a similar finding with English text.
Brevity gives diagrams their power but also presents a challenge for
diagram understanding by computer. If much of the
understanding of a diagram must be inferred from the reader's
knowledge, then that knowledge and the procedures to use it must be
part of a diagram understanding program.
It must be possible to resolve ambiguities to produce
the most likely interpretation. Opportunistic identification is
based not only on syntactic relationships, but also on
world knowledge or common-sense physics (*e.g.*, identification
of a square as a mass implies that a line
coincident with the bottom of the square must be a surface,
not a rope).

A program called VIP (View Interactive Programming) [novak:caia94] allows a user to construct a computer program by making connections between diagrams that represent physical and geometric principles. The user can select physical laws, geometric principles, and physical constants and add them to a workspace. Connections between variable buttons in the diagrams can be made by clicking on each button with the mouse; a connection signifies that the variables are equal.

**Figure 6:** Calculating the Mass of the Sun

Fig. 6 shows how VIP can be used to calculate the mass of
the sun. The initial workspace contains only a default `output`
variable. The user follows Newton's
reasoning: the gravitational attraction of the earth by the sun is
equal to the force required to keep the earth in its
orbit. The user selects a `gravitation` principle and a
`centrifugal-force` principle from the `physics` menu and adds
them to the window. The user clicks the mouse on the `f` button of
each diagram, which causes a line to be drawn between them
and signifies that the forces are equal. The user
selects constants for the mass of the earth and the earth-sun distance
and connects these to the two diagrams. The `output` box is
connected to the other mass in the gravitation diagram. After these
actions, only the velocity `v` of the earth in its orbit remains
unspecified. This can be found by noting that the earth travels around
the sun in one year. The user selects a `circle` diagram from the
`geometry` menu, connects its radius to the earth-sun distance,
and divides its circumference by a time constant of one year.
This gives a fully specified diagram.

A program is derived from the diagram by data flow. Initially, input
variables and constants are assumed to be defined. A variable that is
defined is propagated into boxes to which it is
connected. When a value is propagated into a box, equations
associated with the box are examined to see if any can
be solved. Solutions to equations produce the values of other variables,
which are also propagated. When a value is propagated into the
`output`, the program is complete. Compilation
of this program (in the GLISP language [novak:glisp]) produces an
executable program.
In this case, the compiler reduces all the equations to the numeric
answer (in kilograms): `(LAMBDA () 1.9660057E30)`

VIP can also be used to construct new physical principles that are
combinations of existing ones; for example, the above analysis can
be abstracted as an `orbital-system` principle.

VIP allows problems to be specified by correspondences of features of diagrams. Although equations and algebraic manipulation are involved, they are hidden and are performed automatically. The equations do not have to be memorized. Units of measurement are converted automatically as needed. Subproblems, such as finding the velocity of the earth in its orbit, can be solved using the system itself. Using VIP is clearly faster than doing algebra by hand; however, VIP is much easier to use if the diagrams ``fit'' a given problem than if they do not. For example, consider the problem:

A block rests on a horizontal board. The board is gradually tilted upward and the block just begins to slide down the board when the angle of inclinationA diagram by a human problem solver will depict forces so that they can easily be related to the physical situation. In Fig. 7, it is clear that the weight force can be viewed as a normal force and a force acting to move the block down the board.thetais21... Find the coefficient of static friction^{o}u. [schaum]_{s}

**Figure 7:** Friction Diagram

**Figure 8:** Friction Problem using VIP

This problem can be solved using VIP, as shown in Fig. 8.
However, the correspondence between the VIP diagrams and
the geometry of the problem is poor. The triangle shown in the
VIP diagram is the same triangle shown in Fig. friction, but its
orientation does not match the physical situation. It is difficult
for the user to determine whether `x` and `y` in the triangle
diagram should respectively match `n` and `f` in the friction
model, or *vice versa*; the user might have to draw a
diagram on paper.
Simply having
diagrams is not enough: if the diagrams do not correspond well to the
actual geometry of the problem, then diagrammatic inferences cannot be
performed, and the diagrams will be as disconnected from the problem
as a symbolic representation would be. Larkin and Simon
[larkin-simon:diag10k] note that in humans, a production is
easily triggered only if there is a close match between stimulus
conditions and its triggering conditions.

VIP would be more useful if its diagrams were more like those drawn by humans. Several improvements can be identified:

- The orientation and size of a diagram should be variable so
that the diagram can match the problem geometry.
VIP should have multiple ways to draw a triangle, or
better, an ability to adapt the triangle in size and orientation to
parts of an existing diagram.
- It should be possibly to overlay diagrams. In the diagram of
Fig. sunfig, the three diagrams shown (circle, centrifugal
force, and gravitation) all refer to the same physical space.
Correspondences are shown as lines between them, but it would be
better to overlay these diagrams so that the corresponding parts
would be identical.
- Human problem solvers often replace variables in equations and on diagrams so that the number of variables used is minimized.

We have used VIP to develop small but realistic scientific programs
[novak:caia94]. Abelson *et al.* [abelson89] envision
an automatic engineering assistant; surely such a system should use
diagrams to communicate with its user. It would be interesting to try
teaching physics problem solving using VIP or a similar system. This
would move the focus of problem solving away from algebra and toward
conceptualization of the problem by selection and instantiation of
physical models. VIP could also be used to investigate the
effectiveness of different kinds of diagrams for human problem
solvers: experience quickly demonstrated that diagrams are
much less useful if they are not isomorphic to the problem
geometry.

Another research direction is machine learning of methods for analyzing problems based on correspondences selected using VIP by a physics expert. Learning of the method of application of physical principles could be a useful form of ``chunking'' that would allow future problems of a similar type to be solved automatically as a result of practice [araya].

Diagrams play many roles in human problem solving. Larkin and Simon [larkin-simon:diag10k] describe psychological and computational advantages of diagrammatic reasoning for human problem solvers:

- Diagrams guide attention from one element to related elements;
they reduce search because related elements are usually close together.
- Diagrams minimize labeling: information about an element is
near it.
- Diagrams facilitate perceptual inferences and recognition of
problem-solving methods that may be applicable.
- Diagrams allow quick checks that the analysis is proceeding correctly.

A central feature of human intelligence is limited short-term memory [miller:magic7]. By writing down intermediate results, a person releases limited short-term memory for other uses. Writing and re-perceiving intermediate results is much faster and more reliable than memorizing them; pencil and paper serve a role analogous to that of a paging disk in a computer[larkin:lmss]. Surely diagrams also play such a role. Because people find it easy to perceive diagrams, a diagram can serve as short-term memory for intermediate geometric results. A human problem solver progressively annotates the diagram with results, making those results available by inspection when needed. Retrieval by inspection is often opportunistic, without prior planning to use the retrieved values. Indeed, one strategy for solving a problem is to perform forward reasoning, deducing geometric results that can be derived easily and adding them to the diagram, until the diagram contains the desired answer.

A mental picture can serve as a ``coordinate system'' or geometric substrate, allowing the remainder of a problem to be described relative to the substrate. For example,

A car leaves pointSince the natural language problem statement refers to geometric features of the substrate, a mental model of the substrate is required to understand such a problem.Aand drives north for 6 miles ...

A punter located at his 40-yard line kicks a punt at an angle of

45^to a receiver at the opposite 20-yard line ...

A difficulty faced both by humans and by AI systems is understanding an underspecified problem. Physics problems are often underspecified both geometrically and in terms of the physical principles needed for solution. A diagram can help the problem solver to infer the correct context by encouraging elaboration of elements normally associated with the diagram. An underspecified problem that is solved by ISAAC is shown in Fig. 9.

What force is required to lift one end of a pole?

**Figure 9:** Underspecified Problem

To a person, a drawing of a horizontal pole supported only by a force
at one end ``looks wrong''; the exercise of drawing a free body
diagram may help a human problem solver to consider all the relevant
forces until the set of forces drawn on the body appears to be
balanced. In this problem, ISAAC introduces (by symbolic
inference) a pivot to support the other end of the pole.
Physics problems often omit important geometric facts, *e.g.* that
objects rest on the surface of the earth, or that walls are vertical
planes that are bounded below by horizontal floors.

Larkin and Simon [larkin-simon:diag10k] describe ``perceptual''
inferences as a major advantage of the use of diagrams. While such
inferences (*e.g.* the fact that vertical angles formed by
intersecting lines are equal) can be made symbolically, they can be
made at almost no cost by perception. [larkin-simon:diag10k]
describes perceptual inferences that are identical to
symbolic inferences that can be made formally. While perceptual
inferences may suggest subproblems to be treated formally
(*e.g.*, the perception that vertical angles appear to be equal
may trigger the memory that this is indeed a theorem in geometry),
humans often make perceptual inferences without proof or
even much thought. For example, in the problem of Fig.
example1 the problem solver will make the assumption that the
string is parallel to the inclined plane; this is unstated and thus
*cannot* be proved.

A skilled problem solver deliberately constructs diagrams that
facilitate inference by recognition. In the problem of
Fig. 10, a skilled problem solver will draw the figure so
that the angle * * is clearly less than *45^ *; this
will increase the size contrast between angle * * and angle
ABC, facilitating recognition that the angle * ' * is the
same as * *. While this can be proved, the problem solver
will probably assume that angles that appear to be equal are in fact
equal.

**Figure 10:** Analytic Geometry Problem

It appears that perceptual inferences are important in other domains, even when diagrams are not used. For example, a person skilled in performing mental arithmetic can perform the mental calculation:

by recognizing this problem as an instance of the pattern:4 / .97 ~= 4.12

The recognition that1 / (1 - epsilon) ~= (1 + epsilon), whereepsilonis small.

If somebody comes along and wants to divide 1 by 1.73, you can tell them immediately that it's .577,because you noticethat 1.73 is nearly the square root of 3, so1 / 1.73must be one-third of the square root of 3. [feynman:joking] [emphasis added.]

People seem to be able to recognize at least the following relationships from diagrams:

- Parallel or perpendicular lines.
- Relative positions of objects (
*e.g.*above, below, left, right). - Objects that are similar under translation, scaling and/or
rotation.
- Approximate equivalence of lengths, sizes, or angles.
- Relative sizes (smaller/larger) of lines or angles.
- Proportionality, especially division in half, of lines or angles.

Abelson *et al.* [abelson89] describe the use of machine
vision algorithms to recognize partitions of phase space in
simulations of dynamical systems. Because such a simulation produces
point values rather than trajectories, partitions cannot be derived
directly. However, given a large number of points, the lines can be
recognized by machine vision algorithms. This is especially
interesting as a case where even a computer needs a ``mind's eye'' to
recognize the qualitative structure of a problem.

Some inference rules seem almost to be ``plastic overlays'' that can be moved into position and added to a diagram. The right-hand rule of electromagnetic fields often is invoked with actual movement of the hand. The rule that ``sine = opposite / hypotenuse'' can be thought of as a diagrammatic operator (Fig. 11) that can be mentally moved into position and then used to add inferences directly to a diagram.

**Figure 11:** Sine Rule Overlay

An advantage of such diagrammatic operators is that they can be used
locally by making simple mental transformations such as translation,
rotation, and reflection to make the diagrammatic operator match the
existing diagram. Intermediate results that are written on the diagram
become available for subsequent use. For example, in the problem of
Fig. precalc, the sine rule can be applied to the large triangle
to find that *BC = sin(alpha)*; this value can then be used with a
cosine rule for the smaller triangle to find
*CD = sin(alpha) * cos(alpha)*.

We have proposed [kook:diss,kook-novak-tkde] that the analysis of
a physics problem should be represented not just as a set of
equations, but as sets of correspondences between problem features and
physical models. Solving a physics problem is not simply a matter of
logical deduction (in which necessarily true results are derived from
given premises), but a constructive process in which the given facts
are elaborated by additional assumptions and physical models. In some
problems, a single object will have multiple
views as parts of different physical models. When represented
symbolically, the correspondence sets become large and complex; a
diagram can serve as a compact representation of such correspondences.
Larkin and Simon [larkin-simon:diag10k] note *minimizing
labeling* as an advantage of diagrams. Human problem solvers also
strive to minimize the number of variables used in equations. By
transferring variable names from one part of a diagram to another, the
same variable name can play a role in
multiple physical models. A diagram may thus represent an overlaying
of diagrams for physical models and actual objects.

Diagrams are often included with statements of physical laws [gieck]; they presumably facilitate retrieval of the appropriate formulas from memory when a similar problem diagram is seen. In addition, the diagram facilitates matching between problem features and corresponding features of the physical model because the corresponding features appear in similar locations in each diagram. Consider the problem:

Given the gravitational constantGand the known facts about the orbit of the earth, calculate the mass of the sun.

**Figure 12:** Centrifugal Force Law and Planet Problem

In Fig. 12, the diagram on the left is as shown in
[gieck], while the diagram on the right is drawn to correspond to
it. These diagrams immediately suggest that
the sun corresponds to the center of the circle, the earth to the mass
(suggesting that the earth be ``coerced'' to a point mass), the radius
*r * to the earth-sun distance *d *, and the velocity *v * to the
velocity of the earth (which then becomes a subproblem).

Larkin and Simon [larkin:cogsci] proposed the representation of problems and of physical situations as directed graphs and the use of graph-matching algorithms to find and instantiate appropriate physical models. This may be difficult, both because graph matching is computationally intractable and because missing or extra nodes prevent graphs from matching. Diagram matching may be more useful because diagrams that represent physical principles can be indexed by major features such as circular motion, which are likely to have only a few matches in a given problem. A match between a diagram and a given problem need not be exact: extra elements in the problem do not matter, and missing elements can be ignored (if not used) or taken as subproblems.

Skilled problem solvers often use *gedanken* (thought) experiments
involving actual or imagined diagrams to determine:

- the direction of change in a system,
- equilibrium points, bounding points or extrema,
- connectivity, by tracing connecting paths on the diagram,
- how a change in one quantity will affect another, and in what direction.

**Figure 13:** Removal of Bridge Member

The preceding sections have described uses of diagrams in programs that solve physical problems, as well as uses of diagrams by humans. The power that diagrams give to human problem solving motivates consideration of how similar uses of diagrams could be incorporated into computer programs. The difficulty of machine perception of diagrams suggests that it would be unprofitable to try to duplicate human diagram processing directly. However, machine processing at a ``sketch'' level above the level of direct perception may be reasonable.

A set of basic perceptual operators, analogous to those that people use with diagrams but implemented above the pixel level of an actual diagram, might be implemented to take advantage of the strengths of the computer. A representation of geometric features such as lines, points, and circles by means of analytic geometry seems most appropriate for computer processing. Such a representation should be sufficiently accurate to determine such features as a line terminating at another line, a line tangent to a circle, parallel lines, etc.

Geometric features should be connected, bilaterally, with problem features that are represented symbolically. Sometimes geometric features represent objects, but in other cases they represent relationships (such as the earth-sun distance) or variable values. It must be possible to post values to the diagram representation; in this way, the diagram can serve the short-term memory function and allow opportunistic use of intermediate results that are ``read'' back from the diagram. The propagation of results by VIP is an example of posting results to a diagrammatic model.

It should be possible to group geometric objects into larger units; for example, in the bridge problem of Fig. bridge, two triangles formed from bridge members are treated as rigid bodies in visualizing how the bridge would collapse. The VIP model of the earth-sun system in Fig. sunfig shows that aspects of the geometry of the system are used in several separate models. These separate models are needed for the analysis; however, it would be better to have only a single diagram that unifies all the models rather than three diagrams with connections between them.

A library of geometric models is essential if minimally specified problems are to be understood. The statement ``a ladder leans against a wall'' implies the existence of a floor that supports the bottom of the ladder. It is reasonable to assume that a prototypical representation of the spatial relationships of a ladder, wall, and floor is stored; textbook problems show that a reader is assumed to have such knowledge.

Perceptual operators (*e.g.* detection of parallel lines) can
operate at the analytic geometry level as special-purpose programs
distinct from production rules or other symbolic analysis.
``Noticing'' these features can be done rapidly by special-purpose
programs that perform only this function. Such noticing is a *
signal-to-symbol transformation* [nii:hasp] that converts analog
values into symbolic values that can trigger productions. When
Feynman noticed that 1.73 is almost the square root of 3, this
triggered a production for problems involving a square root; 1.73 is
an analog or ``signal'' value, while the concept of ``square root'' is
symbolic. Noticing can direct attention to inferences based on
observed relationships. For example, BEATRIX notices that two lines
are tangent to a circle and infers the existence of a pulley system.
Some things that are noticed can be assumed to be true, while others
can trigger an attempt to prove what was noticed by more rigorous
methods.

Perceptual inference also includes relating of similar models. In relating the earth-sun system to a circle, there are correspondences between the location of the sun and the center of the circle, between the earth-sun distance and the radius of the circle, etc. A stored relationship between a physical principle and a diagram could be used to relate corresponding parts of two situations that have similar diagrams. In this way, the diagrammatic representation becomes the basis for expressing the isomorphism between a problem situation and its physical model.

We have described uses of diagrams in programs that solve physics problems and have considered ways in which diagrams are used by humans. By implementing perceptual operations at a level below the operation of symbolic reasoning and by making use of correspondences between diagrams, it may be possible to gain the advantages that humans derive from diagrams for computer problem-solving systems.

This research was supported in part by the U.S. Army Research Office under contract DAAG29-84-K-0060. Computer equipment used in this research was donated by Hewlett Packard and Xerox Corporation.

[abelson89]
Abelson, H., *et al.*, ``Intelligence in Scientific Computing'',
*Communications of the ACM*, vol. 32, no. 5 (May 1989),
pp. 546-562.

[araya] Araya, A., ``Learning by Practice using Experimentation and Generalization Techniques'', Ph.D. dissertation, Univ. of Texas at Austin, Dec. 1984.

[ballard-brown]
Ballard, D. H. and Brown, C. M., *Computer Vision*,
Prentice-Hall, 1982.

[bulko:diss]
Bulko, W., *Understanding Coreference in a System for Solving
Physics Word Problems*, Ph.D. dissertation, Tech. Report
AI-89-102, A.I. Lab, CS Dept., Univ. of Texas at Austin, 1989.

[bb1-manual] Garvey, A., Hewett, M., Schulman, R., and Hayes-Roth, Barbara, ``BB1 User Manual -- Interlisp Version'', working paper KSL 86-60, Knowledge Systems Lab, Stanford Univ., 1986.

[chi]
Chi, M., Feltovich, P., and Glaser, R., ``Categorization and
Representation of Physics Problems by Experts and Novices'',
*Cognitive Science*, vol. 5, no. 2 (April 1981), pp. 121-152.

[feynman:joking]
Feynman, R. P., *Surely You're Joking, Mr. Feynman!*,
New York: Norton, 1985.

[formulae]
*Over 1000 Physics Formulae*, New York: Kampmann & Co, 1984.

[ksfu74]
Fu, K. S., *Syntactic Methods in Pattern Recognition*,
Academic Press, 1974.

[gieck]
Gieck, K., *Engineering Formulas*, 5th ed., McGraw-Hill, 1986.

[hearsay]
Erman, L. D., *et al.*, ``The Hearsay-II Speech-Understanding
System: Integrating Knowledge to Resolve Uncertainty'',
*ACM Computing Surveys*, vol 12, no. 2 (June 1980), pp. 213-253.

[kook:diss]
Kook, Hyung Joon, *A Model-Based Representational
Framework for Expert Physics Problem Solving*, Ph.D. dissertation,
Tech. Report AI-89-103,
A.I. Lab, C.S. Dept., Univ. of Texas at Austin, 1989.

[kook-novak-tkde]
Kook, Hyung Joon and Novak, G., ``Representation of Models for
Expert Problem Solving in Physics, *IEEE Trans. on Knowledge
and Data Engineering*, ** 3:**1, pp. 48-54, March 1991.

[larkin:cogsci]
Larkin, J. and Simon, H. A., ``Learning through Growth of Skill
in Mental Modeling'', *Proc. Cognitive Science Society*, 1981;
also in [simon:mot2].

[larkin:lmss]
Larkin, J., J. McDermott, D. Simon and H. A. Simon.
``Expert and Novice Performance in Solving Physics Problems'',
*Science*, 208 (20 June 1980), pp. 1335-1342.

[larkin-simon:diag10k]
Larkin, J. and Simon, H. A., ``Why a Diagram
is (Sometimes) Worth 10,000 Words'', *Cognitive Science*,
** 11:**65-99, 1987; also in [simon:mot2].

[miller]
Miller, F., *Progressive Problems in Physics*,
Boston: D.C. Heath, 1949.

[miller:magic7]
Miller, G. A., ``The Magical Number Seven, Plus or Minus Two'',
*Psychological Review*, ** 63:**81-97, 1956.

[nii:hasp]
Nii, H., E. Feigenbaum, J. Anton, and A. Rockmore,
``Signal-to-Symbol Transformation: HASP/SIAP Case Study,''
*AI Magazine*, ** 3:**2, Spring 1982, pp. 23-35.

[novak:glisp]
Novak, G., ``GLISP: A LISP-Based
Programming System With Data Abstraction'',
*A.I. Magazine*, vol. 4, no. 3 (Fall 1983), pp. 37-47.

[novak:isaac]
Novak, G., ``Computer Understanding of Physics Problems Stated
in Natural Language'', *Am. J. Computational
Linguistics*, Microfiche 53, 1976.

[novak:ijcai77]
Novak, G., ``Representations of Knowledge in a Program for
Solving Physics Problems'', *IJCAI*, 1977, pp. 286-291.

[novak-bulko90]
Novak, G. and Bulko, W., ``Understanding Natural Language with
Diagrams'', *Proc. Eighth National Conference on Artificial
Intelligence (AAAI-90)*, 1990, pp. 465-470.

[novak-bulko93]
Novak, G. and Bulko, W., ``Diagrams and Text as Computer Input'',
*Journal of Visual Languages and Computing* (1993) ** 4**,
161-175.

[novak:caia94]
Novak, G., ``Generating Programs from Connections of Physical Models'',
*Proc. 10th Conf. on Artificial Intelligence for Applications*
(CAIA-94), March 1994, pp. 224-230 (IEEE Computer Society Press).

[simon:mot2]
Simon, H. A., *Models of Thought*, vol. 2, Yale Univ. Press, 1989.

[schaum]
van der Merwe, C. W., *Schaum's Outline of Theory and Problems
of College Physics*, McGraw-Hill, 1961.