Diagrams for Solving Physical Problems

Gordon S. Novak Jr.

Department of Computer Sciences
University of Texas, Austin, TX 78712

Copyright © 1994 by AAAI.

Permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the AAAI.

This article appears in Diagrammatic Reasoning: Cognitive and Computational Perspectives, Janice Glasgow, N. Hari Narayanan, and B. Chandrasekaran, eds., AAAI Press / MIT Press, 1995, pp. 753-774.


Humans often use diagrams when solving physical problems; diagrams appear in physics books and serve as a means of formal communication in engineering. Diagrams are used because physical problems require the solution of geometric subproblems, but they serve many other roles. People find it easy to interpret diagrams; this is not the case for computer programs, where vision is an unsolved problem. The challenge for AI is to give programs the ability to reason with diagrams as humans do.

This paper describes three computer programs that use diagrams in solving physical problems. ISAAC, which understands and solves physics problems stated in English, constructs a geometric model that is equivalent to a free-body diagram for problem solving; it also constructs a diagram that serves to illustrate its understanding of the problem. BEATRIX understands physics problems specified by both English text and a diagram. The focus of this program is on understanding; the diagram and text must be understood together, and each helps to disambiguate the other. The VIP program allows a program to be specified by connections between diagrams of physical models; here, diagrams serve as a medium of communication that is natural to the user.

Section 5 discusses ways in which humans use diagrams. The final section proposes ways in which some of these uses of diagrams might be implemented in computer programs.

Geometric Reasoning in ISAAC

ISAAC [novak:isaac,novak:ijcai77] solves rigid body statics problems stated in English. Fig. 1 shows a problem that ISAAC can read, understand, and solve in less than one second. The diagram is produced from ISAAC's understanding of the English and from calculated values. A geometric model, similar to the diagram, is used in problem solving; in the geometric model most objects are reduced to lines and points.

The foot of a ladder rests against a vertical wall and on a horizontal floor. The top of the ladder is supported from the wall by a horizontal rope 30 ft long. The ladder is 50 ft long, weighs 100 lb with its center of gravity 20 ft from the foot, and a 150 lb man is 10 ft from the top. Determine the tension in the rope.

Fig. 1: ISAAC Problem

A problem statement in natural language is not a complete description of the problem; it is only a minimal outline, requiring the reader to fill in details. To construct a geometric model sufficient for problem solving, many inferences must be made. Consider the problem statement of Fig. 1 [schaum]: it says ``a 150 lb man is 10 ft from the top''. The definite noun phrase the top denotes the top of the ladder. 10 ft from the top must be a location on the ladder, not just any location that is 10 ft from the top of the ladder. Finally, the man is at this location; this must be interpreted as an attachment by contact between the feet of the man and the ladder, with the ladder supporting the man. Some of these inferences may be viewed as linguistic, but others must be based on geometric knowledge about the objects, knowledge about typical spatial relationships, and common-sense physics. ISAAC makes these inferences in several steps. A statement that an object ``is'' at a location on another object is interpreted as an attachment. A location relative to a point on an object is assumed to be toward the center of the object. When ISAAC writes physics equations, it finds that the ladder is supported at two points and that the man has a specified weight; it therefore assumes that the ladder supports the man. Finally, in drawing the diagram, ISAAC assumes that a person is supported at the feet.

ISAAC generates a symbolic geometric model for problem solving and a symbolic diagram model for drawing the diagram. For each type of object, ISAAC has a geometric model, including dimensions of a bounding box, names and coordinates of interesting points on the object, a program to draw a picture of the object, the name of a parameter of the object that indicates its size, and a program to estimate the size for the drawing if no size is specified.

The geometry of an individual object within a model is specified by its object geometry, the location of a reference point, the rotation of the object about the reference point, and the vector size of the object. These data are sufficient for calculation of the location of any named point on the object and for drawing it. To make a geometric or diagram model, these data must be determined for each object. The geometric and diagram models are similar, except for the following features:

  1. The geometric model is a ``skeleton'' model, in which most objects are represented by lines or points. The diagram model requires actual sizes and points of attachment for each object.

  2. The geometric model may contain symbolic variables and algebraic expressions in its coordinate values; the desired solution to a problem may be the value of a geometric variable. In the diagram, all coordinates must be numeric. The solution to the physics problem often provides numeric values for variables; when it does not, default values are assigned.

Before the diagram and geometric models are made, the model of the problem is a semantic network containing symbolic descriptions of objects, their properties, and their relationships. The diagram model is constructed from this network in the following way. In most rigid body statics problems, all of the objects are attached to each other; thus, the objects and attachments form a connected graph. A single object is chosen and assigned the coordinates (0 0); the objects that are attached to it are then scaled to the appropriate size, rotated by the appropriate angle, and translated to the point of attachment to the composite model. Of course, these are vector operations on points, rather than manipulation of an image. Objects that are attached to a newly added object are then added to the diagram in the same manner.

This algorithm is sufficient if the attachments of objects form a tree structure, which is the case for most of our example problems. In the case of the ladder problem shown in Figure 1, however, a triangle must be solved. The triangle is detected by an ad hoc program that tests whether some object a is attached to an object b that is attached to c, which is attached to a. If a triangle is detected, the known parameters are abstracted and given to a triangle solver, which returns the complete set of angles and sides of the triangle. The returned parameters must then be translated back to the form needed for the definition of the object.

ISAAC's geometric model is based on analytic geometry in a single planar coordinate system. For more general application, this form of geometric model may be inadequate. Many geometric features of a physics problem may be unspecified in the problem statement. Although it would be possible to make a single, unified geometric model with symbolic values for all unspecified lengths, positions, and angles, to do so would greatly complicate the algebra. It would be better to have multiple locally precise geometries. Connections between local geometries could be topological rather than exact; exact geometries are needed only when relative distances between locations within separate local geometries are required, and such cases are unusual.

Diagram Understanding

It is difficult to describe geometry using natural language. BEATRIX [novak-bulko90,novak-bulko93] understands physics problems specified by English text and a diagram. Often neither text nor diagram is a complete description; a unified model must be produced from both. Coreference must be established between parts of the text and the diagram that refer to the same object or feature. Fig. 2 shows an example understood by BEATRIX.

Two masses are connected by a light string as shown in the figure. The incline and peg are smooth. Find the acceleration of the masses and the tension in the string for theta = 30 degrees and m1 = m2 = 5 kg.

Figure 2: BEATRIX Example.

Two masses are connected by a cable as shown in the figure. The strut is held in position by a cable. The incline is smooth, and the cable passes over a smooth peg. Find the tension in the cable for theta = 30 degrees and m1 = m2 = 20 kg. Neglect the weight of the strut.

Figure 3: BEATRIX Example.

Diagram Input

BEATRIX's user interface allows diagram elements to be selected, moved, scaled, and rotated as desired; it also allows entry of text within the diagram. A symbolic description of the diagram is constructed for input to the understanding program. The diagram input consists of ``neutral'' components such as lines, circles, and rectangles -- input that could be produced from a printed diagram by a machine vision system [ballard-brown].

Many difficulties of understanding natural language are also present with diagrams: ambiguity of meanings of elements, ambiguity of combination of elements, and underspecification. An element such as a line is ambiguous because it might represent an edge of an object, or an object itself (e.g., a cable). Lines may be combined in many ways, only a few of which are meaningful. Diagrams often omit things that can be inferred by the reader: the attachment between a rope and an object that it supports is often represented only by contact. As in speech understanding [hearsay], ambiguity can be reduced by using several kinds of constraints:

  1. An object mentioned in the text is expected to appear in the diagram.

  2. As objects are identified, identifications of other objects are constrained.

  3. Common-sense physics provides constraints: an object is expected to be supported; a rope terminating at an object is probably attached to it.
The diagram can also reduce ambiguity in interpreting the English text.

A person reading a physics problem will alternate attention between the diagram and text. No fixed order of processing suffices for all problems, since a problem might be specified entirely by text, entirely by a diagram, or by some combination. For this reason, BEATRIX performs co-parsing of the two modalities, using the BB1 blackboard system [bb1-manual].

Diagram Parsing

The diagram input consists of points, lines, rectangles, and circles described by analytic geometry. BEATRIX performs low-level analysis of details, e.g. to determine whether a line is approximately tangent to a circle. The diagram is parsed by knowledge sources (KS's) that recognize special combinations of picture elements, as in a picture grammar [ksfu74]. A diagram is inherently ambiguous: it may omit objects or details, exaggerate features, or include descriptive elements that are not objects (e.g., arrows used to show dimensions of objects). BEATRIX opportunistically combines related elements based on expectations of typical combinations; for example, if two lines meet at an acute angle, and there is a variable name that typically denotes an angle (such as theta) inside and near the vertex, then these elements will be grouped as an angle.

Figure 4: theta is part of angle, but N is not.

As parts of the diagram are interpreted, they trigger other KS's. For example, after a small circle with a line to its center has been interpreted as a pulley, a KS is triggered to look for lines tangent to the pulley that represent a rope; the two lines that represent the rope are grouped into a single rope object, with the distal endpoints identified as its ends. This, in turn, triggers additional inferences: the ends of a rope are expected to be attached to objects or surfaces. When a KS can interpret part of the diagram, it obviates (removes from the execution queue) other KS's that might attempt alternative interpretations.

The diagram parsing KS's also trigger expectations for natural language processing. For example, identification of a contact between a mass and a surface triggers an expectation that a normal force and a coefficient of friction may be specified in the English text. Such an expectation is necessary to interpret a definite noun phrase such as ``the coefficient of friction'' if the contact appears only in the diagram.

Diagram parsing continues until no further interpretations can be made. Fig. 5 illustrates the features that are identified by diagram parsing in an example problem; most of the touch relations and some contact relations are omitted for readability.

Figure 5: Interpretation after Diagram Parsing

Establishing Coreference

The understanding module of BEATRIX combines the parsed English text and the parsed diagram, establishing coreference between them to produce a unified model. For example, the text might say ``the coefficient of friction is 0.25'', referring to a contact between a block and an inclined plane shown in the diagram. The friction value must be associated with the contact relation that was derived from the diagram. The KS's of the understanding module also make inferences based on common-sense physics. For example, BEATRIX infers that the rotation of an edge of an object is the same as that of a surface on which it rests, or that an object hanging from a rope hangs directly below it. Contact between an object and a surface is assumed to be a frictional touch contact, while contact between a rope and an object that it supports is assumed to be an attachment. Such inferences are important, since both text and diagrams often omit things that an intelligent reader can infer.

Priority ratings cause KS's with the best input data to execute first. For example, Identify-Masses gives itself a high rating if there is only one mass object it could match. Default KS's are triggered at a low priority to provide default values or to move objects that are mentioned in only one input modality to the unified level. Low-level KS's are triggered by the problem statement and diagram, while the higher-level KS's are triggered by the output of the low-level KS's.

Conclusions about Diagram Understanding

A diagram represents much more information than is shown explicitly. Understanding a diagram is not a passive process of absorbing what is plainly in the diagram, but is an active process of model construction and inference, using the diagram as an outline of the model to be constructed. ISAAC demonstrated a similar finding with English text. Brevity gives diagrams their power but also presents a challenge for diagram understanding by computer. If much of the understanding of a diagram must be inferred from the reader's knowledge, then that knowledge and the procedures to use it must be part of a diagram understanding program. It must be possible to resolve ambiguities to produce the most likely interpretation. Opportunistic identification is based not only on syntactic relationships, but also on world knowledge or common-sense physics (e.g., identification of a square as a mass implies that a line coincident with the bottom of the square must be a surface, not a rope).

Problem Solving by Diagram Connections

A program called VIP (View Interactive Programming) [novak:caia94] allows a user to construct a computer program by making connections between diagrams that represent physical and geometric principles. The user can select physical laws, geometric principles, and physical constants and add them to a workspace. Connections between variable buttons in the diagrams can be made by clicking on each button with the mouse; a connection signifies that the variables are equal.

Figure 6: Calculating the Mass of the Sun

Fig. 6 shows how VIP can be used to calculate the mass of the sun. The initial workspace contains only a default output variable. The user follows Newton's reasoning: the gravitational attraction of the earth by the sun is equal to the force required to keep the earth in its orbit. The user selects a gravitation principle and a centrifugal-force principle from the physics menu and adds them to the window. The user clicks the mouse on the f button of each diagram, which causes a line to be drawn between them and signifies that the forces are equal. The user selects constants for the mass of the earth and the earth-sun distance and connects these to the two diagrams. The output box is connected to the other mass in the gravitation diagram. After these actions, only the velocity v of the earth in its orbit remains unspecified. This can be found by noting that the earth travels around the sun in one year. The user selects a circle diagram from the geometry menu, connects its radius to the earth-sun distance, and divides its circumference by a time constant of one year. This gives a fully specified diagram.

A program is derived from the diagram by data flow. Initially, input variables and constants are assumed to be defined. A variable that is defined is propagated into boxes to which it is connected. When a value is propagated into a box, equations associated with the box are examined to see if any can be solved. Solutions to equations produce the values of other variables, which are also propagated. When a value is propagated into the output, the program is complete. Compilation of this program (in the GLISP language [novak:glisp]) produces an executable program. In this case, the compiler reduces all the equations to the numeric answer (in kilograms): (LAMBDA () 1.9660057E30)

VIP can also be used to construct new physical principles that are combinations of existing ones; for example, the above analysis can be abstracted as an orbital-system principle.

VIP allows problems to be specified by correspondences of features of diagrams. Although equations and algebraic manipulation are involved, they are hidden and are performed automatically. The equations do not have to be memorized. Units of measurement are converted automatically as needed. Subproblems, such as finding the velocity of the earth in its orbit, can be solved using the system itself. Using VIP is clearly faster than doing algebra by hand; however, VIP is much easier to use if the diagrams ``fit'' a given problem than if they do not. For example, consider the problem:

A block rests on a horizontal board. The board is gradually tilted upward and the block just begins to slide down the board when the angle of inclination theta is 21o ... Find the coefficient of static friction us. [schaum]
A diagram by a human problem solver will depict forces so that they can easily be related to the physical situation. In Fig. 7, it is clear that the weight force can be viewed as a normal force and a force acting to move the block down the board.

Figure 7: Friction Diagram

Figure 8: Friction Problem using VIP

This problem can be solved using VIP, as shown in Fig. 8. However, the correspondence between the VIP diagrams and the geometry of the problem is poor. The triangle shown in the VIP diagram is the same triangle shown in Fig. friction, but its orientation does not match the physical situation. It is difficult for the user to determine whether x and y in the triangle diagram should respectively match n and f in the friction model, or vice versa; the user might have to draw a diagram on paper. Simply having diagrams is not enough: if the diagrams do not correspond well to the actual geometry of the problem, then diagrammatic inferences cannot be performed, and the diagrams will be as disconnected from the problem as a symbolic representation would be. Larkin and Simon [larkin-simon:diag10k] note that in humans, a production is easily triggered only if there is a close match between stimulus conditions and its triggering conditions.

VIP would be more useful if its diagrams were more like those drawn by humans. Several improvements can be identified:

  1. The orientation and size of a diagram should be variable so that the diagram can match the problem geometry. VIP should have multiple ways to draw a triangle, or better, an ability to adapt the triangle in size and orientation to parts of an existing diagram.

  2. It should be possibly to overlay diagrams. In the diagram of Fig. sunfig, the three diagrams shown (circle, centrifugal force, and gravitation) all refer to the same physical space. Correspondences are shown as lines between them, but it would be better to overlay these diagrams so that the corresponding parts would be identical.

  3. Human problem solvers often replace variables in equations and on diagrams so that the number of variables used is minimized.

We have used VIP to develop small but realistic scientific programs [novak:caia94]. Abelson et al. [abelson89] envision an automatic engineering assistant; surely such a system should use diagrams to communicate with its user. It would be interesting to try teaching physics problem solving using VIP or a similar system. This would move the focus of problem solving away from algebra and toward conceptualization of the problem by selection and instantiation of physical models. VIP could also be used to investigate the effectiveness of different kinds of diagrams for human problem solvers: experience quickly demonstrated that diagrams are much less useful if they are not isomorphic to the problem geometry.

Another research direction is machine learning of methods for analyzing problems based on correspondences selected using VIP by a physics expert. Learning of the method of application of physical principles could be a useful form of ``chunking'' that would allow future problems of a similar type to be solved automatically as a result of practice [araya].

Uses of Diagrams in Problem Solving

Diagrams play many roles in human problem solving. Larkin and Simon [larkin-simon:diag10k] describe psychological and computational advantages of diagrammatic reasoning for human problem solvers:

  1. Diagrams guide attention from one element to related elements; they reduce search because related elements are usually close together.

  2. Diagrams minimize labeling: information about an element is near it.

  3. Diagrams facilitate perceptual inferences and recognition of problem-solving methods that may be applicable.

  4. Diagrams allow quick checks that the analysis is proceeding correctly.
This section elaborates these and other benefits of diagrams.

Short-term Memory

A central feature of human intelligence is limited short-term memory [miller:magic7]. By writing down intermediate results, a person releases limited short-term memory for other uses. Writing and re-perceiving intermediate results is much faster and more reliable than memorizing them; pencil and paper serve a role analogous to that of a paging disk in a computer[larkin:lmss]. Surely diagrams also play such a role. Because people find it easy to perceive diagrams, a diagram can serve as short-term memory for intermediate geometric results. A human problem solver progressively annotates the diagram with results, making those results available by inspection when needed. Retrieval by inspection is often opportunistic, without prior planning to use the retrieved values. Indeed, one strategy for solving a problem is to perform forward reasoning, deducing geometric results that can be derived easily and adding them to the diagram, until the diagram contains the desired answer.


A mental picture can serve as a ``coordinate system'' or geometric substrate, allowing the remainder of a problem to be described relative to the substrate. For example,

A car leaves point A and drives north for 6 miles ...

A punter located at his 40-yard line kicks a punt at an angle of 45^ to a receiver at the opposite 20-yard line ...

Since the natural language problem statement refers to geometric features of the substrate, a mental model of the substrate is required to understand such a problem.

Inference of Context

A difficulty faced both by humans and by AI systems is understanding an underspecified problem. Physics problems are often underspecified both geometrically and in terms of the physical principles needed for solution. A diagram can help the problem solver to infer the correct context by encouraging elaboration of elements normally associated with the diagram. An underspecified problem that is solved by ISAAC is shown in Fig. 9.

What force is required to lift one end of a pole?

Figure 9: Underspecified Problem

To a person, a drawing of a horizontal pole supported only by a force at one end ``looks wrong''; the exercise of drawing a free body diagram may help a human problem solver to consider all the relevant forces until the set of forces drawn on the body appears to be balanced. In this problem, ISAAC introduces (by symbolic inference) a pivot to support the other end of the pole. Physics problems often omit important geometric facts, e.g. that objects rest on the surface of the earth, or that walls are vertical planes that are bounded below by horizontal floors.

Inference by Recognition

Larkin and Simon [larkin-simon:diag10k] describe ``perceptual'' inferences as a major advantage of the use of diagrams. While such inferences (e.g. the fact that vertical angles formed by intersecting lines are equal) can be made symbolically, they can be made at almost no cost by perception. [larkin-simon:diag10k] describes perceptual inferences that are identical to symbolic inferences that can be made formally. While perceptual inferences may suggest subproblems to be treated formally (e.g., the perception that vertical angles appear to be equal may trigger the memory that this is indeed a theorem in geometry), humans often make perceptual inferences without proof or even much thought. For example, in the problem of Fig. example1 the problem solver will make the assumption that the string is parallel to the inclined plane; this is unstated and thus cannot be proved.

A skilled problem solver deliberately constructs diagrams that facilitate inference by recognition. In the problem of Fig. 10, a skilled problem solver will draw the figure so that the angle is clearly less than 45^ ; this will increase the size contrast between angle and angle ABC, facilitating recognition that the angle ' is the same as . While this can be proved, the problem solver will probably assume that angles that appear to be equal are in fact equal.

Figure 10: Analytic Geometry Problem

It appears that perceptual inferences are important in other domains, even when diagrams are not used. For example, a person skilled in performing mental arithmetic can perform the mental calculation:

4   /   .97   ~=   4.12
by recognizing this problem as an instance of the pattern:
1   /   (1 - epsilon)   ~=   (1 + epsilon),   where epsilon is small.
The recognition that .97 is ``almost 1'' is a perceptual inference that must be made in order to trigger a production rule for this pattern. There is evidence that experts can make large numbers of such perceptual inferences, which may be an important component of their expertise. For example, Feynman [feynman:joking] boasted (falsely) that he could compute exponentials in his head; he confounded his friends' attempts to expose his deception because he was able to recognize so many special cases that he could do every example they presented to him.
If somebody comes along and wants to divide 1 by 1.73, you can tell them immediately that it's .577, because you notice that 1.73 is nearly the square root of 3, so 1 / 1.73 must be one-third of the square root of 3. [feynman:joking] [emphasis added.]

People seem to be able to recognize at least the following relationships from diagrams:

  1. Parallel or perpendicular lines.

  2. Relative positions of objects (e.g. above, below, left, right).

  3. Objects that are similar under translation, scaling and/or rotation.

  4. Approximate equivalence of lengths, sizes, or angles.

  5. Relative sizes (smaller/larger) of lines or angles.

  6. Proportionality, especially division in half, of lines or angles.

Abelson et al. [abelson89] describe the use of machine vision algorithms to recognize partitions of phase space in simulations of dynamical systems. Because such a simulation produces point values rather than trajectories, partitions cannot be derived directly. However, given a large number of points, the lines can be recognized by machine vision algorithms. This is especially interesting as a case where even a computer needs a ``mind's eye'' to recognize the qualitative structure of a problem.

Diagrammatic Operators

Some inference rules seem almost to be ``plastic overlays'' that can be moved into position and added to a diagram. The right-hand rule of electromagnetic fields often is invoked with actual movement of the hand. The rule that ``sine = opposite / hypotenuse'' can be thought of as a diagrammatic operator (Fig. 11) that can be mentally moved into position and then used to add inferences directly to a diagram.

Figure 11: Sine Rule Overlay

An advantage of such diagrammatic operators is that they can be used locally by making simple mental transformations such as translation, rotation, and reflection to make the diagrammatic operator match the existing diagram. Intermediate results that are written on the diagram become available for subsequent use. For example, in the problem of Fig. precalc, the sine rule can be applied to the large triangle to find that BC = sin(alpha); this value can then be used with a cosine rule for the smaller triangle to find CD = sin(alpha) * cos(alpha).

Relating Actual Situation to Canonical Model

We have proposed [kook:diss,kook-novak-tkde] that the analysis of a physics problem should be represented not just as a set of equations, but as sets of correspondences between problem features and physical models. Solving a physics problem is not simply a matter of logical deduction (in which necessarily true results are derived from given premises), but a constructive process in which the given facts are elaborated by additional assumptions and physical models. In some problems, a single object will have multiple views as parts of different physical models. When represented symbolically, the correspondence sets become large and complex; a diagram can serve as a compact representation of such correspondences. Larkin and Simon [larkin-simon:diag10k] note minimizing labeling as an advantage of diagrams. Human problem solvers also strive to minimize the number of variables used in equations. By transferring variable names from one part of a diagram to another, the same variable name can play a role in multiple physical models. A diagram may thus represent an overlaying of diagrams for physical models and actual objects.

Diagrams are often included with statements of physical laws [gieck]; they presumably facilitate retrieval of the appropriate formulas from memory when a similar problem diagram is seen. In addition, the diagram facilitates matching between problem features and corresponding features of the physical model because the corresponding features appear in similar locations in each diagram. Consider the problem:

Given the gravitational constant G and the known facts about the orbit of the earth, calculate the mass of the sun.

Figure 12: Centrifugal Force Law and Planet Problem

In Fig. 12, the diagram on the left is as shown in [gieck], while the diagram on the right is drawn to correspond to it. These diagrams immediately suggest that the sun corresponds to the center of the circle, the earth to the mass (suggesting that the earth be ``coerced'' to a point mass), the radius r to the earth-sun distance d , and the velocity v to the velocity of the earth (which then becomes a subproblem).

Larkin and Simon [larkin:cogsci] proposed the representation of problems and of physical situations as directed graphs and the use of graph-matching algorithms to find and instantiate appropriate physical models. This may be difficult, both because graph matching is computationally intractable and because missing or extra nodes prevent graphs from matching. Diagram matching may be more useful because diagrams that represent physical principles can be indexed by major features such as circular motion, which are likely to have only a few matches in a given problem. A match between a diagram and a given problem need not be exact: extra elements in the problem do not matter, and missing elements can be ignored (if not used) or taken as subproblems.

Making Predictions

Skilled problem solvers often use gedanken (thought) experiments involving actual or imagined diagrams to determine:

  1. the direction of change in a system,

  2. equilibrium points, bounding points or extrema,

  3. connectivity, by tracing connecting paths on the diagram,

  4. how a change in one quantity will affect another, and in what direction.
An excellent example is a method for determining whether a structural member of a bridge is under tension or compression: imagine the bridge collapsing with the member removed. If the member would become shorter in the collapse (Fig. 13), it must be under compression.

Figure 13: Removal of Bridge Member

Concluding Discussion

The preceding sections have described uses of diagrams in programs that solve physical problems, as well as uses of diagrams by humans. The power that diagrams give to human problem solving motivates consideration of how similar uses of diagrams could be incorporated into computer programs. The difficulty of machine perception of diagrams suggests that it would be unprofitable to try to duplicate human diagram processing directly. However, machine processing at a ``sketch'' level above the level of direct perception may be reasonable.

A set of basic perceptual operators, analogous to those that people use with diagrams but implemented above the pixel level of an actual diagram, might be implemented to take advantage of the strengths of the computer. A representation of geometric features such as lines, points, and circles by means of analytic geometry seems most appropriate for computer processing. Such a representation should be sufficiently accurate to determine such features as a line terminating at another line, a line tangent to a circle, parallel lines, etc.

Geometric features should be connected, bilaterally, with problem features that are represented symbolically. Sometimes geometric features represent objects, but in other cases they represent relationships (such as the earth-sun distance) or variable values. It must be possible to post values to the diagram representation; in this way, the diagram can serve the short-term memory function and allow opportunistic use of intermediate results that are ``read'' back from the diagram. The propagation of results by VIP is an example of posting results to a diagrammatic model.

It should be possible to group geometric objects into larger units; for example, in the bridge problem of Fig. bridge, two triangles formed from bridge members are treated as rigid bodies in visualizing how the bridge would collapse. The VIP model of the earth-sun system in Fig. sunfig shows that aspects of the geometry of the system are used in several separate models. These separate models are needed for the analysis; however, it would be better to have only a single diagram that unifies all the models rather than three diagrams with connections between them.

A library of geometric models is essential if minimally specified problems are to be understood. The statement ``a ladder leans against a wall'' implies the existence of a floor that supports the bottom of the ladder. It is reasonable to assume that a prototypical representation of the spatial relationships of a ladder, wall, and floor is stored; textbook problems show that a reader is assumed to have such knowledge.

Perceptual operators (e.g. detection of parallel lines) can operate at the analytic geometry level as special-purpose programs distinct from production rules or other symbolic analysis. ``Noticing'' these features can be done rapidly by special-purpose programs that perform only this function. Such noticing is a signal-to-symbol transformation [nii:hasp] that converts analog values into symbolic values that can trigger productions. When Feynman noticed that 1.73 is almost the square root of 3, this triggered a production for problems involving a square root; 1.73 is an analog or ``signal'' value, while the concept of ``square root'' is symbolic. Noticing can direct attention to inferences based on observed relationships. For example, BEATRIX notices that two lines are tangent to a circle and infers the existence of a pulley system. Some things that are noticed can be assumed to be true, while others can trigger an attempt to prove what was noticed by more rigorous methods.

Perceptual inference also includes relating of similar models. In relating the earth-sun system to a circle, there are correspondences between the location of the sun and the center of the circle, between the earth-sun distance and the radius of the circle, etc. A stored relationship between a physical principle and a diagram could be used to relate corresponding parts of two situations that have similar diagrams. In this way, the diagrammatic representation becomes the basis for expressing the isomorphism between a problem situation and its physical model.

We have described uses of diagrams in programs that solve physics problems and have considered ways in which diagrams are used by humans. By implementing perceptual operations at a level below the operation of symbolic reasoning and by making use of correspondences between diagrams, it may be possible to gain the advantages that humans derive from diagrams for computer problem-solving systems.


This research was supported in part by the U.S. Army Research Office under contract DAAG29-84-K-0060. Computer equipment used in this research was donated by Hewlett Packard and Xerox Corporation.


[abelson89] Abelson, H., et al., ``Intelligence in Scientific Computing'', Communications of the ACM, vol. 32, no. 5 (May 1989), pp. 546-562.

[araya] Araya, A., ``Learning by Practice using Experimentation and Generalization Techniques'', Ph.D. dissertation, Univ. of Texas at Austin, Dec. 1984.

[ballard-brown] Ballard, D. H. and Brown, C. M., Computer Vision, Prentice-Hall, 1982.

[bulko:diss] Bulko, W., Understanding Coreference in a System for Solving Physics Word Problems, Ph.D. dissertation, Tech. Report AI-89-102, A.I. Lab, CS Dept., Univ. of Texas at Austin, 1989.

[bb1-manual] Garvey, A., Hewett, M., Schulman, R., and Hayes-Roth, Barbara, ``BB1 User Manual -- Interlisp Version'', working paper KSL 86-60, Knowledge Systems Lab, Stanford Univ., 1986.

[chi] Chi, M., Feltovich, P., and Glaser, R., ``Categorization and Representation of Physics Problems by Experts and Novices'', Cognitive Science, vol. 5, no. 2 (April 1981), pp. 121-152.

[feynman:joking] Feynman, R. P., Surely You're Joking, Mr. Feynman!, New York: Norton, 1985.

[formulae] Over 1000 Physics Formulae, New York: Kampmann & Co, 1984.

[ksfu74] Fu, K. S., Syntactic Methods in Pattern Recognition, Academic Press, 1974.

[gieck] Gieck, K., Engineering Formulas, 5th ed., McGraw-Hill, 1986.

[hearsay] Erman, L. D., et al., ``The Hearsay-II Speech-Understanding System: Integrating Knowledge to Resolve Uncertainty'', ACM Computing Surveys, vol 12, no. 2 (June 1980), pp. 213-253.

[kook:diss] Kook, Hyung Joon, A Model-Based Representational Framework for Expert Physics Problem Solving, Ph.D. dissertation, Tech. Report AI-89-103, A.I. Lab, C.S. Dept., Univ. of Texas at Austin, 1989.

[kook-novak-tkde] Kook, Hyung Joon and Novak, G., ``Representation of Models for Expert Problem Solving in Physics, IEEE Trans. on Knowledge and Data Engineering, 3:1, pp. 48-54, March 1991.

[larkin:cogsci] Larkin, J. and Simon, H. A., ``Learning through Growth of Skill in Mental Modeling'', Proc. Cognitive Science Society, 1981; also in [simon:mot2].

[larkin:lmss] Larkin, J., J. McDermott, D. Simon and H. A. Simon. ``Expert and Novice Performance in Solving Physics Problems'', Science, 208 (20 June 1980), pp. 1335-1342.

[larkin-simon:diag10k] Larkin, J. and Simon, H. A., ``Why a Diagram is (Sometimes) Worth 10,000 Words'', Cognitive Science, 11:65-99, 1987; also in [simon:mot2].

[miller] Miller, F., Progressive Problems in Physics, Boston: D.C. Heath, 1949.

[miller:magic7] Miller, G. A., ``The Magical Number Seven, Plus or Minus Two'', Psychological Review, 63:81-97, 1956.

[nii:hasp] Nii, H., E. Feigenbaum, J. Anton, and A. Rockmore, ``Signal-to-Symbol Transformation: HASP/SIAP Case Study,'' AI Magazine, 3:2, Spring 1982, pp. 23-35.

[novak:glisp] Novak, G., ``GLISP: A LISP-Based Programming System With Data Abstraction'', A.I. Magazine, vol. 4, no. 3 (Fall 1983), pp. 37-47.

[novak:isaac] Novak, G., ``Computer Understanding of Physics Problems Stated in Natural Language'', Am. J. Computational Linguistics, Microfiche 53, 1976.

[novak:ijcai77] Novak, G., ``Representations of Knowledge in a Program for Solving Physics Problems'', IJCAI, 1977, pp. 286-291.

[novak-bulko90] Novak, G. and Bulko, W., ``Understanding Natural Language with Diagrams'', Proc. Eighth National Conference on Artificial Intelligence (AAAI-90), 1990, pp. 465-470.

[novak-bulko93] Novak, G. and Bulko, W., ``Diagrams and Text as Computer Input'', Journal of Visual Languages and Computing (1993) 4, 161-175.

[novak:caia94] Novak, G., ``Generating Programs from Connections of Physical Models'', Proc. 10th Conf. on Artificial Intelligence for Applications (CAIA-94), March 1994, pp. 224-230 (IEEE Computer Society Press).

[simon:mot2] Simon, H. A., Models of Thought, vol. 2, Yale Univ. Press, 1989.

[schaum] van der Merwe, C. W., Schaum's Outline of Theory and Problems of College Physics, McGraw-Hill, 1961.