An Introduction to Scheme and its Implementation

Go to the first, previous, next, last section, table of contents.

Making Some Objects (Hunk D)

==================================================================
Hunk D starts here:
==================================================================

I've been talking about "objects," but most of the objects we've seen don't have interesting structure.

One of the most important kinds object in Scheme is the pair, which you can create with the built-in procedure cons. A pair is a simple kind of structured object, like a Pascal record or a C struct. It has two fields, called the car and the cdr, and you can extract their values with the standard procedures car and cdr.

cons takes two arguments, which it uses as the initial values of the car and cdr fields of the pair it creates. (cons is called that because it constructs a pair; the name is short because it's a common operation. In Lisp, pairs are called "cons cells" because you make them with cons.)

I'll show you some simple examples of playing with pairs, just to show you what they are. Be warned that these are bad examples, in that there are usually cleaner ways to do things, which we'll discuss later when we get to lists. (Lists are made of pairs.)

Scheme>(cons 1 2)
(1 . 2)

What happened here was that the call to cons created a pair, and returned (a pointer to) it. Scheme printed out a textual representation of the pair, showing the values of its car and cdr fields, separated by a dot and enclosed in parentheses.

(The printed representation looks sort of like a Scheme expression, because of the parentheses, but it's not--it's just the way Scheme prints this kind of data structure. We're looking at the value returned by the expression (cons 1 2). Don't be confused by the similarity between written Scheme expressions and the textual representation of data structures--they're very different things.)

We didn't do anything with the pair except let Scheme print it, so we've lost it--we didn't save a pointer to it, so we can't refer to it. (The garbage collector will take back its space, so we don't have to worry that we've lost storage space.)

Let's try again, defining (and binding) a variable, and initializing it with the pointer that cons returns.

Scheme>(define my-pair (cons 1 2))
#void

Scheme>my-pair
(1 . 2)

Now try extracting the values of the pair's fields, using car and cdr.

(In Scheme, (car foo) is equivalent to C's foo->car, dereferencing a pointer to an object and extracting the value of the car field. Likewise, (cdr foo) is like foo->cdr. The operators that access fields of a pair are just procedures.)

Scheme>(car my-pair)
1

Scheme>(cdr my-pair)
2

We don't need to use any special pointer syntax to dereference the pointer to the pair---car and cdr expect a pointer, and return the field values of the pair it points to.

car and cdr only work on pairs. If you try to take the car or cdr of anything else, you'll get a runtime type error.

Try it:

Scheme>(car #t)
ERROR: attempt to take the car of a non-pair #t
break>,top
Scheme>

The messages you'll see vary from system to system, but the basic idea is the same. We tried to take the car of the boolean #f, which makes no sense because it has no car field--it doesn't have any fields. Scheme told us it didn't work, and gave us a break prompt for sorting it out. Then we just used the ,top command (or whatever works on your system) to tell Scheme to give up on evaluating that expression and go back to normal interaction.

car and cdr don't work on the empty list. The empty list doesn't have car and cdr fields. (This may be surprising to Lisp programmers, who expect the empty list to behave like Lisp's nil. It doesn't, in this respect.)

Scheme also supplies procedures to change the values of a pair's fields, called set-car! and set-cdr!. They take two arguments, a pair and a value for the field being set.

Scheme>(set-car! my-pair 4)
#void

Scheme>my-pair
(4 . 2)

Scheme>(set-cdr! my-pair 5)
#void

Scheme>my-pair
(4 . 5)

The value of the variable my-pair hasn't actually changed, even though it prints differently. my-pair still holds a pointer to the same object, the pair we created with cons. What has changed is the contents of that object. Its fields are like variable bindings, in that they can hold (pointers to) any kind of object, and we've assigned new values to them. (They're value cells.)

We can refer to the same object by another name if we just define another variable and initialize it with my-pair's value.

Scheme> (define same-pair my-pair)
#void

Scheme>same-pair
(4 . 5)

Now suppose we assign a new value to the car of the pair, referring to it via my-pair

Scheme>(set-car! my-pair 6)
#void

Scheme>my-pair
(6 . 5)

Scheme>same-pair
(6 . 5)

Notice that the change is visible through same-pair as well as my-pair, because we've changed the object that both of them point to.

Now let's make another pair with the same field values.

Scheme>(define different-pair (cons 6 5))
different-pair

Scheme>different-pair
(6 . 5)

Scheme>my-pair
(6 . 5)

Scheme>same-pair
(6 . 5)

Notice that we have two different pairs, but Scheme prints them out the same way, because it just shows us the structure of data structures. We can't tell that they're different just by looking at the printed output. From the printed representation, we can't tell whether or not my-pair, same-pair, and different-pair hold the same values.

Scheme provides a predicate procedure, eq?, to tell whether two objects are the exact same object.

Scheme>(eq? my-pair same-pair)
#t
Scheme>(eq? my-pair different-pair)
#f
Scheme>(eq? same-pair different-pair)
#f

eq? tests object identity, like pointer comparisons in C (using ==) or Pascal (using =).

It may be confusing, but in programming language terminology, two objects are called identical only if they are the very same object, not just two objects that look alike, like "identical" twins. When the government issues "identity" cards, this is the kind of "identity" we're talking about. Two so-called identical twins have different identities, because they're actually different people. A pointer is like a a social security number, because it uniquely identifies a particular individual object.

Scheme also has a test to see whether objects "look the same," that is, have the same structure. It's called equal?. We call this a structural equivalence test.

Scheme>(equal? my-pair same-pair)
#t
Scheme>(equal? my-pair different-pair)
#t
Scheme>(equal? same-pair different-pair)
#t

different-pair is equal? to my-pair and same-pair because it refers to the same kind of object, and its field values are equal?. Notice that that's a recursive definition, which we'll discuss more when we get to lists.

If we didn't have eq?, we could still figure out whether two objects were exactly the same object, by changing one and seeing if the other changed, too.

Scheme>(set-car! my-pair 4)
#void

Scheme>my-pair
(4 . 5)

Scheme>same-pair
(4 . 5)

Scheme>different-pair
(6 . 5)

Now I should warn you about set-car! and set-cdr!. The reason we put an exclamation point in the name of a procedure that side-effects data is because it's dangerous. If you have two pointers to the same data from different places, i.e., different variable bindings or data structures, it's hard to reason about how changes from one of those places affect things at the other place.

In normal Scheme programming style, it is very common to create new data structures that have pointers to other data structures, or parts of data structures. If you modify a shared part of one data structure, it will affect the other data structure that shares that part. This can be very confusing, and leads to subtle bugs.

You should only use side effects when you have a very good reason to, and make it clear that that's what you're doing. Later examples will show how to program in a style that uses very few side effects, and only where they make sense.

Notice that cons is not considered a side-effecting operation, because it returns a new object that has never been seen before. Somewhere in the implementation of the language, cons side-effects memory to initialize it, but you don't see that--from your program's point of view, you're getting a new piece of memory that magically has values in place.

Creating a pair doesn't modify any data structure that already exists, so the installation of its initial values is not considered a side-effect.

==================================================================
This is the end of Hunk D.

At this point, you should go back to the previous chapter and
read Hunk E before returning here and continuing this tutorial.
==================================================================

(Go BACK to read Hunk E, which starts at section The Empty List (Hunk E).)

Go to the first, previous, next, last section, table of contents.