Chapter 32 Scanning Visual Mental Images: The First Phase of the Debate Stephen Kosslyn
The modem debate about mental imagery has gone through two distinct phases. The first began in 1973, with the publication of Pylyshyn's paper "What the Mind's Eye Tells the Mind's Brain: A Critique of Mental Imagery" and Anderson and Bower's book Human Associative Memory.
Pylyshyn's critique of mental imagery focused on arguments that the very idea of imagery was paradoxical (Who looks at the images?) or muddled (In what ways are images like pictures? Why can't you see the number of stripes on an imaged tiger?). The thrust of the critique of imagery was that a depictive representation does not occur in the brain when we experience mental images; instead, propositional representations are used for all forms of cognition—including imagery. The depictive features of images that are evident to introspection were thus taken to be "epiphenomenal": these features have nothing to do with the representation used to perform the task, just as the lights flashing on the outside of a mainframe computer have nothing to do with carrying out the internal processing (the lights could be removed and it would keep working just as well).By their very nature, depictions embody space (recall that "distance" is an intrinsic part of the representation). Thus, if depictive representations underlie the experience of "having an image," then the spatial nature of the representation should affect how images are processed. On the other hand, if the underlying representation is propositional, we have no reason to expect distance to affect processing times (given that the description of an object's appearance would be stored in a list or network of some kind, just as in language).
Different Mechanisms? The First Phase of the Debate
In this section we will consider a series of experiments that were carried out largely by my colleagues and me; these experiments represent a kind of "case study," illustrating how one can make abstract ideas concrete and how one can grasp a conceptual issue by the horns, so to speak.
We reasoned that one way to discover whether image representations embody space is to see whether it takes more time to shift attention greater distances across an imaged object. If subjects take more time to scan a long distance across an imaged object than to scan a short distance, we would have evidence that distance was indeed embodied in the representation of the object.
The first experiment began by asking subjects to memorize a set of drawings (Kosslyn 1973). Half of these drawings were vertical and half were horizontal, as illustrated in figure 32.1. After the subjects had memorized the drawings, they closed their eyes, heard the name of one (say, "speedboat"), and visualized it. Once it was
Figure 32.1
Examples of the drawings used by Kosslyn (1973) to study image scanning
REAR DECK (behind) CANN (behind) FRONT DECK
PROPELLER HANDLE
Figure 32.2
A propositional representation of the drawing of a speedboat illustrated in figure 32.1. The greater the distance between two parts on the drawing, the larger the number of links between them in the network.
imaged, the subjects were asked to mentally focus ("stare" with the "mind's eye") at one end of the object in the image. Then the name of a possible component of the object (say, "motor") was presented on tape. On half the trials the name labeled part of the drawing, and on the other half it did not. The subjects were asked to 'look for" the named component on the image object.
An important aspect of this experiment was that the probed parts were either at one end or the other of a drawing or in the middle. The subjects were told that we were interested in how long it took to "see" a feature on an imaged object (the word scan was never mentioned in the instructions), and they pressed the "true" button only after "seeing" the named component and the "false" button only after "looking" but failing to find it.
We reasoned that if image representations depict information, then it ought to take more time to locate the representations of parts located farther from the point of focus. And in fact this is exactly what occurred.At first glance, the results from this experiment seemed to show that depictive representations are used in imagery. But it soon became clear that a propositional explanation could easily be formulated. Bobrow (personal communication) suggested that the visual appearance of an object is stored in a propositional structure like that illustrated in figure 32.2. This representation is a series of linked hierarchies of propositions, with each hierarchy describing a part of the object. Note that we could rewrite the propositions illustrated here as BOTTOM-OF (PROPELLER, MOTOR), REAR-OF (MOTOR, REAR DECK), and so on. That is, each link is a relation that combines the symbols at the connected nodes into a proposition.
According to Bobrow's theory, people automatically (and unconsciously) construct these sorts of propositional descriptions when asked to memorize the appearance of drawings. When the subjects were asked to focus on one end of the drawing, they would then activate one part of the representation (for instance, for speedboat, the node for motor). When subsequently asked about a part, they then searched the network for its name. The more links they had to traverse through the network before locating the name, the more time it took to respond. For example, for speedboat it took more time to find "anchor" than "porthole" after having been focused on the motor because four links had to be traversed from motor to anchor but only three from motor to porthole. Thus, the effect of "distance" on scanning time may have nothing to do with distance being embodied in an underlying depictive representation but may instead simply reflect the organization of a propositional network (see also Lea 1975). The conscious experience of scanning a pictorial mental image may somehow be produced by processing this network, and the depictive aspects of images open to introspection may simply be epiphenomenal.
It should now be clear why it was necessary to go into so much izing the differences between the types of representations: we need a reasonably characterization of the two representations if we are to perform experiments to discrim- between them. According to our characterization, although propositional structures can be formulated to capture the spatial arrangement of the drawings, they are not depictions. Recall that in depictions, in contrast to this sort of propositional representation, the shape of empty space is represented as clearly as the shape of filled space and there is no explicit representation of relations (such as REAR-OF).
The next experiment was designed to eliminate the problem with the first one. In this experiment we independently varied the distance scanned across and the number of items scanned over. The results of this experiment were straightforward: both distance and amount of material scanned over affected the reaction times. Time increased linearly with increasing distance scanned over, even when the amount of material scanned over was kept constant (for details, see Kosslyn, Ball, and Reiser 1978), as expected if images depict.
The notion of depiction leads us to expect that image representations embody distance in at least two dimensions. To test this idea, we asked subjects to memorize the map illustrated in figure 32.3. On this map were seven objects, which could be related by twos to form 21 pairs. The subjects learned to draw the locations of each of the seven objects on the map. These objects were positioned in such a way that the members of each of the 21 pairs were a different distance apart.
As is evident in figure 32.4, time to scan the image increased linearly with increasing distance scanned across. This result is exactly as predicted by the idea that image But it is possible to create a propositional counterexplanation even here. Now the network contains "dummy nodes" that mark off dis- That is, these nodes convey no information other than the fact that an increment of distance (say, 5 centimeters) exists between one object and another; hence, there would be more nodes between nodes representing parts separated by greater distances on the map.
By putting enough dummy nodes into a network, the propositional theory developed for the original results can be extended to these results as well.To attempt to rule out this propositional counterexplanation, we conducted a control experiment, which involved a variation on the map-scanning task. In this experiment subjects again imaged the map and focused their attention on a particular point, but
Figure 323
A map that was memorized and later imaged and scanned. The seven objects were placed in such a way Hiat the members of each of the 21 pain were a different distance apart.
Figure 32.4
The time to scan between pain of objects on an image of the map illustrated in figure 323
Scanning Visual Mental Images: The First Phase of the Debate 227 now they were told simply to decide as quickly as possible whether the probe named an object on the map. If the propositional theory is correct, we reasoned, then we should find effects of distance here too; after all, we asked the subjects to form the image (which corresponds to accessing the appropriate network). However, there were absolutely no effects of the distance from the focus to target objects on response times.
In other experiments we varied the size of the imaged objects being scanned, asking subjects to adjust the size of an object in the image after they memorized it. Not only did time increase with the distance scanned, but more time was required to scan across larger images. The finding of effects of size on scanning time allows us to eliminate yet another nondepictive explanation for the effects of distance on response times. One could argue that the closer two parts are on an object or drawing, the more likely it is that they will be grouped into a single perceptual "chunk" and stored as a single unit, and hence the easier it will later be to look up two parts in succession. Because the size of the image was not manipulated until after the actual drawing was removed, this explanation cannot account for the effects of size on scanning time.