Qiapter 33 Tacit Knowledge and zzMental Scanning" Zenon W. Pylyshyn
The Empirical Phenomena: Mental Scanning
In the following I examine some specific claims made about the phenomenon of reasoning with the aid of images. Since the study of mental imagery came back into fashion in the 1960s, hundreds of studies have been published, purporting to show that theories of imagery must make allowances for some fairly special properties, properties not shared by other modes of reasoning.
Beginning in the 1970s, these studies have concentrated on the role of imagery in reasoning and problem solving rather than on imagery as a form of memory or imagery as an intervening variable in experiments on learning.Among the best-known research on imaginal reasoning is that of Roger Shepard and his students (Shepard 1978, Shepard and Cooper 1982) and Steve Kosslyn and his associates. Kosslyn's work has been extensively reported—in numerous papers, in a summary in a review paper by Kosslyn et al. (1979), and in a book (Kosslyn 1980). Because Kosslyn, having developed a detailed computer model of imagery, takes a more theoretical approach than most writers, and because his work is among the most influential of the "pictorialists"—to use Block's (1981) term—most of what follows is directed specifically at claims made by Kosslyn. My intention, however, is not to single out this one piece of research; everything I say applies equally to those "pictorialists" who feel that a special form of representation (often called an analogue medium) is needed to account for various experimental results in imaginal reasoning. It is just that Kosslyn's productivity and the explicitness of his claims make him an excellent spokesman for that approach.
The finding that became the basis for much of Kosslyn's theorizing is the "mental scanning result," used not only to argue that "images preserve distances"1 and that they "depict information in a spatial medium" but also as a way to calibrate "imaginal distance" for such purposes as measuring the visual angle of the "mind's eye" (Kosslyn 1978).
Kosslyn's work has also been dted by Attneave (1974) as one of two results that most clearly demonstrate the analogue nature of the representational medium (the other is the "mental rotation" result that will be mentioned here only in passing). Hence, it seems a good place to start.The scanning experiment (for example, Kosslyn, Ball, and Reiser 1978) has been done many times, so there are quite a few variants. Here is a typical one. Subjects are asked to memorize a simple map of a fictitious island containing about seven visually distinct places (a beach, church, lighthouse, bridge, and so on), until they can reproduce the map to within a specified tolerance. The subjects are then asked to image the map "in their mind's eye" and focus their attention on one of the places, for example, the church. Then they are told the name of a second place, which might or might not be on the map. They are asked to imagine a spot moving from the first to the second place named (or, in some variants, to "move their attention" to the second place). When the subjects can clearly see the second place on their image, they are to press a "yes" button, or, if the named place is not on the map, the "no" button. The latter condition is usually there only as a foil. The result, which is quite robust when the experiment is conducted over hundreds of trials, shows that the time it takes to make the decision is a linear function of the distance traversed on the map. Because, to go from one point on an imagined map to a second point the mind's eye apparently scans through intermediate points, theorists conclude that all the intermediate points are on the mental map; hence, the representation is said to be a form of analogue.
This description, though accurate, does justice neither to the range of experiments carried out nor to the extremely intricate, detailed, highly interconnected model used to explain these and other results. I do not take the reader through the details of the model (they are summarized in the review papers and the book already cited), principally because 1 do not believe the details matter, neither for the main point of my criticism nor for what makes the model attractive to "pictorialists." The important point is that the explanation of the "scanning result" is to be found in the intrinsic properties of the representational medium rather than the tadt knowledge subjects have of the situation they are imagining.
Therefore, it is an instance of explidtly positing a property of the functional architecture to account for a generalization.Some Preliminary Considerations
In examining what occurs in studies such as the scanning experiment and those discussed in Kosslyn (1980), it is crudal that we note the difference between the following two tasks:
la. Solve a problem by using a certain prescribed form of representation or a certain medium or mechanism; and
lb. Attempt to re-create as accurately as possible the sequence of perceptual events that would occur if you actually observed a certain real event happening.
The reason this difference is crudal is that substantially different criteria of success apply in the two cases. For example, solving a problem by using a certain representational format does not necessarily entail that various inddental properties of a known situation be considered, let alone simulated. On the other hand, this is predsely what is required of someone solving task lb. Here, failure to duplicate such conditions as the speed at which an event occurs constitutes failure to perform the task corredly. Take the case of imagining. The task of imagining something is the case, or considering an imagined situation in order to answer questions about it, does not entail (as part of the Spedfication of the task itself) that it take a particular length of time. On the other hand, the task of imagining that an event actually happens before your eyes does entail, for a successful realization of this task, consideration of as many characteristics of the event as possible, even if they are irrelevant to the discrimination task itself, as well as entailing that you attempt to place them in the corred time relationships.
In discussing how he imaged his music, Mozart claimed: "Nor do I hear in my imagination, the parts successively, but I hear them, as it were, all at once...." (See Mozart's letter, reproduced in Ghiselin 1952, p. 45.) Mozart felt that he could "hear a whole symphony" in his imagination "all at once" and apprehend its structure and beauty.
He must have had in mind a task best described in terms of task la. Even the word hear, taken in the sense of having an auditorylike imaginal experience, need entail nothing about the duration of the experience. We can be reasonably certain Mozart did not intend the sense of imagining implied in task lb, simply because, if what he claimedTadt Knowledge and "Mental Scanning" 231 to be doing was that he imagined witnessing the real event of, say, sitting in the Odeon Conservatoire in Munich and hearing his Symphony Number 40 in G Minor being played with impeccable precision by the resident orchestra under the veteran Kapellmeister, and if he imagined that it was actually happening before him in real time and in complete detail—including the most minute flourishes of the horns and the trills of the flute and oboe, all in the correct temporal relations and durations—he would have taken nearly 22 minutes for the task. If he had taken less time, it would signify only that Mozart had not been doing exactly what he said he was doing; that is, he would not have been imagining that he witnessed the actual event in which every note was being played at its proper duration—or we might conclude that what he had, in fact, been imagining was not a good performance of his symphony. In other words, if it takes n seconds to witness a certain event, then an accurate mental simulation of the act of witnessing the same event should also take n seconds, simply because, how well the latter task is performed, by deflnition, depends on the accuracy with which it mimics various properties of the former task. On the other hand, the same need not apply merely to the act of imagining that the event has a certain set of properties, that is, imagining a situation to be the case but without the additional requirements specified in the lb version of the task. These are not empirical assertions about how people imagine and think; they are merely claims about the existence of two distinct, natural interpretations of the specification of a certain task.
In applying this to the case of mental scanning, we must be careful to distinguish between the following two tasks, which subjects might set themselves:
2a. Using a mental image, and focusing your attention on a certain object in the image, decide as quickly as possible whether a second named object is present elsewhere in that image; or
2b. Imagine yourself in a certain real situation in which you are viewing a certain scene and are focusing directly on a particular object in that scene. Now imagine that you are looking for (scanning toward, glancing up at, seeing a speck moving across the scene toward) a second named object in the scene. When you succeed in imagining yourself finding (and seeing) the object (or when you see the speck arrive at the object), press the button.
The relevant differences between 2a and 2b should be obvious. As in the preceding examples, the criteria of successful completion of the task are different in the two cases. In particular, task 2b includes, as part of its Spedficatioa such requirements as, subjects should attempt to imagine various intermediate states (corresponding to those they believe would be passed through in actually carrying out the corresponding real task), and that they spend more time visualizing those episodes they believe (or infer) would take more time in the corresponding, real task (perhaps because they recall how long it once took, or because they have some basis for predicting how long it would take). Clearly, the latter conditions are not part of the Spedfication of task 2a, as there is nothing about task 2a which requires that such inddental features of the visual task be considered in answering the question. In the words of Newell and Simon (1972), the two tasks have quite different "task demands."
To demonstrate that subjects actually carry out task 2b in the various studies reported by Kosslyn (and, therefore, that the proper explanation of the findings should appeal to subjeds' tadt knowledge of the depicted situation rather than to properties of their imaginal medium), I shall attempt to establish several independent points.
First, it is independently plausible that the methods used in experiments reported in the literature should be inviting subjects to carry out task 2b rather than task 2a, and that, in fact, thisexplanation has considerable generality and can account for a variety of imaginal phenomena. Second, independent experimental evidence exists showing that subjects can, indeed, be led to carry out task 2a rather than 2b, and that when they do, the increase in reaction time with increase in imagined distance disappears. Finally, I consider several objections raised to the "tacit-knowledge'' explanation, principally, cases in which subjects appear to have no knowledge of how the results would have turned out in the visual case. I then consider a number of interesting, important cases, possibly not explained by the tadt-knowledge view, in which subjects combine visual and imaginal information by, for example, superimposing images on the scene they are examining visually. I argue that these do not bear on the question under debate— namely, the necessity of postulating a special, noninferential (and noncomputational) mechanism in order to deal with the imagistic mode of reasoning.
Task Demands of Imagery Experiments
With respect to the first point, all published studies of which I am aware, in which larger image distances led to longer reaction times, used instructions that explicitly required subjects to imagine witnessing the occurrence of a real physical event. In most scanning experiments subjects are asked to imagine a spot moving from one point to another, although, in a few experiments (for example, Kosslyn 1973; Kosslyn, Ball, and Reiser 1978, experiment 4), they are asked to imagine "shifting their attention" or their "glance" from one imagined object to another in the same imagined scene. In each case, what subjects were required to imagine was a real, physical event (because such terms as move and shift refer to physical processes) about whose duration they would clearly have some reasonable, though sometimes only tacit, knowledge. For example, the subjects would know implicitly that, for instance, it takes a moving object longer to move through a greater distance, that it takes longer to shift one's attention through greater distances (both transversely and in depth).
It is important to see that what is at issue is not a contamination of results by the sort of experimental artifact psychologists refer to as "experimenter demand characteristics" (see, for example, Rosenthal and Rosnow 1969) but simply a case of subjects solving a task as they interpret it (or as they choose to interpret it, for one reason or another) by bringing to bear everything they know about a class of physical events, events they take to be those they are to imagine witnessing. If the subjects take the task to be that characterized as 2b, they will naturally attempt to reproduce a temporal sequence of representations corresponding to the sequence they believe will arise from actually viewing the event of scanning across a scene or seeing a spot move across it. Thus, beginning with the representation corresponding to "imagining seeing the initial point of focus," the process continues until a representation is reached that corresponds to "imagining seeing the named point." According to this view there is no need to assume that what is happening is that the imaging process continues until the occurrence of a certain imagined state is independently detected (by the mind's eye), say, because a certain "visual" property is noticed. The process could just as plausibly proceed according to a rhythm established by some independent psychophysical mechanism that paces the time between each viewpoint imagined, according to the speed the subject sets for the mental scanning. (We know such mechanisms exist, since subjects can generate time intervals corresponding to known magnitudes with even greater reliability than they can estimate them; see Fraisse 1963.) Neither is it required that the process consist of a discrete sequence—all that is required is that there be psychophysical mechanisms for estimating and creating both speeds and time intervals. My point here
Tadt Knowledge and "Mental Scanning" 233 is simply that the skill involved does not necessarily have anything to do with properties specific to a medium of visual imagery.
For the purpose of this account of the scanning results, we need assume little or nothing about intrinsic constraints on the process or about the form of the sequence of representations generated. It could be that the situation here is like that where a sequence of numbers is computed in conventional digital manner and displayed in analogue form. In that example, I claim that positing an analogue representation is theoretically irrelevant. A similar point applies here. We might, for instance, simply have a sequence consisting of a series of representations of the scene, each with a different location singled out in some manner. In that case, the representation's form is immaterial as far as the data at hand are concerned. For example, we could view the representations as a sequence of beliefs whose contents are something like that the spot is now here, and now it is there—where the locative demonstratives are pointers to parts of the symbolic representations being constructed and updated.
Although the sequence almost certainly is more complex than I have described it, we need not assume that it is constrained by a special property of the representational medium—as opposed simply to being governed by what subjects believe or infer about likely intermediate stages of the event being imagined and about the relative times at which they will occur. Now, such beliefs and inferences obviously can depend on anything the subject might tadtly know or believe concerning what usually happens in corresponding perceptual situations. Thus the sequence could, in one case, depend on tadt knowledge of the dynamics of physical objects, and, in another, on tadt knowledge of some aspects of eye movements or what happens when one must "glance up" or refocus on an object more distant, or even on tadt knowledge of the time required to notice or recognize certain kinds of visual patterns. For example, I would not be surprised, for this reason, to find that it took subjects longer to imagine trying to see something in dim light or against a camouflage background.
The Generality of the ttTacit Knowledge" View
The sort of "tadt knowledge" view I have been discussing has considerable generality in explaining various imagery research findings, espedally when we take into account the plausibility that subjects are actually attempting to solve a problem of type lb. For instance, the list of illustrative examples presented at the beginning of this chapter dearly show that, to imagine the episode of "seeing" certain physical events, one must have access to tadt knowledge of physical regularities. In some cases, it even seems reasonable that one needs an implidt theory, since a variety of related generalizations must be brought to bear to correctly predict what some imagined process will do (for example, the sugar solution or the color filter case). In other cases, the mere knowledge or recollection that certain things typically happen in certain ways, and that they take certain relative lengths of time suffices.
Several other findings, allegedly revealing properties of the mind's eye, might also be explainable on this basis, inducting the finding (Kosslyn 1975) that it takes longer to report properties of objects when the objeds are imagined as being small. Consider that the usual way to insped an objed is to take up a viewing position at some convenient distance from the object that depends on the objed's size and, in certain cases, other things as well (for example, consider imagining a deadly snake or a raging fire). So long as we have a reasonably good idea of the objed's true size, we can imagine viewing it at the appropriate distance. Now, if someone told me to imagine an objed as espedally small, I might perhaps think of myself as being farther away or as seeing it through, say,
the wrong end of a telescope. If I were then asked to do something, such as report some properties of the object, and if the instructions were to imagine that I could see the property I was reporting (which was the case in the experiments reported), or even if, for some obscure reason, I simply chose to make that my task, I would naturally try to imagine the occurrence of some real sequence of events in which I went from seeing the object as small to seeing it as large enough for me to easily discern details (that is, I probably would take the instructions as indicating I should carry out task lb). In that case, I probably would imagine something that, in fact, is a plausible visual event, such as a zooming-in sequence (indeed, that is what many of Kosslyn's subjects reported). If such were the case, we would expect the time relations to be as they are actually observed.
Although this account may sound similar to the one given by some analogue theorists (for example, Kosslyn 1975), from a theoretical standpoint, there is one critical difference. In my account, no appeals need be made to knowledge-independent properties of the functional architecture, especially not to geometrical properties. No doubt, the architecture—what I have been calling the representational medium—has some relevant, intrinsic properties that restrict how things can be represented. These properties, however, appear to play no role in accounting for any phenomena we are considering. These phenomena can be viewed as arising from (a) subjects' tadt knowledge of how, in reality, things typically happen, and (b) subjects' ability to carry out such psychophysical tasks as generating time intervals that correspond to inferred durations of certain possible, physical events. This is not to deny the importance of different forms of representation, of certain inferential capadties, or of the nature of the underlying mechanisms; I am merely suggesting that these findings do not necessarily tell us anything about such matters.
Everyone intuitively feels that the visual image modality (format, or medium) severely constrains both the form and the content of potential representations; at the same time, it is no easy matter to state exactly what these constraints are (the informal examples already given should at least cast suspidon on the validity of such intuitions in general). For instance, it seems dear that we cannot image every objed whose properties we can describe; this lends credence to the view that images are more constrained than descriptions. While it is doubtless true that imagery, in some sense, is not as flexible as such discursive symbol systems as language, it is crudal that we know the nature of this constraint if we are to determine whether it is a constraint imposed by the medium or is merely a habitual way of doing things or is related to our understanding of what it means to image something. It might even be a limitation attributable to the absence of certain knowledge or a failure to draw certain inferences. Once again, I would argue that we cannot tell a priori whether certain patterns which arise when we use imagery ought to be attributed to the character of the biological medium of representation (the analogue view), or whether they should be attributed to the subject's possession and use, either voluntary or habitual, of certain tadt knowledge.
Consider the following proposals made by Kosslyn et al. (1979) concerning the nature of the constraints on imagery. The authors take such constraints to be given by the intrinsic nature of the representational medium, suggesting that what they call the "surface display" (a reference to their cathode ray tube proto-model) gives imagery certain fixed characteristics. For example, they state,
We predict that this component will not allow cognitive penetration: that a person's knowledge, beliefs, intentions, and so on will not alter the spatial structure that we believe the display has. Thus we predict that a person cannot at will make
Tadt Knowledge and "Mental Scanning" 235 his surface display four-dimensional, or non-Euclidean.... (Kosslyn et al. 1979 p. 549)
It does seem true that one cannot image a four-dimensional or non-Euclidean space; yet the very oddness of the supposition that we could do so should make us suspicious as to the reason. To understand why little can be concluded from this, let us suppose a subject insists that he or she could image a non-Euclidean space. Suppose further that mental scanning experiments are consistent with the subject's claim (for example, the scan time conforms to, say, a city block metric). Do we believe this subject, or do we conclude that what the subject really does is "simulate such properties in imagery by filling in the surface display with patterns of a certain sort, in the same way that projections of non-Euclidean surfaces can be depicted on two-dimensional Euclidean paper" (Kosslyn et al. 1979, p. 547)?
We, of course, conclude the latter. The reason we do so is exactly the same as that given for discounting a possible interpretation of what Mozart meant in claiming to be able to imagine a whole symphony "at once." That reason has to do solely with the implications of a particular sense of the phrase "imagine a symphony"—namely, that the task-lb sense demands that certain conditions be fulfilled. If we transpose this to the case of the spatial property of visual imagery, we can see that it is also the reason why the notion of imagining four-dimensional space in the sense of task Ib is incoherent. The point is sufficiently central that it merits a brief elaboration.
Let us first distinguish, as I have been insisting we should, between two senses of "imaging." The first sense of imagining (call it "imqgι‰ink ?") means to think of X or to consider the hypothetical situation that X is the case or to mentally construct a symbolic model or a "description" of a "possible world" in which X is the case. The second sense of imagining (call this "imagine," X") means to imagine that you are seeing X or that you observe the actual event X as it occurs. Then the reason for the inadmissibility of four-dimensional or non-Euclidean imaginal space becomes clear, as does its irrelevance to the question of what properties an imaginal medium has. The reason we cannot imagine," such spaces is, they are not the sort of thing that can be seen. Our inability to imagine,ee such things has nothing to do with intrinsic properties of a "surface display" but, instead, with lack of a certain kind of knowledge: We do not know what it is like to see such a thing. For example, we have no idea what configuration of light and dark contours would be necessary, what visual features would need to appear, and so on. Presumably for similar reasons, congenitally color-blind people cannot imagine," a colored scene, in which case, it would hardly seem appropriate to attribute this failure to a defect in their "surface display." On the other hand, we do know, in nonvisual (that is, nonoptical) terms, what a non-Euclidean space is like, hence we might still be able to Jmqginethillk there being such a space in reality (certainly, Einstein did) and thus solve problems about it. Perhaps, given sufficient familiarity with the facts of such spaces, we could even produce mental scanning results in conformity with non-Euclidean geometries. There have been frequent reports of people who claim to have an intuitive grasp of four-dimensional space in the sense that they can, for instance, mentally rotate a four-dimensional tesseract and imagine," its three-dimensional projection from a new four-dimensional orientation (Hinton, 1906, provides an interesting discussion of what is involved). If this were true, these people might be able to do a four-dimensional version of the Shepard mental rotation task.
If one drops all talk about the geometry (that is, the "spatial character") of the display and considers the general point regarding the common conceptual constraints imposed on vision and imagery, there can be no argument: something is responsible for the way
in which we cognize the world. Whatever that something is probably also explains both the way we see the world and how we image it. But that's as far as we can go. From this, we can no more draw conclusions about the geometry, topology, or other structural property of a representational medium than we can about the structure of a language by considering the structure of things that can be described in that language. There is no reason for believing that the relation is anything but conventional—which is precisely what the doctrine of functionalism claims (and what most of us implicitly believe).
The distinction between the two senses of imagine we discussed also serves to clarify why various empirical findings involving imagery tend to occur together. For example, Kosslyn et al. (1979), in their response section, provide a brief report on a. study by Kosslyn, Jolicoeur, and Fliegel which shows that when stimuli are sorted according to whether subjects tend to visualize them in reporting certain of their properties, that is, whether subjects typically imagine^ them in such tasks; then it is only those stimulusproperty pairs that are classified as mental image evokers that yield the characteristic reaction time functions in mental scanning experiments. This is hardly surprising, since anything that leads certain stimuli habitually to be processed in the imagine^ mode will naturally tend to exhibit numerous other characteristics associated with imaginetet. processing, including scanning time results and such phenomena as the "visual angle of the mind's eye" or the relation between latency and imagined size of objects (see the summary in Kosslyn et al. 1979). Of course, nobody knows why certain features of a stimulus or a task tend to elidt the imagine^ habit, nor why some stimuli should do so more than others; but that is not a problem that distinguishes the analogue from the tacit knowledge view.
Some Empirical Evidence
Finally, it may be useful to consider some provisional evidence suggesting that subjects can be induced to use their visual image to perform a task such as 2a in a way that does not entail imagining oneself observing an actual sequence of events. Recall that the question is whether mental scanning effects (that is, the linear relation between time and distance) should be viewed as evidence for an intrinsic property of a representational medium or as evidence for, say, people's tadt knowledge of geometry and dynamics, as well as their understanding of the task. If the former interpretation is the correct one, then it must not merely be the case that people usually take longer to retrieve information about more distant objeds in an imagined scene. That could arise, as already noted, merely from some habitual or preferred way of imagining or from a preferred interpretation of task demands. If the phenomenon is due to an intrinsic property of the imaginal medium, it must be a necessary consequence of using this medium; that is, the linear (or, at least, the monotonic) relation between time and distance represented must hold whenever information is accessed through the medium of imagery.
As it happens, there exists a strong preference for interpreting tasks involving doing something imaginally as tasks of type lb—that is, as requiring one to imagine^ an actual, physically realizable event happening over time. In most mental scanning cases it is the event of moving one's attention from place to place, or of witnessing something moving between two points. It could also involve imagining such episodes as drawing or extrapolating a line, and watching its progression. The question remains, however: Must a subject imagine such a physically realizable event in order to access information from an image or, more precisely, to produce an answer which the subject claims is based on an examination of the image?
A number of studies have been carried out in my laboratory suggesting that conditions can be set up which enable a subject to use an image to access information, yet which is done without the subject having to imagine the occurrence of a particular, real life, temporal event. That is, the subject can be induced to imagituMnk rather than imaginetee. For purposes of illustration, I mention two of these studies. The design of the experiments follows closely that of experiments reported in Kosslyn, Ball, and Reiser (1978). (See Pylyshyn 1981, for additional details, and Bannon 1981, for all the details of the design and analysis.) The subjects were required to memorize a map containing approximately seven visually distinct places (a church, castle, beach, and so on). Then they were asked to image the map in front of them and focus their attention on a particular named place, while keeping the rest of the map in view in their mind's eye. We then investigated various conditions under which the subjects were given different instructions concerning what to do next, all of which (a) emphasized that the task was to be carried out exclusively by consulting their image, and (b) required them to notice, on cue, a second named place on the map and to make some discriminatory response with respect to that place as quickly and as accurately as possible.
So far this description of the method is identical to that of the experiments by Kosslyn, Ball, and Reiser (1978). Indeed, when we instructed subjects to imagine a speck moving from the place of initial focus to another, named place, we obtained the same strongly linear relation between distance and reaction time as did Kosslya Ball, and Reiser. When, however, the instructions specified merely that subjects give the compass bearing of the second place—that is, to state whether the second place was north, northeast, east, southeast, and so on of the first, there was no relation between distance and reaction time. Similar results have also been obtained since by Finke and Pinker (1982).
These results suggest that it is possible to arrange a situation in which subjects use their image to retrieve information, yet where they do not feel compelled to imagine the event of scanning their attention between the two points—that is, to imaginetee. While this result is suggestive, it is by no means compelling, since it lacks controls for a number of alternative explanations. In particular, because a subject must, in any case, know the bearing of the second place on the map before scanning to it (even in Kosslyn's experiments), we might, for independent reasons, wish to claim that in this experiment the relative bearing of pairs of points on the map was retrieved from a symbolic, as opposed to an imaginal, representation, despite subjects' insistence that they did use their images in making judgments. Whereas this tends to weaken the imagery story somewhat (because it allows a crucial spatial property to be represented off the display and thus raises the question, Why not represent other spatial properties this way?, and because it discounts subjects' reports of how they were carrying out the task in this case while accepting such reports in other comparable situations), nonetheless, it is a possible avenue of retreat.
Consequently, another instructional condition was investigated, one aimed at making it more plausible to believe that subjects had to consult their image in order to make the response, while at the same time making it more compelling that they be focused on the second place and mentally "see" both the original and the second place at the time of the response. The only change in instructions made for this purpose was explicitly to require subjects to focus on the second place after they heard its name (for example, church) and, using it as the new origin, give the orientation of the first place (the place initially focused on) relative to the second. Thus the instructions strongly emphasized the necessity of focusing on the second place and the need actually to see both places before making an orientation judgment. Subjects were not told how to get to the second place from the first, only to keep the image before their mind's eye and use the image to read off the correct answer. In addition, for reasons to be mentioned, the same experiment was run (using a different group of subjects) entirely in the visual modality; thus, instead of having to image the map, subjects could actually examine the map in front of them.
What we found was that in the visual condition there is a significant correlation between response time (measured from the presentation of the name of the second place) and the distance between places, whereas in the imaginal condition no such relation holds. These results indicate clearly that even though the linear relation between distance and time (the "scanning phenomenon") is a frequent concomitant of imaging a transition between "seeing" two places on an image, it is not a necessary consequence of using the visual imagery modality, as it is in the case of actual visual perception; consequently, the linear-reaction time function is not due to an intrinsic (hence, knowledge- and goal-independent) property of the representational medium for visual images.
Such experiments demonstrate that image examination is unencumbered by at least one putative constraint of the "surface display" postulated by Kosslyn and others. Further, it is reasonable to expect other systematic relations between reaction time and image properties to disappear when appropriate instructions are given that are designed to encourage subjects to interpret the task as in la instead of lb. For example, if subjects could be induced to generate what they consider small but highly detailed, dear images, the effect of image size on time to report the presence of features (Kosslyn, 1975) might disappear as well. There is evidence from one of Kosslyn's own studies that this might be the case. In a study reported in Kosslyn, Reiser, Farah, and Fliegel (1983), the time to retrieve information from images was found to be independent of image size. From the description of this experiment, it seems that a critical difference between it and earlier experiments (Kosslyn, 1975), in which an effect of image size was found, is that, here, subjects had time to study the actual objects, with instructions to practice generating equally dear images of each object. The subjects were also tested with the same instructions—which, I assume, encouraged them to entertain equally detailed images at all sizes.
Thus it seems possible, when subjects are encouraged to make available detailed information, they can put as fine a grain of detail as desired into their imaginal constructions, though, presumably, the total amount of information in the image remains limited along some dimension, if not the dimension of resolution. Unlike the case of real vision, such imaginal vision need not be limited by problems of grain or resolution or any other difficulty assodated with making visual discriminations. As I have remarked, subjects can exhibit some of the behavioral characteristics assodated with such limitations (for example taking longer to recall fine details); but that may be because the subjects know what real vision is like and are simulating it as best they can rather than because of the intrinsic nature of the imaginal medium.
Note
1. This claim is worded differently at different times, and depending on how careful Kosslyn is. Thus, in Kosslyn et al. (1979), it is put two different ways in two consecutive sentences. In the first, the authors daim that "these results seem to indicate that images do represent metrical distance"; in the second, they take the more radical approach, claiming that "images have spatial extent." (Kosslyn et al., p. 537) I contend that this VaciUation between representing and having is no accident. Indeed, the attraction of the theory—what appears to give it a principled explanation—is the strong version (that images "have spatial extent"); but the only one that can be defended is the weaker version, a version, in fact, indistinguishable from the tadt knowledge view I have been advocating. Computerization of the theory does not remove the equivocation: There are still two options on how to interpret the Simuhtion—as a Simuhtion of an analogue or a surface with "spatial extent," or as a Simuhtion of the knowledge the subject possesses about space.
References
Attneave, F. 1974. "How Do You Know?", American Psychologist 29:493-499.
Bannon, L J. 1981. "An Investigation of Image Scanning: Theoretical Chims and Empirical Evidence," Ph,D. diss., University of Western Ontario. Ann Arbor, Mich: University Microfilms, no. 81-50, 599.
Block, N. J., ed. 1981 Imagery. Cambridge, Mass.: MTT Press, a Bradford Book.
Finke, R. A, and S. Pinker. 1982. "Spontaneous Imagery Scanning in Mental Extrapohtioa" Journal of Experimental Psychology: Learning, Memory, and Cognition 8:2:142-147.
Fraisse, P. 1963. The Psychology of Time. New York: Harper & Row. Ghiselia B. 1952. The Creative Process. New York: New American Library.
Kosslya S. M. 1973. "Scanning Visual Images: Some Structural Implications," Perception and Psychophysics 14:90-94.
Kosslya S. M. 1975. "The Information Represented in Visual Images," Cognitive Psychology 7:341-370. Kosslya S. M. 1978. "Measuring the Visual Angle of the Mind's Eye," Cognitive Psychology 10:356-389. Kosslya S. M. 1980. Image and Mind. Cambridge, Mass.: Harvard Univ. Press.
Kosslya S. M., B. J. Reiser, M. J. Farah, and L Fliegel. 1983. "Generating Visual Images: Units and Rehtions," Joumalpf Experimental Psychology: General, 112:2:278-303.
Kosslya S. M., T. M. BaU, and B. J. Reiser. 1978. "Visual Images Preserve Metric Spatial Information: Evidence from Studies of Image Scanning," Journal of Experimental Psychology: Human Perception and Performance 4:46-60.
Kosslya S. M., S. Pinker, G. Smith, and S. P. 9ιwartz. 1979. "On The Demystification of Mental Imagery," Behavioral and Brain Sciences 2:4:535-548.
NeweU, A, and H. A Simoa 1972. Human Problem Solving. Englewood Cliffs, N. J.: Prentice-HalL Pylyshya Z. W. 1981. "The Imagery Debate: Analogue Medh versus Tadt Knowledge," Psychological Review 88:16-45.
RosenthaL R., and R. L Rosnow, eds. 1969. Artifact in Behavioral Research. New Yoric Academic Press. Shepard, R. N., and L A Cooper. 1982. Mental Images and Their Transformations. Cambridge, Mass.: MIT Press, a Bradford Book.