Chapter 41 Connectionism and Cognitive Architecture: A Critical Analysis Jerry A. Fodor and Zenon W. Pylyshyn
1 Introduction
Connectionist or PDP models are catching on. There are conferences and new books nearly every day, and the popular science press hails this new wave of theorizing as a breakthrough in understanding the mind (a typical example is the article in the May issue of Science 86, called "How we think: A new theory").
There are also, inevitably, descriptions of the emergence of Connectionism as a Kuhnian "paradigm shift". (See Schneider, 1987, for an example of this and for further evidence of the tendency to view Connectionism as the "new wave" of Cognitive Science.)The fan club includes the most unlikely collection of people. Connectionism gives solace both to philosophers who think that relying on the pseudoscientific intentional or semantic notions of folk psychology (like goals and beliefs) mislead psychologists into taking the computational approach (e.g., P. M. Churchland, 1981; P. S. Churchland, 1986; Dennett, 1986); and to those with nearly the opposite perspective, who think that computational psychology is bankrupt because it doesn't address issues of intentionality or meaning (e.g., Dreyfus & Dreyfus, in press). On the computer science side, Connectionism appeals to theorists who think that serial machines are too weak and must be replaced by radically new parallel machines (Fahlman & Hinton, 1986), while on the biological side it appeals to those who believe that cognition can only be understood if we study it as neuroscience (e.g., Arbib, 1975; Sejnowski, 1981). It is also attractive to psychologists who think that much of the mind (including the part involved in using imagery) is not discrete (e.g., Kosslyn & Hatfield, 1984), or who think that cognitive science has not paid enough attention to stochastic mechanisms or to "holistic" mechanisms (e.g., Lakoff, 1986), and so on and on.
It also appeals to many young cognitive scientists who view the approach as not only anti-establishment (and therefore desirable) but also rigorous and mathematical (see, however, note 2). Almost everyone who is discontent with contemporary cognitive psychology and current "information processing" models of the mind has rushed to embrace "the Connectionist alternative".When taken as a way of modeling cognitive architecture, Connectionism really does represent an approach that is quite different from that of the Classical cognitive science that it seeks to replace. Gassical models of the mind were derived from the structure of Turing and Von Neumann machines. They are not, of course, committed to the details of these machines as exemplified in Turing's original formulation or in typical commercial computers; only to the basic idea that the kind of computing that is relevant to understanding cognition involves operations on symbols (see Fodor, 1976,1987; Newell, 1980, 1982; Pylyshyn, 1980, 1984a, b). In contrast, Connectionists propose to design systems that can exhibit intelligent behavior without storing, retrieving, or otherwise operating on structured symbolic expressions. The style of processing carried out in such models is thus strikingly unlike what goes on when conventional machines are computing some function.
Connectionist systems are networks consisting of very large numbers of simple but highly interconnected “units". Certain assumptions are generally made both about the units and the connections: Each unit is assumed to receive real-valued activity (either excitatory or inhibitory or both) along its input lines. Typically the units do little more than sum this activity and change their state as a function (usually a threshold function) of this sum. Each connection is allowed to modulate the activity it transmits as a function of an intrinsic (but modifiable) property called its "weight". Hence the activity on an input line is typically some non-linear function of the state of activity of its sources.
The behavior of the network as a whole is a function of the initial state of activation of the units and of the weights on its connections, which serve as its only form of memory.Numerous elaborations of this basic Connectionist architecture are possible. For example, Connectionist models often have stochastic mechanisms for determining the level of activity or the state of a unit. Moreover, units may be connected to outside environments. In this case the units are sometimes assumed to respond to a narrow range of combinations of parameter values and are said to have a certain "receptive field" in parameter-space. These are called "value units" (Ballard, 1986). In some versions of Connectionist architecture, environmental properties are encoded by the pattern of states of entire populations of units. Such "coarse coding" techniques are among the ways of achieving what Connectionists call "distributed representation".1 The term 'Connectionist model' (like Turing Machine' or Van Neumann machine') is thus applied to a family of mechanisms that differ in details but share a galaxy of architectural commitments. We shall return to the characterization of these commitments below.
Connectionist networks have been analysed extensively—in some cases using advanced mathematical techniques.2 They have also been simulated on computers and shown to exhibit interesting aggregate properties. For example, they can be "wired" to recognize patterns, to exhibit rule-like behavioral regularities, and to realize virtually any mapping from patterns of (input) parameters to patterns of (output) parameters— though in most cases multiparameter, multi-valued mappings require very large numbers of units. Of even greater interest is the fact that such networks can be made to learn; this is achieved by modifying the weights on the connections as a function of certain kinds of feedback (the exact way in which this is done constitutes a preoccupation of Connectionist research and has lead to the development of such important techniques as "back propagation").
In short, the study of Connectionist machines has led to a number of striking and unanticipated findings; it's surprising how much computing can be done with a uniform network of simple interconnected elements. Moreover, these models have an appearance of neural plausibility that Classical architectures are sometimes said to lack. Perhaps, then, a new Cognitive Science based on Connectionist networks should replace the old Cognitive Science based on Classical computers. Surely this is a proposal that ought to be taken seriously: if it is warranted, it implies a major redirection of research.
Unfortunately, however, discussions of the relative merits of the two architectures have thus far been marked by a variety of confusions and irrelevances. It's our view that when you clear away these misconceptions what's left is a real disagreement about the nature of mental processes and mental representations. But it seems to us that it is a matter that was substantially put to rest about thirty years ago; and the arguments that then appeared to militate decisively in favor of the Classical view appear to us to do so still.
In the present paper we will proceed as follows. First, we discuss some methodological questions about levels of explanation that have become enmeshed in the substantive controversy over Connectionism. Second, we try to say what it is that makes Connec- tionist and Classical theories of mental structure incompatible. Third, we review and extend some of the traditional arguments for the Gassical architecture. Though these arguments have been somewhat recast, very little that we'll have to say here is entirely new. But we hope to make it clear how various aspects of the Gassical doctrine cohere and why rejecting the Gassical picture of reasoning leads Connectionists to say the very implausible things they do about logic and semantics.
1.1 LevelsofExplanation
There are two major traditions in modem theorizing about the mind, one that we'U call 'Representationalist' and one that we'll call ,Elimmativist'.
Representationalists hold that postulating representational (or 'intentional' or 'semantic') states is essential to a theory of cognition; according to Representationalists, there are states of the mind which function to encode states of the world. Eliminativists, by contrast, think that psychological theories can dispense with such semantic notions as representation. According to Eliminativists the appropriate vocabulary for psychological theorizing is neurological or, perhaps behavioral, or perhaps syntactic; in any event, not a vocabulary that characterizes mental states in terms of what they represent. (For a neurological version of eliminativism, see P. S. Giurchland, 1986; for a behavioral version, see Watson, 1930; for a syntactic version, see Stich, 1983.)Connectionists are on the Representationalist side of this issue. As Rumelhart and McClelland (1986a, p. 121) say, PDPs "are explicitly concerned with the problem of internal representation". Correspondingly, the specification of what the states of a network represent is an essential part of a Connectionist model. Consider, for example, the well-known Connectionist account of the bistability of the Necker cube (Feldman & Ballard, 1982). "Simple units representing the visual features of the two alternatives are arranged in competing coalitions, with inhibitory... links between rival features and positive links within each coalition.... The result is a network that has two dominant stable states". Notice that, in this as in all other such Connectionist models, the commitment to mental representation is explicit: the label of a node is taken to express the representational content of the state that the device is in when the node is excited, and there are nodes corresponding to monadic and to relational properties of the reversible cube when it is seen in one way or the other.
There are, to be sure, times when Connectionists appear to vacillate between Repre- Sentationalism and the claim that the "cognitive level" is dispensable in favor of a more precise and biologically-motivated level of theory.
In particular, there is a lot of talk in the Connectionist literature about processes that are "sub-symbolic"—and therefore presumably not representational. But this is misleading: Connectionist modeling is consistently Representationalist in practice, and Representationalism is generally endorsed by the very theorists who also like the idea of cognition 'emerging from the sub- symbolic'. Thus, Rumelhart and McClelland (1986a, p. 121) insist that PDP models are "... strongly committed to the study of representation and process". Similarly, though Smolensky (1988, p. 2) takes Connectionism to articulate regularities at the "sub-symbolic level" of analysis, it turns out that sub-symbolic states do have a semantics, though it's not the semantics of representations at the "conceptual level". According to Smolensky, the semantical distinction between symbolic and sub-symbolic theories is just that "entities that are typically represented in the symbolic paradigm by [single] symbols are typically represented in the sub-symbolic paradigm by a large number of sub-symbols".3 Both the conceptual and the sub-symbolic levels thus postulate representational states, but sub-symbolic theories slice them thinner.We are stressing the Representationalist character of Connectionist theorizing because much Connectionist methodological writing has been preoccupied with the question zWhat level of explanation is appropriate for theories of cognitive architecture?' (see. for example, the exchange between Broadbent, 1985, and Rumelhart & McClelland, 1985). And, as we're about to see, what one says about the levels question depends a lot on what stand one takes about whether there are representational states.
It seems certain that the world has causal structure at very many different levels of analysis, with the individuals recognized at the lowest levels being, in general, very small and the individuals recognized at the highest levels being, in general, very large. Thus there is a scientific story to be told about quarks; and a scientific story to be told about atoms; and a scientific story to be told about molecules... ditto rocks and stones and rivers... ditto galaxies. And the story that scientists tell about the causal structure that the world has at any one of these levels may be quite different from the story that they tell about its causal structure at the next level up or down. The methodological implication for psychology is this: If you want to have an argument about cognitive architecture, you have to specify the level of analysis that's supposed to be at issue.
If you're not a Representationalist, this is quite tricky since it is then not obvious what makes a phenomenon cognitive. But specifying the level of analysis relevant for theories of cognitive architecture is no problem for either Classicists or Connectionists. Since Classicists and Connectionists are both Representationalists, for them any level at which states of the system are taken to encode properties of the world counts as a cognitive level; and no other levels do. (Representations of "the world" include of course, representations of symbols; for example, the concept WORD is a construct at the cognitive level because it represents something, namely words.) Correspondingly, it's the architecture of representational states and processes that discussions of cognitive architecture are about. Put differently, the architecture of the cognitive system consists of the set of basic operations, resources, functions, principles, etc. (generally the sorts of properties that would be described in a "user's manual" for that architecture if it were available on a computer), whose domain and range are the representational states of the organism.4
It follows, that, if you want to make good the Connectionist theory as a theory of cognitive architecture, you have to show that the processes which operate on the representational states of an organism are those which are specified by a Connectionist architecture. It is, for example, no use at all, from the cognitive psychologist's point of view, to show that the Wmrepresentational (e.g., neurological, or molecular, or quantum mechanical) states of an organism constitute a Connectionist network, because that would leave open the question whether the mind is such a network at the psychological level. It is, in particular, perfectly possible that nonrepresentational neurological states are interconnected in the ways described by Connectionist models but that the representational states themselves are not. This is because, just as it is possible to implement a Connectionist cognitive architecture in a network of causally interacting nonrepresentational elements, so t∞ it is perfectly possible to implement a Classical cognitive architecture in such a network.3 In fact, the question whether Connectionist networks should be treated as models at some level of implementation is moot.
It is important to be clear about this matter of levels on pain of simply trivializing the issues about cognitive architecture. Consider, for example, the following remark of Rumelharfs: "It has seemed to me for some years now that there must be a unified account in which the so-called rule-governed and [the] exceptional cases were dealt
Connectionism and Cognitive Architecture: A Critical Analysis 293 with by a unified underlying process—a process which produces rule-like and ruleexception behavior through the application of a single process... [In this process]... both the rule-like and non-rule-like behavior is a product of the interaction of a very large number of 'sub-symbolic' processes." (Rumelhart, 1984, p. 60). It's clear from the context that Rumelhart takes this idea to be very tendentious; one of the Connectionist claims that Classical theories are required to deny.
But in fact it's not. For, of course there are 'sub-symbolic' interactions that implement both rule like and rule violating behavior; for example, quantum mechanical processes do. That's not what Classical theorists deny; indeed, it's not denied by anybody who is even vaguely a materialist. Nor does a Classical theorist deny that rule-following and rule-violating behaviors are both implemented by the very same neurological machinery. For a Classical theorist, neurons implement all cognitive processes in precisely the same way: viz., by supporting the basic operations that are required for symbol-processing.
What would be an interesting and tendentious claim is that there's no distinction between rule-following and rule-violating mentation at the cognitive or representational or symbolic level; specifically, that it is not the case that the etiology of rule-following behavior is mediated by the representation of explicit rules.6 We will argue that it too is not what divides Classical from Connectionist architecture; Classical models permit a principled distinction between the etiologies of mental processes that are explicitly rule-governed and mental processes that aren't; but they don't demand one.
In short, the issue between Classical and Connectionist architecture is not about the explicitness of rules; as we'll presently see, Classical architecture is not, per se, committed to the idea that explicit rules mediate the etiology of behavior. And it is not about the reality of representational states; Classicists and Connectionists are all Representational Realists. And it is not about nonrepresentational architecture; a Connec- Honist neural network can perfectly well implement a Classical architecture at the CogniHve level.
So, then, what is the disagreement between Classical and ConnecHonist architecture about?
2 TheNatureoftheDispute
Classicists and ConnecHonists all assign semanHc content to something. Roughly, Con- nectionists assign semanHc content to 'nodes' (that is, to units or aggregates of units; see note 1)—i.e., to the sorts of things that are typically labeled in ConnecHonist diagrams; whereas Classicists assign semanHc content to expressions—i.e., to the sorts of things that get written on the tapes of Turing machines and stored at addresses in Von Neumann machines.7 But Classical theories disagree with ConnecHonist theories about what primiHve relaHons hold among these content-bearing enHHes. Connec- Honist theories acknowledge only causal connectedness as a primiHve relaHon among nodes: when you know how acHvaHon and inhibiHon flow among them, you know everything there is to know about how the nodes in a network are related. By contrast, Classical theories acknowledge not only causal relaHons among the SemanHcally evaluable objects that they posit, but also a range of structural relaHons, of which consHtu- ency is paradigmaHc.
This difference has far reaching consequences for the ways that the two kinds of theories treat a variety of CogniHve phenomena, some of which we will presently examine at length. But, underlying the disagreements about details are two architectural differences between the theories:
(1) Combinatorial syntax and semantics for mental representations. Classical theories —but not Connectionist theories—postulate a 'language of thought' (see, for example, Fodor, 1975); they take mental representations to have a combinatorial syntax and semantics, in which (a) there is a distinction between structurally atomic and structurally molecular representations; (b) structurally molecular representations have syntactic constituents that are themselves either structurally molecular or structurally atomic; and (c) the semantic content of a (molecular) representation is a function of the semantic contents of its syntactic parts, together with its constituent structure. For purposes of convenience, we'll sometime abbreviate (a)-(c) by speaking of Classical theories as committed to "complex" mental representations or to "symbol structures".8
(2) Structure sensitivity of processes. In Classical models, the principles by which mental states are transformed, or by which an input selects the corresponding output, are defined over structural properties of mental representations. Because Classical mental representations have combinatorial structure, it is possible for Classical mental operations to apply to them by reference to their form. The result is that a paradigmatic Classical mental process operates upon any mental representation that satisfies a given structural description, and transforms it into a mental representation that satisfies another structural description. (So, for example, in a model of inference one might recognize an operation that applies to any representation of the form P&Q and transforms it into a representation of the form P.) Notice that since formal properties can be defined at a variety of levels of abstraction, such an operation can apply equally to representations that differ widely in their structural complexity. The operation that applies to representations of the form P&Q to produce P is satisfied by, for example, an expression like "(AvBvC) & (DvEvF)", from which it derives the expression "(AvBvC)".
We take (1) and (2) as the claims that define Classical models, and we take these claims quite literally; they constrain the physical realizations of symbol structures. In particular, the symbol structures in a Classical model are assumed to correspond to real physical structures in the brain and the combinatorial structure of a representation is supposed to have a counterpart in structural relations among physical properties of the brain. For example, the relation 'part of', which holds between a relatively simple symbol and a more complex one, is assumed to correspond to some physical relation among brain states.9 This is why Newell (1980) speaks of computational systems such as brains and Classical computers as “physical symbol systems”.
This bears emphasis because the Classical theory is committed not only to there being a system of physically instantiated symbols, but also to the claim that the physical properties onto which the structure of the symbols is mapped are the very properties that cause the system to behave as it does. In other words the physical counterparts of the symbols, and their structural properties, cause the system's behavior. A system which has symbolic expressions, but whose operation does not depend upon the structure of these expressions, does not qualify as a Classical machine since it fails to satisfy condition (2). In this respect, a Classical model is very different from one in which behavior is caused by mechanisms, such as energy minimization, that are not responsive to the physical encoding of the structure of representations.
From now on, when we speak of 'Classical' models, we will have in mind any model that has complex mental representations, as characterized in (1) and structure-sensitive mental processes, as characterized in (2). Our account of Classical architecture is therefore neutral with respect to such issues as whether or not there is a separate executive. For example, Classical machines can have an "object-oriented" architecture, like that of the computer language Smalltalk, or a "message passing" architecture, like that of Hewett's (1977) Actors—so long as the objects or the messages have a combinatorial
Connectionism and Cognitive Architecture: A Critical Analysis 295 structure which is causally implicated in the processing. Oassical architecture is also neutral on the question whether the operations on the symbols are constrained to occur one at a time or whether many operations can occur at the same time.
Here, then, is the plan for what follows. In the rest of this section, we will sketch the Connectionist proposal for a computational architecture that does away with complex mental representations and structure sensitive operations. (Although our purpose here is merely expository, it turns out that describing exactly what Connectionists are committed to requires substantial reconstruction of their remarks and practices. Since there is a great variety of points of view within the Connectionist community, we are prepared to find that some Connectionists in good standing may not fully endorse the program when it is laid out in what we take to be its bare essentials.) Following this general expository (or reconstructive) discussion, Section 3 provides a series of arguments favoring the Classical story. Then the remainder of the paper considers some of the reasons why Connectionism appears attractive to many people and offers further general comments on the relation between the Classical and the Connectionist enterprise.
2.1 Complex Mental Representations
To begin with, consider a case of the most trivial sort; two machines, one Classical in spirit and one Connectionist.10 Here is how the Connectionist machine might reason. There is a network of labelled nodes as in figure 41.1. Paths between the nodes indicate the routes along which activation can spread (that is, they indicate the consequences that exdting one of the nodes has for determining the level of excitation of others). Drawing an inference from A&B to A thus corresponds to an excitation of node 2 being caused by an excitation of node 1 (alternatively, if the system is in a state in which node 1 is excited, it eventually settles into a state in which node 2 is excited; see note 7).
Now consider a Classical machine. This machine has a tape on which it writes expressions. Among the expressions that can appear on this tape are:
'A', 'B', 'A&B', 'C', zD', 'C&D', 'A&C&D'... etc. The machine's causal constitution is as follows: whenever a token of the form P&Q appears on the tape, the machine writes a token of the form P. An inference from A&B to A thus corresponds to a tokening of type 'A&B' on the tape causing a tokening of type 'A'.
So then, what does the architectural difference between the machines consist in? In the Classical machine, the objects to which the content A&B is ascribed (viz., tokens of the expression 'A&B') literally contain, as proper parts, objects to which the content A is ascribed (viz., tokens of the expression 'A'.) Moreover, the semantics (e.g., the satisfaction conditions) of the expression 'A&B' is determined in a uniform way by the semantics of its constituents.11 By contrast, in the Connectionist machine none of this is true; the object to which the content A&B is ascribed (viz., node 1) is causally connected to the object to which the content A is ascribed (viz., node 2); but there is no structural
Figure 41.1
A possible Connectionist network for drawing inferences from A & B to A or to B
(e.g., no part/whole) relation that holds between them. In short, it is characteristic of Classical systems, but not of Connectionist systems, to exploit arrays of symbols some of which are atomic (e.g., expressions like 'A') but indefinitely many of which have other symbols as syntactic and semantic parts (e.g., expressions like 'A&B').
It is easy to overlook this difference between Gassical and Connectionist architectures when reading the Connectionist polemical literature or examining a Connectionist model. There are at least four ways in which one might be lead to do so: (1) by failing to understand the difference between what arrays of symbols do in Gassical machines and what node labels do in Connectionist machines; (2) by confusing the question whether the nodes in Connectionist networks have constituent structure with the question whether they are neurologically distributed; (3) by failing to distinguish between a representation having semantic and syntactic constituents and a concept being encoded in terms of microfeatures, and (4) by assuming that since representations of Connectionist networks have a graph structure, it follows that the nodes in the networks have a corresponding constituent structure. We shall now need rather a long digression to clear up these misunderstandings.
2.1.1 The role of labels in Connectionist theories In the course of setting out a Connec- tionist model, intentional content will be assigned to machine states, and the expressions of some language or other will, of course, be used to express this assignment; for example, nodes may be labelled to indicate their representational content. Such labels often have a combinatorial syntax and semantics; in this respect, they can look a lot like Classical mental representations. The point to emphasize, however, is that it doesn't follow (and it isn't true) that the nodes to which these labels are assigned have a combinatorial syntax and semantics. 'A&B', for example, can be tokened on the tape of the Gassical machine and can also appear as a label in a Connectionist machine as it does in figure 41.1. And, of course, the expression 'A&B' is syntactically and semantically complex: it has a token of 'A' as one of its syntactic constituents, and the semantics of the expression 'A&B' is a function of the semantics of the expression 'A'. But it isn't part of the intended reading of the diagram that node 1 itself has constituents; the node— unlike its label—has no semantically interpreted parts.
It is, in short, important to understand the difference between Connectionist labels and the symbols over which Chssical computations are defined. The difference is this: Strictly speaking, the labels play no role at all in determining the operation of a Connec- tionist machine; in particular, the operation of the machine is unaffected by the syntactic and semantic relations that hold among the expressions that are used as labels. To put this another way, the node labels in a Connectionist machine are not part of the causal structure of the machine. Thus, the machine depicted in figure 41.1 will continue to make the same state transitions regardless of what labels we assign to the nodes. Whereas, by contrast, the state transitions of Gassical machines are causally determined by the structure—including the constituent structure—of the symbol arrays that the machines transform: change the symbols and the system behaves quite differently. (In fact, since the behavior of a Gassical machine is sensitive to the syntax of the representations it computes on, even interchanging synonymous—semantically equivalent—representations affects the course of computation). So, although the Connectionisfs labels and the Classicist's data structures both constitute languages, only the latter language constitutes a medium of computation.12
2.1.2 Connectionist networks and graph structures The second reason that the lack of syntactic and semantic structure in Connectionist representations has largely been
Connectionism and Cognitive Architecture: A Critical Analysis 297 ignored may be that Connectionist networks look like general graphs; and it is, of course, perfectly possible to use graphs to describe the internal structure of a complex symbol. That's precisely what linguists do when they use 'trees' to exhibit the constituent structure of sentences. Correspondingly, one could imagine a graph notation that expresses the internal structure of mental representations by using arcs and labelled nodes. So, for example, you might express the syntax of the mental representation that corresponds to the thought that John loves the girl like this:
John → loves → the girl
Under the intended interpretation, this would be the structural description of a mental representation whose content is that John loves the girl, and whose constituents are: a mental representation that refers to John, a mental representation that refers to the girl, and a mental representation that expresses the two-place relation represented by '→loves→'.
But although graphs can sustain an interpretation as specifying the logical syntax of a complex mental representation, this interpretation is inappropriate for graphs of Connectionist networks. Connectionist graphs are not structural descriptions of mental representations; they're specifications of causal relations. AU that a Connectionist can mean by a graph of the form X → Y is: states of node X causally affect states of node Y. In particular, the graph can't mean X is a constituent of Y îò X is grammatically related to Y etc., since these sorts of relations are, in general, not defined for the kinds of mental representations that Connectionists recognize.
Another way to put this is that the links in Connectionist diagrams are not generalized pointers that can be made to take on different functional significance by an independent interpreter, but are confined to meaning something like "sends activation to". The intended interpretation of the links as causal Connections is intrinsic to the theory. If you ignore this point, you are likely to take Connectionism to offer a much richer notion of mental representation than it actuaUy does.
2.1.3 Distributed representations The third mistake that can lead to a failure to notice that the mental representations in Connectionist models lack combinatorial syntactic and semantic structure is the fact that many Connectionists view representations as being neurologically distributed; and, presumably, whatever is distributed must have parts. It doesn't fo∏ow, however, that whatever is distributed must have constituents; being neurologicaUy distributed is very different from having semantic or syntactic constituent structure..
You have constituent structure when (and only when) the parts of SemanticaUy evaluable entities are themselves SemanticaUy evaluable. Constituency relations thus hold among objects aU of which are at the representational level; they are, in that sense, within level relations.13 By contrast, neural distributedness—the sort of relation that is assumed to hold between 'nodes' and the 'units' by which they are realized—is a between level relation: The nodes, but not the units, count as representations. To claim that a node is neuraUy distributed is presumably to claim that its states of activation correspond to patterns of neural activity—to aggregates of neural 'units'—rather than to activations of single neurons. The important point is that nodes that are distributed in this sense can perfectly weU be SyntacticaUy and SemanticaUy atomic: Complex spatiaUy-distributed implementation in no way implies constituent structure.
There is, however, a different sense in which the representational states in a network might be distributed, and this sort of distribution also raises questions relevant to the constituency issue.
2.1.4 Representations as 'distributed' over microfeatures Many Connectionists hold that the mental representations that correspond to commonsense concepts (CHAIR, JOHN, CUP, etc.) are 'distributed' over galaxies of lower level units which themselves have representational content. To use common Connectionist terminology (see Smolensky, 1988), the higher or ''conceptual level" units correspond to vectors in a "sub-conceptual" space of microfeatures. The model here is something like the relation between a defined expression and its defining feature analysis: thus, the concept BACHELOR might be thought to correspond to a vector in a space of features that includes ADULT, HUMAN, MALE, and MARRIED; i.e., as an assignment of the value ÷ to the first two features and — to the last. Notice that distribution over microfeatures (unlike distribution over neural units) is a relation among representations, hence a relation at the cognitive level.
Since microfeatures are frequently assumed to be derived automatically (i.e., via learning procedures) from the statistical properties of samples of stimuli, we can think of them as expressing the sorts of properties that are revealed by multivariate analysis of sets of stimuli (e.g., by multidimensional scaling of similarity judgments). In particular, they need not correspond to English words; they can be finer-grained than, or otherwise atypical of, the terms for which a non-specialist needs to have a word. Other than that, however, they are perfectly ordinary semantic features, much like those that lexicographers have traditionally used to represent the meanings of words.
On the most frequent Connectionist accounts, theories articulated in terms of microfeature vectors are supposed to show how concepts are actually encoded, hence the feature vectors are intended to replace 'less precise" specifications of macrolevel concepts. For example, where a Chssical theorist might recognize a psychological state of entertaining the concept CUP, a Connectionist may acknowledge only a roughly analogous state of tokening the corresponding feature vector. (One reason that the analogy is only rough is that which feature vector 'corresponds' to a given concept may be viewed as heavily context dependent.) The generalizations that 'concept level' theories frame are thus taken to be only approximately true, the exact truth being stateable only in the vocabulary of the microfeatures. Smolensky, for example (p. 11), is explicit in endorsing this picture: "Precise, formal descriptions of the intuitive processor are generally tractable not at the conceptual level, but only at the Subconceptual level."14 This treatment of the relation between commonsense concepts and microfeatures is exactly analogous to the standard Connectionist treatment of rules; in both cases, macrolevel theory is said to provide a vocabulary adequate for formulating generalizations that roughly approximate the facts about behavioral regularities. But the contracts of the macrotheory do not correspond to the causal mechanisms that generate these regularities. If you want a theory of these mechanisms, you need to replace talk about rules and concepts with talk about nodes, connections, microfeatures, vectors and the like.15
Now, it is among the major misfortunes of the Connectionist literature that the issue about whether commonsense concepts should be represented by sets of microfeatures has gotten thoroughly mixed up with the issue about combinatorial structure in mental representations. The crux of the mixup is the fact that sets of microfeatures can overlap, so that, for example, if a microfeature corresponding to ' + has-a-handle' is part of the array of nodes over which the commonsense concept CUP is distributed, then you might think of the theory as representing '+ has-a-handle' as a constituent of the concept CUP; from which you might conclude that Connectionists have a notion of constituency after all, contrary to the claim that Connectionism is not a language-of- thought architecture (see Smolensky, 1988).
A moment's consideration will make it clear, however, that even on the assumption that concepts are distributed over microfeatures,' + has-a-handle' is not a constituent of CUP in anything like the sense that ,Mary' (the word) is a constituent of (the sentence) 'John loves Mary'. In the former case, “constituency" is being (mis)used to refer to a semantic relation between predicates; roughly, the idea is that macrolevel predicates like CUP are defined by sets of microfeatures like ,has-a-handle', so that it's some sort of semantic truth that CUP applies to a subset of what 'has-a-handle' applies to. Notice that while the extensions of these predicates are in a set/subset relation, the predicates themselves are not in any sort of part-to-whole relation. The expression ⅛as-a-handle' isn't part of the expression CUP any more than the English phrase 'is an unmarried man' is part of the English phrase 'is a bachelor'.
Real constituency does have to do with parts and wholes; the symbol ,Mary' is literally a part of the symbol 'John loves Mary'. It is because their symbols enter into real-constituency relations that natural languages have both atomic symbols and complex ones. By contrast, the definition relation can hold in a language where all the symbols are syntactically atomic; e.g., a language which contains both 'cup' and ,has-a- handle' as atomic predicates. This point is worth stressing. The question whether a representational system has real-constituency is independent of the question of microfeature analysis; it arises both for systems in which you have CUP as semantically primitive, and for systems in which the semantic primitives are things like '+ has-a- handle' and CUP and the like are defined in terms of these primitives. It really is very important not to confuse the semantic distinction between primitive expressions and defined expressions with the syntactic distinction between atomic symbols and complex symbols.
So far as we know, there are no worked out attempts in the Connectionist literature to deal with the syntactic and semantical issues raised by relations of real-constituency. There is, however, a proposal that comes up from time to time: viz., that what are traditionally treated as complex symbols should actually be viewed as just sets of units, with the role relations that traditionally get coded by constituent structure represented by units belonging to these sets. So, for example, the mental representation corresponding to the belief that John loves Mary might be the feature vector { + John-subject; + hoes; + Mary-object}. Here 'John-subject', ,Mary-object' and the like are the labels of units; that is, they are atomic (i.e., micro-) features, whose status is analogous to zhas-a- handle'. In particular, they have no internal syntactic analysis, and there is no structural relation (except the orthographic one) between the feature ,Mary-object' that occurs in the set {John-subject; loves; Mary-object} and the feature ,Mary-subject' that occurs in the set {Mary-subject; loves; John-object}. (See, for example, the discussion in Hinton, 1987 of "role-specific descriptors that represent the conjunction of an identity and a role [by the use of which] we can implement part-whole hierarchies using set intersection as the composition rule." See also, McClelland, Rumelhart and Hinton, 1986, p. 82-85, where what appears to be the same treatment is proposed in somewhat different terms.)
Since, as we remarked, these sorts of ideas aren't elaborated in the Connectionist literature, detailed discussion is probably not warranted here. But it's worth a word to make clear what sort of trouble you would get into if you were to take them seriously.
As we understand it, the proposal really has two parts: On the one hand, it's suggested that although Connectionist representations cannot exhibit real-constituency, nevertheless the Classical distinction between complex symbols and their constituents can be replaced by the distinction between feature sets and their subsets; and, on the other hand, it's suggested that role relations can be captured by features. We'll consider these ideas in turn.
(1) Instead of having complex symbols like "John loves Mary" in the representational system, you have feature sets like { + John-subject; ÷ hoes; ÷ Mary-object}. Since this set has { ÷ John-subject}, { + loves; + Mary-object} and so forth as sub-sets, it may be supposed that the force of the constituency relation has been captured by employing the subset relation.
However, it's clear that this idea won't work since not all subsets of features correspond to genuine constituents. For example, among the subsets of { + John-subject; + loves; + Mary-object} are the sets {+John-subject; + Mary-object}) and the set { + John-subject; ÷ loves} which do not, of course, correspond to constituents of the complex symbol "John loves Mary".
(2) Instead of defining roles in terms of relations among constituents, as one does in Classical architecture, introduce them as microfeatures.
Consider a system in which the mental representation that is entertained when one believes that John loves Mary is the feature set {+John-subject; +loves; + Mary- object}. What representation corresponds to the belief that John loves Mary and Bill hates Sally? Suppose, pursuant to the present proposal, that it's the set { + John-subject; + loves; + Mary-object; +Bill-subject; +hates; 4-S«//y-o&/ed}. We now have the problem of distinguishing that belief from the belief that John loves Sally and Bill hates Mary; and from the belief that John hates Mary and Bill loves Sally; and from the belief that John hates Mary and Sally and Bill loves Mary; etc., since these other beliefs will all correspond to precisely the same set of features. The problem is, of course, that nothing in the representation of Mary as ÷ Mary-object specifies whether it's the loving or the hating that she is the object of; similarly, mutatis mutandis, for the representation of Johnas +John-subject.
What has gone wrong isn't disastrous (yet). All that's required is to enrich the system of representations by recognizing features that correspond not to (for example) just being a subject, but rather to being the subject of a loving of Mary (the property that John has when John loves Mary) and being the subject of a hating of Sally (the property that Bill has when Bill hates Sally). So, the representation of John that's entertained when one believes that John loves Mary and Bill hates Sally might be something like +John-subject-hates-Mary-object.
The disadvantage of this proposal is that it requires rather a lot of microfeatures.16 How many? Well, a number of the order of magnitude of the sentences of a natural language (whereas one might have hoped to get by with a vocabulary of basic expressions that is not vastly larger than the lexicon of a natural language; after all, natural languages do). We leave it to the reader to estimate the number of microfeatures you would need, assuming that there is a distinct belief corresponding to every grammatical sentence of English of up to, say, fifteen words of length, and assuming that there is an average of, say, five roles associated with each belief. (Hint: George Miller once estimated that the number of well-formed 20-word sentences of English is of the order of magnitude of the number of seconds in the history of the universe.)
The alternative to this grotesque explosion of atomic symbols would be to have a combinatorial syntax and Semantics for the features. But, of course, this is just to give up the game since the syntactic and semantic relations that hold among the parts of the complex feature + ((John subject} loves (Mary object}} are the very same ones that Gassi- cally hold among the constituents of the complex symbol "John loves Mary"; these include the role relations which Connectionists had proposed to reconstruct using just sets of atomic features. It is, of course, no accident that the Connectionist proposal for dealing with role relations runs into these sorts of problems. Subject, object and the rest are Classically defined with respect to the geometry of constituent structure trees. And Connectionist representations don't have constituents.
The idea that we should capture role relations by allowing features like John-subject thus turns out to be bankrupt; and there doesn't seem to be any other way to get the force of structured symbols in a Connectionist architecture. Or, if there is, nobody has given any indication of how to do it. This becomes clear once the crudal issue about structure in mental representations is disentangled from the relatively secondary (and orthogonal) issue about whether the representation of commonsense concepts is 'distributed' (i.e., from questions like whether it's CUP or zhas-a-handle' or both that is semantically primitive in the language of thought).
It's worth adding that these problems about expressing the role relations are actually just a symptom of a more pervasive difficulty: A consequence of restricting the vehicles of mental representation to sets of atomic symbols is a notation that fails quite generally to express the way that concepts group into propositions. To see this, let's continue to suppose that we have a network in which the nodes represent concepts rather than propositions (so that what corresponds to the thought that John loves Mary is a distribution of activation over the set of nodes {JOHN; LOVES; MARY} rather than the activation of a single node labelled JOHN LOVES MARY). Notice that it cannot plausibly be assumed that all the nodes that happen to be active at a given time will correspond to concepts that are constituents of the same proposition; least of all if the architecture is "massively parallel" so that many things are allowed to go on—many concepts are allowed to be entertained—simultaneously in a given mind. Imagine, then, the following situation: at time t, a man is looking at the sky (so the nodes corresponding to SKY and BLUE are active) and thinking that John loves Fido (so the nodes corresponding to JOHN, LOVES, and FIDO are active), and the node FIDO is connected to the node DOG (which is in turn connected to the node ANIMAL) in such fashion that DOG and ANIMAL are active too, We can, if you like, throw it in that the man has got an itch, so ITCH is also on.
According to the current theory of mental representation, this man's mind at t is specified by the vector {+JOHN, +LOVES, +FIDO, +DOG, +SKY, +BLUE, + ITCH, + ANIMAL}. And the question is: which subvectors of this vector correspond to thoughts that the man is thinking? Specifically, what is it about the man's representational state that determines that the simultaneous activation of the nodes, {JOHN, LOVES, FIDO} constitutes his thinking that John loves Fido, but the simultaneous activation of FIDO, ANIMAL and BLUE does not constitute his thinking that Fido is a blue animal? It seems that we made it too easy for ourselves when we identified the thought that John loves Mary with the vector { + JOHN, + LOVES, + MARY}; at best that works only on the assumption that JOHN, LOVES and MARY are the only nodes active when someone has that thought. And that's an assumption to which no theory of mental representation is entitled.
It's important to see that this problem arises precisely because the theory is trying to use sets of atomic representations to do a job that you really need complex representations for. Thus, the question we're wanting to answer is: Given the total set of nodes active at a time, what distinguishes the subvectors that correspond to propositions from the subvectors that don't? This question has a straightforward answer if, contrary to the present proposal, complex representations are assumed: When representations express concepts that belong to the same proposition, they are not merely simultaneously active, but also in construction with each other. By contrast, representations that express concepts that don't belong to the same proposition may be simultaneously active; but, they are ipso facto not in construction with each other.
In short, you need two degrees of freedom to specify the thoughts that an intentional system is entertaining at a time: one parameter (active vs inactive) picks out the nodes that express concepts that the system has in mind; the other (in construction vs not) determines how the concepts that the system has in mind are distributed in the propositions that it entertains. For symbols to be "in construction" in this sense is just for them to be constituents of a complex symbol. Representations that are in construction form parts of a geometrical whole, where the geometrical relations are themselves semantically significant. Thus the representation that corresponds to the thought that John loves Fido is not a set of concepts but something like a tree of concepts, and it's the geometrical relations in this tree that mark (for example) the difference between the thought that John loves Fido and the thought that Fido loves John.
We've occasionally heard it suggested that you could solve the present problem consonant with the restriction against complex representations if you allow networks like this:
FIDO
SUBJECT-OF
BITES
The intended interpretation is that the thought that Fido bites corresponds to the simultaneous activation of these nodes; that is, to the vector { + FIDO, + SUBJECT OF, + BITES}—with similar though longer vectors for more complex role relations.
But, on second thought, this proposal merely begs the question that it set out to solve. For, if there's a problem about what justifies assigning the proposition John loves Fido as the content of the set {JOHN, LOVES, FIDO}, there is surely the same problem about what justifies assigning the proposition Fido is the subject of bites to the set {FIDO, SUBJECT-OF, BITES}. If this is not immediately clear, consider the case where the simultaneously active nodes are {FIDO, SUBJECT-OF, BITES, JOHN}. Is the propositional content that Fido bites or that John does?17
Strikingly enough, the point that we've been making in the past several paragraphs is very close to one that Kant made against the Assodationists of his day. In 'Transcendental Deduction (B)" of The First Critique, Kant remarks that:
... if I investigate... the relation of the given modes of knowledge in any judgement, and distinguish it, as belonging to the understanding, from the relation according to laws of the reproductive imagination [e.g., according to the prindples of assodation], which has only subjective validity, I find that a judgement is nothing but the manner in which given modes of knowledge are brought to the objective unity of apperception. This is what is intended by the copula "is". It is employed to distinguish the objective unity of given representations from the subjective.... Only in this way does there arise from the relation a judgement, that is a relation which is objectively valid, and so can be adequately distinguished from a relation of the same representations that would have only subjective validity—as when they are conneded according to laws of assodation. In the latter case, all that I could say would be If I support a body, I feel an impression of weight'; I could not say, It, the body, is heavy'. Thus to say The body is heavy' is not merely to state that the two representations have always been conjoined in my perception,... what we are asserting is that they are combined in the object... (CPR, p. 159; emphasis Kant's)
A modem paraphrase might be: A theory of mental representation must distinguish the case when two concepts (e.g., THIS BODY, HEAVY) are merely simultaneously entertained from the case where, to put it roughly, the property that one of the concepts expresses is predicated of the thing that the other concept denotes (as in the thought: THIS BODY IS HEAVY). The relevant distinction is that while both concepts are "active" in both cases, in the latter case but not in the former the active concepts are in construction. Kant thinks that "this is what is intended by the copula 'is'". But of course there are other notational devices that can serve to specify that concepts are in construction; notably the bracketing structure of constituency trees.
There are, to reiterate, two questions that you need to answer to specify the content of a mental state: "Which concepts are 'active'" and "Which of the active concepts are in construction with which others?" Identifying mental states with sets of active nodes provides resources to answer the first of these questions but not the second. That's why the version of network theory that acknowledges sets of atomic representations but no complex representations fails, in indefinitely many cases, to distinguish mental states that are in fact distinct.
But we are not claiming that you can't reconcile a Connectionist architecture with an adequate theory of mental representation (specifically with a combinatorial syntax and semantics for mental representations). On the contrary, of course you can: All that's required is that you use your network to implement a Turing machine, and specify a combinatorial structure for its computational language. What it appears that you can't do, however, is have both a combinatorial representational system and a Connectionist architecture at the cognitive level.
So much, then, for our long digression. We have now reviewed one of the major respects in which Connectionist and Classical theories differ; viz., their accounts of mental representations. We turn to the second major difference, which concerns their accounts of mental processes.
2.2 StructuresensitiveOperations Classicists and Connectionists both offer accounts of mental processes, but their theories differ sharply. In particular, the Classical theory relies heavily on the notion of the logico/syntactic form of mental representations to define the ranges and domains of mental operations. This notion is, however, unavailable to orthodox Connectionists since it presupposes that there are nonatomic mental representations.
The Classical treatment of mental processes rests on two ideas, each of which corresponds to an aspect of the Classical theory of computation. Together they explain why the Classical view postulates at least three distinct levels of organization in computational systems: not just a physical level and a semantic (or "knowledge") level, but a syntactic level as well.
The first idea is that it is possible to construct languages in which certain features of the syntactic structures of formulas correspond systematically to certain of their semantic features. Intuitively, the idea is that in such languages the syntax of a formula encodes its meaning; most especially, those aspects of its meaning that determine its role in inference. All the artificial languages that are used for logic have this property and English has it more or less. Classicists believe that it is a crucial property of the Language of Thought.
A simple example of how a language can use syntactic structure to encode inferential roles and relations among meanings may help to illustrate this point. Thus, consider the relation between the following two sentences:
(1) John went to the store and Mary went to the store.
(2) Marywenttothestore.
On the one hand, from the semantic point of view, (1) entails (2) (so, of course, inferences from (1) to (2) are truth preserving). On the other hand, from the syntactic point of view, (2) is a constituent of (1). These two facts can be brought into phase by exploiting the principle that sentences with the syntactic structure '(Sl and S2)s' entail their sentential constituents. Notice that this principle connects the syntax of these sentences with their inferential roles. Notice too that the trick relies on facts about the grammar of English; it wouldn't work in a language where the formula that expresses the conjunctive content John went to the store and Mary went to the store is syntactically atomic.18
Here is another example. We can reconstruct such truth preserving inferences as if Rooer bites then something bites on the assumption that (a) the sentence zRover bites' is of the syntactic type Fa, (b) the sentence 'something bites' is of the syntactic type 3x (Fx) and (c) every formula of the first type entails a corresponding formula of the second type (where the notion 'corresponding formula' is cashed syntactically; roughly the two formulas must differ only in that the one has an existentially bound variable at the syntactic position that is occupied by a constant in the other.) Once again the point to notice is the blending of syntactical and semantical notions: The rule of existential generalization applies to formulas in virtue of their syntactic form. But the salient property that's preserved under applications of the rule is semantical: What's claimed for the transformation that the rule performs is that it is truth preserving.19
There are, as it turns out, examples that are quite a lot more complicated than these. The whole of the branch of logic known as proof theory is devoted to exploring them.20 It would not be unreasonable to describe Gassical Cognitive Science as an extended attempt to apply the methods of proof theory to the modeling of thought (and similarly, of whatever other mental processes are plausibly viewed as involving inferences; preeminently learning and perception). Classical theory construction rests on the hope that syntactic analogues can be constructed for nondemonstrative inferences (or informal, commonsense reasoning) in something like the way that proof theory has provided syntactic analogues for validity.
The second main idea underlying the Gassical treatment of mental processes is that it is possible to devise machines whose function is the transformation of symbols, and whose operations are sensitive to the syntactical structure of the symbols that they operate upon. This is the Classical conception of a computer: it's what the various architectures that derive from Turing and Von Neumann machines all have in common.
Perhaps it's obvious how the two 'main ideas' fit together. If, in principle, syntactic relations can be made to parallel semantic relations, and if, in principle, you can have a mechanism whose operations on formulas are sensitive to their syntax, then it may be possible to construct a syntactically driven machine whose state transitions satisfy semantical criteria of coherence. Such a machine would be just what's required for a mechanical model of the semantical coherence of thought; correspondingly, the idea that the brain is such a machine is the foundational hypothesis of Gassical cognitive science.
So much for the Gassical story about mental processes. The Connectionist story must, of course, be quite different: Since Connectionists eschew postulating mental representations with combinatorial syntactic/semantic structure, they are precluded from postulating mental processes that operate on mental representations in a way that is sensitive to their structure. The sorts of operations that Connectionist models do
Connectionism and Cognitive Architecture: A Critical Analysis 305 have are of two sorts, depending on whether the process under examination is learning or reasoning.
2.2.1 Learning If a Connectionist model is intended to leam, there will be processes that determine the weights of the connections among its units as a function of the character of its training. Typically in a Connectionist machine (such as a 'Boltzman Machine') the weights among connections are adjusted until the system's behavior comes to model the statistical properties of its inputs. In the limit, the stochastic relations among machine states recapitulates the stochastic relations among the environmental events that they represent.
This should bring to mind the old Assodationist prindple that the strength of assodation between Ideas' is a function of the frequency with which they are paired 'in experience' and the Learning Theoretic prindple that the strength of a stimulusresponse connection is a function of the frequency with which the response is rewarded in the presence of the stimulus. But though Connectionists, like other Assodationists, are committed to learning processes that model statistical properties of inputs and outputs, the simple mechanisms based on co-occurrence statistics that were the hallmarks of old-fashioned Assodationism have been augmented in Connectionist models by a number of technical devices. (Hence the 'new' in New Connectionism'.) For example, some of the earlier limitations of assodative mechanisms are overcome by allowing the network to contain zhidden' units (or aggregates) that are not directly connected to the environment and whose purpose is, in effect, to deted statistical patterns in the activity of the 'visible' units induding, perhaps, patterns that are more abstrad or more 'global' than the ones that could be detected by old-fashioned perceptrons.21
In short, sophisticated versions of the assodative prindples for weight-setting are on offer in the Connectionist literature. The point of present concern, however, is what all versions of these prindples have in common with one another and with older kinds of Assodationism: viz., these processes are all frequency-sensitive. To return to the example discussed above: if a Connectionist learning machine converges on a state where it is prepared to infer A from A&B (i.e., to a state in which when the 'A&B' node is exdted it tends to settle into a state in which the 'A' node is exdted) the convergence will typically be caused by statistical properties of the machine's training experience: e.g., by correlation between firing of the 'A&B' node and firing of the 'A' node, or by correlations of the firing of both with some feedback signal. Like traditional Assoda- tionism, Connectionism treats learning as basically a sort of statistical modeling.
2.2.2 Reasoning Assodation operates to alter the structure of a network Aiachronically as a function of its training. Connectionist models also contain a variety of types of 'relaxation' processes which determine the synchronic behavior of a network; specifically, they determine what output the device provides for a given pattern of inputs. In this resped, one can think of a Connectionist model as a spedes of analog machine construded to realize a certain function. The inputs to the function are (i) a specification of the Connededness of the machine (of which nodes are conneded to which); (ii) a Spedfication of the weights along the connections; (iii) a specification of the values of a variety of idiosyncratic parameters of the nodes (e.g., intrinsic thresholds; time since last firing, etc.) (iv) a specification of a pattern of exdtation over the input nodes. The output of the function is a specification of a pattern of exdtation over the output nodes; intuitively, the machine chooses the output pattern that is most highly assodated to its input.
Much of the mathematical sophistication of Connectionist theorizing has been devoted to devising analog solutions to this problem of finding a 'most highly associated' output corresponding to an arbitrary input; but, once again, the details needn't concern us. What is important, for our purposes, is another property that Connectionist theories share with other forms of Assodationism. In traditional Assodationism, the probability that one Idea will elidt another is sensitive to the strength of the assodation between them (induding 'mediating' assodations, if any). And the strength of this assodation is in turn sensitive to the extent to which the Ideas have previously been correlated. Assodative strength was not, however, presumed to be sensitive to features of the content or the structure of representations per se. Similarly, in Connectionist models, the selection of an output corresponding to a given input is a function of properties of the paths that conned them (induding the weights, the states of intermediate units, etc.). And the weights, in turn, are a function of the statistical properties of events in the environment (or of relations between patterns of events in the environment and implidt 'predictions' made by the network etc.). But the syntactic/semantic structure of the representation of an input is not presumed to be a fador in determining the selection of a corresponding output since, as we have seen, syntactic/semantic structure is not defined for the sorts of representations that Connectionist models acknowledge.
To summarize: Classical and Connectionist theories disagree about the nature of mental representation; for the former, but not for the latter, mental representations Charaderistically exhibit a combinatorial constituent structure and a combinatorial semantics. Gassical and Connectionist theories also disagree about the nature of mental processes; for the former, but not for the latter, mental processes are Charaderistically sensitive to the combinatorial structure of the representations on which they operate.
We take it that these two issues define the present dispute about the nature of cognitive architecture. We now propose to argue that the Connectionists are on the wrong side of both.
3 The Need for Symbol Systems: Productivity, Systematicity, Compositionality and Inferential Coherence
Gassical psychological theories appeal to the constituent structure of mental representations to explain three dosely related features of cognition: its productivity, its compositionality and its inferential coherence. The traditional argument has been that these features of cognition are, on the one hand, pervasive and, on the other hand, explicable only on the assumption that mental representations have internal structure. This argument—familiar in more or less explidt versions for the last thirty years or so—is still intad, so far as we can tell. It appears to offer something dose to a demonstration that an empirically adequate cognitive theory must recognize not just causal relations among representational states but also relations of syntactic and semantic constituency; hence that the mind cannot be, in its general structure, a Connectionist network.
3.1 ProductivityofThought
There is a classical productivity argument for the existence of combinatorial structure in any rich representational system (induding natural languages and the language of thought). The representational Capadties of such a system are, by assumption, unbounded under appropriate idealization; in particular, there are indefinitely many propositions which the system can encode.22 However, this unbounded expressive power must presumably be achieved by finite means. The way to do this is to treat the system of representations as consisting of expressions belonging to a generated set. More
Connectionism and Cognitive Architecture: A Critical Analysis 307 precisely, the correspondence between a representation and the proposition it expresses is, in arbitrarily many cases, built up recursively out of correspondences between parts of the expression and parts of the proposition. But, of course, this strategy can operate only when an unbounded number of the expressions are non-atomic. So linguistic (and mental) representations must constitute symbol systems (in the sense of note 8). So the mind cannot be a PDP.
Very often, when people reject this sort of reasoning, it is because they doubt that human cognitive capacities are correctly viewed as productive. In the long run there can be no a priori arguments for (or against) idealizing to productive capacities; whether you accept the idealization depends on whether you believe that the inference from finite performance to finite capacity is justified, or whether you think that finite performance is typically a result of the interaction of an unbounded competence with resource constraints. Classicists have traditionally offered a mixture of methodological and empirical considerations in favor of the latter view.
From a methodological perspective, the least that can be said for assuming productivity is that it precludes solutions that rest on inappropriate tricks (such as storing all the pairs that define a function); tricks that would be unreasonable in practical terms even for solving finite tasks that place sufficiently large demands on memory. The idealization to unbounded productive capacity forces the theorist to separate the finite specification of a method for solving a computational problem from such factors as the resources that the system (or person) brings to bear on the problem at any given moment.
The empirical arguments for productivity have been made most frequently in connection with linguistic competence. They are familiar from the work of Chomsky (1968) who has claimed (convincingly, in our view) that the knowledge underlying linguistic competence is generative—i.e., that it allows us in principle to generate (/understand) an unbounded number of sentences. It goes without saying that no one does, or could, in fact utter or understand tokens of more than a finite number of sentence types; this is a trivial consequence of the fact that nobody can utter or understand more than a finite number of sentence tokens. But there are a number of considerations which suggest that, despite de facto constraints on performance, ones knowledge of ones language supports an unbounded productive capacity in much the same way that ones knowledge of addition supports an unbounded number of sums. Among these considerations are, for example, the fact that a speaker/hearer's performance can often be improved by relaxing time constraints, increasing motivation, or supplying pencil and paper. It seems very natural to treat such manipulations as affecting the transient state of the speaker's memory and attention rather than what he knows about—or how he represents—his language. But this treatment is available only on the assumption that the character of the subject's performance is determined by interactions between the available knowledge base and the available computational resources.
Classical theories are able to accommodate these sorts of considerations because they assume architectures in which there is a functional distinction between memory and program. In a system such as a Turing machine, where the length of the tape is not fixed in advance, changes in the amount of available memory can be affected without changing the computational structure of the machine; viz., by making more tape available. By contrast, in a finite state automaton or a Connectionist machine, adding to the memory (e.g., by adding units to a network) alters the connectivity relations among nodes and thus does affect the machine's computational structure. Connectionist cognitive architectures cannot, by their very nature, support an expandable memory, so they cannot support productive cognitive capacities. The long and short is that if produc-
Hvity arguments are sound, then they show that the architecture of the mind can't be ConnecHonist. ConnecHonists have, by and large, acknowledged this; so they are forced to reject producHvity arguments.
The test of a good ScienHfic idealizaHon is simply and solely whether it produces successful science in the long term. It seems to us that the producHvity idealizaHon has more than earned its keep, especially in linguistics and in theories of reasoning. Connec* Honists, however, have not been persuaded. For example, Riunelhart and McClelland (1986a, p. 119) say that they ''... do not agree that [productive] CapabiliHes are of the essence of human computaHon. As anyone who has ever attempted to process sentences like The man the boy the girl hit kissed moved' can attest, our ability to process even moderate degrees of center-embedded structure is grossly impaired relaHve to an ATN [Augmented Transition Network] parser.... What is needed, then, is not a mechanism for flawless and effortless processing of embedded ConstrucHons... The challenge is to explain how those processes that others have chosen to explain in teπns of recursive mechanisms can be better explained by the kinds of processes natural for PDP networks."
These remarks suggest that Rumelhart and McClelland think that the fact that center-embedding sentences are hard is somehow an embarrassment for theories that view linguistic CapadHes as producHve. But of course it's not since, according to such theories, performance is an effect of interacHons between a producHve competence and restricted resources. There are, in fact, quite plausible Classical accounts of why centerembeddings ought to impose espedally heavy demands on resources, and there is a reasonable amount of experimental support for these models (see, for example, Wanner & Maratsos, 1978).
In any event, it should be obvious that the difficulty of parsing center-embeddings can't be a consequence of their recursiveness per se since there are many recursive structures that are strikingly easy to understand. Consider: 'this is the dog that chased the cat that ate the rat that lived in the house that Jack built.' The Gassidst's case for producHve CapadHes in parsing rests on the transparency of sentences like these.23 In short, the fact that center-embedded sentences are hard perhaps shows that there are some recursive structures that we can't parse. But what Rumelhart and McGelland need if they are to deny the producHvity of IinguisHc capadties is the much stronger daim that there are no recursive structures that we can parse; and this stronger daim would appear to be simply false.
Rumelhart and McGellandzS discussion of recursion (pp. 119-120) nevertheless repays dose attention. They are apparently prepared to concede that PDPs can model recursive CapadHes only indiredly—viz., by implementing Gassical architectures like ATNs; so that if human cognition exhibited recursive capadHes, that would suffice to show that minds have Gassical rather than ConnecHonist architecture at the psychological level. "We have not dwelt on PDP implementations of Turing machines and recursive processing engines because we do not agree with those who would argue that such capacities are of the essence of human computation" (p. 119, our emphasis). Their argument that recursive capadties aren't "of the essence of human computaHon" is, however, just the unconvincing stuff about center-embedding quoted above.
So the Rumelhart and McClelland view is apparently that if you take it to be independently obvious that some cognitive capadties are produdive, then you should take the existence of such capacities to argue for Classical cognitive architecture and hence for treating Connedionism as at best an implementation theory. We think that this is quite a plausible understanding of the bearing that the issues about producHvity and recursion have on the issues about cognitive architecture....
In the meantime, however, we propose to view the status of productivity arguments for Classical architectures as moot; we're about to present a different sort of argument for the claim that mental representations need an articulated internal structure. It is closely related to the productivity argument, but it doesn't require the idealization to unbounded competence. Its assumptions should thus be acceptable even to theorists who—like Connectionists—hold that the Iinitistic character of cognitive capacities is intrinsic to their architecture.
3.2 Systematicity of Cognitive Representation
The form of the argument is this: Whether or not cognitive capacities are really productive, it seems indubitable that they are what we shall call 'systematic'. And we'll see that the Systematicity of cognition provides as good a reason for postulating combinatorial structure in mental representation as the productivity of cognition does: You get, in effect, the same conclusion, but from a weaker premise.
The easiest way to understand what the Systematicity of cognitive capacities amounts to is to focus on the Systematicity of language comprehension and production. In fact, the Systematidty argument for combinatorial structure in thought exactly recapitulates the traditional Structuralist argument for constituent structure in sentences. But we pause to remark upon a point that we'll re-emphasize later; linguistic capadty is a paradigm of systematic cognition, but it's wildly unlikely that it's the only example. On the contrary, there's every reason to believe that Systematidty is a thoroughly pervasive feature of human and infrahuman mentation.
What we mean when we say that linguistic capadties are systematic is that the ability to produce/understand some sentences is intrinsically conneded to the ability to produce/understand certain others. You can see the force of this if you compare learning languages the way we really do Ieam them with learning a language by memorizing an enormous phrase book. The point isn't that phrase b∞ks are finite and can therefore exhaustively specify only non-productive languages; that's true, but we've agreed not to rely on productivity arguments for our present purposes. Our point is rather that you can Ieam any part of a phrase book without learning the rest. Hence, on the phrase book model, it would be perfectly possible to Ieam that uttering the form of words 'Granny's cat is on Unde Arthur's mat' is the way to say (in English) that Granny's cat is on Uncle Arthur's mat, and yet have no idea at all how to say that it's raining (or, for that matter, how to say that Unde Arthur's cat is on Granny's mat). Perhaps it's self-evident that the phrase book story must be wrong about language acquisition because a speaker's knowledge of his native language is never like that. You don't, for example, find native speakers who know how to say in English that John loves the girl but don't know how to say in English that the girl loves John.
Notice, in passing, that Systematidty is a property of the mastery of the syntax of a language, not of its lexicon. The phrase book model really does fit what it's like to Ieam the vocabulary of English since when you Ieam English vocabulary you acquire a lot of basically independent capadties. So you might perfectly well Ieam that using the expression 'cat' is the way to refer to cats and yet have no idea that using the expression 'dedduous conifer' is the way to refer to dedduous conifers. Systematidty, like productivity, is the sort of property of cognitive capadties that you're likely to miss if you concentrate on the psychology of learning and searching lists.
There is, as we remarked, a straightforward (and quite traditional) argument from the Systematidty of language capadty to the condusion that sentences must have syntactic and semantic structure: If you assume that sentences are Construded out of words and phrases, and that many different sequences of words can be phrases of the same type, the very fact that one formula is a sentence of the language will often imply that other formulas must be too: in effect, Systematicity follows from the postulation of constituent structure.
Suppose, for example, that it's a fact about English that formulas with the constituent analysis ,NP Vt NP, are well formed; and suppose that 'John' and 'the girl' are NPs and 'loves' is a Vt. It follows from these assumptions that 'John loves the girl,' 'John loves John,' 'the girl loves the girl,' and 'the girl loves John' must all be sentences. It follows too that anybody who has mastered the grammar of English must have linguistic capacities that are systematic in respect of these sentences; he can't but assume that all of them are sentences if he assumes that any of them are. Compare the situation on the view that the sentences of English are all atomic. There is then no structural analogy between 'John loves the girl' and 'the girl loves John' and hence no reason why understanding one sentence should imply understanding the other; no more than understanding 'rabbit' implies understanding 'tree'.24
On the view that the sentences are atomic, the Systematidty of linguistic capadties is a mystery; on the view that they have constituent structure, the Systematidty of linguistic capadties is what you would predict. So we should prefer the latter view to the former.
Notice that you can make this argument for constituent structure in sentences without idealizing to astronomical computational capadties. There are productivity arguments for constituent structure, but they're concerned with our ability—in prindple —to understand sentences that are arbitrarily long. Systematidty, by contrast, appeals to premises that are much nearer home; such considerations as the ones mentioned above, that no speaker understands the form of words 'John loves the girl' except as he also understands the form of words 'the girl loves John'. The assumption that linguistic capadties are productive "in prindple" is one that a Connedionist might refuse to grant. But that they are systematic in fact no one can plausibly deny.
We can now, Bnally, come to the point: the argument from the Systematidty of linguistic capadties to constituent structure in sentences is quite dear. But thought is systematic too, so there is a predsely parallel argument from the Systematidty of thought to syntactic and semantic structure in mental representations.
What does it mean to say that thought is systematic? Well, just as you don't find people who can understand the sentence 'John loves the girl' but not the sentence 'the girl loves John,' so too you don't find people who can think the thought that John loves the girl but can't think the thought that the girl loves John. Indeed, in the case of verbal organisms the Systematidty of thought follows from the Systematidty of language if you assume—as most psychologists do—that understanding a sentence involves entertaining the thought that it expresses; on that assumption, nobody could understand both the sentences about John and the girl unless he were able to think both the thoughts about John and the girl.
But now if the ability to think that John loves the girl is intrinsically conneded to the ability to think that the girl loves John, that fad will somehow have to be explained. For a Representationalist (which, as we have seen, Connectionists are), the explanation is obvious: Entertaining thoughts requires being in representational states (i.e., it requires tokening mental representations). And, just as the Systematidty of language shows that there must be structural relations between the sentence 'John loves the girl' and the sentence 'the girl loves John,' so the Systematidty of thought shows that there must be structural relations between the mental representation that corresponds to the thought
Connectionisin and Cognitive Architecture: A Critical Analysis 311 that John loves the girl and the mental representation that corresponds to the thought that the girl loves John;25 namely, the two mental representations, like the two sentences, must be made of the same parts. But if this explanation is right (and there don't seem to be any others on offer), then mental representations have internal structure and there is a language of thought. So the architecture of the mind is not a Connectionist network.26
To summarize the discussion so far: Productivity arguments infer the internal structure of mental representations from the presumed fact that nobody has a finite intellectual competence. By contrast, Systematicity arguments infer the internal structure of mental representations from the patent fact that nobody has a punctate intellectual competence. Just as you don't find linguistic capacities that consist of the ability to understand sixty-seven unrelated sentences, so too you don't find cognitive capacities that consist of the ability to think seventy-four unrelated thoughts. Our claim is that this isn't, in either case, an accident; A linguistic theory that allowed for the possibility of punctate languages would have gone not just wrong, but very profoundly wrong. And similarly for a cognitive theory that allowed for the possibility of punctate minds.
But perhaps not being punctate is a property only of the minds of language users; perhaps the representational capacities of infraverbal organisms do have just the kind of gaps that Connectionist models permit? A Connectionist might then claim that he can do everything "up to language" on the assumption that mental representations lack combinatorial syntactic and semantic structure. Everything up to language may not be everything, but it's a lot. (On the other hand, a lot may be a lot, but it isn't everything. Infraverbal cognitive architecture mustn't be so represented as to make the eventual acquisition of language in phylogeny and in ontogeny require a miracle.)
It is not, however, plausible that only the minds of verbal organisms are systematic. Think what it would mean for this to be the case. It would have to be quite usual to find, for example, animals capable of representing the state of affairs aRb, but incapable of representing the state of affairs bRa. Such animals would be, as it were, «Rb sighted but bRa blind since, presumably, the representational capacities of its mind affect not just what an organism can think, but also what it can perceive. In consequence, such animals would be able to Ieam to respond selectively to aRb situations but quite unable to Ieam to respond selectively to bRa situations. (So that, though you could teach the creature to choose the picture with the square larger than the triangle, you couldn't for the life of you teach it to choose the picture with the triangle larger than the square.)
It is, to be sure, an empirical question whether the cognitive capacities of infraverbal organisms are often structured that way, but we're prepared to bet that they are not. Ethological cases are the exceptions that prove the rule. There are examples where salient environmental configurations act as 'gestalten'; and in such cases it's reasonable to doubt that the mental representation of the stimulus is complex. But the point is precisely that these cases are exceptional; they're exactly the ones where you expect that there will be some special story to tell about the ecological significance of the stimulus: that it's the shape of a predator, or the song of a conspecific... etc. Conversely, when there is no such story to tell you expect structurally similar stimuli to elicit correspondingly similar cognitive capacities. That, surely, is the least that a respectable principle of stimulus generalization has got to require.
That infraverbal cognition is pretty generally systematic seems, in short, to be about as secure as any empirical premise in this area can be. And, as we've just seen, it's a premise from which the inadequacy of Connectionist models as cognitive theories follows quite straightforwardly; as straightforwardly, in any event, as it would from the assumption that such capacities are generally productive.
3.3 Compositionality of Representations
Compositionality is closely related to systematidty; perhaps they're best viewed as aspects of a single phenomenon. We will therefore follow much the same course here as in the preceding discussion: first we introduce the concept by recalling the standard arguments for the compositionality of natural languages. We then suggest that parallel arguments secure the compositionality of mental representations. Since compositionality requires combinatorial syntactic and semantic structure, the compositionality of thought is evidence that the mind is not a Connectionist network.
We said that the systematidty of linguistic competence consists in the fact that "the ability to produce/understand some of the sentences is intrinsically connected to the ability to produce/understand certain of the others". We now add that which sentences are systematically related is not arbitrary from a semantic point of view. For example, being able to understand 'John loves the girl' goes along with being able to understand 'the girl loves John', and there are correspondingly close semantic relations between these sentences: in order for the first to be true, John must bear to the girl the very same relation that the truth of the second requires the girl to bear to John. By contrast, there is no intrinsic connection between understanding either of the John/girl sentences and understanding semantically unrelated formulas Øåå 'quarks are made of gluons' or 'the cat is on the mat' or '2 + 2 = 4'; it looks as though semantical relatedness and systematidty keep quite close company.
You might suppose that this covariance is covered by the same explanation that accounts for systematidty per se,* roughly, that sentences that are systematically related are composed from the same syntactic constituents. But, in fact, you need a further assumption, which we'll call the 'prindple of compositionality': insofar as a language is systematic, a lexical item must make approximately the same semantic contribution to each expression in which it occurs. It is, for example, only insofar as 'the', 'girl', loves' and 'John' make the same semantic contribution to 'John loves the girl' that they make to 'the girl loves John' that understanding the one sentence implies understanding the other. Similarity of constituent structure accounts for the semantic relatedness between systematically related sentences only to the extent that the semantical properties of the shared constituents are context-independent.
Here it's idioms that prove the rule: being able to understand 'the', 'man', Icicked' and bucket' isn't much help with understanding 'the man kicked the bucket', since lacked' and bucket' don't bear their standard meanings in this context. And, just as you'd exped, 'the man kicked the bucket' is not systematic even with resped to syntactically dosely related sentences like 'the man kicked over the bucket' (for that matter, it's not systematic with resped to the 'the man kicked the bucket' read literally).
It's uncertain exadly how compositional natural languages actually are (just as it's uncertain exactly how systematic they are). We susped that the amount of context induced variation of lexical meaning is often overestimated because other sorts of context sensitivity are misconstrued as violations of compositionality. For example, the difference between 'feed the chicken' and 'chicken to eat' must involve an animal/food ambiguity in 'chicken' rather than a violation of compositionality since if the context 'feed the...' could induce (rather than seled) the meaning animal, you would exped 'feed the veal', 'feed the pork' and the like.27 Similarly, the difference between 'good book', 'good rest' and 'good fight' is probably not meaning shift but Syncategorema- tidty. 'Good NP means something like NP that answers to the relevant interest in NI⅛: a good book is one that answers to our interest in books (viz., it's good to read); a good rest is one that answers to our interest in rests (viz., it leaves one refreshed); a good fight is one that answers to our interest in fights (viz., it's fun to watch or to be in, or it clears
Connectionism and Cognitive Architecture: A Critical Analysis 313 the air); and so on. It's because the meaning of 'good' is Syncategorematic and has a variable in it for relevant interests, that you can Imow that a good flurg is a flurg that answers to the relevant interest in flurgs without knowing what flurgs are or what the relevant interest in flurgs is (see Ziff, 1960).
In any event, the main argument stands: Systematicity depends on compositionality, so to the extent that a natural language is systematic it must be compositional too. This illustrates another respect in which Systematidty arguments can do the work for which productivity arguments have previously been employed. The traditional argument for compositionality is that it is required to explain how a finitely representable language can contain infinitely many nonsynonymous expressions.
Considerations about Systematidty offer one argument for compositionality; considerations about entailment offer another. Consider predicates like '... is a brown cow'. This expression bears a straightforward semantical relation to the predicates '... is a cow' and '... is brown'; viz., that the first predicate is true of a thing if and only if both of the others are. That is,'... is a brown cow' severally entails'... is brown' and'... is a cow' and is entailed by their conjunction. Moreover—and this is important—this semantical pattern is not peculiar to the cases dted. On the contrary, it holds for a very large range of predicates (see'... is a red square,''... is a funny old German soldier,''... is a child prodigy;' and so forth).
How are we to account for these sorts of regularities? The answer seems dear enough;'... is a brown cow' entails'... is brown' because (a) the second expression is a constituent of the first; (b) the syntactical form '(adjective noun)√ has (in many cases) the semantic force of a conjunction, and (c) brown' retains its semantical value under simplification of conjunction. Notice that you need (c) to rule out the possibility that brown' means brown when it modifies a noun but (as it might be) dead when it's a predicate adjective; in which case'... is a brown cow' wouldn't entail'... is brown' after all. Notice too that (c) is just an application of the prindple of composition.
So, here's the argument so far: you need to assume some degree of compositionality of English sentences to account for the fact that systematically related sentences are always semantically related; and to account for certain regular parallelisms between the syntactical structure of sentences and their entailments. So, beyond any serious doubt, the sentences of English must be compositional to some serious extent. But the prindple of compositionality governs the semantic relations between words and the expressions of which they are constituents. So compositionality implies that (some) expressions have constituents. So compositionality argues for (spedfically, presupposes) syntactic/ semantic structure in sentences.
Now what about the compositionality of mental representations? There is, as you'd expect, a bridging argument based on the usual psycholinguistic premise that one uses language to express ones thoughts: Sentences are used to express thoughts; so if the ability to use some sentences is connected with the ability to use certain other, semantically related sentences, then the ability to think some thoughts must be correspondingly connected with the ability to think certain other, semantically related thoughts. But you can only think the thoughts that your mental representations can express. So, if the ability to think certain thoughts is interconnected, Hien the corresponding representational capadties must be interconnected too; spedfically, the ability to be in some representational states must imply the ability to be in certain other, semantically related representational states.
But then the question arises: how could the mind be so arranged that the ability to be in one representational state is connected with the ability to be in others that are semantically nearby? What account of mental representation would have this consequence? The answer is just what you'd expect from the discussion of the linguistic material. Mental representations must have internal structure, just the way that sentences do. In particular, it must be that the mental representation that corresponds to the thought that John loves the girl contains, as its parts, the same constituents as the mental representation that corresponds to the thought that the girl loves John. That would explain why these thoughts are systematically related; and, to the extent that the semantic value of these parts is context-independent, that would explain why these systematically related thoughts are also semantically related. So, by this chain of argument, evidence for the compositionality of sentences is evidence for the compositionality of the representational states of speaker/hearers.
Finally, what about the compositionality of infraverbal thought? The argument isn't much different from the one that we've just run through. We assume that animal thought is largely systematic: the organism that can perceive (hence leam) that aRb can generally perceive (∕leam) that bRa. But, systematically related thoughts (just like systematically related sentences) are generally semantically related too. It's no surprise that being able to Ieam that the triangle is above the square implies being able to Ieam that the square is above the triangle; whereas it would be very surprising if being able to Ieam the square/triangle facts implied being able to Ieam that quarks are made of gluons or that Washington was the first President of America.
So, then, what explains the correlation between systematic relations and semantic relations in infraverbal thought? Clearly, Connectionist models don't address this question; the fact that a network contains a node labelled X has, so far as the constraints imposed by Connectionist architecture are concerned, no implications at all for the labels of the other nodes in the network; in particular, it doesn't imply that there will be nodes that represent thoughts that are semantically close to X. This is just the semantical side of the fact that network architectures permit arbitrarily punctate mental lives.
But if, on the other hand, we make the usual Classicist assumptions (viz., that systematically related thoughts share constituents and that the semantic values of these shared constituents are context independent) the correlation between Systematicity and semantic relatedness follows immediately. For a Qassicist, this correlation is an 'architectural' property of minds; it couldn't but hold if mental representations have the general properties that Classical models suppose them to.
What have Connectionists to say about these matters? There is some textual evidence that they are tempted to deny the facts of compositionality wholesale. For example, Smolensky (1988) claims that: ''Surely... we would get quite a different representation of 'coffee' if we examined the difference between 'can with coffee' and 'can without coffee' or 'tree with coffee' and 'tree without coffee'; or 'man with coffee' and 'man without coffee'... context insensitivity is not something we expect to be reflected in Connectionist representations....".
It's certainly true that compositionality is not generally a feature of Connectionist representations. Connectionists can't acknowledge the facts of compositionality because they are committed to mental representations that don't have combinatorial structure. But to give up on compositionality is to take Idck the bucket' as a model for the relation between syntax and semantics; and the consequence is, as we've seen, that you make the Systematidty of language (and of thought) a mystery. On the other hand, to say that ,kick the bucket' is aberrant, and that the right model for the syntax/ semantics relation is (e.g.) Thrown cow', is to start down a trail which leads, pretty inevitably, to acknowledging combinatorial structure in mental representation, hence to the rejection of Connectionist networks as cognitive models.
We don't think there's any way out of the need to acknowledge the compositionality of natural languages and of mental representations. However, it's been suggested (see Smolensky, op dt.) that while the principle of compositionality is false (because content isn't context invariant) there is nevertheless a "family resemblance" between the various meanings that a symbol has in the various contexts in which it occurs. Since such proposals generally aren't elaborated, it's unclear how they're supposed to handle the salient facts about Systematidty and inference. But surely there are going to be serious problems. Consider, for example, such inferences as
(i) Turtles are slower than rabbits.
(ii) Rabbits are slower than Ferraris.
(iii) Turtles are slower than Ferraris.
The soundness of this inference appears to depend upon (a) the fad that the same relation (viz., slower than) holds between turtles and rabbits on the one hand, and rabbits and Ferraris on the other; and (b) the fad that that relation is transitive. If, however, it's assumed (contrary to the prindple of compositionality) that 'slower than' means something different in premises (i) and (ii) (and presumably in (iii) as well)—so that, stridly speaking, the relation that holds between turtles and rabbits is not the same one that holds between rabbits and Ferraris-then it's hard to see why the inference should be valid.
Talk about the relations being 'similar, only papers over the difficulty since the problem is then to provide a notion of similarity that will guaranty that if (i) and (ii) are true, so too is (iii). And, so far at least, no such notion of similarity has been forthcoming. Notice that it won't do to require just that the relations all be similar in resped of their transitivity, i.e., that they all be transitive. On that account, the argument from 'turtles are slower than rabbits' and 'rabbits are furrier than Ferraris' to 'turtles are slower than Ferraris' would be valid since 'furrier than' is transitive too.
Until these sorts of issues are attended to, the proposal to replace the compositional prindple of context invariance with a notion of "approximate equivalence... across contexts" (Smolensky, 1988) doesn't seem to be much more than hand waving.
3.4 The Systematicity of Inference
In Section 2 we saw that, according to Classical theories, the syntax of mental representations mediates between their semantic properties and their causal role in mental processes. Take a simple case: It's a logical' prindple that conjunctions entail their constituents (so the argument from P&Q to P and to Q is valid). Correspondingly, it's a psychological law that thoughts that P&Q tend to cause thoughts that P and thoughts that Q all else being equal. Oassical theory exploits the constituent structure of mental representations to account for both these fads, the first by assuming that the combinatorial semantics of mental representations is sensitive to their syntax and the second by assuming that mental processes apply to mental representations in virtue of their constituent structure.
A consequence of these assumptions is that Classical theories are committed to the following striking prediction: inferences that are of similar logical type ought, pretty generally,28 to elidt correspondingly similar cognitive capadties. You shouldn't, for example, find a kind of mental life in which you get inferences from P&Q&R to P but you don't get inferences from P&Q to P. This is because, according to the Classical
Figure 41.2 A possible Connectionist network which draws inferences from P & Q fc R to P and also draws inferences from P & Q to P.
account, this logically homogeneous class of inferences is carried out by a correspondingly homogeneous class of psychological mechanisms: The premises of both inferences are expressed by mental representations that satisfy the same syntactic analysis (viz., S1ΘS2 6rS3 &... ⅞); and the process of drawing the inference corresponds, in both cases, to the same formal operation of detaching the constituent that expresses the conclusion.
The idea that organisms should exhibit similar cognitive capacities in respect of logically similar inferences is so natural that it may seem unavoidable. But, on the contrary: there's nothing in principle to preclude a kind of cognitive model in which inferences that are quite similar from the logician's point of view are nevertheless computed by quite different mechanisms; or in which some inferences of a given logical type are computed and other inferences of the same logical type are not. Consider, in particular, the Connectionist account. A Connectionist can certainly model a mental life in which, if you can reason from P&Q&R to P, then you can also reason from P&Q to P. For example, the networic in (Figure 41.2) would do.
But notice that a Connectionist can equally model a mental life in which you get one of these inferences and not the other. In the present case, since there is no structural relation between the P&Q&R node and the P&Q node (remember, all nodes are atomic; don't be misled by the node labels) there's no reason why a mind that contains the first should also contain the second, or vice versa. Analogously, there's no reason why you shouldn't get minds that simplify the premise John loves Mary and Bill hates Mary but no others; or minds that simplify premises with 1, 3, or 5 conjuncts, but don't simplify premises with 2,4, or 6 conjuncts; or, for that matter, minds that simplify only premises that were acquired on Tuesdays... etc.
In fact, the Connectionist architecture is utterly indifferent as among these possibilities. That's because it recognizes no notion of syntax according to which thoughts that are alike in inferential role (e.g., thoughts that are all subject to simplification of conjunction) are expressed by mental representations of correspondingly similar syntactic form (e.g., by mental representations that are all syntactically conjunctive). So, the Connec- tionist architecture tolerates gaps in cognitive capacities; it has no mechanism to enforce the requirement that logically homogeneous inferences should be executed by correspondingly homogeneous computational processes.
But, we claim, you don't find cognitive capacities that have these sorts of gaps. You don't, for example, get minds that are prepared to infer John went to the store from John
Connectionism and Cognitive Architecture: A Critical Analysis 317 and Mary and Susan and Sally went to the store and from John and Mary went to the store but not from John and Mary and Susan went to the store. Given a notion of logical syntax—the very notion that the Classical theory of mentation requires to get its account of mental processes off the ground—it is a truism that you don't get such minds. Lacking a notion of logical syntax, it is a mystery that you don't.
3.5 Summary
It is perhaps obvious by now that all the arguments that we've been reviewing—the argument from Systematidty, the argument from compositionality, and the argument from influential coherence—are really much the same: If you hold the kind of theory that acknowledges structured representations, it must perforce acknowledge representations with similar or identical structures. In the linguistic cases, constituent analysis implies a taxonomy of sentences by their syntactic form, and in the inferential cases, it implies a taxonomy of arguments by their logical form. So, if your theory also acknowledges mental processes that are structure sensitive, then it will predict that similarly structured representations will generally play similar roles in thought. A theory that says that the sentence 'John loves the girl' is made out of the same parts as the sentence 'the girl loves John', and made by applications of the same rules of composition, will have to go out of its way to explain a linguistic competence which embraces one sentence but not the other. And similarly, if a theory says that the mental representation that corresponds to the thought that P&Q&R has the same (conjunctive) syntax as the mental representation that corresponds to the thought that P&Q and that mental processes of drawing inferences subsume mental representations in virtue of their syntax, it will have to go out of its way to explain inferential capadties which embrace the one thought but not the other. Such a competence would be, at best, an embarrassment for the theory, and at worst a refutation.
By contrast, since the Connectionist architecture recognizes no combinatorial structure in mental representations, gaps in cognitive competence should proliferate arbitrarily. It's not just that you'd expect to get them from time to time; it's that, on the 'no-structure' story, gaps are the unmarked case. It's the systematic competence that the theory is required to treat as an embarrassment. But, as a matter of fad, inferential competences are blatantly systematic. So there must be something deeply wrong with Connectionist architecture.
What's deeply wrong with Connectionist architecture is this: Because it acknowledges neither syntactic nor semantic structure in mental representations, it perforce treats them not as a generated set but as a list. But lists, qua lists, have no structure; any collection of items is a possible list. And, correspondingly, on Connectionist prindples, any collection of (causally conneded) representational states is a possible mind. So, as far as Connectionist architecture is concerned, there is nothing to prevent minds that are arbitrarily unsystematic. But that result is preposterous. Cognitive capadties come in structurally related dusters; their Systematidty is pervasive. All the evidence suggests that punctate minds can't happen. This argument seemed condusive against the Connec- tionism of Hebb, Osgood and Hull twenty or thirty years ago. So far as we can tell, nothing of any importance has happened to change the situation in the meantime.29
A final comment to round off this part of the discussion. It's possible to imagine a Connectionist being prepared to admit that while Systematidty doesn't follow from— and hence is not explained by—Connectionist architecture, it is nonetheless compatible with that architecture. It is, after all, perfectly possible to follow a policy of building networks that have aRb nodes only if they have frRa nodes... etc. There is therefore nothing to stop a Connectionist from stipulating—as an independent postulate of his theory of mind—that all biologically instantiated networks are, de facto, systematic.
But this misses a crucial point: It's not enough just to stipulate Systematicity; one is also required to specify a mechanism that is able to enforce the stipulation. To put it another way, it's not enough for a Connectionist to agree that all minds are systematic; he must also explain how nature contrives to produce only systematic minds. Presumably there would have to be some sort of mechanism, over and above the ones that Connec- tionism per se posits, the functioning of which insures the Systematidty of biologically instantiated networks; a mechanism such that, in virtue of its operation, every network that has an aRb node also has a bRa node... and so forth. There are, however, no proposals for such a mechanism. Or, rather, there is just one: The only mechanism that is known to be able to produce pervasive Systematidty is Classical architecture. And, as we have seen, Classical architecture is not compatible with Connectionism since it requires internally structured representations.