Anaphora in Multimodal Discourse 259
that conventions have not yet arisen? This hardly seems likely since, as has been
argued elsewhere (Neilson and Lee 1994), discourse is probably more often and
more primitively multimodal than purely linguistic. If the reason is not lack of
opportunity, then what is it?
We believe that an outline answer to this question is quite revealing of fun-
damental characteristics of the two modalities of graphics and language. We now
attempt to sketch an explanation.
One fundamental is that, in establishing reference, anaphora is involved with
relating the content of a communicative expression to an existing state of knowl-
edge. We tend to take this view even of a situation where the existing knowledge
is perhaps only accepted very tentatively and perhaps has been only very recently
acquired, e.g. in the previous sentence of a discourse. The view is not uncommon
in linguistic treatments, being essentially that processing of discourse involves
the incremental construction of a
discourse model
representing the current state
of knowledge about the immediate context. (Note that this model is distinct
from, though in various ways perhaps related to, 'background knowledge' about
the general domain of discourse and other things.) Hence establishing corefer-
ence for an expression is typically concerned with establishing whether it refers
to an entity already in the model, and if so which, or whether it introduces a
new entity. If an existing entity is being referred to, this is usually for a spe-
cific purpose such as to add new information about it, or give a new instruction
concerning it, etc. -- and this much is common with e.g. Singer.
We can see that in practical terms the establishing of coreference depends a
good deal on what kind of access can be supported to the current discourse model
state. Where a dialogue, say, is being conducted entirely over the telephone, or a
discourse entirely in unillustrated text, there is no way to establish the reference
of an expression other than through its relations to previous expressions, or
to things in the world which are directly named or described. Where, on the
other hand, a system for
external representation
is available to both participants
in a dialogue, it may be used to maintain persistent information about their
discourse, which in turn may function as an auxiliary representation of (part of)
the discourse model.
Obviously, any dynamic use of graphics is likely to play this kind of role.
While a discourse model in language is a purely conceptual structure, it may
have in graphical or pther physical contexts a partial physical counterpart, and
hence reference can be secured by a combination of physical and linguistic means.
This gives rise to the familiar use of deixis, which secures reference for language
in graphical and other contexts; many phenomena which establish reference are,
however, much more subtle than the paradigm cases of rather overt deixis. As
observed by Neilson and Lee (1994), a complex process of inference, often bring-
ing in background knowledge, seems to be required to account for many of the
referential relations that arise in multimodal dialogue.
Given a view of this sort, it is natural to suppose that anaphora as such (i.e.
the fairly precise phenomenon considered in linguistics) fails to arise in many
cases where graphics is present, simply because it isn't necessary to establish