308 Tsuneaki Kato and Yukiko I. Nakano
conveyed and the communicative mode employed. Our claim is that those two
constituents of information are inherently associated with each other and cannot
be divided. In the model, each piece of information, either linguistic or other,
is assigned two measures: its effectiveness in activating a given concept and its
cost, the amount of effort required to make it. In any situation, the speaker
chooses and combines the pieces of information that achieve concept activation
with provably minimal cost.
It is natural to consider that the amount of information needed for activation
is influenced by several contextual factors such as saliency (Alshawi, 1987) and
attention stack position (Grosz and Sidner, 1986). That is, initial identification
of an object needs a large amount of information to introduce and activate its
concept. A smaller amount of information is needed to reactivate an object in-
troduced recently. With time or the introduction of different discourse segments,
more information is needed to reactivate the original concept. Related to this
idea, the observed result that proper names do not suffice to reactivate an object
already introduced suggests the validity of context models as was discussed in
Walker (1992).
The communicative mode dependency reported here is derived naturally from
our model because each identification request was expressed using a combina-
tion of information pieces. If the speaker can use visual information, such as
pointing, and activates the concept up to some extent, his/her usage of linguis-
tic information decreases as it is used only for the remainder of the activation.
This means that pointing is not just supplemental. Therefore, there must be
some relationship among pieces of information, which are conveyed via different
communicative modes. The next question is what kind of relationship it is. As
COMET (Feiner and McKeown, 1990) decides which portion of a given semantic
content should be realized in which communicative mode, just how contents are
communicated depends on the modes available, while the contents communi-
cated don't. According to the framework of COMET, pointing is regarded as an
alternative way of communication, and it must convey some of the contents that
would be conveyed linguistically in the situation that pointing could not be used.
According to our results, in initial identification, such contents included those
conveyed by information of shape/size, characters/marks and related objects. It
was related to position information in third time identification. That is, accord-
ing to the COMET framework, we have to attribute several roles to pointing
actions depending on contextual factors. This is complicated and unnatural.
On the other hand, our model claims that semantic content conveyed also
depends on the available modes, as information cannot be divided into its se-
mantic content conveyed and its communicative mode(s) employed. Our model
interprets the role of pointing as follows. Let us consider an object, for exam-
ple. In a spoken-mode situation, the combination of three pieces of information,
say description of its general name, position, and shape, is enough to activate
this object with the least expense. We assume also the cost required for shape
information is most expensive. In the case of identifying this object in a multi-
modal situation, our model predicts that the speaker will first choose to use