
Support the Range of Representational Systems
Required by the Task
The structural complexity and linguistic variability of input generated by users
are important sources of processing difficulties. A primary technique to elicit
simpler, easier-to-process language is related to the choice of modalities that an
interface supports. Users will naturally choose the modalities that are most appro-
priate for conveying content. For example, users typically select pen input to pro-
vide location and spatially oriented information, as well as digits, symbols, and
graphic content (Oviatt, 1997; Oviatt & Olsen, 1994; Suhm, 1998). In contrast, they
will use speech for describing objects and events and for issuing commands for
actions (Cohen & Oviatt, 1995; Oviatt & Cohen, 1991).
A primary guideline is therefore to support modalities so that the representa-
tional systems required by users are available. The language that results when
adequate complementary modalities are available tends to be simplified linguisti-
cally, briefer, syntactically simpler, and less disfluent (Oviatt, 1997), and it con-
tains less linguistic indirection and fewer co-referring expressions (Oviatt &
Kuhn, 1998). One implication of this is that the fundamental language models
needed to design a multimodal system are not the same as those used in the past
for processing textual language.
Structure the Interface to Elicit Simpler Language
A key insight in designing multimodal interfaces that lead to simpler, more process-
able language is that the language employed by users can be shaped very strongly by
system presentation features. Adding structure, as opposed to having an uncon-
strained interface, has been demonstrated to be highly effective in simplifying the
language produced by users, resulting in more processable language and fewer
errors. A forms-based interface that guides users through the steps required to
complete a task can reduce the length of spoken utterances and eliminate up to
80 percent of hard-to-process speech disfluencies (Oviatt, 1995). Similar benefits
have been identified in map-based domains. A map with more detailed information
displaying the full network of roads, buildings, and labels can reduce disfluencies
compared to a minimalist map containing one-third of the roads (Oviatt, 1997).
Other techniques that may lead users toward expected language are guided
dialogs and context-sensitive cues (Bourguet, 2006). These provide additional
information that helps users determine what their input options are at each point
of an interaction, leading to more targeted production of terms that are expected
by the interface at a given state. This is usually implemented by having a prompt
that explicitly lists the options the user can choose from.
Exploit Natural Adaptation
A powerful mechanism for transparently shaping user input relies on the ten-
dency that users have of adapting to the linguistic style of their conversational
12.6 Design Guidelines
427