Ergonomics and HMI concerns in Mule:
towards intelligent input methods
Didier Verna
The XEmacs Project
<didier@xemacs.org>
April 19, 1999
Abstract
This paper presents some ergonomics and Human Machine Inter-
action problems that several input methods (notably the French ones)
introduce in Emacs
1
. First, a general overview of the available input
methods is given. Then, some ergonomics problems are described.
Finally, some ideas are proposed to improve the quality of these input
methods.
Introduction
Working with an internationalized text editor means that not only sever al
languages can be displayed, possibly at the same time, but also th at those
languages can be typed in. In order to ease the process of typing in differ-
ent languages with a st and ard keyboard, Mule offers the concept of “input
method” which allow you to enter characters from different alphabets. The
process of inputing characters from a traditional keyboard can be particu-
larly complex, notably for large character sets like kanjis. Witho ut going
that far, we can already identify ergonomics problems with simpler meth-
ods like the o n es used for inputing French. In this paper, we would like to
demonstrate those problems, and propose ideas that could be used in order
to make those input met h ods more “intelligent”, and thus simpler to use.
Section 1 propose s an over view of the different available input method s.
Section 2 de mo n strates the ergonomics problems that French input meth-
ods suffer from. Section 3 attempts to propose some ideas for improving
the quality and the ergonomy of thes e input method s.
1
the term Emacs refers to any flavor of the software, notably GNU Emacs or XEmacs
1
1 Different Kinds of Input Methods
This section provides a short overview of t h e different kinds of input meth-
ods available in Mule for inputing different languages, and gives mo re de-
tails on t h e Quail French input methods.
1.1 Input methods classification
Class 1: key bindings The first class of input method works by rebinding
the keyboard keys to characters from another language. This is possible
for small alphabets that can be represented on a normal k eyboard. Russian
input met h ods belong to this categor y. Each character can also be printed
on the corresponding key to help finding them on the keyboard. As far
as French is concerned, note that an input me thod emulating an AZERTY
keyboard on a QWERTY one would belong t o th is catego r y.
Class 2: key sequences When there is n ot enough keys on the keyboard,
or when you don’t want to rebind them, some input methods use key s e-
quences to input characters. Latin-1 (notably for French) and Latin-2 input
methods belong to this category. The ne x t subsection will give a more de-
tailed overview of the French input method s.
Combination of class 1 and class 2 Some languages such as Thai use a
combination of the first and second class to input characters. I n a first s tep,
the keys are bound to vowels, consonants etc. and a seque n ce of such char-
acters actually produce a composite one.
Class 3: external help Finally, in mo re complex cases, some help from
an external program might be required. Japanese input methods belong
to this category: after using a combination of the preceding cases to input,
say, hiragana phonetically, an external program like canna-server proposes
possible kanji translations for this word.
1.2 Quail French input methods
Quail provides several French input methods belonging to class 2 as de-
scribed previously. “french-postfix” and “french-prefix” are the most widely
used. Those input methods typically let you enter a character with an ac-
cent or a cedilla in two keystrokes. In french-prefix, you type the symbol
first, while in french-postfix, you type the letter first. Figure 1 shows some
examples of key sequences from those input methods.
Obviously, it is also possible to cancel the key sequence, if you really
need the two characters one after each other. Under a french-postfix in-
put method, th is is usually achieved by typing the symbol twice. Und er a
2
french-prefix french-postfix
+ a à a + à
+ e é e + é
, + c ç c + , ç
< + < « < + < «
/ + c
c
c + /
c
/ + o o + /
Figure 1: Examples of Quail French key sequence s
french-prefix input method, the space bar terminates the sequence after the
symbol has been typed in. This mechanism is illustrated on figure 2.
french-prefix french-postfix
+ <space> + a a a + à + a
+ <space> + e e e + é + e
/ + <space> / + c c / c + /
c
+ / c /
/ + <space> / + o / o o + / + / o /
Figure 2: Cancelling Quail French key sequences
As you can see, those key sequences are very easy to remember. The
symbols used in french-postfix do look like the corresponding accents or
cedilla which makes their use very intuitive. Doubling the symbol in order
to cancel the sequen ce is also something r ather natu r al.
2 Cognitive problems in Quail input methods
Despite their appearant simplicity, t h ose input methods actually introduce
some cognitive problems th at from time to time can make them more d if-
ficult to use that it seems at a first glance. Figure 3 s h ows some common
mistakes issued with those methods . The first one is in a french-prefix con-
text, the others are in french-postfix. Each case shows the sentence that was
obtained, and t h e o n e t h at was expected.
Obtained Expected
Lémpire contre attaque L’empire contre attaque
Sois franç ça ne marche pas. . . Sois franc, ça n e marche pas. . .
Utilisez plutôt ‘setlocalé Utilisez plutôt ‘setlocale’
/us
R
local/sr
c
xemacs /usr/local/src/xemacs
Figure 3: Common mistakes under Quail French input methods
3
As you can see, those mistakes are always related to key sequence can-
celation. What happens is that the user forgets to cancel the key sequence,
by either typing <space> (in french-prefix) or by doubling the symbol (in
french-postfix). We think that the reason for this breakage is that the con-
cept of key sequence cancelation is counter intuitive. As we have already
said, mapping symbols to accents is very easy to remember and doesn’t
raise any problem. However, cancelling a key seque n ce means that you
actually input something wrong and you must be aware of it, because af-
terwards, you will have the opportun ity to correct it. For instance, if you
want to input a “c” followed by a comma in french-postfix, you must have
in mind that you will first input a “c” cedilla and then correct it to what you
want by doubling the comma.
According to the way Quail was designed, the idea of doubling the sym-
bol to cancel a key sequence is probably the best choice that could be don e.
Which is arguable is not the cancelation method, but really the fact that
cancelation is needed. This also stands for the french-prefix me thod. As a
result, we should try to determine how an input meth od cou ld avoid can-
celation. This is the purpose of the next section.
3 Proposed solutions
In order to eliminate the need for a cancelation method, we must accept
the fact that the user can t ype two keys in different circumstances with
different ideas in mind. For instance, the key seq uence <c comma> some-
times means “give me a c cedilla”,and sometimes means “give me a c and
a comma”. We should conseq uently find w ays to make t h e input methods
underst and the different cases without requiring anything special from the
user.
3.1 Static solution: using the context
In the second example of figure 3, the situation happens to be rather simple
to correct: in French, a word cannot be ende d with a c cedilla. Conse-
quently, if a space immediately follows the c cedilla, we kn ow for sure that
the user actually wanted a c and a comma. This example shows that we
could benefit from the “context” of the key s equence, in other words, the
characters already present around the insert ion point. Quail blindly uses
key sequences, without knowing anything about the current context.
Consequent ly, the first solution we can t h ink of is defining an input
method by character-key sequences rather than by key sequences only. Con-
sider the sample specification given in figure 4. This specification means
that typing a comma when there is a c in the buffer should produce a c
cedilla. However, typing a space when there is a c cedilla in the buffer
4
{c} + , ç
{ç} + <space> c , <space>
Figure 4: character-key specification example
should turn it into a c followed by a comma and a space.
There is still something not completely satisfactory with this technique.
Namely, the fact that the c cedilla will still be generated in wrong cases ,
even temporarily. Although the user does not have to correct it by hand,
it can still be annoying to see it appear. This problem can p artially be
solved by using more than a single character as t h e context. For instance, in
French, we do not have any word cont aining the s equence <i r a ç>. Con-
sequent ly, if we add the rule “{irac} + ,
i r a c , to the input method
specification, we will get immediately the proper characters in that case.
As we can see, by extending the concept o f key sequence to the concept
of character-key sequence, which should not be very hard to implement,
the current inpu t methods could be considerably improved with respect to
the cancelation problems. However, several issues remain problematic:
There exist cases that cannot be corrected with this method. The next
subsection presents some of them.
Specifying all the possible cases similar to the “irac” e x ample would
be enormous. We cannot afford it, espe cially because the user should
still h ave the possibility to customize his key sequences.
3.2 Dynamic solution: relationship with spell checking
One of the problems that the preceding solution cannot solve is ambiguous-
ness. This means that at t h e time a character-key sequence is encountered,
it is not necessarily possible to decide which action should be taken. For in-
stance, in French, the sequence “L ’ e n i” is u n decidable, because the quot e
can really be an apo strophe, but it could also be a french-prefix sequence
for a word beginning with “L é n i” (there are some). As a consequence, it
is only possible to decide what to do after some more characters are typed
in. In our example, the next character can be sufficient. For ins tance, if it is
an “f”, the ke y sequence necessarily represents an “é”, and if it is a “v”, it
is n ecessarily a real apostrophe. Otherwise, the corresponding word does
not exist in French.
It is important to notice th at when we speak of solving ambiguousness,
what we have to do in terms of implementation is actually to look up into
a dictionary and see if such or such word does exist. This process is actu-
ally highly related to the no tion of spell checking. . . This idea of dynamic
checking can also be applied to the cases (described in the previous subsec-
tion) where the decision could be made immediately. Take again the “i r a
5
c” example described precedently. Ins tead of specifying explicitly the key
sequence as proposed in the first solution, we can also look up into a dic-
tionary to see if such a word is possible. This way, we don’t have to specify
every possible s equence in the input met h od, the decision is made based
on a generic me chanism.
4 How far should we go?
If we push even farthe r the relationship between input method and spell
checking, we can reach reach the con ce pt of “adaptive" input method and
even the concept of word completion. However, going that far in the au-
tomation of character input is not necessarily a good thing.
4.1 Adaptive input m ethods
Consider the sequence (already typed ) “C é r”. In French, it is not possible
that th is sequence is followed with an “e”. Only a “é” can happen. Know-
ing this, we can imagine that if the user actually types an “e", it could be
automatically transformed into an “é”. He re, we have just reached the con-
cept of “adaptive" input method, that is a method mixing different classes
(see section 1). Our me thod which is normally of class 2 tu r n s out to be
a class 1 metho d (k ey bindings) in that case, since the “e” k ey produces
directly an “é”.
4.2 Word completion
While we are at it, why stopping at the accents or cedilla level? Since the
input met h od can sometimes decide what is the ne x t character in a word,
it can p robably also know how to complete the whole word in some cases.
There, we have gone from the concept of input method, through the no-
tion of spell checking and finally got a word completion mechanism. This
demonstrates how far those n otions are related to each other.
So, how far should we go? Although only experiments with Mule users
could give us a definitive answer, the ideas of adaptive input me thods and
word completion are probably not good things to implement. In Human
Machine Interaction, we know that automation is good only if the user per-
ceives it as a stable behavior. It is highly probable that if an input method
sometimes transforms an “e” directly into an “é” and sometimes does not,
the us er will be annoyed rathe r than pleased, be cause it is out of his capac-
ity to remember exactly in which cases the first behavior happens and in
which cases the second one takes place. The same consideration applies for
the notion of w ord completion.
6
conclusion
In this p aper, we have described briefly the main classes of input methods,
and given a more detailed view on the Quail French ones which belong to
the second class. The concept of key sequence cancelation, although n eces-
sary given the way Quail is designed, appears to be coun ter intuitive and
is at the origin of numerous breakages in the process of inputing charac-
ters. While examining possible solutions to eliminate the need for a key se-
quence cancelation process, we have finally demonstrated that the notion
of input method, at least in the case of French, and probably for all Latin
languages, is deeply related to spell checking. If one day w e can make spell
checkers understand that “e is actually a misspelled version of “é”, then
flyspell will probably be the most efficient input met h od ever writte n .
However, we should keep in mind th at even spell checking input meth-
ods will not solve all the problems. In the example o f “/us
R
local/sr
c
xemacs”,
a broken pathname, the only way to correct it automatically wou ld be for
the machine to know that this is a pathname, and that the way it is currently
written is meaningless . How ever, nowadays, meaning recognition is still
another story. . .
7