The following online article has been derived mechanically from an MS produced on the way towards conventional print publication. Many details are likely to deviate from the print version; figures and footnotes may even be missing altogether, and where negotiation with journal editors has led to improvements in the published wording, these will not be reflected in this online version. Shortage of time makes it impossible for me to offer a more careful rendering. I hope that placing this imperfect version online may be useful to some readers, but they should note that the print version is definitive. I shall not let myself be held to the precise wording of an online version, where this differs from the print version. Published in A. Rubio et al., eds., Proceedings of the First International Conference on Language Resources and Evaluation, Granada, 28–30 May 1998, pp. 1279–82. |
Consistent Annotation of Speech-Repair Structures
Geoffrey
Sampson
University of Sussex
Abstract
The CHRISTINE project is extending the
rigorously-specified SUSANNE scheme of grammatical annotation into the domain
of spontaneous spoken language.
The aim of the SUSANNE/CHRISTINE annotation scheme is to be fully
informative but at the same time fully predictable, so that for any real-life
linguistic usage the scheme permits one and only one analysis. Predictability is specially difficult
to achieve in the domain of speech-repair structures. The paper surveys a range of recurrent repair types, each of
which forces the analyst to make unmotivated choices between alternative
interpretations which would receive distinct representations in any annotation
scheme devised to date.
Introduction
As Jane
Edwards of the University of California has put it (Edwards, 1992: 139), “The
single most important property of any data base for purposes of
computer-assisted research is that similar instances be encoded in
predictably similar ways”. Statistics are meaningless, if they are
drawn from a database in which the same phenomenon is coded now this way, now
that. In the field of grammatical
annotation, this principle has been neglected. Many alternative lists of grammatical categories are extant,
but often these are not backed up by detailed, rigorous specifications of
logical boundaries between the categories: an annotation scheme may specify how to draw a parse tree
with richly informative node-labels for clear, “textbook” example sentences,
while leaving it quite ambiguous how the scheme should be applied to messy
real-life language.
Our SUSANNE scheme (Sampson, 1995; www.grsampson.net/RSue.html)
attempted to fill this gap, focusing mainly on the grammar of edited written
English. The 500 pages of the
published scheme aim to define a uniquely predictable structural analysis for
anything occurring in real-life usage.
The SUSANNE scheme has been winning a measure of international
recognition: “the detail of [the
SUSANNE] annotation is unrivalled” (Langendoen, 1997: 600); “impressive … very
detailed and thorough” (Mason, 1997: 169, 170); “meticulous treatment of
detail” (Leech & Eyes, 1997: 38).
Other research groups may prefer to use different lists of grammatical
symbols in creating treebanks, but the usage statistics they compile will be of
questionable value unless the logical boundaries between those symbols are
defined with respect to the multifarious issues that are treated explicitly in
the SUSANNE scheme.
My current CHRISTINE project (http://www.grsampson.net/RChristine.html) is extending this undertaking to cover the
structural annotation of spoken English, particularly spontaneous, informal
spoken English (and, as a by-product, it is generating a 200,000-word annotated
corpus of English speech). We are
using several sources of data, the chief source being the speech subsection of
the British National Corpus (http://info.ox.ac.uk/bnc/), which to my knowledge is unrivalled as a
demographically-balanced “fair cross-section” of recent British speech (even if
it is not always above reproach with respect to quality of transcription). When complete, the CHRISTINE Corpus
will be made freely available by electronic means, legal constraints
permitting, as the SUSANNE Corpus already is.
In the domain of spontaneous
speech, one significant issue is that of rules for consistently annotating speech
repairs – stretches of
wording in which a speaker begins to realize one grammatical plan, but breaks
off and either starts afresh or continues in conformity to a different
plan. (The term speech
management phenomena is
sometimes used in a similar though broader sense.) Speech-mediated man-machine interaction is nowadays seen as
a key future technology, and for this purpose the automatic recognition and
analysis of speech repairs will be a crucial technique; so it is very desirable
to develop standardized, predictable schemes for registering repair structures
in corpus data.
The present paper surveys some of
the problems that confront any attempt to make such schemes rigorous. All examples are excerpted from the
British National Corpus, and are identified by the three-byte BNC file name
followed after a full stop by five digits representing the BNC “s-unit” number,
supplemented with leading zeros as necessary. Notations of the SUSANNE/CHRISTINE scheme are explained as
needed in the discussion of individual examples; and the notations are displayed
in only as much detail as is relevant to the points under discussion. (The full CHRISTINE grammatical labels
in the examples quoted below are often considerably more informative than is
shown in the present paper.)
Approaches to Repair Annotation
When the
SUSANNE annotation scheme was developed, the most fully-worked-out and
empirically-founded existing approach to annotating speech repairs was that of
Howell & Young (1990, 1991), based on Levelt (1983). This approach identified, within a
stretch of wording that includes a repair, a set of up to nine “repair
milestones”: for instance, the
point at which the original grammatical plan is abandoned (the interruption
point), and the (earlier)
point marking the beginning of the stretch of wording that is destined to be
replaced by new wording after the interruption point.
However, Sampson (1995: 448ff.)
found that the Howell & Young approach was in one respect too limited, and
in another respect too rich, to be satisfactory for NLP research purposes. A speech repair will typically be
embedded within a larger structure which in other respects may be grammatically
well-formed; the Howell & Young scheme contained no proposals for showing
the relationship between the repair and the structure containing it, an issue
that may not have been significant for their purposes but which is crucial in
an NLP context. In this respect,
then, the scheme was unduly limited as it stood. Arguably it was unduly rich, on the other hand, in terms of
the range of different milestones used to characterize the progress of a speech
repair. When a speaker abandons
one grammatical plan for another, the moment when the first plan is abandoned normally
corresponds to a real event – the speaker is likely to be conscious of making a
change, and he will often produce audible hesitation phenomena. On the other hand, even if the wording
after the interruption point replaces a specific stretch of the wording before
that point, as in e.g.:
so I’ll be I’ll come down
KCJ.01017
where the
wording from the second I’ll onwards clearly replaces I’ll be, the beginning of that stretch (in the example, the
transition between so
and I’ll) corresponds
to no actual event (when the speaker said so I’ll, he was speaking continuously and surely
did not anticipate that he was moving into a stretch of wording that was fated
to be replaced).
In this example, the transition
between so and I’ll can at any rate logically be identified as
initiating the reparandum,
in Levelt’s terminology, in the sense that excising I’ll be leaves a sequence so I’ll come down which is perfectly well-formed and
plausible as an expression of what the speaker ultimately wanted to say. But, in our experience, it is often an
artificial exercise to try to identify the beginning of a reparandum; wording
following the interruption point does not always replace a specific stretch of
wording preceding that point, such that cutting out that stretch would leave a
fully felicitous utterance. Howell
& Young (1990) claim that Levelt’s scheme shows promise when applied to
their data, which consist of dictaphone recordings; but this is an unusual
hybrid genre of language, in the sense that the speaker’s intention is not
merely to convey his message but to create good written prose for a secretary
to convert to typescript. Despite
the fact that this might be expected to make their data more “polished” than
spontaneous speech, Howell & Young’s use of Levelt’s scheme nevertheless
often seems to require them to make quite complex unmotivated decisions between
alternative analyses.
Levelt developed his speech-repair
annotation system in the context of a psycholinguistic theory which attempted
to make predictions about what types of repair do and what types do not occur
in practice. It is accordingly no
criticism of his system that it makes strong assumptions about repair
structures; scientific theories ought to be strong, i.e. highly testable. But for the more engineering-oriented
purposes of NLP data compilation, it is desirable to use annotation schemes
which make weaker assumptions and hence can be readily adapted to represent
whatever repair structures turn out to be found in practice, so that data can
be registered here and now without waiting for the ultimate psycholinguistic
theory of speech repairs to be formulated; data annotated in terms of a general
scheme could be used, among other things, as evidence for and against such
psycholinguistic theories.
Accordingly, the SUSANNE scheme
proposed a modified approach for annotating speech repairs, based mainly on
identifying the interruption point, which is represented as a terminal symbol
(shown here as “#”) attached as a daughter of the lowest node in a parsetree
such that the wording before # and (if any) the wording after # can both be
interpreted as (partial or complete) attempts to realize that node. Illustrating via a straightforward
example, the sequence
when his <pause> he was with his daughter I said you
give me my bloody keys
… KCT.10662
is analysed
(showing relevant labelled bracketing only) as:
[S [Fa when [N his N] # <pause> [Nas he Nas] [Vsb was Vsb] [P with his daughter P] Fa] I said you give me my bloody
keys … S]
– the # node
is shown as a daughter of the adverbial clause (Fa) tagma, because both when
his … and (when) … he
was with his daughter are
successive attempts to produce an adverbial clause, which as a whole fits
normally into the superordinate main clause (S) as its first constituent. (The symbols N, V, and P stand
respectively for noun phrase, verb group, and prepositional phrase; they are
supplemented in some cases by subcategory letters giving more detailed
classification, for instance he is labelled Nas, being morphologically marked as subject and
singular.) Although his appears to have been intended as the first
word of a noun phrase that was not completed, we do not attach the symbol # as
daughter of a single noun-phrase tagma [N his # he], because it is not plausible that his was the start of a noun phrase that would have been
followed by the predicate was with his daughter – the interruption point marks a change of tack for
the whole adverbial clause, not just for its subject.
Thus a speech database
incorporating this type of annotation does show how a repaired stretch fits
grammatically into a wider grammatical structure; and it permits automatic
retrieval of most properties that actually apply to a typical speech repair,
while so far as possible it avoids the need for analysts to create pseudo-facts
that correspond to no realities in the repairs themselves but only to
artificial demands of the notation.
The annotation could be described as “minimalist”; a notation which
slimmed down Levelt’s rich apparatus of repair milestones even further than
this might scarcely deserve to be called a system of repair annotation at all.
Problematic Issues in Repair Annotation
The
CHRISTINE project is testing and improving the definition of this approach by
confronting it with real-life data.
The remainder of the present paper uses BNC examples to survey and
classify various problematic issues that emerge from this undertaking. The paper is frankly concerned with the
examination and classification of data, rather than with propounding theories
or algorithms. At this stage of
our understanding of speech-repair phenomena, investigating the data in an
empirical spirit is what is needed.
The overall aim of the SUSANNE and
CHRISTINE annotation scheme is to indicate explicitly as much detail of the
structure of language as is practical, but to do so in a predictable way, so
that two analysts equipped with copies of the scheme and given the same
language sample must produce identical annotations. The scheme tries always to provide fallback notations which
avoid forcing the analyst to make decisions beyond the evidence available; as a
simple case, the noun phrases the man and the men are labelled Ns and Np, being morphologically marked
as singular and plural respectively, but the fish is labelled just N. Consequently, the speech-repair
annotation system is perceived as problematic in cases where it creates
alternative ways to annotate the same stretch of wording, and where there is no
convenient way to define a neutral fallback notation. Despite the “slimming down” of Levelt’s notation that has
been applied in creating the CHRISTINE scheme, quite a number of such problems
do still arise. The examples
listed below are classified by types of analytic ambiguity.
Repair v. Co-ordination
Grammatical
annotation even of edited written prose must provide for various types of
co-ordinate structure, not all of which (apposition, asyndetic co-ordination)
are overtly marked by conjunctions.
It can then be difficult or impossible to distinguish between repairs in
which one tagma is replaced by another, and co-ordinations in which both tagmas
are intended. Thus:
yeah but he don’t get done he doesn’t
get done that is the problem KD6.03060
she can’t be much cop if she’d open her
legs to a first date to a Dutch s- sailor
KSS.05002
In the KD6
example, he doesn’t get done could be intended to replace the nonstandard verb form in he don’t
get done, implying a
structure [S+ but he don’t get done # he doesn’t get done S+]; alternatively the speaker may intend the
near-repetition as an appositional reiteration to add emphasis, implying (in
SUSANNE notation) a structure [S+ but he don’t get done [S@ he doesn’t get done S@] S+]. In the KSS case, to a Dutch s- sailor could be intended as additional
information in apposition to the preceding phrase to a first date, though this might be regarded as a rather
literary structure in the context of informal speech; alternatively, the
structure may be a speech repair in which to a Dutch s- sailor is substituted for to a first date as the true ground for criticism (though
this might raise questions about the speaker’s scale of relative social
heinousness).
Repair v. Interpolation
Interpolation,
whereby a structurally independent tagma is inserted into the middle of a
construction that would be complete without it, is normal in written prose; it
is often marked orthographically by brackets. Since speech exploits almost all constructions found in
writing, it would be arbitrary not to recognize interpolations in the spoken
language, but they can be very difficult to distinguish from repairs. E.g.:
that’s a certainty but I don’t know what
whether to make one or not I don’t know what to do
KSS.04846
but he he the thing is he just thumps
them KD6.03085
In the KSS
example, it is clear that the speaker makes two attempts to produce a clause
after but consisting of
I don’t know followed
by an indirect question (Fn?). The
whether clause might be
a well-formed interpolation (I) inserted between these attempts, or
alternatively what whether to … could be a repair: the
analysis might be either of the following:
[S+ but I don’t know [Fn? what ] # [I whether to make one or not ] I don’t know [Fn? what to do ] S+]
[S+ but I don’t know [Fn? what # whether to make one or not ] # I don’t know [Fn? what to do ] S+]
In the KD6
case, the thing is
might be treated as an interpolation inserted into a succession of attempts to
realize a subject pronoun, or alternatively the thing is he just thumps them might be a repaired version of a sentence
which was initially planned to run simply but he just thumps them.
Repair v. Nonstandard Usage
Real-life
spontaneous speech of the kind assembled in the BNC contains a large variety of
nonstandard turns of phrase. In
the case of some well-established and widely-discussed regional variants – say,
the omission of definite articles in the speech of East Yorkshire – analysts
can reasonably be expected to understand the wording in terms of the dialect
rules governing a particular speaker’s usage. But the spectrum of English varieties contains far more
grammatical variants than any analyst will be aware of; and oddities of usage
may often be one-off performance deviations not referrable to any rule,
standard or nonstandard. Consider:
I can remember on erm <pause> the other day <pause> I <pause> I accidentally dropped
the helicopter on the floor KCA.02691
like I done with the house <pause> I put different parts for
there KCA.02674
In standard
English, the other day,
or on Tuesday, can
function as Time adjuncts, but on the other day cannot.
Should erm in
the former example be taken as marking an interruption point where a planned
construction such as on Tuesday was abandoned in favour of the other day, or should on the other day be treated as a nonstandard but intended
phrase, without repair? In the
latter example, for there
seems not to represent any standard English usage, but it might consist of some
standard prepositional phrase abandoned after the preposition and replaced by there, or alternatively for there may be well-formed in the speaker’s
dialect.
Verb groups are one area
particularly rich in repair/nonstandard-usage ambiguities. In you’ve can’t make a a wedding
cake … KSS.04868, it is tempting to say that +’ve
can’t is so deviant a
sequence, and the regular patterns of English verb groups are so well
established, that the example must represent a repair, with an interruption
point falling between +’ve
and can’t. This may be correct; but in the same
dialogue alone, the same speaker (female, age 72, Salvation Army, Lancashire)
also says it’s be
KSS.04811 and don’t you pushing your nose KSS.05009, and another speaker (male, 45, unemployed,
Lancashire) says she wents into town with me KSS.04989.
Other BNC files contain a great diversity of nonstandard verb
groups. Not all of these forms can
plausibly be analysed as speech repairs, so perhaps none should be. (In the last example, wents may represent nonstandard phonology rather
than grammar, namely the affrication characteristic of Liverpool speech; but
the other examples seem to require an account in terms of wording rather than
pronunciation.)
A situation which does not involve
“repair” as such, but is conveniently included in this section, is exemplified
by:
oh she was shouting at him at dinner
time … oh god dinner time she was shouting him KD6.03154
The second
clause at first sight appears to contain an unrepaired omission of at, required in standard English with the
target of a shout (and included in the first clause); the present writer, for
one, was not previously aware of any English dialect in which the person
shouted at occurs as direct object of shout.
However, from other examples in this BNC file it becomes clear that this
particular speaker regularly uses shout in that way, so that no omission should be marked in
the second clause; the first clause presumably represented the speaker
temporarily shifting towards the standard.
How Much Omitted?
Repairs
often occur when a first attempt at expressing an idea accidentally omits
linguistic forms which are needed for the adequate or correct expression of the
idea. Even a relatively austere
repair annotation scheme, such as the CHRISTINE scheme, can then find itself
forced to answer unanswerable questions about how much has been omitted. Consider:
if that if it was the drama teacher that
said that I’m gonna write to her KD6.03122
it’s a big mistake when they let <anonymized name> in into that
school KD6.03100
In the
former case, there are clearly two attempts at an if clause, with an interruption point between
that and the second if; but that preceding this interruption point might have been
intended as subject of the first attempt (the role eventually filled by it), or it might be the same that which eventually occurs as object of said (implying that the first attempt at the if clause was abandoned because of a gross
omission). No doubt there are
other possibilities. In the second
example, in may be an
attempt at into which
was abandoned and restarted after the first syllable (a common repair pattern)
– in which case the repair is at word level; or in may be intended as a complete adverb, in
which case the repair is at phrase level:
[P [ in # into ] [N that school ] ], or [R#P in # into [N that school ] ].
(The label R#P indicates a repaired structure which begins as an
adverbial phrase and ends as a prepositional phrase.)
Intentional Discontinuity
Grammatical
discontinuity is normally taken to be a performance deviation from the
competence rules (of the standard language, or of a nonstandard dialect). Perhaps surprisingly, one not uncommon
pattern in CHRISTINE data is that discontinuity is used intentionally to achieve
a particular communicative effect.
In:
and he takes the mickey out of him which
okay then he called him … KD6.03054
the most
plausible interpretation of which okay has it saying, in effect, “I am not going to complete
the relative clause I have initiated with which, but, were I to do so, that clause would amount to a
concession of the issue just raised”.
It is not clear whether the concept “speech repair” can appropriately be
extended to intentional discontinuities; but, without that concept, it is
difficult to see how a structural annotation could indicate what is going on in
such an example.
Conclusion
The main
purpose of this paper is not to point out that patterns such as those listed
above occur frequently in real-life speech. In itself, that observation would be fairly trivial. The aim, rather, is to draw attention
to the problems that such repair phenomena create for the enterprise of
devising a structural annotation system which is both informative and
predictable.
Any grammatical annotation scheme,
even one devised for edited written language, will inevitably encounter
sporadic examples which force the analyst to guess between alternative
acceptable analyses. But a good
annotation scheme should not systematically require repeated guesswork with
respect to some aspect of structure:
it should permit the analyst to specify only what there is positive
evidence for in the words spoken.
This aim is particularly difficult to achieve in the area of speech
repairs. Yet consistent annotation
standards are as necessary in the domain of speech repairs as they are in the
domain of “grammar” in the strict sense, if NLP researchers are to develop
systems capable of handling spontaneous speech.
Acknowledgment
The research reported here was supported by grant R000 23 6443, “Analytic Standards for Spoken Grammatical Performance”, awarded by the Economic and Social Research Council (UK).
References
Edwards, Jane A. (1992). Design principles in the transcription of spoken discourse. In J. Svartvik (Ed.), Directions in Corpus Linguistics (pp. 129–144). Berlin: Mouton de Gruyter.
Howell, P. & Young, K. (1990). Speech repairs: report of work conducted October 1st 1989–March 31st 1990. Department of Psychology, University College London.
Howell, P. & Young, K. (1991). The use of prosody in highlighting alterations in repairs from unrestricted speech. Quarterly Journal of Experimental Psychology, 43A, 733–758.
Langendoen, D.T. (1997). Review of Sampson (1995). Language, 73, 600–603.
Leech, G.N. & Eyes, Elizabeth. (1997). Syntactic annotation: treebanks. In R.G. Garside et al. (Eds.), Corpus Annotation (pp. 35–52). Harlow: Longman.
Levelt, W.J.M. (1983). Monitoring and self-repair in speech. Cognition, 14, 41–104.
Mason, O. (1997). Review of Sampson (1995). Inter-national Journal of Corpus Linguistics, 2, 169–172.
Sampson, G.R. (1995). English for the Computer: the SUSANNE Corpus and Analytic Scheme . Oxford: Clarendon Press (Oxford University Press).