logo

MATE Deliverable D1.1

Supported Coding Schemes

 

TEI

Chapter 11 of the Text Encoding Initiative Guidelines [Sperberg 94] discusses the transcription of spoken language. Since the main aim of this standardization effort concerns written texts, the guidelines presented in this chapter are oriented towards the transcription of speech as a text enriched with a set of conventions for phenomena that can not be adequately described with standard spelling. TEI Guidelines on spoken texts were mainly the result of work carried out within a subgroup composed of Stig Johansson -chair-, Jane Edwards and Andrew Rosta [Johansson 95a, b].

More information on the TEI can be found at:

http://etext.virginia.edu/TEI.html

 

Coding book:

Chapter 11 of the Text Encoding Initiative Guidelines [Sperberg 94] is the basic reference manual to apply the TEI conventions to the transcription of prosody.

Information about the Text Encoding Initiative Guidelines can be found at http://www.uic.edu/orgs/tei/. There is also a ftp site where documents about TEI are available: ftp-tei.uic.edu (in the "pub/tei" directory).

Applications:

The TEI web page includes a list of 63 projects using TEI Guidelines for text annotation. This list is available at:

http://www-tei.uic.edu/orgs/tei/app/index.html.

The list contains references to some projects involving the annotation of dialogues by means of TEI. Some of these are:

- Danish Spoken Language Dialogue Systems Project

(http://www.cog.ruc.dk/projects/Dialogue/user-95)

- Chiba Corpus of Map Task Dialogues in Japanese

(http://cogsci.L.chiba-u.ac.jp/MapTask)

- Edinburgh Map Task Corpus

(http://www.cogsci.ed.ac.uk/elsnet/Resources/Map-Task/ mt_corpus.html)

 Evaluation:

Information not available.

 Purpose and underlying approach:

The scheme is intended to enhance TEI conventions concerning written text with labels for prosodic phenomena that can not be adequately described with standard spelling.

List of phenomena annotated:

Prosodic boundaries:

TEI conventions allow to indicate tone units or intonational phrase boundaries by means of the elements <seg> (beginning of a unit) and </seg> (end of a unit).

Prosodic phenomena:

1) Stress

The stressed syllable can be indicated using the label &stress, after the stressed syllable.

2) Rhythm

A set of labels is proposed to specify different types of rhythm:

rh

beatable rhythm

arrh

Arrhythmic

spr

spiky rising

spf

spiky falling

glr

glissando rising

glf

glissando falling

Phonetic cues of prosody:

1) Duration

TEI includes one symbol to indicate the extra lengthening of syllables:

:

lengthened syllable

2) Pauses

The presence of a pause is indicated with the element <pause>

3) Tempo (speech rate)

TEI proposes a set of symbols to transcribe tempo:

a

allegro (fast)

aa

very fast

acc

accelerando (getting faster)

l

lento (slow)

l l

very slow

rall

rallentando (getting slower)

4) Loudness

TEI also proposes a set of symbols to transcribe different degrees of loudness:

f

forte (loud)

ff

very loud

cresc

crescendo (getting louder)

p

piano (soft)

pp

very soft

dimin

diminuendo (getting softer)

5) F0 events

5.1. F0 contours

The following set of symbols is defined in the TEI conventions to transcribe pitch patterns (contours):

.

low fall intonation

,

fall rise intonation

?

low rise intonation

!

rise fall intonation

5.2. Global F0 events

Variations in pitch range can be transcribed using the TEI conventions using the following set of labels:

high

high pitch range

low

low pitch range

wide

wide pitch range

narrow

narrow pitch range

Global falling or rising intonation can be transcribed using the following labels:

Asc

Ascending

Desc

Descending

Monot

Monotonous

Scand

scandent (each succeeding syllable higher than the last, generally ending in a falling tone)

6) Voice quality

The following set of labels is proposed to indicate voice quality

whisp

Whisper

breath

Breathy

husk

Husky

creak

Creaky

fals

Falsetto

reson

Resonant

giggle

unvoiced laugh or giggle

laugh

voiced laugh

trem

Tremulous

sob

Sobbing

yawn

Yawning

sigh

Sighing

Some critique

[Llisterri 96a]:

"[Payne 92:51ff] mentions the lack of development of guidelines for encoding prosody in the TEI scheme and discusses some inconsistencies of the statements about prosody in the TEI Guidelines. The favoured solution would be to incorporate basic prosodic information in the orthographic transcription and to use a fundamental frequency tracing aligned with the text in cases where a detailed prosodic analysis is needed. Tone units: Although an easy conversion can be made between French's boundary markers and TEI tags delimiting tone units, [Payne 92] notes the difficulties of transcribing melodic contours with TEI conventions. Tonic syllables: TEI Guidelines do not provide an indication of tonic syllables as straightforwardly as in French's system. As [Payne 92:55] points out if the tonic syllable is going to be marked, it should be marked in the orthographic transcription, and the TEI Guidelines should be extended to provide a way of doing this in a straightforward manner. Tones: [Payne 92:56] suggests the extension of TEI Guidelines to allow distinguishing tones as in French's conventions; such an extension could be based in different specifications for the tag <syllable>. Prominent non-tonic syllables: Prominent non-tonic syllables are marked in French's system, but no provision for such feature is found in the TEI Guidelines. Speech management: TEI has no specific guidelines for the transcription of disfluency phenomena, recommending transcription using IPA or other systems of phonemic transcription. On the other hand, French's conventions, adopted by NERC, are much more specific and deal with different phenomena not covered by TEI, such as guessed or unintelligible fragments."

Examples:

Information not available.

Markup language:

TEI conventions have been defined using SGML as markup language. This is one of the advantages of this transcription scheme.

Annotation tools:

Information not available.