logo

MATE Deliverable D1.1

Supported Coding Schemes

 

VERBMOBIL

Different coding schemes for prosody have been used in the VERBMOBIL Project. Here a Perceptual Scheme and a Syntactic-Prosodic Scheme are reviewed.

Information about prosodic labelling in VERBMOBIL can be found at:

http://sbvsrv.ifn.ing.tu-bs.de/prosody/verbmobil.html

Coding book:

M.Reyelt and A. Batliner, "Ein Inventar prosodischer Etiketten fur VERBMOBIL", Verbmobil Memo 33, 1994 [Reyelt et al. 94]

Applications:

Perceptual Scheme:

33 dialogues (about 2 hours)

480 sentences (20% of PHONDAT database, read sentences) - 5 annotators

Syntactic-Prosodic scheme:

7286 turns (about 150,000 words) - one annotator

Evaluations of scheme:

An evaluation has been performed on the PHONDAT material showing that utterances were "rather consistently labelled even by untrained listeners" [Reyelt 93]. The material consisted of sentences by 8 speakers. Labellers were 5.

Inter-labeller agreement (on 8 different speakers, min-max):

Some opinions about the Perceptual Scheme (see (http://www.ims.uni-stuttgart.de/phonetik/joerg/stockholm/bserlmu.html): "the labelling of intonation is still difficult for the transcribers" . "Most partners in Verbmobil use only the functional and the break index tier". "The functional tier has several advantages:- it contains information usable for focus-analysis- it makes the two decisions (is a word accented? if, which pitch accent?) more transparent for the transcribers".

The Syntactic-prosodic Scheme has been judged easier, faster and more reliable [Batliner et al. 96]

Purpose and underlying approach:

The aim of the German VERBMOBIL project (http://www.dfki.de/verbmobil/) is to develop a system for automatic speech-to-speech translation in appointment scheduling dialogues. Prosody is studied in the project both in itself and as a cue to dialogue segmentation and to enhance syntactic parsing, to classify dialog acts, etc.

Different studies and experiments involving prosody have been conducted in the VERBMOBIL framework, where prosody has been represented in its acoustic aspects [Batliner et al. 97] or in its syntactic function [Batliner et al. 96]. The reference coding scheme for (auditory) prosodic labelling was developed at Braunschweig University, with the aim of providing a scheme usable by several project partners for a variety of purposes and usable also by transcribers with only little experience in prosodic labelling.

List of phenomena annotated:

The Perceptual Prosodic Labelling scheme represents phrasing, accents and intonation contours at a phonological level. For intonation contours, it uses a ToBI-like inventory consisting of H and L tones, while its more specific feature is a more abstract functional tier where prominence relations are explicitly marked in order to be more easily related with focus and discourse structure.

The label inventory splits into three tiers:

functional tier: main accent, secondary accent, emphasized/contrastive accent, sentence modality

break index tier: (full) intonation phrase boundary, minor boundary, irregular boundary

tone tier: pitch accents and boundary tones

Functional tier:

?

Question mark (several question types are labeled)

PA

Main accent (in each intonational phrase the most prominent word is labelled)

NA

Secondary accent (all other accents are secondary accents)

EK

Emphasized or contrastive accent

Break index tier:

B1

Normal word boundary

B2

Minor (intermediate) phrase boundary

Weak intonational marking

B3

Full intonational phrase boundary

Strong intonational marking with or without lengthening or change in speech tempo

B9

Irregular boundary

Marks disfluencies at hesitations, repairs, etc.

Tone tier (ToBI-like, with additional distinctions for labelling spontaneous speech):

Accents

H*

normal peak accent

L+H*

medium (or raised') peak. Starting with a low tone before the accented syllable the f0 rises to a high peak within the syllable.

L*+H

delayed peak, a H* accent that reaches high f0 in the syllable behind the accented one

L*

trough accent. Can be rising when followed by a H-H\% boundary.

!H*, L+!H*, L*+!H

downstepped accents

H+!H*

early peak. Fall before the accented syllable, often followed by a low boundary.

Boundaries

L-L%

terminal fall, boundary reaches the lower end of the speaker's pitch range

H-H%

question/continuation rise, high boundary reaches the high end of the speaker's pitch range

L-H%

low phrase tone with a pitch rise to mid or high level

H-L%

continuation fall (a fall from high to mid pitch, or a more level boundary at mid-high pitch)

Syntactic-prosodic scheme:

A different coding scheme has also been developed and used in the VERBMOBIL Project [Batliner et al. 96], more syntax-oriented labelling.

 

M3S

main/subord. clause

M3P

non sentential free element/phrase, elliptic sentence

M3E

extraposition

M3I

embedded sentence/phrase

M3T

pre/post-sentential particle with pause/breathing

M3D

pre/post-sentential particle without pause/breathing

M3A

syntactically ambiguous

M2I

constituent, marked prosodically

M1I

constituent, not marked prosodically

M0I

every other word

Examples:

Information not available.

Annotation tools:

A workstation for prosodic labelling is developed at Braunschweig University, including software for visualization and labelling (fish), using Tcl/Tk, resynthesis of the original speech signal with variation of F0 (according to the labelled pitch accents, for acoustic verification of the transcription), evaluation of the labelling, automatic pre-segmentation of word boundaries, potential phrase boundaries prediction.

Automatic labelling systems based on Multi-Layer Perceptrons have been implemented both for the Perceptual Scheme and the Syntactic-Prosodic one [Batliner et al. 96, 97].