MATE Deliverable D1.1

Supported Coding Schemes



The TILT model has been proposed by Taylor [Taylor et al. 94] as a way of representing intonation, oriented both to speech synthesis and to intonation analysis. The model provides linguistics labels and quantitative parameters.

Coding book:

There is no coding book, instruction was given orally.

But see [Taylor 97] on http://www.cstr.ed.ac.uk/~pault/papers.html.


TILT has been applied to the prosodic transcription of the Canadian DCIEM Maptask Corpus ([Bard et al. 95], http://www.cogsci.ed.ac.uk/hcrc/wgs/dialogue/dialog/maptask.html), the Boston Radio News Corpus [Ostendorf et al. 95] and the Switchboard Corpus [Godfrey et al. 92].

The annotators were 5 PhD students on intonation. The labelled material consisted of:

Evaluations of scheme:

Labelling consistency between the 5 labellers was tested with pairwise comparisons of their transcriptions [Taylor 97]. For each ordered pair of transcriptions, assuming the first as a reference, the correctness (number of events correctly identified) and accuracy (correct minus the percentage of false insertions) of the second was evaluated. The average correctness and accuracy were 81.6% and 60.4%, respectively. When ignoring minor accents, the average scores were 88.6% and 74.8%.

Manual labelling with TILT can be done using any suitable system which allows you to see a waveform and mark events at particular times. It has been noted that "labelling tilt events is much easier than labelling ToBI parameters" [Dusterhoff et al. 97].

Purpose and underlying approach:

The TILT model has been proposed by Taylor [Taylor et al. 94] as a refinement of its previous Rise/Fall/Connection model [Taylor 94] for a representation of intonation oriented both to speech synthesis and to intonation analysis. It defines a reversible function linking the f0 curve to its linguistic representation, providing means to automatically derive the representation from the curve and viceversa.

The f0 curve is seen as a sequence of intonational events, each linked to a syllabic nucleus. Events can be pitch accents or boundary tones. Each event is a movement in the f0 curve - a rise, a fall or a combination of both - which is described by:

List of phenomena annotated:

The labelling scheme is intended to represent the f0 curve. Labels are associated to intonational events, which can be accents or boundary tones.

Events can be automatically detected on the basis of f0 and energy information, or can be manually labelled and aligned to the signal. The quantitative description of the event (starting f0 value, duration, amplitude, tilt) is automatically derived from the f0 curve.

For manual labelling the following labels are defined:






major pitch accent


failing boundary


rising boundary


accent+falling boundary


accent+rising boundary


minor accent


minor accent+falling boundary


minor accent+rising boundary


level accent


level accent+rising boundary


level accent+falling boundary


Example xlabel file (segmentation file in the environment Entropics\xwaves).

The position of the accents and boundary tones was decided by the humans, and the numbers after "tilt:" were calculated automatically with reference to the F0 contour:

0.69333 26 c; tilt: 118.984

0.74000 26 sil; tilt: 0.000

0.86116 26 a; tilt: 118.984 17.020 0.121 1.000 0.000

1.13170 26 c; tilt: 136.004

1.28200 26 a; tilt: 112.858 0.311 0.150 -1.000 0.000

1.57008 26 c; tilt: 112.547

1.81056 26 afb; tilt: 107.899 13.914 0.240 -1.000 0.000

1.90001 26 sil; tilt: 93.985

Markup language:

Labelling is realized in the form of a segmentation file, each intonational event on a separate line, specified by a time coordinate, an ASCII label and a set of numeric values.

Annotation tools:

Manual annotation could be done using Entropics xwaves or any similar system.

An automatic event detector, based on HMM, is available [Taylor 97].

For each event, the quantitative parameters (duration, amplitude, tilt) are automatically computed from the f0 curve.