MATE Deliverable D1.1

Supported Coding Schemes



ToBI (Tones and Break Indices) was proposed in 1992 [Silverman et al. 92] by "a group of researchers with expertise in a variety of approaches to prosodic analysis and speech technology" with the aim of defining a notational system, "analogous to IPA for phonetic segmentation", that could become a "standard for prosodic transcription of most varieties of American English".


A description of the ToBI system is available at:


Coding book:

ToBI creators have developed two labelling guides (Beckman & Ayers, 1994; Beckman & Hirschberg, 1994).

They are available at:


Although primarily developed for English, it has been used also to transcribe intonation events of English dialects [Mayo et al. 97] or other languages such as Italian [Grice et al. 95 b] or German [Grice et al. 95a].

ToBI has also been integrated, with adjustments and ehancements, into other transcription systems, such as VERBMOBIL [Reyelt et al. 94] or the Stuttgart System [Mayer 95] (see http://www.ims.uni-stuttgart.de/phonetik/joerg/labman/STGTsystem.html)


An evaluation of the performance of ToBI is presented in [Pitrelli et al. 94].

The German version GToBI has been evaluated in [Grice et al. 96]

Purpose and underlying approach:

ToBI is an adaptation of Pierrehumbert's phonological model of English intonation [Pierrehumbert 80].

[Llisterri 96a]:

"In the domain of prosodic transcription systems to be used in speech research and in speech technology, ToBI (Tone and Break Index Tier was developed to fulfill the need of a prosodic notation system providing a common core to which different researchers can add additional detail within the format of the system; it focuses on the structure of American English, but transcribes word grouping and prominence, two aspects which are considered to be rather universal [Price 92].

As described by [Silverman et al. 92] the system shows the following features: (1) it captures categories of prosodic phenomena; (2) it allows transcribers to represent some uncertainties in the transcription; (3) it can be adapted to different transcription requirements by using subsets or supersets of the notation system; (4) it has demonstrated high intertranscriber agreement; (5) it defines ASCII formats for machine-readable representations of the transcription; and (6) it is equipped with software to support transcription using Waves and UNIX programmes.

A ToBI transcription for an utterance consists of symbolic labels for events on four parallel tiers: (1) orthographic tier, (2) break-index tier, (3) tone tier and (4) miscellaneous tier. Each tier consists of symbols representing prosodic events, associated to the time in which they occur in the utterance. The conventions for annotation according to TOBI are defined for text-based transcriptions and for computer-based labeling systems such as Waves."

ToBI is based on a phonological model of English intonation, but several attempts have been made to extend it to other languages (and English dialects), by means of additions and adjustments. Criticism has been raised against it, see for example [Nolan et al. 97]

List of phenomena annotated:

ToBI system has been conceived for the transcription of intonation phenomena and prosodic boundaries. There are no existing symbols for the transcription of the phonetic cues of prosody. Boundaries and tones are represented in separate tiers, aligned with the text by means of temporal coordinates.

Prosodic boundaries

Prosodic boundaries are annotated in ToBI by means of the Break Indices:


clitic group boundary


word boundary


boundary with no tonal mark


Intermediate Phrase boundary


Intonative Phrase boundary

Prosodic phenomena

ToBI provides a set of symbols for the transcription of intonation phenomena: pitch accents, phrase accents and boundary tones. Such symbols are associated with the accented syllable and with phrases, respectively. They can be time-aligned with f0 peaks and valley.

1.1. Pitch accents


peak accent (high pitch accent)


low accent (low pitch accent)


scooped accent


rising peak accent


downstepped accent

1.2. Boundary tones


final low boundary tone


final high boundary tone


initial high boundary tone

1.3. Phrase accents


low phase accent


high phase accent

ToBI also provides one symbol for the transcription of downstep:




Using the transcriber tool and xwaves, a series of files are created during the transcription process which contain the information related to the different tiers. The following are examples of files containing the transcription of the utterance 'Show me the cheapest fare from Philadelphia to Dallas excluding restriction" (obtained from the TOBI-TRAINING material):


Orthographic tier:

signal cheapest2

type 1

color 123

font -*-times-medium-r-*-*-17-*-*-*-*-*-*-*

separator ;

nfields 1


2.105000 123 show

2.245000 123 me

2.355000 123 the

2.935000 123 cheapest

3.315000 123 fare

3.565000 123 from

3.836919 123 Da(llas)-

4.325000 123 from

5.015000 123 Philadelphia

5.225000 123 to

5.855000 123 Dallas

7.399125 123 excluding

8.585000 123 restriction

8.825000 123 V

9.115000 123 U

9.595000 123 slash

9.880000 123 one

Break index-tier:

signal cheapest2

type 0

color 123

comment created using xlabel Fri Sep 3 17:24:47 1993

font -*-times-medium-r-*-*-17-*-*-*-*-*-*-*

separator ;

nfields 1


2.105000 123 1

2.245000 123 1

2.355000 123 1

2.935000 123 1

3.315000 123 4

3.565000 123 1

3.836919 123 1p

4.325000 123 1

5.015000 123 3

5.225000 123 1

5.855000 123 4

7.399125 123 4

8.585000 123 4

8.825000 123 1

9.115000 123 3

9.595000 123 1

9.880000 123 4

Tone tier:

signal cheapest2

type 0

color 115

comment created using xlabel Fri Sep 3 17:24:48 1993

font -*-times-medium-r-*-*-17-*-*-*-*-*-*-*

separator ;

nfields 1


2.052696 115 H*

2.579923 115 L+H*

3.065052 115 !H*

3.315635 115 L-L%

4.149572 115 %r

4.470318 115 L+H*

4.771018 115 !H*

5.015584 115 L-

5.388451 115 H*

5.855538 115 L-L%

6.984159 115 L+H*

7.399114 115 L-L%

8.154402 115 H*

8.585841 115 L-L%

8.711954 115 H*

8.928780 115 !H*

9.114631 115 L-

9.353582 115 H*

9.694309 115 H*

9.880160 115 L-L%

The following picture gives an example of x-waves visualization of a ToBI transcription, aligned with waveform and f0 curve.

Markup language:

Symbolic labels in separate tiers for each type of information (orthography, boundaries, tones, miscellaneous), time-aligned with the signal.

Annotation tools:

Two annotation tools have been developed using the xwaves environment, a transcriber and a checker. The transcriber is a UNIX script that simplifies the transcription task, but doesn't produce the transcription automatically. The checker is also a UNIX script that validates the coherence of the transcribed sequences of symbols. They are available via ftp at kiwi.nmt.edu.