MATE Deliverable D1.1

Supported Coding Schemes




"TSM: Tonetic Stress Marks System" is a coding scheme based on the British School style of auditory intonation analysis [O'Connor et al. 73] and applied for the transcription of the SEC "Spoken English Corpus", created in a joint project by Lancaster University and IBM. The corpus has now been digitized and time-aligned: the 'machine readable' version is called MARSEC.

Information is available at http://midwich.reading.ac.uk/research/speechlab/marsec/marsec.html

Coding book:

G.Knowles,A.Wichmann, P.Anderson "Working with Speech: Perspective on research into the Lancaster /IBM Spoken English Corpus", London and NewYork, Longman, 1966 [Knowles et al. 66]


MARSEC corpus: more than 50 texts from the BBC (different speakers, 30% female, RP accent, commentary, news broadcasting, etc.) amounting to about 52,000 words.

The corpus has been transcribed by two annotators.

Evaluations of scheme:

Information not available.

Purpose and underlying approach:

Based on the British School auditory intonation analysis ([Crystal 69], [O'Connor et al. 73]).

List of phenomena annotated:

Labels represent phrase boundaries and intonation contours.

The signal is phonetically segmented. Energy and f0 are automatically computed.

Prosodic annotation is inserted in the orthographic representation, time-aligned with the signal at beginning of accented syllables.

Two levels of intonation phrasing:


major tone unit


minor tone unit

Each accented syllable is marked with a diacritic classifying the accent according to the following characteristics (describing the tone contour from the beginning of the syllable up to the next accented syllable or the tone unit end):

high/low (refers to the starting point of the tone, higher or lower than the previous pitch)

level/fall/rise/fall-rise/rise-fall (the shape of the contour)

A conversion has been attempted between TSM and ToBI [Roach 94]

Markup language:

Prosodic labels are diacritics inserted in the orthographic stream and time-aligned with the signal at beginning of accented syllables.

Annotation tools:

Environment for (manual) labelling: Entropics/waves+