logo

MATE Deliverable D1.1

Supported Coding Schemes

 

IPO

 IPO

The methodology for the study of intonation proposed by IPO (Institute for Perception Research, Eindhoven) has inspired many experimental works and synthesis implementations. The approach can be considered a reference in the field of intonation research, more for its general principles than for its representation scheme. The actual coding scheme has been applied by the authors to the modelling of Dutch intonation, and has also been adapted and applied to other languages.

Coding book:

The reference text both for general principles and notation is:

J. 't Hart, R. Collier, A. Cohen, "A perceptual study of intonation" [Hart 90].

Applications:

The actual coding scheme has been applied by the authors to the modelling of Dutch intonation, and has also been adapted and applied to other languages (English [Willems et al. 88], French [Beaugendre 94], Italian [Quazza 91], Mpur [Odé 97], German [Brindopke et al. 97]).

Evaluation:

Information not available.

Purpose and underlying approach:

IPO approach provides a framework for studying both the physical and linguistic aspects of intonation and "supports language-independent pre-theoretical description of speech melody allowing the development of new melodic categories" [Brindopke et al. 97]. The idea is that a model of intonation for a given language should be extracted from raw acoustic f0 data, by means of successive steps, first removing perceptually irrelevant details and finally getting to meaningful patterns related to linguistic function. Speech synthesis has a twofold role in the process, as an analysis tool allowing to assess the perceptual plausibility of the models, and as an application, where the acoustic content of the representation can be directly implemented.

The underlying theory of intonation represents the f0 curve as a sequence of pitch movements, superimposed on a general declination line, gradually lowering the pitch range through the utterance, with possible resets at phrase boundaries. The linguistically relevant pitch patterns are discovered by a direct analysis of f0 curves.

First the f0 curve is stylized with a sequence of straight lines representing a close copy of the original curve, perceptually identical when imposed on the original signal by means of resynthesis.

Then, segments in the stylized curves are classified and described according to four discrete parameters: direction (rise/fall), timing (early in the syllable/late/very late), rate of change (fast/slow), size (full/half). On this basis, a clustering of f0 segments identifies a standard set of pitch movements typical of the given language (speaker/domain/corpus). Pitch curves can so be standardized, i.e. described as sequences of standard pitch movements. When resynthesized, the standardized curve should be perceptually equivalent to the original one.

Further analysis would find out the typical and recurring configurations of pitch movements which carry some linguistic function: for example a 'pointed hat' marking a pitch accent or a 'flat hat' sounding as a sentence conclusion. A complete intonation model for a given language would discover the grammar according to which configurations combine into full intonation contours, realizing the basic intonation patterns of the language.

List of phenomena annotated:

The IPO methodology focuses on intonation only, providing different representations for pitch. Each step has its own coding of the f0 curve, from a detailed acoustic description up to phonetic/phonological representations.

Such codings presuppose a phonetic segmentation of the speech signal, or at least a segmentation into syllables, with respect to which pitch movements are aligned.

 

Acoustic description (stylization):

the curve is represented as a sequence of f0 straight segments, each measured in semitones of change, milliseconds of duration, alignment with syllable boundaries.

 

Phonetic description (standardization):

the curve is represented as a sequence of pitch movements which can be considered standard for the given language; each standard movement is characterized by four parameters:

 

direction

rise/fall

timing

early in the syllable/late/very late

rate of change

fast/slow

size

full/half

Standard movements for Dutch are distinguished into five types of rises, labelled 1, 2, 3, 4 and 5, and five falls, labelled A, B, C, D and E. Segments corresponding to lower and upper declination lines are labelled O and 0 respectively. Each syllable is assigned at least one label. If two or more movements occur on the same syllable, their labels are joined by "&".

Phonological description:

Once a 'grammar of intonation' has been defined for a given language, a more abstract labelling of the curve can be obtained in terms of pitch configurations and pitch contours.

In the same methodological framework, different notations have been adopted. For example, in the Dutch SPIN/ASSP Program [Heuven et al. 93] the following symbols have been used [Terken 93a]:

R

rise

F

fall

L

low level pitch

H

high level pitch

FF

gradual pitch fall

&

two movements on the same syllable

*

associated with accented syllable

%

associated with a boundary

The inventory of pitch contours for Dutch is represented as follows:

 

Transcription

IPO notation

Description

R*

1

prominence-lending rise (early in syllable)

F*

A

prominence-lending fall (early in syllable)

R*F*

1A

sequence of R and F associated with two successive accented syllables

R&*F

1&A

combination of prominence-lending rise and fall on the same syllable

R*FF

1D

prominence-lending rise followed by gradual falling pitch

LR%

O2

non-prominence lending rise (late in the syllable), starting from the baseline

R*H%

10

R* followed by high-level pitch until the end of the phrase

Examples:

Information not available.

Markup language:

Symbolic labels time-aligned with the f0 curve.

Annotation tools:

IPO developed its own tools for (manual) perceptual stylization and resynthesis. In this environment labels can be assigned to flags in the speech waveform, but the choice of labels is completely free and there is no well-formedness checker.

Several tools have been developed for pitch perceptual stylization, more or less related with the IPO approach (e.g. WinPitch www.winpitch.com).

Automatic stylization has been implemented too, with more or less sophisticated approaches. See for example [Coile et al. 94], where a piece-wise linear approximation of the f0 curve (in the log domain) is obtained starting from raw f0 data and, optionally, phonetic segmentation. See also [Mertens et al. 97], describing a sophisticated stylization algorithm based on a "tonal perception model".

A tool for (synthetic) speech manipulation explicitly implementing the IPO framework is Speech Maker [Leeuwen et al. 93], where the intonation contour can be represented in terms of IPO pitch movements, each controllable in its parameters (anchor, timing, duration, excursion).

A recent example of "an environment for labelling and testing of melodic aspects of spoken language" inspired by IPO methodology is the one described in [Brindopke et al. 97], implemented in C and integrated in EXPS/Xwaves. The tool "relies on the method of approximating the original f0 contour with a minimum set of straight lines", provides "labelling facilities for model-based melodic description for German" and "supports language-independent pre-theoretical description of speech melody allowing the development of new melodic categories".

Tools for automatic labelling of pitch contours with IPO labels have also been implemented [Bosch 93a,b].

IPO is developing a system for automatic extraction and labelling of prosodic information. It takes the speech waveform as input, employs information about phoneme durations, identifies intonation phrases, accent locations, pitch accent types and boundary tones, and parameters for pitch range (baseline and topline parameters). The mapping between acoustic features and prosodic labels is defined in run-time readable files.