VILE, Estudio acústico y perceptivo de la variación inter e intralocutor en español
VILE-P, Estudio acústico y perceptivo de la variación prosódica inter e intralocutor en español

JESSEN, M.- KÖSTER, O.- GFROERER, S. (2005) "Influence of vocal effort on average and variability of fundamental frequency", The International Journal of Speech, Language and the Law 12, 2: 174-203.

Read and spontaneous speech was produced by 100 male adult speakers of German in a neutral setting and in a Lombard setting, where 80 dB noise was presented over headphones. Average f0 (‘f0mean') and relative standard deviation of f0 (‘f0varco') were determined for each speaker in each of the four conditions spontaneous neutral, spontaneous loud, read neutral and read loud. The results confirm that when increasing vocal effort from neutral to loud speech, f0mean increases as well. None of the 100 speakers posed an exception to this effect, but the size of the effect differed between speakers, even after differences in amplitude level were accounted for. F0varco was significantly higher in loud than normal speech for the reading task, whereas in spontaneous speech no significant difference occurred. These results are compared with the literature and discussed with respect to explanations and forensic-phonetic implications.

KÖSTER, O.- HESS, M.M.- SCHILLER, N.O.- KÜNZEL, J. (1998) "The correlation between auditory speech sensitivity and speaker recognition ability", The International Journal of Speech, Language and the Law 5, 1: 22-38.

In various applications of forensic phonetics the question arises as to how far aural-perceptual speaker recognition performance is reliable. Therefore, it is necessary to examine the relationship between speaker recognition results and human perception/production abilities like musicality or speech sensitivity. In this study, performance in a speaker recognition experiment and a speech sensitivity test are correlated. The results show a moderately significant positive correlation between the two tasks. Generally, performance in the speaker recognition task was better than in the speech sensitivity test. Professionals in speech and singing yielded a more homogeneous correlation than non-experts. Training in speech as well as choir-singing seems to have a positive effect on performance in speaker recognition. It may be concluded, firstly, that in cases where the reliability of voice line-up results or the credibility of a testimony have to be considered, the speech sensitivity test could be a useful indicator. Secondly, the speech sensitivity test might be integrated into the canon of possible procedures for the accreditation of forensic phoneticians. Both tests may also be used in combination.

KÜNZEL, H. (2000) "Effects of voice disguise on speaking", The International Journal of Speech, Language and the Law 7, 2: 149-179.

Patterns of voice disguise in forensic cases involving speaker identification or speaker profiling may contain clues to features of the undisguised voice of a speaker. In a longitudinal and synchronous study, 100 subjects were asked to read a text on five occasions during a period of six months, first using their normal voices, and subsequently with two out of three modes of voice disguise, (1) raising fundamental frequency, (2) lowering fundamental frequency, (3) denasalization by firmly pinching their nose. The focus of this investigation is on fundamental frequency (F0). Results show that most subjects were in fact able consistently to change their F0 according to the mode of disguise they had selected. However, there were differences between both sexes with regard to their preference of disguise modes as well as to the individual articulatory ‘strategies' which they employed to implement them. Results corroborate experience with forensic casework, that is, they show that there is a constant relation between the F0 of a speaker's natural speech behaviour and the kind of disguise he will use in an incriminating phone call. Speakers with higher-than-average F0 tend to increase their F0 levels. This process may or may not involve register changes from modal voice to falsetto. Speakers with lower-than-average F0 prefer to disguise their voices by lowering F0 even more and often end up with permanently creaky voice. Thelatter trend can be observed much more clearly in males. Females are generally more reluctant to make drastic changes to their fundamental frequency patterns.

MARKHAM, D. (1999) "Listeners and disguised voices: The imitation and perception of dialectal accent", The International Journal of Speech, Language and the Law 6, 2: 289-299.

This paper presents an experimental investigation into whether a group of speakers could produce convincing text readings in various dialectal accents of Swedish, and the performance of listeners in identifying the accents and determining whether the accents were natural or a disguise. It was observed that individual speakers vary greatly in their ability to produce plausible imitations of accents and to mask their own dialectal background. Examination of the listeners' perceptual strategies contributes an important dimension to the understanding of reasoning processes in earwitnesses. The linguistically trained listeners were found to use combinations of accent markers as cues to the degree of naturalness, although some of the judgements reflected misconceptions or preconceived ideas about the possible forms of specific accents. The hazards of using speakers with certain accent features in voice line-ups and the potential problems associated with earwitness accent identification are discussed.

McDOUGALL, K. (2006) "Dynamic features of speech and the characterization of speakers. Toward a new approach using format frequencies", The International Journal of Speech, Language and the Law 13, 1: 89-126.

Previous research in speaker characteristics has tended to focus on static properties of the speech signal. Static features of speech generally display differences between individuals, but are insufficient to characterize a speaker. The present study considers reasons why dynamic features of speech should provide an important source of speaker-distinguishing information. It discusses this idea for formant frequencies in particular, and outlines a new direction for the development of a technique for characterizing individual speakers which uses regression to parameterize formant frequency contours. The technique is trialled and results reported for two data sets, one focusing on the formant dynamics of the rhyme /aIk/, the second on the formant dynamics of intervocalic /r/ sequences.

NEWBROOK, M.- CURTAIN, J.M. (1998) "Oates' theory of Reverse Speech: a critical examination", The International Journal of Speech, Language and the Law 5, 2.

David Oates claims to have discovered a language phenomenon which he has labelled Reverse Speech (RS). According to Oates, during speech two messages are communicated simultaneously: one forwards and heard and responded to consciously, and the other (RS) in reverse and heard and responded to unconsciously. RS can allegedly be heard as clear, grammatical statements mixed in amongst gibberish.The content of reversals is said to relate to the forward dialogue, and often accentuates the forward speech. It is also said to reveal unspoken thoughts which may be in contradiction to forward speech; therefore, it can be an effective tool to discover unspoken truths.Oates conducted an experiment which produced results suggesting that untrained listeners are able to hear RS. This experiment was replicated by the authors and the results – as well as Oates' many naive claims regarding language – suggest that RS is illusory.

NOLAN, F. (2002) "Intonation in speaker identification: An experiment on pitch alignment features", The International Journal of Speech, Language and the Law 9, 1: 1-21.

While long-term fundamental frequency statistics have been shown to be useful in discriminating speakers, relatively little attention has been paid in work on speaker characterization to intonation as a linguistically and phonetically structured phenomenon. To help redress the balance, this article presents the results of an experiment on between-speaker differences in linguistically specified intonational events. An autosegmental-metrical model of intonation is assumed which uses H (high) and L (low) targets as its primes. Since the pitch of the events corresponding to these targets is highly variable due to factors such as within-speaker variation in pitch range (or ‘pitch span'), this study investigates as a source of speaker discrimination the temporal alignment of these intonational events with segmental events. A limited degree of discrimination is achieved in highly controlled materials. Of theoretical interest is that definable pitch events lying between H and L targets show more potential for between speaker discrimination than the targets themselves.

NOLAN, F.- GRIGORAS, C. (2005) "A case for formant analysis in forensic speaker identification", The International Journal of Speech, Language and the Law 12, 2: 144-173.

Views differ on the relative importance for forensic speaker identification of different aspects of the speech signal. It is argued here that formants, whose frequencies and dynamics are the product of the interaction of an individual vocal tract with the idiosyncratic articulatory gestures needed to achieve linguistically agreed targets, are so central to speaker identity that they must play a pivotal role in speaker identification. As a practical demonstration a case is described in which F1, F2 analysis of a vowel and F2 analysis of three diphthongs show a consistent separation between two recordings, thus effectively eliminating a suspect from having made obscene telephone calls. Subsequent additional analysis, based on the statistical distribution of formant frequency estimates throughout the samples, confirms the distinctness of the voice of the suspect and that of the obscene caller. The theoretical foundation for several kinds of formant-based analysis is then discussed.

ROGERS, H. (1998) "Foreign accent in voice discrimination: a case study", The International Journal of Speech, Language and the Law 5, 2: 203-208.

This article describes the phonetic analysis of a case of speaker identification involving a foreign accent. A taped telephone message in English with a Cantonese accent resulted in the arrest of Lo, a Cantonese speaker. The investigation compared the voice on the tape with that of Lo, particularly noting the accent in English. In several places, close auditory examination showed that Lo had a stronger accent in English than that of the voice on the tape. Acoustic analysis corroborated this view. The theoretical point underlying the conclusion is that non-native speakers can imitate a stronger accent than they normally use, but not a weaker accent.

ROSE, P.- OSANAI, T.- KINOSHITA, Y. (2003) "Strength of forensic speaker identification evidence: multispeaker formant- and cepstrum-based segmental discrimination with a Bayesian likelihood ratio as threshold", The International Journal of Speech, Language and the Law 10, 2.

A forensic-phonetic speaker identification experiment is described which tests to what extent same-speaker pairs from a 60 speaker Japanese data base can be discriminated from different-speaker pairs using a Bayesian likelihood ratio (LR) as discriminant function. Non-contemporaneous telephone recordings are used, with comparison based on mean values from three segments only: a nasal, a voiceless fricative, and a vowel. It is shown that discrimination using the LR-based distance is better than with a conventional distance, and that the cepstrum outperforms the formants. A LR for the test of 50 is obtained for formant-based discrimination, compared to c. 900 for the cepstrum, and the tests are thus shown to be capable of yielding a probative strength of support for the prosecution hypothesis that is conventionally quantified as ‘moderate' for formants but ‘moderately strong' for the cepstrum. Comparisons are made with results from similar experiments.

SCHILLER, N.O.- KÖSTER, O. (1998) "The ability of expert witnesses to identify voices: a comparison between trained and untrained listeners", The International Journal of Speech, Language and the Law 5, 1: 1-9.

This study reports the results of a speaker identification experiment in which the performance of phonetic expert witnesses and untrained listeners was compared. In a direct identification task participants from both groups were asked to identify the voice of a target speaker among five foils. Results showed that expert witnesses, who were experienced in speaker identification, performed significantly better than untrained listeners, who had no experience in phonetic speaker identification.

YARMEY, A.D. (2001) "Earwitness descriptions and speaker identification", The International Journal of Speech, Language and the Law 8, 1.

Some 160 men and women selected from public locations agreed to participate in a voice identification experiment. Participants were instructed to listen carefully to the tape-recorded voice of a perpetrator committing a simulated armed robbery of a business establishment. Two minutes later they were asked to describe the voice characteristics of the perpetrator, to recall exactly what he said, and then attempt to identify the speaker from a six-person perpetrator-present or perpetrator-absent voice line-up. Half of the participants in each line-up heard a sample of identical phrases and the other half heard phrases non-identical to those used in the robbery. Accuracy of speaker identification was significantly better than chance; however, there were no significant differences in performance on either line-up as a function of the type of voice sample employed. The confidence-accuracy of identification correlation proved to be non-significant. No significant correlations were found between accuracy of speaker identification and completeness of voice descriptions, or speaker identification and percentage accuracy of recall of actual words used by the perpetrator, or speaker identification and percentage accuracy of recall of idea units contained in the perpetrator's monologue. It was concluded that voice lineups should be constructed of non-identical phrases rather than the identical phrases reportedly used by the perpetrator.

