Speech and Spoken Language Resources
Bibliography


General references on speech and spoken language resources
Standards in speech and spoken language resources
TEI, Text Encoding Initiative
EAGLES, (Expert Advisory Group on Language Engineering Standards) Spoken Language Working Group - ISLE (International Standards for Language Engineering) Natural Interactivity and Multimodality Working Group
SAM, Speech Assessment Methodologies
SpeechDat
COCOSDA, International Committee for the Co-ordination and Standardisation of Speech Databases and Assesment Techniques
Validation of speech resources
Design and compilation of speech and spoken language resources
Design and compilation of speech corpora
Design and compilation of spoken language corpora
Recording techniques
Data elicitation techniques
Tools for speech corpora acquisition and management
Labelling and annotation of speech corpora
Levels of labelling
Labelling criteria
Validation of labelling
Multilingual labelling and annotation
Multimodal labelling and annotation
Labelling and annotation tools
Phonetic representation of speech corpora
Phonetic representation of speech corpora: segmental level
Phonetic representation of speech corpora: suprasegmental level
INTSINT, International Transcription System for Intonation
Transcription and encoding of spoken corpora
Speech and spoken language corpora
Catalan
Spanish
Applications of speech and spoken language resources
Research in phonetics
Speech technologies
Linguistic analysis
Research in second language acquisition
Research in clinical phonetics
Documentation and teaching of minority languages

Speech resources

Spoken language resources


Speech resources

Spoken language resources

General references on speech and spoken language resources


= Recommended introductory/general reading


= Recommended advanced reading

Alcántara Plá, M. (2008). Los retos en el análisis de los corpus de última generación. In R. Monroy, & A. Sánchez (Eds.), 25 años de Lingüística Aplicada en España: hitos y retos / 25 years of Applied Linguistics in Spain: Milestones and Challenges. (pp. 701-6). Murcia: Servicio de Publicaciones de la Universidad de Murcia.

Carré, R. (1991). Los bancos de sonidos. In J. Vidal Beneyto (Ed.), Las industrias de la lengua. (pp. 108-18). Madrid: Fundación Sánchez Ruipérez - Pirámide.

Carré, R. (1992). Speech databases. In W. A. Ainsworth (Ed.), Advances in speech, hearing and language processing. Volume 2. (pp. 199-216). London: JAI Press.

Compiling and processing spoken language corpora. LREC 2004, International Conference on Language Resources and Evaluation. Lisbon, Portugal. May 24, 2004.


Draxler, C. (2000). Speech databases. In F. van Eynde & D. Gibbon (Eds.), Lexicon development for speech and language processing. (pp. 169-206). Dordrecht: Kluwer.

Draxler, C. (2008). Korpusbasierte Sprachverarbeitung - eine Einführung. Tübingen: Gunter Narr.


Gibbon, D., Moore, R., & Winski, R. (Eds). (1997). Spoken language reference materials. Berlin - New York: Mouton de Gruyer.

1. User's guide; A. Character codes and computer readable alphabets; B. SAMPA computer readable phonetic alphabet; C. SAM file formats; D. SAM recording protocols; E. SAM software tools; F. EUROPEC recording tools; G. Digital storage media; H. Database management systems; I. Speech standards; J. EUROM-1 database overview; K. Polyphone project overview; L. European speech resources; M. Transcription and documentation conventions for Speechdat; N. The Bavarian Archive for Speech Signals.


Gibbon, D., Moore, R., & Winski, R. (Eds). (1998). Spoken language systems and corpus design. Berlin - New York: Mouton de Gruyer.

1. User's guide; 2. System design; 3. SL corpus design; 4. SL corpus collection; 5. SL corpus representation.

Höge, H. (1998). Spoken language resources for voice driven man machine interfaces. In LREC 1998. Proceedings of the 1st international conference on language resources and evaluation. Vol 1. (pp. 209-16). Granada, Spain. May 28-30, 1998.

Krauwer, S. (Ed). (2002). Proceedings of the workshop "Towards a roadmap for multimodal language resources and evaluation". LREC 2002. 3rd International Conference on Language Resources and Evaluation. Las Palmas de Gran Canaria, Spain. June 2, 2002. Retrieved from http://www.elsnet.org/roadmap-lrec2002.html


Lamel, L., & Cole, R. A. (1997). Spoken language corpora. In R. A. Cole, J. Mariani, H. Uszkoreit, A. Zaenen, & V. Zue (Eds.), Survey of the state of the art in human language technology. (pp. 450-4). Cambridge: Cambridge University Press. Retrieved from http://speech.bme.ogi.edu/HLTsurvey/ch12node5.html#SECTION123

Llisterri, J. (1996). Els corpus lingüístics orals. In L. Payrató, E. Boix, M. R. Lloret, & M. Lorente (Eds.), Corpus, corpora. Actes del 1er i 2on col·loquis lingüístics de la Universitat de Barcelona (CLUB-1, CLUB-2). (pp. 27-70). Barcelona: Promociones y Publicaciones Universitarias. Retrieved from http://liceu.uab.cat/~joaquim/publicacions/UB_Corpus_93.pdf

Llisterri, J. (1999). Corpus orals per a la fonètica i les tecnologies de la parla. In Actes del I congrés de fonètica experimental. (pp. 27-38). Universitat Rovira i Virgili - Universitat de Barcelona. Tarragona, 22, 23 i 24 de febrer de 1999. Retrieved from http://liceu.uab.cat/~joaquim/publicacions/Resum_tarragona_99.html

Llisterri, J., Machuca, M. J., Mota, C., Riera, M., & Ríos, A. (2005). Corpus orales para el desarrollo de las tecnologías del habla en español. Oralia. Análisis del Discurso Oral, 8, 289-325. Retrieved from http://liceu.uab.cat/~joaquim/publicacions/Llisterri_Machuca_Mota_Riera_Rios_05_Corpus_Orales_Tecnologias_Habla_Espanol.pdf

Maybury, M. (Ed). (2002). Proceedings of the workshop on multimodal resources and multimodal systems evaluation. LREC 2002. 3rd International Conference on Language Resources and Evaluation. Las Palmas de Gran Canaria, Spain. June 1, 2002.

Melin, H. (1999). Databases for speaker recognition: Activities in the COST250 working group. In COST250 workshop on speaker recognition in telephony. Rome, Italy. 10-12 November, 1999. Retrieved from http://www.speech.kth.se/ctt/publications/papers/cost250-00_wg2fr.pdf

Pols, L. C. W. (1987). Speech technology and corpus linguistics. In W. Meijs (Ed.), Corpus linguistics and beyond. Proceedings of the seventh international conference on English language research on computerized corpora. (pp. 285-94). Amsterdam: Rodopi.

Pols, L. C. W. (Ed). (1990). Special issue on speech input / output assessment and speech databases. Speech Communication, 9(4).

Pols, L. C. W., & van Bezooijen, R. (1989). Proceedings of the ESCA tutorial day and workshop on speech input / output assessment and speech databases. Noordwijkerhout, the Netherlands. 20-23 September 1989. Retrieved from http://www.isca-speech.org/archive_open/sioa_89/index.html


Schiel, F., Draxler, C., Baumann, A., Ellbogen, T., & Steffen, A. (2003). The production of speech corpora. Institut für Phonetik und Sprachliche Kommunikation, Universität München. Retrieved from http://www.phonetik.uni-muenchen.de/forschung/BITS/TP1/Cookbook/


Schiel, F. (2003). The validation of speech corpora. Institut für Phonetik und Sprachliche Kommunikation, Universität München. Retrieved from http://www.phonetik.uni-muenchen.de/forschung/BITS/TP2/Cookbook/

Véronis, J. (Ed). (2004). Le traitement automatique des corpus oraux. Traitement Automatique des Langues, 45(2).


Wray, A., Trott, K., & Bloomer, A. (1998). Projects in linguistics. A practical guide to researching language. London - New York: Arnold - Oxford University Press.

III.- Tools for data analysis and project writing: 17.- Transcribing speech phonetically and phonemically; 18.- Transcribing speech orthographically.

arrow_up

Standards in speech and spoken language resources

Standards

TEI, Text Encoding Initiative

Johansson, S. (1995). The approach of the Text Encoding Initiative to the encoding of spoken discourse. In G. Leech, G. Myers, & J. Thomas (Eds.), Spoken English on computer: Transcription, markup and applications (pp. 82-98). Harlow: Longman.

Johansson, S. (1995). The encoding of spoken texts. In N. Ide & J. Véronis (Eds.), The Text Encoding Initiative. Background and context (pp. 149-158). Dordrecht: Kluwer.

TEI Consortium (Eds). (2013). 8 Transcription of speech. In TEI P5: Guidelines for electronic text encoding and interchange [Version 2.5.0. Last updated on 23 July 2013]. TEI Consortium. Retrieved from http://www.tei-c.org/release/doc/tei-p5-doc/en/html/TS.html

TEI, Text Encoding Initiative

arrow_up

EAGLES, (Expert Advisory Group on Language Engineering Standards) Spoken Language Working Group - ISLE (International Standards for Language Engineering) Natural Interactivity and Multimodality Working Group

Dybkjaer, L., Berman, S., Bernsen, N. O., Carletta, J., Heid, U., & Llisterri, J. (2001). Requirements and specifications for a tool in support of annotation of natural interaction and multimodal data. Deliverable D11.2. Final Report. July 2001. ISLE Natural Interactivity and Modality Working Group. Retrieved from http://spokendialogue.dk/Publications/2001e/D11.2-ISLE-29.7.2001-F.pdf

Dybkjaer, L., Berman, S., Kipp, M., Wegener Olsen, M., Pirrelli, V., Reithinger, N., & Soria, C. (2001). Survey of existing tools, standards and user needs for annotation of natural interaction and multimodal data. Deliverable D11.1. Final Report. January 2001. ISLE Natural Interactivity and Multimodality Working Group. Retrieved from http://spokendialogue.dk/Publications/2001f/D11.1-14.2.2001-F.pdf

GIBBON, D. (1997) "Standards and Resources for Spoken Language Systems", in MARCINKEVICIENE, R.- VOLZ, N. (Eds.) TELRI. Trans-Europea Language Resources and Infrastructure. Proceedings of the Second European Seminar "Language Applications for Multilingual Europe". Kaunas, Lithuania, April 17-20, 1997. Mannheim - Kaunas: IDS - VDU. pp. 35-54.

GIBBON, D. - MOORE, R.- WINSKI, R. (Eds.) (1997) Handbook on Standards and Resources for Spoken Language Systems. Berlin: Mouton De Gruyter.

GIBBON, D. - MOORE, R.- WINSKI, R. (Eds.) (1998) Spoken Language Systems and Corpus Design. Berlin: Mouton De Gruyter. (Handbook of Standards and Resources for Spoken Language Systems, Volume I).

1. User's guide; 2. System design; 3. SL corpus design; 4. SL corpus collection; 5. SL corpus representation.

GIBBON, D. - MOORE, R.- WINSKI, R. (Eds.) (1998) Spoken Language Characterisation. Berlin: Mouton De Gruyter. (Handbook of Standards and Resources for Spoken Language Systems, Volume II).

1.- User's guide; 2.- Spoken language lexica; 3.- Language models; 4.- Physical characterisation and description.

GIBBON, D. - MOORE, R.- WINSKI, R. (Eds.) (1998) Spoken Language Reference Materials. Berlin: Mouton De Gruyter. (Handbook of Standards and Resources for Spoken Language Systems, Volume IV).

1. User's guide; A. Character codes and computer readable alphabets; B. SAMPA computer readable phonetic alphabet; C. SAM file formats; D. SAM recording protocols; E. SAM software tools; F. EUROPEC recording tools; G. Digital storage media; H. Database management systems; I. Speech standards; J. EUROM-1 database overview; K. Polyphone project overview; L. European speech resources; M. Transcription and documentation conventions for Speechdat; N. The Bavarian Archive for Speech Signals.

GIBBON, D.- MERTINS, I.- MOORE, R. (Eds.) (2000) Handbook of Multimodal and Spoken Dialogue Systems. Resources, Terminology and Product Evaluation. Dordrecht: Kluwer Academic Publishers (Kluwer International Series in Engineering and Computer Science, 565)

1.- Representation and annotation of dialogue. 2.- Audio-visual and multimodel speech-based systems. 3.- Consumer off-the-shelf (COTS) product and service evaluation. 4.- Terminology for spoken language systems. 5.- Reference materials.

Llisterri, J. (1996). Preliminary recommendations on spoken texts. EAGLES Documents EAG-TCWG-STP/P. May 1996. Retrieved from http://www.ilc.cnr.it/EAGLES96/spokentx/spokentx.html

WINSKI, R. - MOORE, R.- GIBBON, D. (1995) "EAGLES Spoken Language Working Group: Overview and Results", in Eurospeech'95. Proceedings of the 4th European Conference on Speech Communication and Speech Technology. Madrid, Spain, 18-21 September, 1995. Vol 1, pp. 841-844.
http://coral.lili.uni-bielefeld.de/EAGLES/rwpaper/rwpaper.html

EAGLES, Expert Advisory Group on Language Engineering Standards

arrow_up

SAM, Speech Assessment Methodologies

CHAN, D.- FOURCIN, A.- GIBBON, D.- GRANSTRÖM, B.- HUCKVALE, M.- KOKKINAKIS, G.- KVALE, K.- LAMEL, L.- LINDBERG, B.- MORENO, A.- MOUROPOULOS, J.- SENIA, F.- TRANCOSO, I.- VELD, C.- ZEILIGER, J. (1995) "EUROM- A Spoken Language Resource for the EU", in EUROSPEECH 1995. Proceedings of the 4th European Conference on Speech Communication and Technology. 18 - 21 September, 1995. Madrid, Spain. Vol 1, pp. 867-870.
http://www.phon.ucl.ac.uk/resource/eurom1/eurospeech95eurom.pdf

FOURCIN, A.- DOLMAZON, J.M. (on behalf of the SAM Project) (1991) "Speech knowledge, standards and assessment", in Actes du XIIème Congrès International des Sciences Phonétiques. 19-24 août 1991, Aix-en-Provence, France. Aix-en-Provence: Université de Provence, Service des Publications. Vol 5 pp. 430-433.

FOURCIN, A.- HARLAND, G.- BARRY, W. - HAZAN, V (Eds.) (1989) Speech Input and Output Assessment. Multilingual Methods and Standards. Chichester: Ellis Horwood Ltd.

GRICE, M.- BARRY, W. (1989) "EUROM.0 technical description", in SAM (1989) Esprit Project 1541 (SAM) Multilingual Speech Input/Output: Assessment, Methodology and Standardisation. Extension Phase. Final Report. 1 April 1988- 28 February 1989. pp.179-193

SHERWOOD, T.- FULLER, H. (1992) Guide to EUROM.1 Speech Database. Doc. No. SAM-NPL-102, Final, 21 April 1992. ESPRIT PROJECT 2589 (SAM)
arrow_up

SpeechDat

DRAXLER, C.- van den HEUVEL, H.- TROPF, H. (1998) "SpeechDat Experiences in Creating Large Multilingual Speech Databases for Teleservices", in LREC 1998. Proceedings of the First International Conference on Language Resources and Evaluation. 28 - 30 May 1998. Granada, Spain. Paris: ELRA, European Language Resources Association. Vol. I. pp. 361-366.
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.50.2745

HEUVEL, H. van den- BONAFONTE, A.- BOUDY, J.- DUFOUR, S.- LOCKWOOD, P.- MORENO, A.- RICHARD, G. (1999) "SpeechDat-Car: Towards a collection of speech databases for automotive environments", in Nokia-COST 249 Workshop. Tampere, Finland.
http://www.speechdat.org/SP-CAR/CONFEREN/ICAR99V1.PDF

HEUVEL, H. van den.- BOUDY, J.- COMEYNE, R.- EULER, S.- MORENO, A.- RICHARD, G. (1999) "The SpeechDat-Car multilingual speech databases for in-car applications: some first validation results", in EUROSPEECH 1999. Proceedings of the 6th European Conference on Speech Communication and Technology. 5 - 9 September, 1999. Budapest, Hungary.
http://www.speechdat.org/SP-CAR/CONFEREN/EURO99_0.PDF

HEUVEL, H. van den.- HALL, P.- HÖGE, H.- MORENO, A.- RINCÓN, A.- SENIA, F. (2004) "SALA II across the finish line : a large collection of mobile telephone speech databases from North & Latin America completed", in LREC 2004. Proceeedings of the 4th International Conference on Language Resources and Evaluation. 26-28 May, 2004, Lisbon, Portugal. Paris: ELRA, European Language Resources Association.
http://gps-tsc.upc.es/veu/research/pubs/download/Heu_SAL_04.pdf

MORENO, A. (2000) "SALA: SpeechDat Across Latin America", in Proceedings of the 1st Workshop on Very Large Databases. May, 2000. Athens, Greece.
http://gps-tsc.upc.es/veu/research/pubs/download/Mor00a.ps

MORENO, A.- COMEYNE, R.- HASLAM, K.- van den HEUVEL, H.- HÖGE, H.- HORBACH, S..- MICCA, G. (2000) "SALA: SpeechDat across Latin America. Results of the First Phase", in LREC 2000. Proceedings of the Second International Conference on Language Resources and Evaluation. 31 May - 2 June, 2000, Athens, Greece. Paris: European Language Resources Association.
http://gps-tsc.upc.es/veu/research/pubs/download/Mor00c.pdf

MORENO, A.- GEDGE, O.- van den HEUVEL, H.- HÖGE, H.- HORBACH, S.- MARTIN, P.- PINTO, E.- RINCÓN, A.- SENIA, F.- SUKKAR, R. (2002) "SpeechDat across all America: SALA II" in LREC 2002. Proceedings of the Third International Conference on Language Resources and Evaluation. 27 May - 2 June, 2002, Las Palmas de Gran Canaria, Spain. Paris: European Language Resources Association.

MORENO, A.- HÖGE, H.- KÖLER, J. - MARIÑO, J.B. (1998) "SpeechDat Across Latin America. Project SALA", in LREC 1998. Proceedings of the First International Conference on Language Resources and Evaluation. 28 - 30 May 1998. Granada, Spain. Paris: ELRA, European Language Resources Association. Vol. I. pp. 367-370.

MORENO, A.- LINDBERG, B.- DRAXLER, C.- RICHARD, G.- CHOUKRI, K.- EULER, S.- ALLEN, J. (2000) "SPEECHDAT-CAR. A Large Speech Database for Automotive Environments", in LREC 2000. Proceedings of the Second International Conference on Language Resources and Evaluation. 31 May - 2 June, 2000, Athens, Greece. Paris: European Language Resources Association.
http://gps-tsc.upc.es/veu/research/pubs/download/Mor00c.pdf
http://www.speechdat.org/SP-CAR/CONFEREN/LREC2000.PDF

VELDEN, J.G. van - LANGMANN, D.- PAWLEWSKI, M. (1996) Specification of speech data collection over mobile telephone networks. Version 2.3. SpeechDat LE2-401 Deliverable SD1.1.2/1.2.2.14 October, 1996.
http://www.speechdat.org/speechdat/deliverables/public/SD112V23.DOC

WINSKI, R. (1997) Definition of corpus, scripts and standards for fixed networks. Version 4.1.SpeechDat LE2-401 Deliverable SD1.1.1. 22 January 1997.
http://www.speechdat.org/speechdat/deliverables/public/SD111V41.DOC

Working standards for speech databases directed towards short and medium term applications. LRE-63314 Report D3.1.1.1.
http://www.speechdat.org/speechdt/speechdat_m/deliverables/D3111.pdf

Reduced needs and specifications for the database. LRE-63314 SpeechDat Report D1.4.1.
http://www.speechdat.org/speechdt/speechdat_m/deliverables/D141.pdf

COCOSDA, International Committee for the Co-ordination and Standardisation of Speech Databases and Assesment Techniques

COCOSDA 2000. Workshop of the the International Committee for the Co-ordination and Standardisation of Speech Databases and Assesment Techniques. 21st October 2000, Beijing, China.
http://www.cocosda.org/meet/beijing.html

COCOSDA 2001. Workshop of the the International Committee for the Co-ordination and Standardisation of Speech Databases and Assesment Techniques. 2nd September 2001, Aalborg, Denmark.

COCOSDA 2002. Workshop of the the International Committee for the Co-ordination and Standardisation of Speech Databases and Assesment Techniques. 16th September 2002, Denver, Colorado.
http://www.cocosda.org/meet/denver/present.html

COCOSDA 2003. Workshop of the the International Committee for the Co-ordination and Standardisation of Speech Databases and Assesment Techniques. 31st August 2003, Geneva, Switzerland.
http://www.cocosda.org/meet/2003/present.html

arrow_up

Validation of speech resources

Boves, L. (1998). ELRA validation manual for SLR (Spoken Language Resources) (ELRA - Deliverable 6.1.1). ELRA - Distribution Agency (ELDA). Retrieved from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.54.4354

Schiel, F., Baumann, A., Draxler, C., Ellbogen, T., Hoole, P., & Steffen, A. (2004). The validation of speech corpora (Version 1.11). München: Bavarian Archive for Speech Signals. Retrieved from http://www.phonetik.uni-muenchen.de/forschung/BITS/TP2/Cookbook/

Schiel, F., Baumann, A., Draxler, C., Ellbogen, T., Hoole, P., & Steffen, A. (2012). The validation of speech corpora (Version 1.10). München: Bavarian Archive for Speech Signals. Retrieved from http://epub.ub.uni-muenchen.de/13698/1/schiel_13698.pdf

van den Heuvel, H., Iskra, D., Sanders, E., & de Vriend, F. (2008). Validation of spoken language resources: An overview of basic aspects. Language Resources and Evaluation, 42(1), 41-73. doi:10.1007/s10579-007-9049-1

arrow_up

Design and compilation of speech and spoken language resources

Design and compilation of speech corpora

ALCÁCER, N.- CASTRO, M.J.- GALIANO, I.- GRANELL, R.- GRAU, S.- GRIOL, D. (2004) "Adquisición de un corpus de diálogo: DIHANA", in SANCHIS ARNAL, E. (Ed.) Actas de las III Jornadas en Tecnología del Habla. Valencia, del 17-19 de noviembre de 2004. Organizadas por la Red Temática en Tecnología del Habla. Valencia: Departamento de Sistemas Informáticos y Computación, Facultad de Informática, Universidad Politécnica de Valencia. pp. 131-136.
http://lorien.die.upm.es/~lapiz/rtth/JORNADAS/III/actas3JTH.pdf

ANDERNACH, T.- DEVILLE, G.- MORTIER, L (1993) "The Design of a Real World Wizard of Oz Experiment for a Speech Driven Telephone Directory Information Service", in Eurospeech'93. 3rd European Conference on Speech Communication and Technology. Berlin, Germany, 21-23 September 1993. Vol. 2 pp.1165-1168.

BOULIANNE, G.- KENNY, P.- LENNIG, M.- O'SHAUGHNESSY, D.- MERMELSTEIN, P. (1994) "Books on tape as training data for continuous speech recognition", Speech Communication 14,1: 61-70

CAMPBELL, N. (1998) "Design of Speech Corpora for use in Concatenative Synthesis Systems", in LREC 1998. Proceedings of the First International Conference on Language Resources and Evaluation. 28 - 30 May 1998. Granada, Spain. Paris: ELRA, European Language Resources Association. Vol. II, pp. 1309-1312.

ESKÉNAZI, M.- FREDERKING, R. (1998) "Issues in Database design: recording and processing speech from new populations", in Proceedings of the First International Conference on Language Resources and Evaluation. May 28 - 30, 1998, Granada, Spain.

FRASER, N.- GILBERT, G.N. (1991) "Simulating speech systems", Computer Speech and Language 5,1: 81-99.

GIBBON, D. - MOORE, R.- WINSKI, R. (Eds.) (1998) Spoken Language Systems and Corpus Design. Berlin: Mouton De Gruyter. (Handbook of Standards and Resources for Spoken Language Systems, Volume I).

1. User's guide; 2. System design; 3. SL corpus design; 4. SL corpus collection; 5. SL corpus representation.

HOZJAN, V.- KACIC, Z.- MORENO, A.- BONAFONTE, A.- NOGUEIRAS, A. (2002) "Interface Databases: Design and Collection of a Multilingual Emotional Speech Database", in LREC 2002. Proceedings of the Third International Conference on Language Resources and Evaluation. 27 May - 2 June, 2002, Las Palmas de Gran Canaria, Spain. Paris: European Language Resources Association.
http://gps-tsc.upc.es/veu/research/pubs/download/hoz_int_02.pdf

KRSTULOVIC, S.- BIMBOT, F.- BOËFFARD, O.- CHARLET, D.- FOHR, D.- MELLA, O. (2006) "Optimizing the coverage of a speech database through a selection of representative speaker recordings", Speech Communication 48, 10: 1319-1348.
http://dx.doi.org/10.1016/j.specom.2006.07.002

LAMEL, L.- ROSSET, S.- BENNACEF, S.- BONNEAU-MAYNARD, H.- DEVILLERS, L.- GAUVAIN, J.L. (1995) "Development of Spoken Language Corpora for Travel Information", in EUROSPEECH 1995. Proceedings of the 4th European Conference on Speech Communication and Technology. 18 - 21 September, 1995. Madrid, Spain.Vol 3, pp. 1961-1964.

LLISTERRI, J.- POCH, D. (1991) "Phonetic criteria for the development of a speech database in Spanish (the Albayzin project), in CASTAGNERI, G. (Ed.) Proceedings of the Workshop on International Cooperation and Standardization of Speech Databases and Speech i/O Assessment Methods. Chiavari 26-28 September 1991 (Italy).
http://liceu.uab.cat/~joaquim/publicacions/Llisterri_Poch_91_Albayzin.pdf

Llisterri, J., & Poch, D. (1991). Phonetic criteria for the development of a speech database in Spanish (the Albayzín project). In G. Castagneri (Ed.), Proceedings of the workshop on international cooperation and standardization of speech databases and speech I /O assessment methods. Chiavari, Italy. September 26-28, 1991. Retrieved from http://liceu.uab.cat/~joaquim/publicacions/Llisterri_Poch_91_Albayzin.pdf

Llisterri, J., & Poch, D. (1994). Proyecto de una base de datos acústicos de la lengua española. In Actas del congreso de la lengua española. Sevilla, del 7 al 10 de octubre de 1992. (pp. 278-92). Madrid: Instituto Cervantes. Retrieved from http://cvc.cervantes.es/obref/congresos/sevilla/tecnologias/ponenc_llisterripoch.htm

MACHUCA, M. J. (2006) "Corpus para el desarrollo de sistemas de diálogo", in LLISTERRI, J.- MACHUCA, M. J. (Eds.) Los sistemas de diálogo. Bellaterra - Soria: Universitat Autònoma de Barcelona, Servei de Publicacions - Fundación Duques de Soria (Manuals de la Universitat Autònoma de Barcelona, Lingüística, 45). pp. 61-79.

MARCHAL, A.- HARDCASTLE, W.- HOOLE, P. - FARNETANI, E.- NI CHASAIDE, A.- SCHMIDBAUER, O.- GALIANO-RONDA, I.- ENGSTRAND, O. - RECASENS, D. (EUR-ACCOR) (1991) "The design of a multichannel database", in Actes du XIIème Congrès International des Sciences Phonétiques. 19-24 août 1991, Aix-en-Provence, France. Aix-en-Provence: Université de Provence, Service des Publications. Vol 5, pp. 422-425

MILLAR, J.B.- HAWKINS, S.R. (1990) "Selecting representative speakers", in Proceedings of the Tutorial and Research Workshop on Speaker Characterization in Speech Technology. Edinburgh, 26-28 June. Edinburgh: Center for Speech Technology Research.pp.161-166.

Moreno, A., Poch, D., Bonafonte, A., Lleida, E., Llisterri, J., Mariño, J. B., & Nadeu, C. (1993). ALBAYZN speech database: Design of the phonetic corpus. In Eurospeech 1993. Proceedings of the 3rd European conference on speech communication and technology. Vol 1. (pp. 175-8). Berlin, Germany. 21- 23 September, 1993. Retrieved from http://liceu.uab.cat/~joaquim/publicacions/Moreno_et_al_93_Albayzin_Phonetic_Corpus.pdf

PÉAN, V.- WILLIAMS, S.- ESKÉNAZI, M. (1993) "The Design and Recording of ICY, a Corpus for the Study of Intraspeaker Variability" in Eurospeech'93. 3rd European Conference on Speech Communication and Technology. Berlin, Germany, 21-23 September 1993. Vol. 1 pp. 627-630

SCHIEL, F. - TÜRK, U. (2006) "Wizard-of-Oz recordings", in WAHLSTER, W. (Ed.) SmartKom: Foundations of Multimodal Dialogue Systems. New York: Springer. pp. 541-570.

SWERTS, M.- COLLIER, R. (1992) "On the controlled ellicitation of spontaneous speech", Speech Communication 11, 4-5: 463-468.

WINSKI, R.- SENIA, F.- CONNER, P.- HÄB-ÜMBACH, R.- CONSTANTINESCU, A.- NIEDERMAIR, G.- MORENO, A.- TRANCOSO, I. (1996) Specification of Telephone Speech Data Collection. LRE-63314 SPEECHDAT, Deliverable D1.4.1.
http://www.speechdat.org/speechdt/speechdat_m/deliverables/D141.pdf

ZANTEN, E. van- DAMEN, L.W.M. - HOUTEN, E. van "Collecting data for a speech database", in HEUVEN, V.J. van - POLS, L.C.W. (Eds) Analysis and synthesis of speech. Strategic research towards high quality text-to-speech generation. Berlin: Mouton de Gruyter (Speech Research Series)

arrow_up

Design and compilation of spoken language corpora


= Recommended introductory/general reading


= Recommended advanced reading

Adolphs, S. & Knight, D. (2010). Building a spoken corpus: What are the basics? In A. O'Keeffe & M. McCarthy (Eds.), The Routledge handbook of corpus linguistics. Oxford: Routledge.

BRIZ, A. et al. (1993) "La elaboración de un corpus de español coloquial. Problemas metodológicos previos", Cahiers du Centre Interdisciplinaire des Sciencies du Langage, Actes du Colloque "Le Dialogue en question". Université de Toulouse -Le Mirail, Valencia, 1994. pp. 103-109.

BRIZ, A. et al. (1995) "La elaboración de un corpus de español coloquial. Problemas metodológicos previos", in Actas del I Congreso de Lingüística General. València: Universitat de València.

CID URIBE, M. E. - ROSS ARIAS, A. (2006) "La construcción de un corpus de habla pública de Chile: Criterios y procedimientos para la selección de una muestra representativa", Onomázein 13, 1: 21-33.
http://www.onomazein.net/Articulos/13/2_Cid.pdf

Čermák, F. (2009). Spoken corpora design: Their constitutive parameters. International Journal of Corpus Linguistics, 14(1), 113-123.

CROWDY, S. (1993) "Spoken Corpus Design and Transcription ", Literary and Linguistic Computing, 8, 4: 259-265

de KLERK, V. (2002) "Starting with Xhosa English... towards a spoken corpus", International Journal of Corpus Linguistics 7, 1: 21-42.

Freitas, T. (2008). Recolha e transcrição de corpora orais. In E. Fernández Rei & X. L. Regueira (Eds.), Perspectivas sobre a oralidade. (pp. 297-324). Santiago de Compostela: Consello da Cultura Galega - Instituto da Lingua Galega.
http://consellodacultura.org/mediateca/extras/simposio_oralidade.pdf

IZRE'EL, S.- RAHAV, G. (2004) "The Corpus of Spoken Israeli Hebrew (CoSIH); Phase I: The Piloty Study", in Compiling and Processing Spoken Language Corpora. LREC 2004, International Conference on Language Resources and Evaluation. 24th May 2004. Lisboa, Portugal. pp. 1-7.
http://www.tau.ac.il/humanities/semitic/lrec2004.pdf

IZRE'EL, S.- HARY, B.- RAHAV, G. (2002) "Designing CoSIH: The Corpus of Spoken Israeli Hebrew", International Journal of Corpus Linguistics 6, 2: 171-197.


MORENO FERNÁNDEZ, F. (1997) "La formación de corpus de lengua hablada", in MORENO FERNÁNDEZ, F. (Ed.) Trabajos de sociolingüística hispánica. Alcalá de Henares: Universidad de Alcalá, Servicio de Publicaciones (Ensayos y Documentos, 27) pp. 93-114.


MORENO FERNÁNDEZ, F. (1999) "La formación de corpus-corpora de lengua hablada", in DE LAS CUEVAS, J.- FASLA, D. (Eds.) Contribuciones al estudio de la lingüística aplicada. Castellón: Asociación Española de Lingüística Aplicada. pp. 447-464.

TAYLOR, L. (1996) "The compilation of the Spoken English Corpus", in KNOWLES, G.- WICHMANN, A.- ALDERSON, P. (Eds.)Working with Speech: Perspectives on research into the Lancaster/IBM Spoken English Corpus. London & New York: Longman. pp. 20-37.

HIDALGO NAVARRO, A.(1993) "El habla juvenil: Una propuesta metodológica para la extracción de un corpus oral representativo" in FERNANDEZ-BARRIENTOS MARTIN, J. (Ed)Jornadas Internacionales de Lingüística Aplicada/International Conference of Applied Linguistics. Robert J. Di Pietro in Memorian. Actas/Proceedings. Granada: Instituto de Ciencias de la Educación de la Universidad de Granada. vol. 1 pp. 66-75.

SUMMERS, D. (1993) "Longman/Lancaster English Language Corpus - Criteria and Design", International Journal of Lexicography 6,3: 181-208

TAYLOR, L. (1996) "The compilation of the Spoken English Corpus", in KNOWLES, G.- WICHMANN, A.- ALDERSON, P. (Eds.) Working with Speech: Perspectives on research into the Lancaster/IBM Spoken English Corpus.

Torruella, J., & Llisterri, J. (1999). Diseño de corpus textuales y orales. In J. M. Blecua, G. Clavería, C. Sánchez, & J. Torruella (Eds.), Filología e informática. Nuevas tecnologías en los estudios lingüísticos. (pp. 45-77). Barcelona: Seminari de Filologia i Informàtica, Departament de Filologia Espanyola, Universitat Autònoma de Barcelona - Editorial Milenio. Retrieved from http://liceu.uab.cat/~joaquim/publicacions/Torruella_Llisterri_99.pdf

VÁZQUEZ VEIGA, N. (1995) "'Corpus de lengua hablada en la ciudad de A Coruña': el rol del entrevistador en la conversación semidirigida", Moenia, Revista Lucense de Lingüística & Literatura 1: 181-202.

Recording techniques

Data elicitation techniques

arrow_up

Tools for speech corpora acquisition and management


= Recommended introductory/general reading


= Recommended advanced reading

CHEVALIER, G.- KASPARIAN, S.- SILBERZTEIN, M. (2004) "Élements de solution pour le traitement automatique d’un français oral régional", in VÉRONIS, J. (Ed.) Le traitement automatique des corpus oraux, Traitement automatique des langues 45, 2: 41-62.

Draxler, C., & Jänsch, K. (2004). Speechrecorder -- a universal platform independent multi-channel audio recording software. In LREC 2004. Proceedings of the 4th International Conference on Language Resources and Evaluation. (pp. 559-62). Lisbon, Portugal. May 24-30, 2004.

Draxler, C., & Jänsch, K. (2008). Wikispeech -- A content management system for speech databases. In Interspeech 2008. Proceedings of the 9th Annual Conference of the International Speech Communication Association. (pp. 1646-9). Brisbane, Australia. September 22-26, 2008.

FONOLLOSA, J.A.R.- MORENO, A. (1998) "Automatic Database Acquisition Software for ISDN PC Cards and Analogic Boards", in LREC 1998. Proceedings of the First International Conference on Language Resources and Evaluation. 28 - 30 May 1998. Granada, Spain. Paris: ELRA, European Language Resources Association. Vol. II. pp. 1325-1328.

FRYDA, P.- KOPECEK, I. (1998) "PHC Format for Managing Data in Phonetic Corpora", in LREC 1998. Proceedings of the First International Conference on Language Resources and Evaluation. 28 - 30 May 1998. Granada, Spain. Paris: ELRA, European Language Resources Association. Vol. II. pp. 1283-1287.

GALIANO, M. I.- GRANELL, R.- HURTADO, Ll.F.- MIGUEL, A.- SÁNCHEZ, J.A.- SANCHIS, E. (2003) "La plataforma de adquisición de diálogos en el proyecto DIHANA", Procesamiento del Lenguaje Natural 31: 341-342.
http://www.sepln.org/revistaSEPLN/revista/31/31-Pag341.pdf

GAROFOLO, J.S. - PALLETT, D.S. (1989) " Use of CD-Rom for speech database storage and exchange" in TUBACH, J.P.- MARIANI, J.J. (Eds) Eurospeech 89. European Conference on Speech Communication and Technology. Paris- September 1989. Edinburgh: CEP Consultants Ltd. pp.309-312

GIBBON, D. - MOORE, R.- WINSKI, R. (Eds.) (1998) Spoken Language Reference Materials. Berlin: Mouton De Gruyter. (Handbook of Standards and Resources for Spoken Language Systems, Volume IV).


Harrington, J. (2010). Phonetic analysis of speech corpora. Oxford: Wiley-Blackwell. Retrieved from http://phonetik.uni-muenchen.de/~jmh/research/pasc010808/pasc.pdf

NOGUEIRAS, A.- MORENO, A. (1998) "NaniBD: a Set of Tools for Transcribing and Validating Speech Databases", in LREC 1998. Proceedings of the First International Conference on Language Resources and Evaluation. 28 - 30 May 1998. Granada, Spain. Paris: ELRA, European Language Resources Association. Vol. II. pp. 1359-1365.
http://gps-tsc.upc.es/veu/research/pubs/download/Nog98b.ps.gz

RIBEIRO, C.- TRANCOSO, I. - SERRALHEIRO, A. (1993) "A software tool for Speech Collection, Recognition and Reproduction" in Eurospeech'93. 3rd European Conference on Speech Communication and Technology. Berlin, Germany, 21-23 September 1993. Vol. 1 pp. 179-182

SORIA, C.- BERNSEN, N.O.- CADÉE, N.- CARLETTA, J.- DYBKJAER, L.- EVERT, S.- HEID, U.- ISARD, A.- KOLODNYTSKY, M.- LAUER, C.- LEZIUS, W.- NOLDUS, L.P.J.J.- PIRRELLI, V.- REITHINGER, N.- VÖGELE, A. (2002) "Advanced Tools for the Study of Natural Interactivity", in LREC 2002. Proceedings of the Third International Conference on Language Resources and Evaluation. Las Palmas de Gran Canaria, 27 May - 2 June, 2002. European Language Resources Association.
http://www.dfki.de/~bert/NITE-Pisa-F.pdf

VÉRONIS, J. (Ed.) (2004) Le traitement automatique des corpus oraux, Traitement automatique des langues 45, 2.

VÉRONIS, J. (2004) "Le traitement automatique des corpus oraux", in VÉRONIS, J. (Ed.) Le traitement automatique des corpus oraux, Traitement automatique des langues 45, 2: 7-14.

arrow_up

Labelling and annotation of speech corpora

Corpus annotation

Corpus labelling

Corpus segmentation

Llisterri, J. (1999). Transcripción, etiquetado y codificación de corpus orales. Revista Española de Lingüística Aplicada. Volumen Monográfico "Panorama de la Investigación en Lingüística Informática", 53-82. Retrieved from http://liceu.uab.cat/~joaquim/publicacions/RESLA_99.pdf

Levels of labelling

BARRY, W.J.- FOURCIN, A.J. (1992) "Levels of Labelling", Computer Speech and Language 6: 1-14

MARCHAL, A.- NGUYEN, N.- HARDCASTLE, W. (1995) "Multitiered phonetic approach to speech labelling", in SORIN, C.- MARIANI, J.- MELONI, H.- SCHOENTGEN, J. (Eds.) Levels in Speech Communication. Relations and Interactions. A Tribute to Max Wajskop / Hommage à Max Wajskop. Amsterdam: Elsevier Science B.V. pp. 149-158

TILLMANN, H.G.- POMPINO-MARSCHALL, B. (1993) "Theoretical Principles Concerning Segmentation, Labelling Strategies and Levels of Categorical Annotation for Spoken Language Database Systems" in Eurospeech'93. 3rd European Conference on Speech Communication and Technology. Berlin, Germany, 21-23 September 1993. Vol. 3 pp. 1691-1694

arrow_up

Labelling criteria

AUTESSERRE, D.- PÉRENNOU, G.- ROSSI, M. (1989) "Methodology for the transcription and labeling of a speech corpus", Journal of the International Phonetic Association 19,1: 2-15

CAMPBELL, N. (2002) "Labelling natural conversational speech data". Fall Meeting of the Acoustical Society of Japan.
http://www.speech-data.jp/nick/pubs/lncsd.pdf

JOHNSON, K. (2003) "Aligning phonetic transcriptions with their citation forms", Acoustic Research Letters Online5: 19-24.
http://corpus.linguistics.berkeley.edu/~kjohnson/papers/DTW_aligning.pdf

KEATING, P.- MacEACHERN, P.- SHRYOCK, A.- DOMINGUEZ, S. (1994) " A manual for phonetic transcription: Segmentation and labelling of words in spontaneous speech", UCLA Working Papers in Phonetics 88: 91-120.

KROT, C.- TAYLOR, B. (1995) Criteria for Acoustic-Phonetic Segmentation and Word Labelling in the Australian National Database of Spoken Language.
http://andosl.anu.edu.au/andosl/general_info/aue_criteria.html

LANDER, T. (1997) The CSLU Labeling Guide. Center for Spoken Language Understanding, Oregon Graduate Institute.
http://www.cslu.ogi.edu/corpora/docs/labeling.pdf

LENNES, M. Labeling Finnish speech in Worldbet. Department of Speech Sciences, University of Helsinki.
http://www.helsinki.fi/%7Elennes/labeling.html

PITT, M.A.- JOHNSON, K.- HUME, E.- KIESLING, S.- RAYMOND, W. (2005) "The Buckeye Corpus of Conversational Speech: Labeling Conventions and a Test of Transcriber Reliability", Speech Communication 45, 1: 89-95.
http://corpus.linguistics.berkeley.edu/~kjohnson/papers/Pitt_et_al.pdf
http://dx.doi.org/10.1016/j.specom.2004.09.001

ROACH, P.- ROACH, H.- DEW, A.- ROWLANDS, P. (1990) "Phonetic analysis and the automatic segmentation and labeling of speech sounds ", Journal of the International Phonetic Association 20,1: 15-21

arrow_up

Validation of labelling

BARRY, W.J.- GRICE, M. (1991) "Auditory and visual factors in speech database analysis", Speech, Hearing and Language. UCL Work in Progress 5: 9-32

COLE, R.A.- OSHIKA, B.T.- NOEL, M.- LANDER, T.- FANTY, M. (1994) "Labeler Agreement in Phonetic Labeling of Continuous Speech", in Proceedings of the 1994 International Conference on Spoken Language Processing, Yokohama, Japan, 18-22 September 1994.
http://www.isca-speech.org/archive/icslp_1994/i94_2131.html

EISEN, B. (1993) "Reliability of speech segmentation and labelling at different levels of transcription" in Eurospeech'93. 3rd European Conference on Speech Communication and Technology. Berlin, Germany, 21-23 September 1993. Vol. 1 pp. 673-676

GRICE, M.- BARRY, W. (1991) " Phonetic units by ear and eye " in Proceedings of the ESCA Workshop on Phonetics and Phonology of Speaking Styles. Barcelona, Spain, 30 September - 2 October, 1991. pp. 29.1-29.5

HOECKEL, C.J.M. van (1989) "The reliability of manual labelling of contiunous speech", in Proceedings of ESCA Workshop Speech Input / Output Assesment and Speech Databases. Noordwijkerhout, the Netherlands, 20-23 September 1989. pp.5.5.1.-5.5.4

KVALE, K.- FOLDVICK, K. (1991) " Manual segmentation and labelling of continuous speech " in Proceedings of the ESCA Workshop on Phonetics and Phonology of Speaking Styles. Barcelona, Spain, 30 September - 2 October, 1991. pp. 37.1-37.5

PITT, M.A.- JOHNSON, K.- HUME, E.- KIESLING, S.- RAYMOND, W.D. (2005) "The Buckeye Corpus of Conversational Speech: Labeling Conventions and a Test of Transcriber Reliability", Speech Communication 45, 1: 89-95.
http://lpl.psy.ohio-state.edu/documents/Buckeye.Corpus-5.pdf
http://dx.doi.org/10.1016/j.specom.2004.09.001

RAYMOND, W.D.- PITT, M.- JOHNSON, K.- HUME, E.- MAKASHAY, M.- DAUTRICOURT, R.- HILTS, C. (2002) "An analysis of transcription consistency in spontaneous speech from the Buckeye corpus", in ICSLP 2002 - INTERSPEECH 2002. Proceedings of the 7th International Conference on Spoken Language Processing. 16 - 20 September, 2002. Denver, Colorado, USA.
http://buckeyecorpus.osu.edu/pubs/icslp02.pdf

STRANGERT, E.- HELDNER, M. (1995) "Labelling of boundaries and prominences by phonetically experienced and non-experienced transcribers", Phonum 3, Reports from the Department of Phonetics, Umeå University: 85-109.
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.7.2357
arrow_up

Multilingual labelling and annotation

BARRY, W.J. (1989) "Proposals for body of labelling file", in SAM (1989) Esprit Project 1541 (SAM) Multilingual Speech Input/Output: Assessment, Methodology and Standardisation. Extension Phase. Final Report. 1 April 1988- 28 February 1989. pp.194-196

BARRY, W.J.- DALSGAARD, P. (1993) "Speech Database Annotation. The importance of a Multi-Lingual Approach" in Eurospeech'93. 3rd European Conference on Speech Communication and Technology. Berlin, Germany, 21-23 September 1993. Vol. 1 pp. 13-22

DALSGAARD, P.- ANDERSEN, O.- BARRY, W. (1991) "Multi-lingual acoustic-phonetic features for a number of European languages" in Eurospeech 91. 2nd European Conference on Speech Communication and Technology. Genova, Italy, 24-26 September 1991. vol 2 pp. 685-688

DALSGAARD, P.- ANDERSEN, O.- BARRY, W.J. (1991) "The cross-language validity of acoustic-phonetic features in label alignment" inActes du XIIème Congrès International des Sciences Phonétiques. 19-24 août 1991, Aix-en-Provence, France. 5 vols. Aix-en-Provence: Université de Provence, Service des Publications.

ERP, A. van - GRICE, M.- BARRY, W. (1989) "Manual labelling of Danish, Dutch, English and French Speech Material on EUROM.0", in SAM (1989) Esprit Project 1541 (SAM) Multilingual Speech Input/Output: Assessment, Methodology and Standardisation. Extension Phase. Final Report. 1 April 1988- 28 February 1989. pp. 304-315

ERP, A. van- HOUBEN, C.- BARRY, B.- GRICE, M.- BOË, L.J.- BRAUN, G.- COSI, P.- DYHR, N.- PÉRENNOU, G.- VIGOUROUX, N.- AUTESSERRE, D. (1987) "A unified approach to the labelling of speech: First multilingual results" in TUBACH, J.P.- MARIANI, J.J. (Eds) Eurospeech 89. European Conference on Speech Communication and Technology. Paris, September 1989. Edinburgh: CEP Consultants Ltd. vol 2 pp. 88-91
arrow_up

Multimodal labelling and annotation

Multimodal annotation

ALCÁNTARA PLÁ, M. (2007) "La anotación del habla en un corpus de vídeo", Procesamiento del Lenguaje Natural 38: 131-139.
http://www.sepln.org/revistaSEPLN/revista/38/14.pdf

Baldry, A. & Thibault, P. J. (2006). Multimodal transcription and text analysis: A multimedia toolkit and coursebook with associated on-line course. London: Equinox.

Baldry, A. & Thibault, P. J. (2006). Multimodal corpus linguistics. In G. Thompson & S. Hunston (Eds.), System and corpus: Exploring connections. (pp. 164-83). London: Equinox.

Knight, D., Evans, D., Carter, R., & Adolphs, S. (2009). HeadTalk, HandTalk and the corpus: Towards a framework for multi-modal, multi-media corpus development. Corpora, 4, 1-32. Retrieved December 11, 2009, from http://dx.doi.org/10.3366/E1749503209000203

STEININGER, S. - SCHIEL, F. - RABOLD, S. (2006) "Annotation of multimodal data", in WAHLSTER, W. (Ed.) SmartKom: Foundations of Multimodal Dialogue Systems. New York: Springer. pp. 571-596.

arrow_up

Labelling and annotation tools

General works on labelling and annotation tools

Annotation tools

COSI, P. (2002) "Metodologie e sistemi per l’annotazione linguistica", Quaderni dell'Istituto di Fonetica e Dialettologia 4.
http://www.aisv.it/AISVScuolaEstiva2006/materials/P.Cosi/Software%20e%20Metodologie.ppt

DELLWO, V. (2003) "Tools for a combined analysis of speech & gesture", in Proceedings of the 15th International Congress of Phonetic Sciences. Barcelona, 3-9 August, 2003. CD-ROM Edition. Casual Productions.
http://www.phonetiklabor.de/Phonetiklabor/Inhalt/Ver%F6ffentlichungen/PDFs/Speech&Gesture.pdf

GARG, S.- MARTINOVSKI, B.- ROBINSON, S.- STEPHAN, J.- TETREAULT, J.- TRAUM, D.R. (2004) "Evaluation of transcription and annotation tools for a multi-modal multi-party dialogue corpus", in LREC 2004. Proceeedings of the 4th International Conference on Language Resources and Evaluation. 26-28 May, 2004, Lisbon, Portugal. Paris: ELRA, European Language Resources Association. pp. 2163 - 2166.
http://www.lrec-conf.org/proceedings/lrec2004/

JACOBSON, M. (2004) "Gestion de corpus oraux annotés: Méthodes et outils", in JEP 2004. XXVes Journées d'Etudes sur la Parole. 19-22 avril 2004, Fès, Maroc.
http://aune.lpl-aix.fr/jep-taln04/proceed/actes/jep2004/Jacobson.pdf

JACOBSON, M. (2002) "Les outils modernes pour la transcription de corpus de parole", Revue PArole (Mons) 22-23-24: 213-230.

ROHLFING, K.- LOEHR, D.- DUNCAN, S.- BROWN, A.- FRANKLIN, A.- KIMBARA, I.- MILDE, J.-T.- PARRILL, F.- ROSE, T.- SCHMIDT, T.- SLOETJES, H.- THIES, A.- WELLINGHOFF, A. (2005) "Comparison of multimodal annotation tools - workshop report", in Tools Symposium. Second Congress of the International Society for Gesture Studies. 15-18 June 2005. Université de Lyon 2, France.
http://www.gespraechsforschung-ozs.de/heft2006/tb-rohlfing.pdf

VÉRONIS, J. (Ed.) (2004) Le traitement automatique des corpus oraux, Traitement automatique des langues 45, 2.

Specific works on labelling and annotation tools

ALLWOOD, J.- GRONQVIST, L.- AHLSEN, E.- GUNNARSON, M. (2003) "Annotations and tools for an activity based spoken language corpus", in KUPPEVELT, J. van - SMITH, R.W. (Eds.) Current and new directions in discourse and dialogue. Dordrecht: Kluwer Academic Publishers (Text, Speech and Language Technology, 22). pp. 1-18.
http://aclweb.org/anthology//W/W01/W01-1601.pdf

BAILLY, G.- BARBE, T.- WANG, H. (1992) " Automatic labelling of large prosodic databases: Tools, methodology and links with a text-to-speech system", in BAILLY, G.- BENOIT, C. (1992) (Eds) Talking Machines. Theories, Models and Designs. Amsterdam: North Holland.- Elsevier Science Publishers. pp. 307-322

BARRAS, C.- GEOFFROIS, E.- WU, Z.- LIBERMAN, M. (1998) "Transcriber: a Free Tool for Segmenting, Labeling and Transcribing Speech", in LREC 1998. Proceedings of the First International Conference on Language Resources and Evaluation. 28 - 30 May 1998. Granada, Spain. Paris: ELRA, European Language Resources Association. Vol. II. pp. 1373-1376.
http://trans.sourceforge.net/articles/Transcriber-LREC1998.pdf

BARRAS, C.- GEOFFROIS, E.- WU, Z.- LIBERMAN, M. (2001) "Transcriber: development and use of a tool for assisting speech corpora production", Speech Communication 33, 1-2: 5-22.
http://trans.sourceforge.net/articles/Transcriber-SpeechComm2000.ps

BERNSEN, N.O.- DYBKJAER, L.- KOLODNYTSKY, M. (2003) "An interface for annotating natural interactivity", in KUPPEVELT, J. van - SMITH, R.W. (Eds.) Current and new directions in discourse and dialogue. Dordrecht: Kluwer Academic Publishers (Text, Speech and Language Technology, 22). pp. 35-62.
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.58.149

BERNSEN, N.O.- DYBKJAER, L.- KOLODNYTSKY, M. (2002) "The NITE Workbench. A Tool for Annotation of Natural Interactivity and Multimodal Data", in LREC 2002. Proceedings of the Third International Conference on Language Resources and Evaluation. Las Palmas de Gran Canaria, 27 May - 2 June, 2002. European Language Resources Association.
http://spokendialogue.dk/Publications/2002d/NITE-LREC-paper.2.4.2002-F.pdf

BOËFFARD, O.- CHERBONNEL, B.- EMERARD, F.- WHITE, S. (1993) "Automatic Segmentation and Quality Evaluation of Speech Unit Inventories for Concatenation-Based PSOLA Text-to-Speech Systems" in Eurospeech'93. 3rd European Conference on Speech Communication and Technology. Berlin, Germany, 21-23 September 1993. Vol. 2 pp. 1449-1453

BOERSMA, P. (2001) "Praat, a system for doing phonetics by computer", Glot International 5, 9-10: 341-345.

COSI, P. (1993) "SLAM: Segmentation and Labelling Module" in Eurospeech'93. 3rd European Conference on Speech Communication and Technology. Berlin, Germany, 21-23 September 1993. Vol. 1 pp. 665-668.
http://www.isca-speech.org/archive/eurospeech_1993/e93_0665.html

COSI, P. (1995) "SLAM: a PC-Based Multi-Level Segmentation Tool", in RUBIO, A.- LÓPEZ, J.M. (Eds.) Speech Recognition and Coding. New Advances and Trends. Springer-Verlag (NATO ASI Series, Series F: Computer and Systems Sciences, 147) pp. 124-127.
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.46.5234

CHAN, D.S.F.- FOURCIN, A.J. (1993) "Automatic annotation using multi-sensor data" in Eurospeech'93. 3rd European Conference on Speech Communication and Technology. Berlin, Germany, 21-23 September 1993. Vol. 1 pp. 187-190

DE GINESTEL-MAILLAND, A.- DE CALMÈS, M.- PÉRENNOU, G. (1993) "Multi-Level Transcription of Speech Corpora from Orthographic Forms" in Eurospeech'93. 3rd European Conference on Speech Communication and Technology. Berlin, Germany, 21-23 September 1993. Vol. 2 pp. 1441-1444

DOURS, C.- DE CALMÈS, M.- KABRÉ, H.- PÉCATTE, J.M.- PÉRENNOU, G.- VIGOUROUX, N. (1989) "A multilevel automatic segmentation system: SAPHO and VERIPHONE" in TUBACH, J.P.- MARIANI, J.J. (Eds) Eurospeech 89. European Conference on Speech Communication and Technology. Paris, September 1989. Edinburgh: CEP Consultants Ltd. vol 2 pp. 83-86

Dybkjaer, L., Berman, S., Bernsen, N. O., Carletta, J., Heid, U., & Llisterri, J. (2001). Requirements and specifications for a tool in support of annotation of natural interaction and multimodal data. Deliverable D11.2. Final Report. July 2001. ISLE Natural Interactivity and Modality Working Group. Retrieved from http://spokendialogue.dk/Publications/2001e/D11.2-ISLE-29.7.2001-F.pdf

Dybkjaer, L., Berman, S., Kipp, M., Wegener Olsen, M., Pirrelli, V., Reithinger, N., & Soria, C. (2001). Survey of existing tools, standards and user needs for annotation of natural interaction and multimodal data. Deliverable D11.1. Final Report. January 2001. ISLE Natural Interactivity and Multimodality Working Group. Retrieved from http://spokendialogue.dk/Publications/2001f/D11.1-14.2.2001-F.pdf

DYBKJAER, L.- BERNSEN, N.O. (2000) "The MATE Workbench", in ISLE/EAGLES Workshop "Meta-Descriptions and Annotation Schemes for Multimodal/Multimedia Language Resources and Data Architectures and Software Support for Large Corpora". LREC 2000 Workshop, Athens, Greece, 29-30 May 2000.
http://www.imdi.eu/documents/2000%20LREC/dybkjaer_paper.pdf

ESTÈVE, Y.- DELÉGLISE, P.- JACOB, B. (2004) "Système de transcription automatique de la parole et logiciels libres", in VÉRONIS, J. (Ed.) Le traitement automatique des corpus oraux, Traitement automatique des langues 45, 2: 15-40.

Garrido, J. M. (2010). A tool for automatic F0 stylisation, annotation and modelling of large corpora. In Speech Prosody 2010. Fifth International Conference on Speech Prosody. Chicago, Illinois, 11-14 May, 2010. Retrieved May 17, 2010, from http://speechprosody2010.illinois.edu/papers/100041.pdf

GUIRAO, J.- MORENO SANDOVAL, A. (2004) "A "toolbox" for tagging the Spanish C-ORAL-ROM corpus", in Compiling and Processing Spoken Language Corpora. LREC 2004, International Conference on Language Resources and Evaluation. 24th May 2004. Lisboa, Portugal. pp. 28-32.
http://lablita.dit.unifi.it/coralrom/papers/toolbox-final.pdf

HERNÁEZ, I.- BARANDIARÁN, J.- MONTE, E. (1993) " A segmentation algorithm based on acoustical features using a self organizing neural network" in Eurospeech'93. 3rd European Conference on Speech Communication and Technology. Berlin, Germany, 21-23 September 1993. Vol. 1 pp. 661-664

JEONG, C.-G. - JEONG, H. (1996) "Automatic phone segmentation and labeling of continuous speech", Speech Communication 20, 3-4: 291-312

KABRE, H.- PÉRENNOU, G.- VIGOUROUX, N. (1991) "Automatic labelling of speech signal into phonetic events" in Actes du XIIème Congrès International des Sciences Phonétiques. 19-24 août 1991, Aix-en-Provence, France. 5 vols. Aix-en-Provence: Université de Provence, Service des Publications.

KIPP, M. (2001) "Anvil - A generic annotation tool for multimodal dialogue", in EUROSPEECH 2001 - INTERSPEECH 2001. Proceedings of the 7th European Conference on Speech Communication and Technology. 3 - 7 September, 2001. Aalborg, Denmark. pp. 1367-1370.
http://www.dfki.de/~kipp/public_archive/kipp2001-eurospeech.pdf

MARTIN, P. (2004) "WinPitch Corpus: A texto to speech analysis and alignment tool for large multimodal corpora", in Compiling and Processing Spoken Language Corpora. LREC 2004, International Conference on Language Resources and Evaluation. 24th May 2004. Lisboa, Portugal. pp. 48-52.
http://lablita.dit.unifi.it/coralrom/papers/Philippe_Martin.pdf

MERTENS, P. (2004) "The Prosogram : Semi-Automatic Transcription of Prosody based on a Tonal Perception Model", in BEL, B.- MARLIEN, I. (Eds.) Proceedings of Speech Prosody 2004. 23-26 March 2004. Nara, Japan).
http://bach.arts.kuleuven.be/pmertens/papers/sp2004.pdf

MERTENS, P. (2004) "Un outil pour la transcription de la prosodie dans les corpus oraux", in VÉRONIS, J. (Ed.) Le traitement automatique des corpus oraux, Traitement automatique des langues 45, 2: 109-130.
http://bach.arts.kuleuven.be/pmertens/papers/tal2004.pdf

MILDE, J.-T.- GUT, U. (2002) "The TASXenvironment: an XML-based toolset for time aligned speech corpora", in LREC 2002. Proceedings of the Third International Conference on Language Resources and Evaluation. 27 May - 2 June, 2002. Las Palmas de Gran Canaria, Spain. Paris: ELRA, European Language Resources Association. pp. 1922-1927.

MORENO, A.- ARMAS, P.- MARIÑO, J.B.- MASGRAU, E. (1989) "Automatic segmentation of Spanish speech into syllables" in TUBACH, J.P.- MARIANI, J.J. (Eds) Eurospeech 89. European Conference on Speech Communication and Technology. Paris, September 1989. Edinburgh: CEP Consultants Ltd. vol 2 pp. 75-78.

NOGUEIRAS, A.- MORENO, A. (1998) "NaniBD: a Set of Tools for Transcribing and Validating Speech Databases", in LREC 1998. Proceedings of the First International Conference on Language Resources and Evaluation. 28 - 30 May 1998. Granada, Spain. Paris: ELRA, European Language Resources Association. Vol. II. pp. 1359-1365.
http://gps-tsc.upc.es/veu/research/pubs/download/Nog98b.ps.gz

OLIVIER, A. - KIRSCHNING, I. (1999) "Evaluación de métodos de determinación automática de una transcripción fonética", in ENC'99. Segundo Encuentro Nacional de Computación. Pachuca, Hidalgo, México. Septiembre de 1999.
http://ict.udlap.mx/people/ingrid/ingrid/ENC99_409.pdf

PÉRENNOU, G.- DE CALMÈS, M.- FERRANE, I.- TiHONI, J. (1989) "Automated phonotypical transcription through the GEPH phonology expert system" in TUBACH, J.P.- MARIANI, J.J. (Eds) Eurospeech 89. European Conference on Speech Communication and Technology. Paris, September 1989. Edinburgh: CEP Consultants Ltd. vol 2 pp. 364-367

SAM (1992) User Guide to ETR Tools. ESPRIT PROJECT 2589 ( SAM) Multilingual Speech Input/Output Assessment, Methodology and Standardisation. Ref, SAM-UCL-G007.

SCHMIDT, T. (2004) "Transcribing and annotating spoken language with EXMARaLDA" , in Proceedings of the LREC-Workshop on XML based richly annotated corpora. LREC 2004, International Conference on Language Resources and Evaluation. 29 May 2004, Lisbon, Portugal. Paris: ELRA, European Language Resources Association. pp. 69- 74.
http://www1.uni-hamburg.de/exmaralda/Daten/4D-Literatur/Paper_LREC.pdf

SJÖLANDER, K.- BESKOW, J. (2000) "WaveSurfer - An open source speech tool", in ICSLP 2000 - INTERSPEECH 2000. Proceedings of the 6th International Conference on Spoken Language Processing. 16 - 20 October, 2000. Beijin, China. pp. 464-467.
http://www.speech.kth.se/wavesurfer/wsurf_icslp00.pdf

TAMBURINI, F.- CAINI, C. (2004) "Automatic annotation of speech corpora for prosodic prominence", in Compiling and Processing Spoken Language Corpora. LREC 2004, International Conference on Language Resources and Evaluation. 24th May 2004. Lisboa, Portugal. pp. 53-58.
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.58.4592

TAMBURINI, F.- CAINI, C. (2005) "An automatic system for detecting prosodic prominence in American English continuous speech", International Journal of Speech Technology 8: 33-44.
http://dx.doi.org/10.1007/s10772-005-4760-z

Torre Toledano, D., Hernández Gómez, L. A., & Villarrubia Grande, L. (2003). Automatic phonetic segmentation. IEEE Transactions on Speech and Audio Processing, 11(6), 617-625.

VORSTERMANS, A.- MARTENS, J.-P.- VAN COILE, B. (1996) "Automatic segmentation and labelling of multilingual speech data", Speech Communication 19, 4: 271-294.

WEISSER, M. (2003) "SPAACy - A semi-automated tool for annotating dialogue acts", International Journal of Corpus Linguistics 8, 1: 63-74.

INTSINT (International Transcription System for Intonation) tools

Speech analysis and transcription software

arrow_up

Phonetic representation of speech corpora

Transcription

Llisterri, J. (1999). Transcripción, etiquetado y codificación de corpus orales. Revista Española de Lingüística Aplicada. Volumen Monográfico "Panorama de la Investigación en Lingüística Informática", 53-82. Retrieved from http://liceu.uab.cat/~joaquim/publicacions/RESLA_99.pdf

Llisterri, J., Machuca, M. J., Mota, C., Riera, M., & Ríos, A. (2005). Corpus orales para el desarrollo de las tecnologías del habla en español. Oralia. Análisis del Discurso Oral, 8, 289-325. Retrieved from http://liceu.uab.cat/~joaquim/publicacions/Llisterri_Machuca_Mota_Riera_Rios_05_Corpus_Orales_Tecnologias_Habla_Espanol.pdf

Phonetic representation of speech corpora: segmental level

ALLEN, G. (1981) "PHONASCII", in MacWHINNEY, B. The Childes Project: Tools for Analyzing Talk. Hillsdale, N.J.: Lawrence Erlbaum. pp. 71-119

ALLEN, G.D. (1988) " The PHONASCII System", Journal of the International Phonetic Association 18, 1: 9-25.

CONSTABLE, P. (2000) "Phonetic Fonts and Phonetic Data Encoding", in Linguistic Exploration. Workshop on Web-Based Language Documentation and Description. 12 - 15 December 2000, Institute for Research in Cognitive Science, University of Pennsylvania, Philadelphia, Pennsylvania, US.
http://www.learningace.com/doc/3094806/1620bc1a7a052ed4e2afb59726709390/constable

CUÉTARA PRIEDE, J.O. (2004) Fonética de la ciudad de México. Aportaciones desde las tecnologías del habla. Tesis para obtener el título de Maestro en Lingüística Hispánica. Maestría en Lingüística Hispánica, Posgrado en Lingüística, Universidad Nacional Autónoma de México.

ESLING, J.H. (1988) "Computer coding of IPA symbols and detailed phonetic representations of computer databases", Journal of the International Phonetic Association 18,2: 99-106

ESLING, J.H. (1990) "Computer Coding of the IPA: Supplementary Report", Journal of the International Phonetic Association 20,1: 22-26

ESLING, J.H.- GAYLORD, H. (1993) "Computer Codes for Phonetic Symbols", Journal of the International Phonetic Association 23,2: 77-82

GAYLORD, H.E. (1995) "Character representation", Computers and the Humanities 29, 1: 51-73

GIBBON, D. - MOORE, R.- WINSKI, R. (Eds.) (1998) Spoken Language Reference Materials. Berlin: Mouton De Gruyter. (Handbook of Standards and Resources for Spoken Language Systems, Volume IV).

HIERONYMUS, J.L. (1994) ASCII phonetic symbols for the world's languages: Worldbet. AT&T Bell Labos, Technical Report.
http://dipaola.org/stanford/facade/lipsync/worldbet.pdf
http://www.ling.ohio-state.edu/~edwards/WorldBet/worldbet.pdf

IPA, Worldbet and OGIbet Englis Broad Phonetic Labels. Center for Spoken Language Understanding, Oregon Graduate Institute of Science and Technology, 1995.
http://byuh.doncolton.com/courses/cs441/9504.refbet.pdf
http://dipaola.org/stanford/facade/lipsync/refbet.pdf

IPA, International Phonetic Alphabet. International Phonetic Association
http://www.langsci.ucl.ac.uk/ipa/ipachart.html

IPA (1989) "The IPA 1989 Kiel Convention Workgroup 9 report: Computer Coding of IPA symbols and Computer Representation of Individual Languages", Journal of the International Phonetic Association 19,2: 81-92

ISO 646 (1991) Information Processing - ISO 7-bit coded character set for information interchange. Geneva: International Organization for Standardization.

ISO 8859-1 (1987) Information processing - ISO 8-bit syngle byte coded graphic character set form information interchange - Part 1: Lation alphabet No. 1. Geneva: International Organization for Standardization

LENNES, M. Labeling Finnish speech in Worldbet. Department of Speech Sciences, University of Helsinki.
http://www.helsinki.fi/%7Elennes/labeling.html

Llisterri, J., & Mariño, J. B. (1993). Spanish adaptation of SAMPA and automatic phonetic transcription. SAM-A/UPC/001/v1. ESPRIT project 6819 (SAM-A) Speech Technology Assessment in Multilingual Applications. Retrieved from http://liceu.uab.cat/~joaquim/publicacions/SAMPA_Spanish_93.pdf

MARIÑO, J.B.- MORENO, A. (1998) "Spanish Dialects: Phonetic Transcription", in ICSLP 98 Conference Proceedings CD-ROM. The 5th International Conference on Spoken Language Processing. Sydney Convention Centre, Sydney, Australia, 30th November - 4th December 1998. Rundle Mall: Causal Productions, 1998.

MARIÑO, J.B.- MORENO, A. (2000) Spanish SAMPA set. SALA (SpeechDat across Latin America) Doc 2. February 2000.

QUILIS, A.- ENRÍQUEZ, E. Specification of the SAM phonetic alphabet for Spanish. ESPRIT 2104 Polyglot 1.

SCHMIDT, M.S. -SCOTT, C.- JACK, M.A. (1993) "Phonetic transcription standards for European names (ONOMASTICA)", in Eurospeech'93. 3rd European Conference on Speech Communication and Technology. Berlin, Germany, 21-23 September 1993. Vol. 1 pp. 279-282

UCL (1992) "Speech acquisition and Annotation Protocols and Index of Mnemonics (SAM-UCL-018)- Section IV: SAMPA" in SAM User Guide to ETR Tools. ESPRIT PROJECT 2589 ( SAM) Multilingual Speech Input/Output Assessment, Methodology and Standardisation. Ref, SAM-UCL-G007.

URAGA, E. (1999) Modelado fonético para un sistema de reconocimiento de voz continua en español. Tesis de Maestría en Ciencias Computacionales. División de Ingeniería y Ciencias, Campus Morelos, Instituto Tecnológico y de Estudios Superiores de Monterrey. [3.4.4. Propuesta de un alfabeto fonético para el español hablado en México]

URAGA, E. - PINEDA, L. (2000) "A set of phonological rules for Mexican Spanish", in CICLing-2000 Conference on Intelligent Text Trocessing and Computational Linguistics. February 13-19, 2000. México City, México.

URAGA, E.- PINEDA, L. (2002) "Automatic Generation of Pronunciation Lexicons for Spanish", in GELBUKH, A. (Ed.) Computational Linguistics and Intelligent Text Processing. Third International Conference, CICLing 2002. Mexico City, Mexico, February 2002. Proceedings. Berlin: Springer (Lecture Notes in Computer Science, 2276). pp. 330-338. [3. The Phonetic Alphabet Mexbet].

WELLS, J.C., SAMPA: Computer Readable Phonetic Alphabet., Department of Phonetics and Linguistics, University College London
http://www.phon.ucl.ac.uk/home/sampa/

WELLS, J.C., X-SAMPA, Extended SAM Phonetic Alphabet. Department of Phonetics and Linguistics, University College London
http://www.phon.ucl.ac.uk/home/sampa/x-sampa.htm
http://www.phon.ucl.ac.uk/home/sampa/ipasam-x.pdf

WELLS, J.C. (1986) SAMPA for Spanish. Department of Phonetics and Linguistics, University College London
http://www.phon.ucl.ac.uk/home/sampa/spanish.htm

WELLS, J.C. (1986) " A Standardized Machine-Readable Phonetic Notation" in International Conference on Speech Input/Output; Techniques and Applications. London: IEE pp. 134-137

WELLS, J.C. (1987) "Computer Coded Phonetic Transcription", Journal of the International Phonetic Association 17,2: 94-114.

WELLS, J.C. (1989) " Computer-coded phonemic notation of individual languages of the European Community ", Journal of the International Phonetic Association 19,1: 31-54

WELLS, J.C. (1994) "Computer-coding the IPA: a proposed extension of SAMPA", Speech, Hearing and Language, Work in Progress, 1994 (University College London, Department of Phonetics and Linguistics) 8: 271-289.
http://www.phon.ucl.ac.uk/home/sampa/ipasam-x.pdf

WELLS, J.C. (2003) "Phonetic symbols in word processing and on the web", in Proceedings of the 15th International Congress of Phonetic Sciences. Barcelona, 3-9 August, 2003. CD-ROM Edition. Casual Productions.
http://www.phon.ucl.ac.uk/home/wells/ICPhS_18.pdf

WELLS, J.C.- BARRY, W.- GRICE, M.- FOURCIN, A.- GIBBON, D. (1992) Standard Computer-Compatible Transcription. SAM Stage Report Sen.3 SAM UCL-037, 28 February 1992. In SAM (1992) ESPRIT PROJECT 2589 (SAM) Multilingual Speech Input/Output Assessment,Methodology and Standardisation. Final Report. Year Three: 1.III.91-28.II.1992. London: University College London.
arrow_up

Phonetic representation of speech corpora: suprasegmental level

ARVANITI, A.- BALTZANI, M. (2000) "Greek ToBI: A system for the annotation of Greek speech corpora", in LREC 2000. Proceedings of the Second International Conference on Language Resources and Evaluation. Athens, Greece, 31 May - 2 June 2000. European Language Resources Association. pp. 555-562.

BATLINER, A.- KOMPE, R.- KIESSLING, A.- MAST, M.- NIEMANN, H.- NÖTH, E. (1998) "M = Syntax + Prosody: A syntactic-prosodic labelling scheme for large spontaneous speech databases", Speech Communication 25,4: 193-222.
http://www.phonetik.uni-muenchen.de/Bas/BasBPFDokuPRO.pdf

BECKMAN, M.E. - AYERS, G.M. (1997) Guidelines for ToBI Labelling. Version 3, March 1997. Department of Linguistics, Ohio State University.
http://www.ling.ohio-state.edu/research/phonetics/E_ToBI/
http://www.ling.ohio-state.edu/~tobi/ame_tobi/labelling_guide_v3.pdf

BECKMAN, M.E.- HIRSCHBERG, J. The ToBI Annotation Conventions. Appendix A of BECKMAN, M.E. - AYERS, G.M. Guidelines for ToBI Labelling. Department of Linguistics, Ohio State University.
http://www.ling.ohio-state.edu/~tobi/ame_tobi/annotation_conventions.html

BECKMAN, M.E.- HIRSCHBERG, J.- SHATTUCK-HUFNAGEL, S. (2005) "The original ToBI system and the evolution of the ToBI framework", in JUN, S.-A. (Ed.) Prosodic Topology. The Phonology of Intonation and Phrasing. Oxford: Oxford University Press.
http://www.ling.ohio-state.edu/~tobi/JunBook/BeckHirschShattuckToBI.pdf

BECKMAN, M.E.- DÍAZ CAMPOS, M.- McGORY, J.T.- MORGAN, T.A. (2002) "Intonation across Spanish, in the Tones and Break Indices framework", Probus 14, 1: 9-36.
http://www.ling.ohio-state.edu/~mbeckman/Sp_ToBI/Sp_ToBI_Jul29.pdf

Dybkjaer, L., Bernsen, N. O., Wegener Knudsen, M., Llisterri, J., Machuca, M. J., Martin, J. C., . . . Wittenburg, P. (2003). Guidelines for the creation of NIMM annotation schemes. Deliverable D9.2. Final Report. 14 February 2003. ISLE Natural Interactivity and Multimodality Working Group. Retrieved from http://spokendialogue.dk/Publications/2003f/D9.2-13.2.2003-F.pdf

Estruch, M., Garrido, J. M., Llisterri, J., & Riera, M. (1996). Una aproximación fonética al estudio de la entonación. Philologia Hispalensis, 11, 218-293. Retrieved from http://liceu.uab.cat/~joaquim/publicacions/Sevilla_96.pdf

Estruch, M., Garrido, J. M., Llisterri, J., & Riera, M. (2007). Técnicas y procedimientos para la representación de las curvas melódicas. Revista de Lingüística Teórica y Aplicada, 45(2), 59-87. Retrieved January 27, 2009, from http://liceu.uab.cat/~joaquim/publicacions/Estruch_Garrido_Llisterri_Riera_Metodos_Entonacion_07.pdf

Fernández Rei, E., & Escourido, A. B. (2008). Problemas metodológicos en la adquisición de datos prosódicos a partir de corpora. Language Design. Journal of Theoretical and Experimental Linguistics. Special Issue 2: Experimental Prosody, 2, 249-258.

Garrido, J. M. (2010). A tool for automatic F0 stylisation, annotation and modelling of large corpora. In Speech Prosody 2010. Fifth international conference on speech prosody. Chicago, Illinois, 11-14 May, 2010. Retrieved May 17, 2010, from http://speechprosody2010.illinois.edu/papers/100041.pdf

GIBBON, D. (1989) Survey of Prosodic Labelling for EC Languages. SAM-UBI-1/90, 12 February 1989; Report e.6, in ESPRIT 2589 (SAM) Interim Report, Year 1. Ref. SAM-UCL G002. University College London, February 1990.

GIBBON, D. - MOORE, R.- WINSKI, R. (Eds.) (1998) Spoken Language Reference Materials. Berlin: Mouton De Gruyter. (Handbook of Standards and Resources for Spoken Language Systems, Volume IV).

GRØNNUM THORSEN, N. (1987) "Suprasegmental transcription", ARIPUC - Annual Report of the Institute of Phonetics University of Copenhagen 21: 1-27

GURLEKIAN, J.- RODRÍGUEZ, H.- COLANTONI, J.- TORRES, H. (2001) "Development of a prosodic database for an Argentine Spanish text to speech system", in Proceedings of the IRCS Workshop on Linguistic Databases. 11-13 December 2001, University of Pennsylvania, Philadelphia. [3.4. ToBI for Argentine Spanish]
http://www.researchgate.net/publication/2586101_Development_of_a_Prosodic_Database_for_an_Argentine_Spanish_Text_to_Speech_System

HUALDE, J.I, (2003) "El modelo métrico y autosegmental", in PRIETO, P. (Ed.) Teorías de la entonación. Barcelona: Ariel (Ariel Lingüística). pp. 155-184.

KNOWLES, G. (1991) "Prosodic labelling: the problem of tone group boundaries", in JOHANSSON, S.- STENSTRÖM, A. (Eds) English Computer Corpora. Selected Papers and Research Guide. Berlin: Mouton de Gruyter. pp. 149-163

KNOWLES, G. (1996) "The value of prosodic transcriptions", in KNOWLES, G.- WICHMANN, A.- ALDERSON, P. (Eds.) Working with Speech: Perspectives on research into the Lancaster/IBM Spoken English Corpus. London & New York: Longman. pp. 87-106.

KNOWLES, G.- LAWRENCE, L. (1987) "Automatic intonation assignment" in GARSIDE, R.- LEECH, G.- SAMPSON, G. (Eds) The Computational Analysis of English: A Corpus-based Approach. London: Longman. pp. 139-148

Llisterri, J. (1994). Prosody encoding survey. WP 1 Specifications and Standards. T1.5. Markup Specifications. Deliverable 1.5.3. Final version, 15 September 1994. LRE Project 62-050 MULTEXT. Retrieved from http://liceu.uab.cat/~joaquim/publicacions/Prosody_encoding_94.pdf

Llisterri, J. (1995). Spanish prosodic labelling. Workshop on prosodic labelling, ICPhS 1995. 3th international congress of phonetic sciences . Stockholm, Sweden, August 13-19, 1995. Retrieved from http://liceu.uab.cat/~joaquim/publicacions/Spanish_Prosodic_Labelling.pdf

McGORY, J.T.- DÍAZ CAMPOS, M. Sp-ToBI (Spanish Tones and Break Indices). Department of Linguistics, Ohio State University.
http://www.ling.ohio-state.edu/~tobi/sp-tobi/spanish.html

MERTENS, P. (1991) "Intonation" in BLANCHE-BENVENISTE, C.- BILGER, M.- ROUGET, Ch.- van den EYNDE, K. Le français parlé. Etudes grammaticales. Paris: Editions du Centre National de la Recherche Scientifique (Sciences du Langage) pp. 159-176.

PICKERING, B.- WILLIAMS, B.- KNOWLES, G. (1996) "Analysis of transcriber differences in the SEC", in KNOWLES, G.- WICHMANN, A.- ALDERSON, P. (Eds.) Working with Speech: Perspectives on research into the Lancaster/IBM Spoken English Corpus. London & New York: Longman. pp. 61-86.

PITRELLI, J. - BECKMAN, M. - HIRSCHBERG, J. (1994) "Evaluation of prosodic transcription labelling reliability in the ToBI framework", in Proceedings of the third International Conference on Spoken Language Processing, Yokohama, ICSLP, Vol. 2. pp. 123-126.
http://www.ling.ohio-state.edu/~tobi/ame_tobi/Pitrelli_etal1994.pdf

QUAZZA, S.- GARRIDO, J.M. (1998) "Prosody", in KLEIN, M. (Ed.) Supported Coding Schemes. MATE Deliverable D1.1. LE Telematics Project LE4 – 8370. July 1998.
http://liceu.uab.cat/publicacions/MATE_D1_1_6_Prosody/D11_6_Prosody.html

QUAZZA, S.- GARRIDO, J.M. (2000) "Prosody", in MENGEL, A.- DYBKJAER, L.- GARRIDO, J.M.- HEID, U.- KLEIN, M.- PIRRELLI, V.- POESIO, M.- QUAZZA, S.- SCHIFFRIN, A.- SORIA, C. MATE Dialogue Annotation Guidelines. MATE Deliverable D2.1. LE Telematics Project LE4 – 8370. 8 January 2000.
http://www.andreasmengel.de/pubs/mdag.pdf

RAMÍREZ VERDUGO, D. (2003) "A new approach to the analysis and annotation of speech and prosody based on computational cross-linguistic corpora", XIX Congreso de la Sociedad Española para el Procesamiento del Lenguaje Natural. Universidad de Alcalá, 10, 11 y 12 de septiembre de 2003. Procesamiento del Lenguaje Natural 31: 343-344.
http://www.sepln.org/revistaSEPLN/revista/31/31-Pag343.pdf

ROACH, P.- ARNFIELD, S. (1995) "Linking prosodic transcription to the time dimension", in LEECH, G.- MYERS, G.- THOMAS, J. (Eds) Spoken English on Computer: Transcription, Markup and Applications. Harlow: Longman. pp. 149-160

SELTING, M. (1987) "Descriptive categories for the auditive analysis of intonation in conversation", Journal of Pragmatics 11: 777-791

SELTING, M. (1988) "The role of intonation in the organisation of repair and problem handling sequences in conversation", Journal of Pragmatics 12: 293-322.

SILVERMAN, K.- BECKMAN, M.- PITRELLI, J.- OSTENDORF, M.- WIGHTMAN, C.- PRICE, P.- PIERREHUMBERT, J.- HIRSCHBERG, J. (1992) "TOBI: A standard for labelling English prosody", in ICSLP'92. Proceedings of the Second International Conference on Spoken Language Processing. Banff, October 1992. pp. 867-870.
http://www.ling.ohio-state.edu/~tobi/ame_tobi/Silverman_etal1992.pdf

SOSA, J.M. (2003) "La notación tonal del español en el modelo Sp-ToBI", in PRIETO, P. (Ed.) Teorías de la entonación. Barcelona: Ariel (Ariel Lingüística). pp. 185-208.

ToBI, Tone and Break Indices
http://www.ling.ohio-state.edu/~tobi/

English ToBI Homepage, Department of Linguistics, Ohio State University
http://www.ling.ohio-state.edu/~tobi/ame_tobi/

Spanish ToBI Labelling Guidelines: Word Tier.
http://www.ling.ohio-state.edu/~tobi/sp-tobi/word-tier.html

Spanish ToBI Labelling Guidelines: Tone Tier.
http://www.ling.ohio-state.edu/~tobi/sp-tobi/tonetier.html

WELLS, J.C. (1995) SAMPROSA, SAM Prosodic Transcription. Department of Phonetics and Linguistics, University College London.
http://www.phon.ucl.ac.uk/home/sampa/samprosa.htm

WILLIAMS, B. (1996) "The formulation of an intonation transcription system for British English", in KNOWLES, G.- WICHMANN, A.- ALDERSON, P. (Eds.) Working with Speech: Perspectives on research into the Lancaster/IBM Spoken English Corpus. London & New York: Longman. pp. 38-58.

INTSINT, International Transcription System for Intonation

INTSINT

Astésano, C., Espesser, R., Hirst, D., & Llisterri, J. (1997). Stylisation automatique de la fréquence fondamentale: Une évaluation multilingue. In Actes du 4ème congrés français d’acoustique. Vol 1. (pp. 441-3). Marseille, France. 14-18 avril, 1997. Retrieved from http://liceu.uab.cat/~joaquim/publicacions/Astesano_et_al_97.pdf

Baqué, L., & Estruch, M. (2003). Modelo de Aix-en-Provence. In P. Prieto (Ed.), Teorías de la entonación. (pp. 123-53). Barcelona: Ariel. Retrieved from http://sites.google.com/site/lorrainebaqueuab/publis/ModeloAix-en-ProvenceV3.pdf

CAELEN-HAUMONT, G. - AURAN, C. (2003) "INTSMEL: un outil pour l’analyse des contours proéminents de F0", in Colloque international PFC : Phonologie et Phonétique du français, données et théories. 11-13 décembre 2003. Paris, France.
http://halshs.archives-ouvertes.fr/hal-00256394/

CAMPIONE, E. - FLACHAIRE, E. - HIRST, D. - VÉRONIS, J. (1997) "Stylisation and symbolic coding of F0: A quantitative model", in ESCA Tutorial and Research Worskhop on Intonation: Theory, Models and Applications. 18-20 September, 1997. Athens, Greece. pp. 71-74.
http://sites.univ-provence.fr/veronis/pdf/1997esca-campione.pdf

CAMPIONE, E. - FLACHAIRE, E. - HIRST, D. - VÉRONIS, J. (1998) "Evaluation de modèles d'étiquetage automatique de l’intonation", in Actes des 22èmes Journées d'Etude sur la Parole. Martigny (Suisse). pp. 99-102.
http://sites.univ-provence.fr/~veronis/pdf/1998jep-campione.pdf

CAMPIONE, E. - HIRST, D. - VÉRONIS, J. (2000) "Automatic stylisation and symbolic coding of F0: Implementations of the INTSINT model", in BOTINIS, A. (Ed.) Intonation: Analysis, Modelling and Technology. Dordrecht: Kluwer Academic Publishers (Text, Speech and Language Technology, 15). pp. 185-208.
http://sites.univ-provence.fr/~veronis/pdf/2000Campione.pdf

CAMPIONE, E. - VÉRONIS, J. (1998) "A statistical study of pitch target points in five languages", in Proceedings of the 5th International Conference on Spoken Language Processing, ICSLP'98. 30 November - 4 December 1998, Sidney (Australia).
http://sites.univ-provence.fr/~veronis/pdf/1998icslp-stats.pdf

CAMPIONE, E. - VÉRONIS, J. (2000) "Une évaluation de l’algorithme de stylisation mélodique MOMEL", TIPA, Travaux Interdisciplinaires du Laboratoire Parole et Langage d'Aix-en-Provence 19: 27-44.
http://aune.lpl.univ-aix.fr/lpl/tipa/19/tipa19-campione.pdf

CAMPIONE E. - VÉRONIS, J. (2001) "Etiquetage prosodique semi-automatique des corpus oraux", in TALN'2001. Actes de la Conférence Traitement Automatique des Langues. Tours: ATALA. pp. 123-132.
http://sites.univ-provence.fr/~veronis/pdf/2001-taln.pdf

CAMPIONE, E. - VÉRONIS, J. (2001) "Semi-automatic tagging of intonation in French spoken corpora", in RAYSON, P- WiILSON, A.- McENERY, T.- HARDIE, A.- KHOJA, S. (Eds.) Proceedings of the Corpus Linguistics'2001 Conference. Lancaster, U.K.: Lancaster University, UCREL. pp. 90-99.
http://sites.univ-provence.fr/~veronis/pdf/2001-lancaster-intonation.pdf

DI CRISTO, A. - HIRST, D. - BOUDOURESQUES, N. - LOUIS, M. (2002) "Écrire l’intonation: le système INTSINT, fondements théoriques et illustrations", Revue PArole 22-23-24: 175-212.

ESTRUCH AXMACHER, M. (2000) "Évaluation de l’algorithme de stylisation mélodique MOMEL et du système de codage symbolique INTSINT avec un corpus de passages en catalan", TIPA -Travaux Interdisciplinaires du laboratoire Parole et Langage d'Aix-en-Provence 19. pp. 45-61.
http://aune.lpl.univ-aix.fr/lpl/tipa/19/tipa19-estruch.pdf

Giordano, R. (2005). Analisi prosodica e trascrizione intonativa in INTSINT. In F. Albano Leoni, & R. Giordano (Eds.), Italiano parlato. Analisi di un dialogo (con un CDROM contenente il materiale audio variamente elaborato e altri materiali). (pp. 231-56). Napoli: Liguori Editore.

HIRST, D. J. INTSINT: An International Transcription System
http://aune.lpl.univ-aix.fr/~hirst/intsint.html

HIRST, D. J. (1991) "Intonation models: towards a third generation", in Actes du XIIème Congrès International des Sciences Phonétiques, 19-24 août 1991, Aix-en-Provence, France. Aix-en-Provence, Université de Provence, Service des Publications, Vol 1, pp. 305-310

HIRST, D. J. (1999) "The symbolic coding of duration and timing: an extension to the INTSINT system", in Eurospeech'99. 6th European Conference on Speech Communication and Technology. September 5-9, 1999, Budapest, Hungary.
http://aune.lpl.univ-aix.fr/~hirst/articles/1999%20Hirst.pdf

HIRST, D. J. (2000) "ProZed: a multilingual prosody editor for speech synthesis", in Proceedings of the IEE Workshop "State of the Art in Speech Synthesis". London, March 2000.
http://aune.lpl.univ-aix.fr/~hirst/articles/2000%20Hirst_b.pdf

HIRST, D. J. (2002) "Automatic analysis of prosody for multilingual speech corpora", in KELLER, R. - BAILLY, G. - MONAGHAN, A. - TERKEN, J. - HUCKVALE, M. (Eds.) Improvements in Speech Synthesis. Cost 258: The Naturalness of Synthetic Speech. Chichester: John Wiley & Sons. pp. 320-327.
http://aune.lpl.univ-aix.fr/~hirst/articles/2001%20Hirst.pdf

HIRST, D. J. (2005) "Phonetic and phonological annotation of speech prosody", in SAVY, R. - CROCCO, C. (Eds.) AISV 2005. 2o Convegno Nazionale "Analisi prosodica: teorie, modelli e sistemi di annotazione". Associazione Italiana di Scienze della Voce, Salerno, 30 Novembre - 2 Dicembre 2005.

HIRST, D. J. (2007) "A Praat plugin for MOMEL and INTSINT with improved algorithms for modelling and coding intonation", in Proceedings of the 16th International Congress of Phonetic Sciences. 6-10 August, 2007, Saarbrücken, Germany. pp. 1233-1236.
http://www.icphs2007.de/conference/Papers/1443/1443.pdf

Hirst, D. (2011). The analysis by synthesis of speech melody: From data to models. Journal of Speech Sciences, 1(1), 55-83. Retrieved from http://www.journalofspeechsciences.org/index.php/journalofspeechsciences/article/view/21

HIRST, D.J. - Di CRISTO, A. (1998) "A survey of Intonation Systems", in HIRST, D.- Di CRISTO, A. (Eds.) Intonation Systems. A Survey of Twenty Languages. Cambidge: Cambridge University Press. pp. 1-44.
http://aune.lpl.univ-aix.fr/~hirst/articles/1998%20Hirst&DiCristo.pdf

HIRST, D. J. - DI CRISTO, A. - ESPESSER, R. (2000) "Levels of representation and levels of analysis for the description of intonation systems", in HORNE, M. (Ed.) Prosody: Theory and Experiment. Studies presented to Gösta Bruce. Dordrecht: Kluwer Academic Publishers (Text, Speech and Language Technology, 14). pp. 51-88.
http://aune.lpl.univ-aix.fr/~hirst/articles/2000%20Hirst&al.pdf

HIRST, D. J. - DI CRISTO, A. - LE BESNERAIS, M. - NAJIM, Z. - NICOLAS, P.- ROMÉAS, P. (1993) "Multilingual modelling of intonation patterns", in HOUSE, D.- TOUATI, P. (Eds) Proceedings of an ESCA Workshop on Prosody. September 27-29, 1993, Lund, Sweden. Lund University Department of Linguistics and Phonetics, Working Papers 41. pp. 204-207

HIRST, D. J. - ESPESSER, R. (1993) "Automatic modelling of fundamental frequency using a quadratic spline function", Travaux de l'Institut de Phonétique d'Aix 15: 71-85.
http://aune.lpl.univ-aix.fr/~hirst/articles/1993%20Hirst&Espesser.pdf

HIRST, D. J. - ESPESSER, R. (1995) Prosodic labeling tools. LRE Project 62-050 MULTEXT. Task 2.5 Prosody Tools. Deliverable 2.6.1. Version B, March 1995.
http://aune.lpl.univ-aix.fr/pub/multext/docs/M2.6.1B.rtf.gz

HIRST, D. J. - ESPESSER, R. (1995) Prosodic labeling tools. Appendix 1: User's guide to Prosody Tools. LRE Project 62-050 MULTEXT. Task 2.5 Prosody Tools. Deliverable 2.6.1. Version B, March 1995.
http://aune.lpl.univ-aix.fr/pub/multext/docs/prosotut.rtf.gz

HIRST, D. J. - IDE, N. - VÉRONIS, J. (1994) "Coding fundamental frequency patterns for multi-lingual synthesis with INTSINT in the MULTEXT project", in Conference Proceedings of the Second ESCA/IEEE Workshop on Speech Synthesis. September 12-15, 1994. Mohonk Mountain House, New Paltz, New York, USA. pp. 77-80-
http://www.isca-speech.org/archive_open/ssw2/ssw2_077.html

HIRST, D. J. - NICOLAS, P. - ESPESSER, R. (1991) "Coding the Fo of a continuous text in French: An experimental approach" in Actes du XIIème Congrès International des Sciences Phonétiques, 19-24 août 1991, Aix-en-Provence, France. Aix-en-Provence, Université de Provence, Service des Publications, Vol 5, pp. 234-237

Llisterri, J. (Ed.). (1996). Prosody tools efficiency and failures. WP 4 Corpus. T4.6 Speech Markup and Validation. Deliverable 4.5.2. Final version. 15 October 1996. LRE Project 62-050 MULTEXT. Retrieved from http://liceu.uab.cat/~joaquim/publicacions/Prosody_tools_96.pdf

RIERA MASJOAN, M. (2001) Anàlisi acústica dels moviments tonals del grup accentual en català. Programa de Tercer Cicle "Lingüística: Tractament Informàtic del Llenguatge", Departament de Filologia Espanyola, Facultat de Filosofia i Lletres, Universitat Autònoma de Barcelona, September 2001.
http://liceu.uab.cat/publicacions/Riera2001.pdf

VÉRONIS, J. - CAMPIONE, E. (1998) "Towards a reversible symbolic coding of intonation", in Proceedings of the 5th International Conference on Spoken Language Processing, ICSLP'98. 30 November - 4 December 1998, Sidney (Australia). pp. 2899-2902.
http://sites.univ-provence.fr/~veronis/pdf/1998icslp-coding.pdf

arrow_up

Transcription and encoding of spoken corpora

ABOU HAIDAR, L. (Ed.) (2002) Transcription de la parole normale et pathologique. Revue PArole (Mons) 22-23-24.

ALBELDA MARCO, M. (2005) "Sistemas de transcripción de los corpus orales del español", in CARRIÓ PASTOR, M. (Ed.) Perspectivas interdisciplinares de la lingüística aplicada. València: Universitat Politècnica de València - AESLA, Asociación Española de Lingüística Aplicada. CD-ROM. Vol. 2. pp. 381-387.

ALBELDA MARCO, M.- FERNÁNDEZ COLOMER, M. J. (2005) "Análisis de los signos y convenciones del sistema de transcripción de Val.Es.Co", in CARRIÓ PASTOR, M. (Ed.) Perspectivas interdisciplinares de la lingüística aplicada. València: Universitat Politècnica de València - AESLA, Asociación Española de Lingüística Aplicada. CD-ROM. Vol. 2. pp. 65-74.

ALLWOOD, J.- GRONQVIST, L.- AHLSEN, E.- GUNNARSON, M. (2003) "Annotations and tools for an activity based spoken language corpus", in KUPPEVELT, J. van - SMITH, R.W. (Eds.) Current and new directions in discourse and dialogue. Dordrecht: Kluwer Academic Publishers (Text, Speech and Language Technology, 22). pp. 1-18.
http://aclweb.org/anthology//W/W01/W01-1601.pdf

ÁVILA MUÑOZ, A.M. (1996) "Problemas prácticos en la realización de corpus orales. La transliteración del corpus oral del proyecto de investigación de las variedades vernáculas malagueñas (VUM)", in LUQUE DURÁN, J. de D.- PAMIES BERTRÁN, A. (Eds.) Actas del Primer Simposio de Historiografía Lingüística. Granada, 1996. Granada: Método Ediciones. pp. 103-112.

BALDRY, A. - THIBAULT, P. (2004) Multimodal Transcription and Text Analysis. Oakville, CT: TheDavid Brown Book Company. (Equinox Textbooks and Surveys in Linguistics).

BILGER, M. et al. (1997) "Transcription de l’oral et interprétation: illustration de quelques difficultés", Recherches sur le français parlé 14

Bilger, M. (coord.). (2009). Données orales. Les enjeux de la transcription. Perpignan: Presses Universitaires de Perpignan.

BLANCHE-BENVENISTE, C. (1997) "Transcription et technologie", Recherches sur le français parlé 14

BLANCHE-BENVENISTE, C. (2002) "Réflexions sur les transcriptions du corpus de français parlé", Revue PArole (Mons) 22-23-24: 91-118.

BLANCHE-BENVENISTE, C.- COLETTE, J.J. (1987) Le français parlé: Transcription et Edition. Paris: Didier Erudition.

BLOOM, L. (1993) "Transcription and Coding for Child Language Research: The Parts are More than the Whole", in EDWARDS, J.A.- LAMPERT, M.D. (Eds) Talking Data: Transcription and Coding in Discourse Research. Hillsdale, N.J.: Lawrence Erlbaum Associates. pp. 149-168

BOUDOURESQUES, N.- DI CRISTO, A.- HIRST, D. (2002) "Élaboration d’un cadre de recherche pour l'évaluation subjective et la notation objective des troubles prosodiques du patient traumatisé crânien avec lésions cérébrales définies", Revue PArole (Mons) 22-23-24: 119-142.

BRAZIL, D. (1987) "Representing Pronunciation" in SINCLAIR, J. (Ed) Looking Up, An Account of the COBUILD Project. London: Collins. pp. 160-166

BRIZ, E.A.- ALBELDA, M.- GREÑO, A.- HIDALGO, A.- PADILLA, X.A.- PONS, S.- RUIZ, L. SANMARTÍN, J. (2002) "La transcripción de la lengua hablada: el sistema del grupo Val.Es.Co", Español Actual. Revista de español vivo 77: 57-86.

BRIZ, A. (Coord.) (1995) La conversación coloquial (Materiales para su estudio). València: Universitat de València, Facultad de Filología, Departamento de Filología Española (Lengua Española) (Cuadernos de FIlología, Anejo XVI) [5. La transcripción. Signos y convenciones. 5.1. Sistema de transcripción. 5.2. Precisiones sobre los signos prosódicos: pausas y entonación. 5.2.1. Tonemas. 522. Pausas. 5.3. Los filtros].

BRIZ, A.- GÓMEZ, J.R. (1992) "Scheme of Study of Colloquial Spanish: Some Methodological Considerations", in MORENO FERNÁNDEZ, F. (Ed.) Sociolinguistics and Stylistic Variation, LynX 3: 111-124.

BROEDER, D.- CUNNINGHAM, H.- IDE, N.- ROY, D.- THOMPSON, H.- WITTENBURG, P. (Eds.) (2000) Meta-Descriptions and Annotation Schemes for Multimodal/Multimedia Language Resources. Workshop Proceedings. LREC 2000, Second International Conference on Language Resources and Evaluation. Athens, Greece, 29-30 May 2000.

BROEDER, D.- OFFENGA, F.- WILLEMS, D.- WITTENBURG, P. (2002) EAGLES/ISLE Metadata Set for Multimedia/Multimodal Language Resources. ISLE Natural Interactivity and Multimodality Working Group.Deliverable D10.2. August 2002.

BROEDER, D.- OFFENGA, F.- WITTENBURG, P. (2002) EAGLES / ISLE Overview of Metadata Initiatives and Corpus Metadata in Language Engineering and Linguistics. ISLE Natural Interactivity and Multimodality Working Group.Deliverable D10.1. February 2001.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.84.4271&rep=rep1&type=pdf

BÜRKI, Y. - De STEFANI, E. (Eds.) (2006) Trascrivere la lingua / Transcribir la lengua. Dalla filologia all’analisi conversazionale / De la Filología al Análisis Conversacional. Frankfurt a. M.: Peter Lang.

CAPPEAU, P. (1997) "Données erronées: quelles erreurs commettent les transcripteurs?", Recherches sur le français parlé 14

CERCLE LINGUISTIQUE D'AIX-EN-PROVENCE (1995) Langue orale: ses unités descriptives. Travaux, 13. Aix-en-Provence: Publications de l'Université de Provence.

CERDÁN, L.- LLOBERA, M. (1997) "Actuación de los profesores en el aula. Desarrollo de un modelo semiótico de transcripción", RESLA, Revista Española de Lingüística Aplicada 12: 115-140.

COOK, G. (1995) "Theoretical issues: transcribing the untranscribable", in LEECH, G.- MYERS, G.- THOMAS, J. (Eds) Spoken English on Computer: Transcription, Markup and Applications. Harlow: Longman. pp. 35-53

COULTHARD, M.- MONTGOMERY, M (Eds) (1981) Studies in Discourse Analysis. London: Routledge and Keagan Paul.

CROWDY, S. (1994) "Spoken corpus transcription", Literary & Linguistic Computing 9,1: 25-28.

CROWDY, S. (1995) "The BNC spoken corpus", in LEECH, G.- MYERS, G.- THOMAS, J. (Eds) Spoken English on Computer: Transcription, Markup and Applications. Harlow: Longman. pp. 224-234

CHAFE, W. (1995) "Adequacy, user-friendliness, and practicality in transcribing", in LEECH, G.- MYERS, G.- THOMAS, J. (Eds) Spoken English on Computer: Transcription, Markup and Applications. Harlow: Longman. pp. 54-61

CHAFE, W.L. (1993) "Prosodic and Functional Units of Language", in EDWARDS, J.A.- LAMPERT, M.D. (Eds) Talking Data: Transcription and Coding in Discourse Research. Hillsdale, N.J.: Lawrence Erlbaum Associates. pp. 33-44

CREER, S.- THOMPSON, P. (2004) "Processing spoken language data: The BASE experience", in Compiling and Processing Spoken Language Corpora. LREC 2004, International Conference on Language Resources and Evaluation. 24th May 2004. Lisboa, Portugal. pp. 20-27.
http://www.reading.ac.uk/AcaDepts/ll/base_corpus/creer_thompson_final.pdf

DU BOIS, J.W. (1991) " Transcription design principles for spoken discourse research", Pragmatics 1: 71-106

DU BOIS, J.W.- SCHUETZE-COBURN, S. (1993) "Representing Hierarchy: Constituent Structure for Discourse Databases", in EDWARDS, J.A.- LAMPERT, M.D. (Eds)Talking Data: Transcription and Coding in Discourse Research. Hillsdale, N.J.: Lawrence Erlbaum Associates. pp. 221-262

DU BOIS, J.W.- SCHUETZE-COBURN, S.-CUMMING, S.- PAOLINO, D. (1993) "Outline of discourse transcription", in EDWARDS, J.A.- LAMPERT, M.D. (Eds.) Talking Data: Transcription and Coding in Discourse Research. Hillsdale, N.J.: Lawrence Erlbaum Associates. pp. 45-90

DYBKJAER, L.- BERMAN, S.- KIPP, M.- WEGENER OLSEN, M.- PIRRELLI, V.- REITHINGER, N.- SORIA, C. (2001) Survey of existing tools, standards and user needs for annotation of natural interaction and multimodal data. ISLE Natural Interactivity and Multimodality Working Group. Deliverable D11. 1. January 2001.
http://spokendialogue.dk/Publications/2001f/D11.1-14.2.2001-F.pdf

DYBKJAER, L.- BERNSEN, N. O. (2002) "Natural Interactivity Resources - Data, Annotation Schemes and Tools", in LREC 2002. Proceedings of the Third International Conference on Language Resources and Evaluation. Las Palmas de Gran Canaria, 27 May - 2 June, 2002. European Language Resources Association.

DYBKJAER, L.- BERNSEN, N.O.- BROEDER, D.- WITTENBURG, P. (2003) Introduction to and Summary of the Final NIMM WG Guidelines. ISLE Natural Interactivity and Multimodality Working Group Deliverable D7.1. February 2003.
http://spokendialogue.dk/Publications/2003e/D7.1-14.2.2003-F.pdf

DYBKJAER, L.- BERNSEN, N.O.- DYBKJAER, H.- McKELVIE, D.- MENGEL, A. (1998) The MATE Markup Framework. MATE Deliverable D1.2. 30 November 1998.
http://www.aclweb.org/anthology-new/W/W00/W00-1003.pdf

Dybkjaer, L., Bernsen, N. O., Wegener Knudsen, M., Llisterri, J., Machuca, M. J., Martin, J. C., . . . Wittenburg, P. (2003). Guidelines for the creation of NIMM annotation schemes. Deliverable D9.2. Final Report. 14 February 2003. ISLE Natural Interactivity and Multimodality Working Group. Retrieved from http://spokendialogue.dk/Publications/2003f/D9.2-13.2.2003-F.pdf

EDWARDS, J.A. (1991) "Transcription in discourse" in BRIGHT, W. (Ed) Oxford International Encyclopedia of Linguistics. Oxford: Oxford University Press. Vol 1 pp. 367-371

EDWARDS, J.A. (1992) "Design principles in the transcription of spoken discourse" in SVARTVIK, J. (Ed) Directions in Corpus Linguistics. Proceedings of Nobel Symposium 82. Stockholm, 4-8 August, 1991. Berlin: Mouton de Gruyter. pp. 129-147

EDWARDS, J.A. (1993) "Principles and Contrasting Systems of Discourse Transcription", in EDWARDS, J.A.- LAMPERT, M.D. (Eds) Talking Data: Transcription and Coding in Discourse Research. Hillsdale, N.J.: Lawrence Erlbaum Associates. pp. 3-32

EDWARDS, J.A. (1995) "Principles and alternative systems in the transcription, coding and mark-up of spoken discourse", in LEECH, G.- MYERS, G.- THOMAS, J. (Eds) Spoken English on Computer: Transcription, Markup and Applications. Harlow: Longman. pp. 19-34

EDWARDS, J.A.- LAMPERT, M.D. (Eds) (1993) Talking Data: Transcription and Coding in Discourse Research. Hillsdale, N.J.: Lawrence Erlbaum Associates.

EHLICH, K. (1993) "HIAT: A Transcription System for Discourse Data", in EDWARDS, J.A.- LAMPERT, M.D. (Eds)Talking Data: Transcription and Coding in Discourse Research. Hillsdale, N.J.: Lawrence Erlbaum Associates. pp. 123-148

FINK, G.A.- JOHANNTOKRAX, M.- SCHAFFRANIETZ, B. (1995) "A flexible formal language for the orthographic transcription of spontaneous spoken dialogues , in Eurospeech'95. Proceedings of the 4th European Conference on Speech Communication and Speech Technology. Madrid, Spain, 18-21 September, 1995. Vol 1, pp. 871-874

Freitas, T. (2008). Recolha e transcrição de corpora orais. In E. Fernández Rei & X. L. Regueira (Eds.), Perspectivas sobre a oralidade. (pp. 297-324). Santiago de Compostela: Consello da Cultura Galega - Instituto da Lingua Galega.
http://consellodacultura.org/mediateca/extras/simposio_oralidade.pdf

GALLARDO PAÚLS, B. (2004) "La transcripción del lenguaje afásico", in GALLARDO, B. - VEYRAT, M. (Eds.) Estudios de lingüística clínica: Lingüística y patología. València: Universitat de València - Asociación Valenciana de Lenguaje, Comunicación y Cultura. pp. 83-114.
http://www.uv.es/~pauls/BGallardo2004a.pdf

GIBBON, D. - MOORE, R.- WINSKI, R. (Eds.) (1998) Spoken Language Systems and Corpus Design. Berlin: Mouton De Gruyter. (Handbook of Standards and Resources for Spoken Language Systems, Volume I).

GONZÁLEZ LEDESMA, A.- de la MADRID HEITZMANN, G.- ALCÁNTARA PLA, M.- de la TORRE CUESTA, R.- MORENO SANDOVAL, A. (2004) "Orality and difficulties in the transcription of a spoken corpus", in Compiling and Processing Spoken Language Corpora. LREC 2004, International Conference on Language Resources and Evaluation. 24th May 2004. Lisboa, Portugal. pp. 12-19.
http://lablita.dit.unifi.it/coralrom/papers/formato_oralid_final.pdf

GUMPERZ, J.J.- BERENZ, N. (1993) " Transcribing Conversational Exchanges", in EDWARDS, J.A.- LAMPERT, M.D. (Eds) Talking Data: Transcription and Coding in Discourse Research. Hillsdale, N.J.: Lawrence Erlbaum Associates. pp. 91-122 Interaction and Language Use. Human Studies 9 (1986):109-110

HENNOSTE, T.- KOIT, M.- RÄÄBIS, A.- VALDISOO, M. (2004) "Developing a dialogue act coding scheme: An experience of annotating the Estonian Dialogue Corpus", in Compiling and Processing Spoken Language Corpora. LREC 2004, International Conference on Language Resources and Evaluation. 24th May 2004. Lisboa, Portugal. pp. 40-47. http://www.cs.ut.ee/~koit/Dialoog/Artikkel/FinalVersions/THLrecWSFinal.pdf

ISLE/EAGLES Workshop "Meta-Descriptions and Annotation Schemes for Multimodal/Multimedia Language Resources and Data Architectures and Software Support for Large Corpora". LREC 2000 Workshop, Athens, Greece, 29-30 May 2000.
http://www.mpi.nl/ISLE/events/LREC%202000/LREC2000.htm

JOHANSSON, S. (1995) "The approach of the Text Encoding Initiative to the encoding of spoken discourse", in LEECH, G.- MYERS, G.- THOMAS, J. (Eds) Spoken English on Computer: Transcription, Markup and Applications. Harlow: Longman. pp. 82-98

JOHANSSON, S. (1995) "The Encoding of Spoken Texts", Computers and the Humanities 29,1: 149-158; in IDE, N.- VÉRONIS, J. (Eds) (1995) The Text Encoding Initiative. Background and Context. Dordrecht: Kluwer Academic Publishers. pp. 149-158.

KIPP, M.- REITHINGER, N.- BERNSEN, N.O.- DYBKJAER, L.- WEGENER, M.- MACHUCA, M.- RIERA, M. (2002) Best practice gesture, facial expression, and cross-modality coding schemes for inclusion in the workbench. NITE, Natural Interactivity Tools Engineering. Deliverable D2.3. December 2002.
http://www.nislab.dk/Publications/NITE-D2.3-final.pdf

KÖHLER, K.- LEX, G.- PÄTZOLD, M.- SCHEFFERS, M.- SIMPSON, AP..- THON, W., in collaboration with DRAXLER, C.- JOHNE, B.- SCHIEL, F.- FAUST, L. (1994) Handbuch zur Datenaufnahme und Transliteration, in TP14 from VERBMOBIL - 3.0 Verbmobil Technisches Dokument 11, Kiel: IPDS, March 1994.
http://coral.lili.uni-bielefeld.de/Classes/Winter96/Morphlex/kohlerdoc/

LAMPERT, M.D.- ERVIN-TRIPP, S.M. (1993) "Structured Coding for the Study of Language and Social Interaction", in EDWARDS, J.A.- LAMPERT, M.D. (Eds)Talking Data: Transcription and Coding in Discourse Research. Hillsdale, N.J.: Lawrence Erlbaum Associates. pp.169-206

LEBAUPIN, A.- LEROY, M. (2002) "Transcription des indices segmentaux, suprasegmentaux et posturo-mimo-gestuels chez le jeune enfant", Revue PArole (Mons) 22-23-24: 231-244.

LEECH, G.- MYERS, G.- THOMAS, J. (Eds) (1995) Spoken English on Computer: Transcription, Markup and Applications. Harlow: Longman.

LEECH, G.- WEISSER, M.- WILSON, A.- GRICE, M. (1998) Survey and Guidelines for the representation and annotation of dialogue. LE-EAGLES- WP4-4 Integrated Resources Working Group. 16 October 1998.
http://www.tu-chemnitz.de/phil/english/chairs/linguist/documents/mw/publications/wp4final.pdf

LEECH, G.- WEISSER, M.- WILSON, A.- GRICE, M. (2000) "Survey and Guidelines for the Representation and Annotation of Dialogue", in GIBBON, D.- MERTINS, I.- MOORE, R. (Eds.) Handbook of Multimodal and Spoken Dialogue Systems Resources, Terminology and Product Evaluation. Dordrecht: Kluwer Academic Publishers.

LINDSAY, J.- O'CONNELL, D. (1995) "How do transcribers deal with audio recordings of spoken discourse?", Journal of Psycholinguistic Research 24,2: 101-116

Llisterri, J. (1996). Preliminary recommendations on spoken texts. EAGLES Documents EAG-TCWG-STP/P. May 1996. Retrieved from http://www.ilc.cnr.it/EAGLES96/spokentx/spokentx.html

Llisterri, J. (1999). Transcripción, etiquetado y codificación de corpus orales. Revista Española de Lingüística Aplicada. Volumen Monográfico "Panorama de la Investigación en Lingüística Informática", 53-82. Retrieved from http://liceu.uab.cat/~joaquim/publicacions/RESLA_99.pdf

LOUIS, M.- DI CRISTO, A.- HABIB, M.- HIRST, D. (2002) "Transcription segmentale et suprasegmentale d’un jargon phonémique", Revue PArole (Mons) 22-23-24: 245-266.

MacWHINNEY, B. (1991) The CHILDES Project: Tools for Analyzing Talk. Hillsdale, N.J.: Lawrence Erlbaum.

MARCOS MARÍN, F.- BALLESTER, A.- SANTAMARÍA, C. (1993) "Transcription Conventions used for the Corpus of Spoken Contemporary Spanish", Literary & Linguistic Computing 8, 4: 283-292
http://www.lllf.uam.es/~fmarcos/articulo/93LLCspoken.pdf

MENGEL, A. - DYBKJAER, L., GARRIDO, J.M. - HEID, U.- KLEIN, M. - PIRRELLI V. - POESIO, M. - QUAZZA, S. - SCHIFFRIN, A. - SORIA, C. (2000) MATE Dialogue Annotation Guidelines. MATE Deliverable D2.1. 8 January 2000.
http://www.andreasmengel.de/pubs/mdag.pdf

NELSON, G. (1995) "The International Corpus of English: mark-up for spoken language", in LEECH, G.- MYERS, G.- THOMAS, J. (Eds) Spoken English on Computer: Transcription, Markup and Applications. Harlow: Longman. pp. 220-223

NELSON, G. (1997) "Standardizing wordforms in a spoken corpus", Literary and Linguistic Computing 12, 2: 79-93.

NERC (1994) NERC-1. Network of European Reference Corpora. Final Report. Pisa. ("Spoken Language", "Phonetic/Phonemic and Prosodic Annotation")

O'CONNELL, D.C.- KOWAL, S. (1994) "Some Current Transcription Systems for Spoken Discourse: A critical Analysis", Pragmatics 4: 81-107

OCHS, E. (1979)" Transcription as Theory" in OCHS, E.- SCHIEFFELIN, B.B. (Eds.) (1979) Developmental Pragmatics. New York: Academic Press. pp. 43-72

PALLAUD, B. (2002) "Erreurs d'écoute dans la transcription de données orales", Revue PArole (Mons) 22-23-24: 267-294.

PAYNE, J. (1995) "The COBUILD spoken corpus: transcription conventions", in LEECH, G.- MYERS, G.- THOMAS, J. (Eds) Spoken English on Computer: Transcription, Markup and Applications. Harlow: Longman. pp. 203-207

PAYRATÓ, Ll. (1995) "Transcripción del discurso coloquial", in CORTÉS RODRÍGUEZ, L. (Ed.) El español coloquial. Actas del I Simposio sobre Análisis del Discurso Oral. Almería, 23-25 de noviembre de 1994. Almería: Universidad de Almería, Servicio de Publicaciones. pp. 43-70.

PAYRATÓ, Ll. (1996) "Transcripció del discurs col·loquial", in PAYRATÓ, Ll.- BOIX, E.- LLORET, M.-R.- LORENTE, M. (Eds.) Corpus, Corpora. Actes del 1er i 2on Col·loquis Lingüístics de la Universitat de Barcelona (CLUB-1, CLUB-2). Barcelona: Promociones y Publicaciones Universitarias SA. pp. 181-216.

PEPPÉ, S. (1995) "The Survey of English Usage and the London-Lund Corpus: computerizing manual prosodic transcription", in LEECH, G.- MYERS, G.- THOMAS, J. (Eds) Spoken English on Computer: Transcription, Markup and Applications. Harlow: Longman. pp.187-202

PINO MORENO, M. (1997) Transcripción, codificación y almacenamiento de los textos orales del corpus CREA. Versión 2.0. Instituto de Lexicografía, Real Academia Española. En: Macrocorpus de la norma lingüística culta de las principales ciudades del mundo hispánico (MC-NLCH). Preparado por José Antonio Samper Padilla, Clara Eugenia Hernández Cabrera y Magnolia Troya Déniz. Edición en CD-ROM. Las Palmas de Gran Canaria: Servicio de Publicaciones de la Universidad de Las Palmas de Gran Canaria, 1998.

PINO MORENO, M.- SÁNCHEZ SÁNCHEZ, M. (1999) "El subcorpus oral del banco de datos CREA-CORDE (Real Academia Española): Procedimientos de transcripción y codificación", Oralia. Análisis del discurso oral 2: 83-138.

SANMARTÍN SAEZ, J. (2006) "Datos conversacionales y su transcripción: el corpus Val.Es.Co y el corpus PerLA", in BÜRKI, Y. - DE STEFANI, E. (Eds.) Transcrivere la lingua. Dalla filologia all’analisi conversazionale / Transcribir la lengua. De la filología al análisis conversacional. Bern: Peter Lang. pp. 257-283.

SENIA, F.- van VELDEN, J.G. (1997) Specifications of orthographic transcription and lexicon conventions. LRE-4001 SpeechDat Technical Report SD1.3.2, Final version, 10 January 1997.
http://www.speechdat.org/speechdat/deliverables/public/SD132V24.PDF

SERENARI, M.- DYBKJAER, L.- HEID, U.- KIPP, M.- REITHINGER, N. (2002) Survey of existing gesture, facial expression and cross-modality coding schemes. NITE, Natural Interactivity Tools Engineering. Deliverable D2.1. September 2002.

SINCLAIR, J. (1995) "From theory to practice", in LEECH, G.- MYERS, G.- THOMAS, J. (Eds) Spoken English on Computer: Transcription, Markup and Applications. Harlow: Longman. pp. 99- 112

SLOBIN, D. (1993) "Coding Child Language Data for Crosslinguistic Analysis", in EDWARDS, J.A.- LAMPERT, M.D. (Eds) Talking Data: Transcription and Coding in Discourse Research. Hillsdale, N.J.: Lawrence Erlbaum Associates. pp.207-220

SPERBERG-McQUEEN, C.M. - BURNARD, L. (Eds.) (2002) TEI P4: Guidelines for Electronic Text Encoding and Interchange. Chapter 11: Transcriptions of Speech. Text Encoding Initiative Consortium. XML Version: Oxford, Providence, Charlottesville, Bergen.
http://www.tei-c.org/release/doc/tei-p4-doc/html/TS.html

STEININGER, S. (2000) "Transliteration of Language and Labeling of Emotion and Gestures in SmartKom", in ISLE/EAGLES Workshop "Meta-Descriptions and Annotation Schemes for Multimodal/Multimedia Language Resources and Data Architectures and Software Support for Large Corpora". LREC 2000 Workshop, Athens, Greece, 29-30 May 2000.
http://www.mpi.nl/ISLE/documents/papers/Steininger_paper.pdf

Transcriber. Manuel du transcripteur. Conventions de transcription pour les enregistrements radio-télédiffusés. Version 1.21, 24 mars 2004.
http://trans.sourceforge.net/en/transguidFR.php

VILLASEÑOR, L.- MASSÉ, A.- PINEDA, L.A. (2000) "A Multimodal Dialogue Contribution Coding Scheme", in ISLE/EAGLES Workshop "Meta-Descriptions and Annotation Schemes for Multimodal/Multimedia Language Resources and Data Architectures and Software Support for Large Corpora". LREC 2000 Workshop, Athens, Greece, 29-30 May 2000.
http://www.mpi.nl/ISLE/documents/papers/villasenor_paper.pdf

VILLENA PONSODA, J.A. (1994) "Pautas y procedimientos de representación del corpus oral de la Universidad de Málaga. Informe preliminar", in ALVAR EZQUERRA, M.- VILLENA PONSODA, J.A. (Coord) Estudios para un corpus del español. Málaga: Universidad de Málaga. pp. 73-102.

Villena Ponsoda, J. A., vila Muñoz, A. M., Sánchez Bohorques, J. M., & Lasarte Cervantes, M. C. (2010). Problemas de anotación e intercambio en los corpus orales: Estrategias para la transformación de textos etiquetados en documentos XML. El caso de los corpus PRESEEA. Oralia. Análisis del Discurso Oral, 13, 261-323.

WAGENER KNUDSEN, M.- BERNSEN, N.O.- DYBKJAER, L.- HANSEN, T.- MARTIN, J.C.- MAPELLI, V.- PAULSSON, N.- PELACHAUD, C.- WITTENBURG, P. (2003) Guidelines for the Creation of NIMM Data Resources. ISLE Natural Interactivity and Multimodality Working Group Deliverable 8.2. February 2003.
http://spokendialogue.dk/Publications/2003g/D8.2-17.2.2003-F.pdf

Wegener Knudsen, M., Martin, J. C., Dybkjaer, L., Machuca, M. J., Bernsen, N. O., Carletta, J., . . . Wittenburg, P. (2002). Survey of multimodal annotation schemes and best practice. Deliverable D9.1. Final Report. February 2002. ISLE Natural Interactivity and Multimodality Working Group. Retrieved from http://spokendialogue.dk/Publications/2002o/D9.1-7.3.2002-F.pdf

WRAY, A.- TROTT, K.- BLOOMER, A. (1998) Projects in Linguistics. A Practical Guide to Researching Language. London - New York: Arnold - Oxford University Press. [Ch. 17.- Transcribing speech phonetically and phonemically, Ch. 18.- Transcribing speech orthographically]
arrow_up

Speech and spoken language corpora

Catalan

Catalan resources

ALTURO, N.- BLADAS, O.- PAYÀ, M.- PAYRATÓ, Ll. (Eds.) (2004) Corpus oral de registres. Materials de treball. Barcelona: Publicacions i Edicions de la Universitat de Barcelona (Universitat, 13) [+CD-ROM].

ARRANZ, V.- CASTELL, N.- GIMÉNEZ, J. (2004) "Creació de recursos lingüístics per a la traducció automàtica", in CELC 04, II Congrés d'Enginyeria en Llengua Catalana. 19-21 de novembre de 2004. Andorra la Vella, Andorra.
http://www.lc-star.com/vancjg-article.pdf

BOIX, E. (1996) "Els materials de llengua oral dels corpus de català contemporani de la UB (CUB)", in PAYRATÓ, Ll.- BOIX, E.- LLORET, M.-R.- LORENTE, M. (Eds.) Corpus, Corpora. Actes del 1er i 2on Col·loquis Lingüístics de la Universitat de Barcelona (CLUB-1, CLUB-2). Barcelona: Promociones y Publicaciones Universitarias SA. pp. 93-114.

ESQUERRA, I.- BONAFONTE, A.- VALLVERDÚ, F.- FEBRER, A. (1998) "A bilingual Spanish-Catalan database of units for concatenative synthesis", Workshop on Language Resources for European Minority Languages, May 27 1998, Granada, Spain. pp. 39-42.
http://www.lsi.upc.edu/~nlp/papers/esquerra98c.pdf

ESQUERRA, I.- NADEU, C.- VILLARRUBIA, L. (1998) "Design of a phonetic corpus for speech recognition in Catalan", in Workshop on Language Resources for European Minority Languages, May 27 1998, Granada, Spain.

Moreno, A., Febrer, A., & Márquez, L. (2006). Generation of language resources for the development of speech tecnologies in Catalan. In LREC 2006. Proceedings of the 5th International Conference on Language Resources and Evaluation. (pp. 1632-5). Genoa, Italy, May 24-26, 2006.

PAYRATÓ, Ll.- ALTURO, N. (Eds.) (2002) Corpus oral de conversa col·loquial. Materials de treball. Barcelona: Publicacions de la Universitat de Barcelona (Universitat, 11) [+CD-ROM].

Viaplana, J., Lloret, M. R., Perea, M. P., & Clua, E. (2007). COD. Corpus Oral Dialectal. [CD-ROM] Barcelona: Promociones y Publicaciones Universitarias.

Vila, M., González, S., Martí, M. A., Llisterri, J., & Machuca, M. J. (2010). ClInt: A bilingual Spanish-Catalan spoken corpus of clinical interviews. Procesamiento del Lenguaje Natural, 45, 105-111. Retrieved from http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/796

VILLARRUBIA, L.- LEÓN, P.- HERNÁNDEZ, L.- NADEU, C. - ESQUERRA, I.- HERNANDO, J.- GARCÍA MATEO, C.- DOCIO, L. (1998) "VOCATEL and VOGATEL: Two Telephone Speech Databases of Spanish Minority Languages (Catalan and Galician)", Workshop on Language Resources for European Minority Languages, May 27 1998, Granada, Spain.

arrow_up

 Spanish

Spanish resources

Speech corpora for phonetic studies

CAMPIONE, E.- VÉRONIS, J. (1998) "A Multilingual Prosodic Database", in ICSLP 1998. Proceedings of the 5th International Conference on Spoken Language Processing. 30 November - 4 December, 1998. Sydney, Australia. Vol. 7, pp. 3163-3166.
http://sites.univ-provence.fr/~veronis/pdf/1998icslp-database.pdf

CID, M.- FERNÁNDEZ CORUGEDO, S.G. (1991) " The construction of a corpus of spoken Spanish: Phonetic and phonological parameters", in Proceedings of the ESCA Workshop ' Phonetics and Phonology of Speaking Styles: Reduction and Elaboration in Speech Communication'. Barcelona, Catalonia, Spain, 30 September - 2 October 1991. pp. 17-1 - 17-5.

Multext Prosodic database (ELDA-S0060). Paris: ELDA, Evaluations and Language resources Distribution Agency.
http://www.elda.org/catalogue/en/speech/S0060.html

RENATO, A.C.- ÁLVAREZ, J.A. "Corpora of Latin American Spanish for research in prosody and synthesis", in SSW5 2004. Proceedings of the 5th ISCA Tutorial and Research Workshop on Speech Synthesis. 14 -16 June, 2004. Oakland, Pittsburgh, PA, USA.
http://www.isca-speech.org/archive_open/ssw5/ssw5_221.html

Speech corpora for speech technology applications

Llisterri, J., Machuca, M. J., Mota, C., Riera, M., & Ríos, A. (2005). Corpus orales para el desarrollo de las tecnologías del habla en español. Oralia. Análisis del Discurso Oral, 8, 289-325. Retrieved from http://liceu.uab.cat/~joaquim/publicacions/Llisterri_Machuca_Mota_Riera_Rios_05_Corpus_Orales_Tecnologias_Habla_Espanol.pdf

Albayzín
Albayzín corpus (ELDA-S0089). Paris: ELDA, Evaluations and Language resources Distribution Agency.
http://www.elda.org/catalogue/en/speech/S0089.html

Base de datos oral del español Albayzín. Universitat Politècnica de València, Universidad Politécnica de Madrid, Universidad de Granada, Universitat Autònoma de Barcelona, Universitat Politècnica de Catalunya. 5 CD-ROMs. 1999.

Casacuberta, F., García, R., Llisterri, J., Nadeu, C., Pardo, J. M., & Rubio, A. (1991). Development of Spanish corpora for speech research (Albayzín). In G. Castagneri (Ed.), Proceedings of the workshop on international cooperation and standardization of speech databases and speech I /O assessment methods. Chiavari, Italy. September 26-28, 1991. Retrieved from http://liceu.uab.cat/~joaquim/publicacions/Casacuberta_et_al_91.pdf

Casacuberta, F., García, R., Llisterri, J., Nadeu, C., Pardo, J. M., & Rubio, A. (1992). Desarrollo de corpus para la investigación en tecnologías del habla (Albayzín). Procesamiento del Lenguaje Natural, 12, 35-42. Retrieved from http://liceu.uab.cat/~joaquim/publicacions/Casacuberta_et_al_92_Corpus_Albayzin.pdf

DÍAZ, J.- RUBIO, A.- PEINADO, A.- SEGARRA, E.- PRIETO, N.- CASACUBERTA, F. (1993) "Development of task-oriented Spanish speech corpora" in EUROSPEECH 1993. Proceedings of the 3rd European Conference on Speech Communication and Technology. 21 - 23 September, 1993. Berlin, Germany.

DÍAZ VERDEJO, J.E.- PEINADO, A.M.- RUBIO, A.J.- SEGARRA, E.- PRIETO, N.- CASACUBERTA, F. (1998) "Albayzín: a task-oriented Spanish speech corpus", in LREC 1998. Proceedings of the First International Conference on Language Resources and Evaluation. 28 - 30 May 1998. Granada, Spain. Paris: ELRA, European Language Resources Association. Vol. I. pp. 497-502.

Llisterri, J., & Poch, D. (1991). Phonetic criteria for the development of a speech database in Spanish (the Albayzín project). In G. Castagneri (Ed.), Proceedings of the workshop on international cooperation and standardization of speech databases and speech I /O assessment methods. Chiavari, Italy. September 26-28, 1991. Retrieved from http://liceu.uab.cat/~joaquim/publicacions/Llisterri_Poch_91_Albayzin.pdf

Moreno, A., Poch, D., Bonafonte, A., Lleida, E., Llisterri, J., Mariño, J. B., & Nadeu, C. (1993). Albayzín speech database: Design of the phonetic corpus. In Eurospeech 1993. Proceedings of the 3rd European conference on speech communication and technology. Vol 1. (pp. 175-8). Berlin, Germany. 21- 23 September, 1993. Retrieved from http://liceu.uab.cat/~joaquim/publicacions/Moreno_et_al_93_Albayzin_Phonetic_Corpus.pdf

Ahumada
ORTEGA GARCÍA, J.- GONZÁLEZ RODRÍGUEZ, J.- MARRERO AGUIAR, V.- DÍAZ GÓMEZ, .J.- GARCÍA JIMÉNEZ, R.- LUCENA MOLINA, J.- SÁNCHEZ MOLERO, J.A.G. (1998) "Speaker recognition-oriented 'Ahumada' large speech corpus", in LREC 1998. Proceedings of the First International Conference on Language Resources and Evaluation. 28 - 30 May 1998. Granada, Spain. Paris: ELRA, European Language Resources Association. Vol. II. pp. 1101 - 1106.

ORTEGA GARCÍA, J.- GONZÁLEZ RODRÍGUEZ, J. - MARRERO AGUIAR, V.- DÍAZ GÓMEZ, J.J.- GARCÍA JIMÉNEZ, R.- LUCENA MOLINA, J.- SÁNCHEZ MOLERO, J.A.G. (1998) "AHUMADA: A Large Speech Corpus in Spanish for Speaker Identification and Verification", in ICASSP 1998. Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing. 12 -15 May, 1998. Seattle, Washington, USA. pp. 773-776.

ORTEGA GARCÍA, J.- GONZÁLEZ RODRÍGUEZ, J.- MARRERO AGUIAR, V. (2000) "AHUMADA: A large corpus in Spanish for speaker characterization and identification", Speech Communication 31, 2-3: 255-264.
http://dx.doi.org/10.1016/S0167-6393(99)00081-3

EUROM
EUROM1 (The multilingual European speech database) (ELDA-S0014). Paris: ELDA, Evaluations and Language Resources Distribution Agency.
http://www.elda.org/catalogue/en/speech/S0014.html

CHAN, D.- FOURCIN, A.- GIBBON, D.- GRANSTRÖM, B.- HUCKVALE, M.- KOKKINAKIS, G.- KVALE, K.- LAMEL, L.- LINDBERG, B.- MORENO, A.- MOUROPOULOS, J.- SENIA, F.- TRANCOSO, I.- VELD, C.- ZEILIGER, J. (1995) "EUROM- A Spoken Language Resource for the EU", in EUROSPEECH 1995. Proceedings of the 4th European Conference on Speech Communication and Technology. 18 - 21 September, 1995. Madrid, Spain. Vol 1, pp. 867-870.
http://www.phon.ucl.ac.uk/resource/eurom1/eurospeech95eurom.pdf

FOURCIN, A.- DOLMAZON, J.M. (on behalf of the SAM Project) (1991) "Speech knowledge, standards and assessment", in Actes du XIIème Congrès International des Sciences Phonétiques. 19-24 août 1991, Aix-en-Provence, France. Aix-en-Provence: Université de Provence, Service des Publications. Vol 5 pp. 430-433.

Llisterri, J., Aguilar, L., Blecua, B., Machuca, M. J., de la Mota, C., Ríos, A., . . . Salavedra, J. (1993). Spanish EUROM.1: Phonetic contents. Report D 6. SAM-A/UPC/002. ESPRIT Project 6819 (SAM-A) Speech Technology Assessment in Multilingual Applications. Retrieved from http://liceu.uab.cat/~joaquim/publicacions/Llisterri_et_al_1993_Spanish_EUROM1_Phonetic_contents.pdf

MORENO, A. (1993) EUROM-1 Spanish Database. Report D6, SAM-A/UPC/003. September 1993

LC-STAR, Lexica and Corpora for Speech-to-Speech Translation Components
ARRANZ, V.- CASTELL, N.- CREGO. J.M.- GIMÉNEZ, J.- de GISPERT, A.- LAMBERT. P. (2004) "Bilingual connections for trilingual corpora: An XML approach", in LREC 2004. Proceeedings of the 4th International Conference on Language Resources and Evaluation. 26-28 May, 2004. Lisbon, Portugal. Paris: ELRA, European Language Resources Association.
http://www.lc-star.com/Actas-LREC.ps

ARRANZ, V.- CASTELL, N.- GIMÉNEZ, J. (2003) "Development of language resources for speech-to-speech translation", in RANLP 2003. International Conference on Recent Advances in Natural Language Processing. 10-12 September 2003. Borovets, Bulgaria.
http://www.lc-star.com/ranlp2003-acg.pdf

ARRANZ, V.- CASTELL, N.- GIMÉNEZ, J. (2004) "Creación de recursos lingüísticos para la traducción automática", in SANCHIS ARNAL, E. (Ed.) Actas de las III Jornadas en Tecnología del Habla. Valencia, del 17-19 de noviembre de 2004. Organizadas por la Red Temática en Tecnología del Habla. Valencia: Departamento de Sistemas Informáticos y Computación, Facultad de Informática, Universidad Politécnica de Valencia.
http://www.lc-star.com/3jth2004-acg.pdf

BISANI, M.- BONAFONTE, A.- CASTELL, N.- HARTIKAINEN, E.- MALTESE, G.- MORENO, A.- SHAMMASS, S.- ZIEGENHAIN, U. (2003) "Lexicon and corpora for speech to speech translation (LC-STAR)", XIX Congreso de la Sociedad Española para el Procesamiento del Lenguaje Natural. Universidad de Alcalá, 10, 11 y 12 de septiembre de 2003. Procesamiento del Lenguaje Natural 31: 317-319.
http://www.sepln.org/revistaSEPLN/revista/31/31-Pag317.pdf

CONEJERO, D.- GIMÉNEZ, J.- ARRANZ, V.- BONAFONTE, A.- PASCUAL, N.- CASTELL, N.- MORENO, A. (2003) "Lexica and corpora for speech-to-speech translation: A trilingual approach", in EUROSPEECH 2003 - INTERSPEECH 2003. Proceedings of the 8h European Conference on Speech Communication and Technology. 1 - 4 September, 2003. Geneva, Switzerland. pp. 1593-1596.
http://gps-tsc.upc.es/veu/research/pubs/download/Con_lex_03.pdf
http://www.lc-star.com/Con_lex_03.pdf

de VRIEND, F.- CASTELL, N.- GIMÉNEZ, J. - MALTESE, G. (2004) "LC-STAR: XML-coded phonetic lexica and bilingual corpora for speech-to-speech translation", in Papillon 2004. 5th Workshop on Multilingual Lexical Databases. 30 August - 1 September 2004. Grenoble, France.
http://www.lc-star.com/LC-STAR_papillon_2004.PDF

FERSØE, H.- HARTIKAINEN, E.- van den HEUVEL, H.- MALTESE, G.- MORENO, A.- SHAMMASS, S.- ZIEGENHAIN, U. (2004) "Creation and validation of large lexica for speech-to-speech translation purposes", in LREC 2004. Proceeedings of the 4th International Conference on Language Resources and Evaluation. 26-28 May, 2004. Lisbon, Portugal. Paris: ELRA, European Language Resources Association.
http://www.lc-star.com/LREC2004Paper_1.1.doc

HARTIKAINEN, E.- MALTESE, G.- MORENO, A.- SHAMMASS, S.- ZIEGENHAIN, U. (2003) "Large lexica for speech-to-speech translation: From specification to creation", in EUROSPEECH 2003 - INTERSPEECH 2003. Proceedings of the 8h European Conference on Speech Communication and Technology. 1 - 4 September, 2003. Geneva, Switzerland.
http://www.lc-star.com/Eurospeech2003_1.pdf

SpeechDat - SALA, SpeechDat Across Latin America / SpeechDat Across All America
AURORA Project Database - Subset of SpeechDat-Car Spanish database (AURORA/CD0003-02). Paris: ELDA, Evaluations and Language Resources Distribution Agency.
http://www.elda.org/article20.html#spanish

Chilean Spanish FDB-500 (ELDA-S0054). Universitat Politècnica de Catalunya, 1998. Paris: ELDA, Evaluations and Language resources Distribution Agency.
http://www.elda.org/catalogue/en/speech/S0054.html

Colombian Spanish Speech Database (ELDA-S0064). Universitat Politècnica de Catalunya, 1998. Paris: ELDA, Evaluations and Language resources Distribution Agency.
http://catalog.elra.info/product_info.php?products_id=720

DRAXLER, C.- van den HEUVEL, H.- TROPF, H. (1998) "SpeechDat Experiences in Creating Large Multilingual Speech Databases for Teleservices", in LREC 1998. Proceedings of the First International Conference on Language Resources and Evaluation. 28 - 30 May 1998. Granada, Spain. Paris: ELRA, European Language Resources Association. Vol. I. pp. 361-366.
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.50.2745

GURLEKIAN, J.- COLANTONI, L.- TORRES, H.- RINCÓN, A.- MORENO, A.- MARIÑO, J. (2001) "Database for an automatic speech recognition system for Argentine Spanish", in Proceedings of the IRCS Workshop on Linguistic Databases. 11-13 December 2001, University of Pennsylvania, Philadelphia, PA, USA. pp. 92-98.
http://www.researchgate.net/publication/2413819_Database_for_an_Automatic_Speech_Recognition_System_for_Argentine_Spanish

HEUVEL, H. van den- BONAFONTE, A.- BOUDY, J.- DUFOUR, S.- LOCKWOOD, P.- MORENO, A.- RICHARD, G. (1999) "SpeechDat-Car: Towards a collection of speech databases for automotive environments", in Nokia-COST 249 Workshop. Tampere, Finland.
http://www.speechdat.org/SP-CAR/CONFEREN/ICAR99V1.PDF

HEUVEL, H. van den.- BOUDY, J.- COMEYNE, R.- EULER, S.- MORENO, A.- RICHARD, G. (1999) "The SpeechDat-Car multilingual speech databases for in-car applications: some first validation results", in EUROSPEECH 1999. Proceedings of the 6th European Conference on Speech Communication and Technology. 5 - 9 September, 1999. Budapest, Hungary.
http://www.speechdat.org/SP-CAR/CONFEREN/EURO99_0.PDF

HEUVEL, H. van den.- HALL, P.- HÖGE, H.- MORENO, A.- RINCÓN, A.- SENIA, F. (2004) "SALA II across the finish line : a large collection of mobile telephone speech databases from North & Latin America completed", in LREC 2004. Proceeedings of the 4th International Conference on Language Resources and Evaluation. 26-28 May, 2004, Lisbon, Portugal. Paris: ELRA, European Language Resources Association.
http://www.lrec-conf.org/proceedings/lrec2004/

MORENO, A. (2000) "SALA: SpeechDat Across Latin America", in Proceedings of the 1st Workshop on Very Large Databases. May, 2000. Athens, Greece.
http://gps-tsc.upc.es/veu/research/pubs/download/Mor00a.ps

MORENO, A.- COMEYNE, R.- HASLAM, K.- van den HEUVEL, H.- HÖGE, H.- HORBACH, S..- MICCA, G. (2000) "SALA: SpeechDat across Latin America. Results of the First Phase", in LREC 2000. Proceedings of the Second International Conference on Language Resources and Evaluation. 31 May - 2 June, 2000, Athens, Greece. Paris: European Language Resources Association. pp. 877-882.
http://gps-tsc.upc.es/veu/research/pubs/download/Mor00c.pdf

MORENO, A.- GEDGE, O.- van den HEUVEL, H.- HÖGE, H.- HORBACH, S.- MARTIN, P.- PINTO, E.- RINCN, A.- SENIA, F.- SUKKAR, R. (2002) "SpeechDat across all America: SALA II" in LREC 2002. Proceedings of the Third International Conference on Language Resources and Evaluation. 27 May - 2 June, 2002, Las Palmas de Gran Canaria, Spain. Paris: European Language Resources Association.

MORENO, A.- HÖGE, H.- KÖLER, J. - MARIÑO, J.B. (1998) "SpeechDat Across Latin America. Project SALA", in RUBIO, A.- GALLARDO, N.- CASTRO, R.- TEJADA, A. (Eds.) LREC 1998. Proceedings of the First International Conference on Language Resources and Evaluation. 28 - 30 May 1998. Granada, Spain. Paris: ELRA, European Language Resources Association. Vol. I. pp. 367-370.

MORENO, A.- LINDBERG, B.- DRAXLER, C.- RICHARD, G.- CHOUKRI, K.- EULER, S.- ALLEN, J. (2000) "SPEECHDAT-CAR. A Large Speech Database for Automotive Environments", in LREC 2000. Proceedings of the Second International Conference on Language Resources and Evaluation. 31 May - 2 June, 2000, Athens, Greece. Paris: European Language Resources Association.
http://gps-tsc.upc.es/veu/research/pubs/download/Mor00c.pdf

MORENO, A.- SENIA, F.- RINCÓN, A. (2002) The complete SALA II project specifications. Version 1.6. SALA II Technical Report. November 29, 2002.

SALA II Spanish from Mexico database (ELDA-S0171). Paris: ELDA, Evaluations and Language resources Distribution Agency.
http://www.elda.org/catalogue/en/speech/S0171.html

SALA II Spanish Mobile Network Database collected in Venezuela (ELDA-S0167). ATLAS, Applied Technologies on Language and Speech, Barcelona. Paris: ELDA, Evaluations and Language resources Distribution Agency.
http://www.elda.org/catalogue/en/speech/S0167.html

SALA Spanish Colombian Database (ELDA-S0084). Universitat Politècnica de Catalunya, 2000. Paris: ELDA, Evaluations and Language resources Distribution Agency.
http://www.elda.org/catalogue/en/speech/S0084.html

SALA Spanish Venezuelan Database (ELDA-S0141). Universitat Politècnica de Catalunya, 2000. Paris: ELDA, Evaluations and Language resources Distribution Agency.
http://www.elda.org/catalogue/en/speech/S0141.html

Spanish SpeechDat (M) DB1 (ELDA-S0065). Universitat Politècnica de Catalunya, 1999. Paris: ELDA, Evaluations and Language resources Distribution Agency.
http://catalog.elra.info/product_info.php?products_id=721

Spanish SpeechDat (M) DB2 (ELDA-S0066). Universitat Politècnica de Catalunya, 1999. Paris: ELDA, Evaluations and Language resources Distribution Agency.
http://catalog.elra.info/product_info.php?products_id=722

Spanish SpeechDat Database for the Mobile Telephone Network (ELDA-S0119). Universitat Politècnica de Catalunya, 2003. Paris: ELDA, Evaluations and Language resources Distribution Agency.
http://catalog.elra.info/product_info.php?products_id=635

Spanish SpeechDat-Car Database (ELDA-S0140). Universitat Politècnica de Catalunya, 2001. Paris: ELDA, Evaluations and Language resources Distribution Agency.
http://catalog.elra.info/product_info.php?products_id=690

Spanish SpeechDat(II) FDB-1000 (ELDA-S0101). Universitat Politècnica de Catalunya, 1997. Paris: ELDA, Evaluations and Language resources Distribution Agency.
http://catalog.elra.info/product_info.php?products_id=726

Other resources
1997 HUB-4 Broadcast News Evaluation Non English Test Material (LDC2001S91). Philadelphia, PA: Linguistic Data Consortium.
http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2001S91

1997 HUB-5 Spanish Evaluation (LDC2002S25). Philadelphia, PA: Linguistic Data Consortium.
http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2002S25

1997 HUB-5 Spanish Transcripts (LDC2003T04). Philadelphia, PA: Linguistic Data Consortium.
http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2003T04

1997 Spanish Broadcast News Speech (Hub-4NE) (LDC98S74). Philadelphia, PA: Linguistic Data Consortium.
http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC98S74

1997 Spanish Broadcast News Transcripts (Hub-4NE) (LDC98T29). Philadelphia, PA: Linguistic Data Consortium.
http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC98T29

22 Language Corpus v1.2. Center for Spoken Language Understanding, Oregon Graduate Institute Science University.
http://www.cslu.ogi.edu/corpora/22lang/

ALCÁCER, N.- CASTRO, M.J.- GALIANO, I.- GRANELL, R.- GRAU, S.- GRIOL, D. (2004) "Adquisición de un corpus de diálogo: DIHANA", in SANCHIS ARNAL, E. (Ed.) Actas de las III Jornadas en Tecnología del Habla. Valencia, del 17-19 de noviembre de 2004. Organizadas por la Red Temática en Tecnología del Habla. Valencia: Departamento de Sistemas Informáticos y Computación, Facultad de Informática, Universidad Politécnica de Valencia. pp. 131-136.
http://lorien.die.upm.es/~lapiz/rtth/JORNADAS/III/actas3JTH.pdf

ANITA (Audio eNhancement In Telecom Applications) (ELDA-S0156) EADS Telecom. Paris: ELDA, Evaluations and Language resources Distribution Agency.
http://www.elda.org/catalogue/en/speech/S0156.html

BORDEL, G.- EZEIZA, A.- LÓPEZ de IPIÑA, K.- MÉNDEZ, M.- PEÑAGARIKANO, M.- RICO, T.- TOVAR, C.- ZULUETA, E. (2004) "Development of Resources for a Bilingual Automatic Index System of Broadcast News in Basque & Spanish", in LREC 2004. Proceeedings of the 4th International Conference on Language Resources and Evaluation. 26-28 May, 2004, Lisbon, Portugal. Paris: ELRA, European Language Resources Association. pp. 881-884.

CALLFRIEND Spanish-Caribbean Dialect (LDC96S57). Philadelphia, PA: Linguistic Data Consortium.
http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC96S57

CALLFRIEND Spanish-Non-Caribbean Dialect (LDC96S58). Philadelphia, PA: Linguistic Data Consortium.
http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC96S58

CALLHOME Spanish Dialogue Act Annotation (LDC2001T61). Philadelphia, PA: Linguistic Data Consortium.
http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2001T61

CALLHOME Spanish Lexicon (LDC96L16). Philadelphia, PA: Linguistic Data Consortium.
http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC96L16

CALLHOME Spanish Speech (LDC96S35). Philadelphia, PA: Linguistic Data Consortium.
http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC96S35

CALLHOME Spanish Transcripts (LDC96T17). Philadelphia, PA: Linguistic Data Consortium.
http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC96T17

CIERI, C.- CAMPBELL, J.P.- NAKASONE, H.- MILLER, D.- WALKER, K. (2004) "The Mixer corpus of multilingual, multichannel speaker recognition data", in LREC 2004. Proceeedings of the 4th International Conference on Language Resources and Evaluation. 26-28 May, 2004, Lisbon, Portugal. Paris: ELRA, European Language Resources Association. pp. 627-630.

DE LA TORRE MUNILLA, C.- HERNÁNDEZ-GÓMEZ, L.A.- TAPIAS, D. (1995) "CEUDEX: a Data Base Oriented to Context-Dependent Units Training in Spanish for Continuous Speech Recognition", in EUROSPEECH 1995. Proceedings of the 4th European Conference on Speech Communication and Technology. 18 - 21 September, 1995. Madrid, Spain. Vol 1, pp. 845-848.

ESQUERRA, I.- BONAFONTE, A.- VALLVERDÚ, F.- FEBRER, A. (1998) "A bilingual Spanish-Catalan database of units for concatenative synthesis", Workshop on Language Resources for European Minority Languages, May 27 1998, Granada, Spain. pp. 39-42.
http://www.lsi.upc.edu/~nlp/papers/esquerra98c.pdf

ESTEVE, J.- TAPIAS, D.- TORRECILLA, J.C. (1994) "La base de datos VESTEL", Comunicaciones de Telefónica I+D 5, 2: 44-54.

GALIANO, M. I.- GRANELL, R.- HURTADO, Ll.F.- MIGUEL, A.- SÁNCHEZ, J.A.- SANCHIS, E. (2003) "La plataforma de adquisición de diálogos en el proyecto DIHANA", Procesamiento del Lenguaje Natural 31: 341-342.
http://www.sepln.org/revistaSEPLN/revista/31/31-Pag341.pdf

GARCÍA MATEO, C.- DIÉGUEZ, J.- DOCÍO, C.- CARDENAL, A. (2004) "Transcrigal: A bilingual system for automatic indexing of broadcast news", in LREC 2004. Proceeedings of the 4th International Conference on Language Resources and Evaluation. 26-28 May, 2004, Lisbon, Portugal. Paris: ELRA, European Language Resources Association. pp. 2061-2064.

GURLEKIAN, J.- RODRÍGUEZ, H.- COLANTONI, J.- TORRES, H. (2001) "Development of a prosodic database for an Argentine Spanish text to speech system", in Proceedings of the IRCS Workshop on Linguistic Databases. 11-13 December 2001, University of Pennsylvania, Philadelphia, PA, USA. pp. 99-104.
http://www.researchgate.net/publication/2586101_Development_of_a_Prosodic_Database_for_an_Argentine_Spanish_Text_to_Speech_System

HENNEBERT, J.- MELIN, H.- PETROVSKA, D.- GENOUD, S. (2000) "POLYCOST: A telephone-speech database for speaker recognition", Speech Communication 31, 2-3: 265-270.
http://dx.doi.org/10.1016/S0167-6393(99)00082-5

HOZJAN, V.- KACIC, Z.- MORENO, A.- BONAFONTE, A.- NOGUEIRAS, A. (2002) "Interface Databases: Design and Collection of a Multilingual Emotional Speech Database", in LREC 2002. Proceedings of the Third International Conference on Language Resources and Evaluation. 27 May - 2 June, 2002, Las Palmas de Gran Canaria, Spain. Paris: European Language Resources Association. pp. 2024-2028.
http://gps-tsc.upc.es/veu/research/pubs/download/hoz_int_02.pdf

Hub-5 Spanish Telephone Speech Corpus (LDC98S70). Philadelphia, PA: Linguistic Data Consortium.
http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC98S70

Hub-5 Spanish Transcripts (LDC98T27). Philadelphia, PA: Linguistic Data Consortium.
http://catalog.ldc.upenn.edu/LDC98T27

ISKRA, D.- GROSSKOPF, B.- MARASEK, K.- van den HEUVEL, H.- DIEHL, F.- KIESSLING, A. (2002) "SPEECON Speech Databases for Consumer Devices: Database Specification and Validation", in LREC 2002. Proceedings of the Third International Conference on Language Resources and Evaluation. 27 May - 2 June, 2002. Las Palmas de Gran Canaria, Spain. Paris: ELRA, European Language Resources Association. pp. 329-333.
http://gps-tsc.upc.es/veu/research/pubs/download/Die_Spe_02.pdf

LAMEL, L.F.- ADDA, G.- ADDA-DECKER, M.- CORREDOR-ARDOY, C.- GANGOLF, J.J.- GAUVAIN, J.L. (1998) "A Multilingual Corpus for Language Identification", in LREC 1998. Proceedings of the First International Conference on Language Resources and Evaluation. 28 - 30 May 1998. Granada, Spain. Paris: ELRA, European Language Resources Association. Vol. 2, pp. 1115-1122.
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.53.5217

LANDER, T.L.- COLE, R.A.- OSHIKA, B.- NOEL, M. (1995) "The OGI 22 Language Telephone Speech Corpus", in EUROSPEECH 1995. Proceedings of the 4th European Conference on Speech Communication and Technology. 18 - 21 September, 1995. Madrid, Spain. Vol 1, pp. 817-820.

LATINO-40 Spanish Read News ( LDC95S28). Philadelphia, PA: Linguistic Data Consortium.
http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC95S28

LÓPEZ CÓZAR R. - RUBIO, A.J.- GARCÍA, P.- SEGURA, J.C. (1998) "A Spoken Dialogue System based on Dialogue Corpus Analysis", in LREC 1998. Proceedings of the First International Conference on Language Resources and Evaluation. 28 - 30 May 1998. Granada, Spain. Paris: ELRA, European Language Resources Association. Vol. I. pp. 55-58.

MARTIN, A.- MILLER, D.- PRZYBOCKI, M.- CAMPBELL, J.- NAKASONE, H. (2004) "Conversational telephone speech corpus collection for the NIST speaker recognition evaluation 2004", in LREC 2004. Proceeedings of the 4th International Conference on Language Resources and Evaluation. 26-28 May, 2004. Lisbon, Portugal. Paris: ELRA, European Language Resources Association.
http://www.itl.nist.gov/iad/IADpapers/2004/542.pdf

MICROADES, ATLAS Spanish Microphone Database (ELDA-S0165). ATLAS, Applied Technologies on Language and Speech, Barcelona. Paris: ELDA, Evaluations and Language resources Distribution Agency.
http://www.elda.org/catalogue/en/speech/S0165.html

MONTERO, J.M.- GUTIÉRREZ, J.- COLÁS, J.- MACÍAS, J.- ENRÍQUEZ, E.- PARDO, J.M. (1999) "Development of an emotional speech synthesiser in Spanish", in EUROSPEECH 1999. Proceedings of the 6th European Conference on Speech Communication and Technology. 5 - 9 September, 1999. Budapest, Hungary. pp. 2099-2102.
http://www-gth.die.upm.es/~macias/doc/pubs/eurosp99/submitted/m058.pdf

MONTERO, J.M.- GUTIÉRREZ, J.- PALAZUELOS, S.- ENRÍQUEZ, E.- AGUILERA, S.- PARDO, J.M. (1998) "Emotional speech synthesis: From speech database to TTS", in ICSLP 1998. Proceedings of the 5th International Conference on Spoken Language Processing. 30 November - 4 December, 1998. Sydney, Australia. Rundle Mall: Causal Productions, 1998.
http://www-gth.die.upm.es/research/documentation/AI-45Emo-98.pdf

Multilanguage Telephone Speech Corpus v1.2. Center for Spoken Language Understanding, Oregon Graduate Institute.
http://www.cslu.ogi.edu/corpora/mlts/

MUTHUSAMY, Y.- HOLLIMAN, E.- WHEATLEY, B.- PICONE, J.- GODFREY, J. (1995) "Voice Across Hispanic America: A Telephone Speech Corpus of American Spanish," in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. May, 1995. Detroit, Michigan, USA. pp. 85-88.

MUTHUSAMY, Y.K.- COLE, R.A.- OSHIKA, B.T. (1992) "The OGI multi-language telephone speech corpus", in ICSLP 1992. Proceedings of the 2nd International Conference on Spoken Language Processing. 12 - 16 October, 1992. Banff, Alberta, Canada. Edmonton: The University of Alberta. pp. 895-898.

OGI Multilanguage Corpus (LDC94S17). Philadelphia, PA: Linguistic Data Consortium.
http://catalog.ldc.upenn.edu/LDC94S17

ORTEGA GIMÉNEZ, A.- SUKNO, F.- LLEIDA SOLANO, E.- FRANGI CAREGNATO, A. MIGUEL ARTIAGA, A.- BUERA RODRÍGUEZ, L.- ZACUR, E. (2004) "Base de Datos Audiovisual y Multicanal en Castellano para Reconocimiento Automático del Habla Multimodal en el Automóvil", in SANCHIS ARNAL, E. (Ed.) Actas de las III Jornadas en Tecnología del Habla. 1Valencia, del 17-19 de noviembre de 2004. Organizadas por la Red Temática en Tecnología del Habla. Valencia: Departamento de Sistemas Informáticos y Computación, Facultad de Informática, Universidad Politécnica de Valencia. pp. 125-130.
http://diec.unizar.es/intranet/articulos/uploads/Base%20de%20Datos%20Audiovisual%20y%20Multicanal%20en%20Castellano%20para%20Reconocimiento%20Automatico%20del%20Habla%20Multimodal%20en%20el%20Automovil.pdf

ORTEGA GIMÉNEZ, A.- SUKNO, F.- LLEIDA SOLANO, E.- FRANGI CAREGNATO. A.- MIGUEL ARTIAGA, A.- BUERA RODRÍGUEZ, L.- ZACUR, E. (2004) "AV@CAR: A Spanish Multichannel Multimodal Corpus for In-Vehicle Automatic Audio-Visual Speech Recognition", in LREC 2004. Proceeedings of the 4th International Conference on Language Resources and Evaluation. 26-28 May, 2004, Lisbon, Portugal. Paris: ELRA, European Language Resources Association. pp. 763-766.
http://diec.unizar.es/intranet/articulos/uploads/lrec04def2.pdf.pdf

PINEDA, L.A.- VILLASEÑOR, L.- CUÉTARA, J.- CASTELLANOS, H.- LÓPEZ, I. (2004) "DIMEx100: A new phonetic and speech corpus for Mexican Spanish", in LEMAITRE, C.- REYES, C.A.- GONZÁLEZ, J.A. (Eds.) Iberamia 2004. Proceedings of the 9th Iberoamerican Conference on Artificial Intelligence. 22-26 de noviembre de 2004, Puebla, México. Berlin - Heidelberg: Springer (Lecture Notes in Computer Science, Lecture Notes in Artificial Intelligence, 3315). pp. 974-983.
http://springerlink.com/link.asp?id=dvgeabtmcb5rlgg1

Pineda, L. A., Castellanos, H., Cuétara, J., Galescu, L., Juárez, J., Llisterri, J., . . . Villaseñor, L. (2009). The corpus DIMEx100: Transcription and evaluation. Language Resources and Evaluation, 44(4), 347-370. doi:10.1007/s10579-009-9109-9

RENATO, A.C.- ÁLVAREZ, J.A. "Corpora of Latin American Spanish for research in prosody and synthesis", in SSW5 2004. Proceedings of the 5th ISCA Tutorial and Research Workshop on Speech Synthesis. 14 -16 June, 2004. Oakland, Pittsburgh, PA, USA.
http://www.isca-speech.org/archive_open/ssw5/ssw5_221.html

SIEMUND, R.- HÖGE, H.- KUNZMANN, S.- MARASEK, K. (2000) "SPEECON - Speech Data for Consumer Devices", in LREC 2000. Proceedings of the Second International Conference on Language Resources and Evaluation. 31 May - 2 June, 2000, Athens, Greece. Paris: ELRA, European Language Resources Association. Vol. 2, pp. 883-886.
http://www.speechdat.org/speecon/public_docs/lrec2000.pdf

Spanish Speecon database (ELDA-S0160). Siemens AG - Universitat Politècnica de Catalunya. Paris: ELDA, Evaluations and Language resources Distribution Agency.
http://www.elda.org/catalogue/en/speech/S0160.html

Spanish Speech Corpus 1 (Appen) (ELDA-S0149). Appen, Australia. Paris: ELDA, Evaluations and Language resources Distribution Agency.
http://www.elda.org/catalogue/en/speech/S0149.html

Spanish TTS Speech Corpus (Appen) (ELDA-S0150). Appen, Australia. Paris: ELDA, Evaluations and Language resources Distribution Agency.
http://www.elda.org/catalogue/en/speech/S0150.html

TAPIAS, A.- ACERO, A.- ESTEVE, J. - TORRECILLA, J.C. (1994) "The VESTEL Telephone Speech Database", in ICSLP 1994. Proceedings of the 3rd International Conference on Spoken Language Processing. 18 - 22 September, 1994. Yokohama, Japan. pp. 1811-1814.

Tlatoa Common Questions Corpus. Tlatoa, Grupo de Investigación en Tecnologías del Habla. Centro de Investigación en Tecnologías de Información y Automatización, Universidad de las Américas. Puebla, México.

Tlatoa/OGI Spanish TTS Corpus. Tlatoa, Grupo de Investigación en Tecnologías del Habla. Centro de Investigación en Tecnologías de Información y Automatización, Universidad de las Américas. Puebla, México.

TRANCOSO, I. (1995) "The ONOMASTICA Interlanguage Pronunciation Lexicon", in Eurospeech'95. Proceedings of the 4th European Conference on Speech Communication and Technology. Madrid, Spain, 18-21 September, 1995. Vol 1. pp. 829-832.

URAGA, E.- GAMBOA, C. (2004) "VOXMEX Speech Database : Design of a Phonetically Balanced Corpus", in LREC 2004. Proceeedings of the 4th International Conference on Language Resources and Evaluation. 26-28 May, 2004, Lisbon, Portugal. Paris: ELRA, European Language Resources Association. pp. 1471-1474.

VAHA, Voice Across Hispanic America (Polyphone II) (LDC96S41). Philadelphia, PA: Linguistic Data Consortium.
http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC96S41

VILLASEÑOR, L.- MONTES, M.- VAUFREYDAZ, D.- SERIGNAT, J.-F. (2003) "Elaboración de un corpus balanceado para el cálculo de modelos acústicos usando la web", in CIC 2003. XII Congreso Internacional de Computación. 13-17 de octubre de 2003. Ciudad de México, México.
http://www-prima.inrialpes.fr/Vaufreydaz/Telechargement/Villasenor03b.pdf

VILLASEÑOR, L.- MONTES, M.- VAUFREYDAZ, D.- SERIGNAT, J.-F. (2004) "Experiments on the the construction of a phonetically balanced corpus from the web", in GELBUKH, A. (Ed.) CICLing-2004. Proceedings of t5th International Conference on Intelligent Text Processing and Computational Linguistics. 15-21 February, 2004. Seoul, Korea. Berlin - Heidelberg: Springer (Lecture Notes in Computer Science, 2945) pp. 416-419.
http://ccc.inaoep.mx/~mmontesg/publicaciones/2004/PhoneticallyBalancedCorpus-cicling04.pdf

Spoken Language Corpora

C-ORAL-ROM, Corpus integrado de referencia en lenguas romances
ALCÁNTARA PLÁ, M.- MORENO SANDOVAL, A.- de la MADRID HEITZMANN, G.- GONZÁLEZ LEDESMA, A.- ARES CHICOTE, F. (2003) "C-ORAL-ROM. Corpus integrado de referencia en lenguas romances", XIX Congreso de la Sociedad Española para el Procesamiento del Lenguaje Natural. Universidad de Alcalá, 10, 11 y 12 de septiembre de 2003. Procesamiento del Lenguaje Natural 31: 301-302.
http://www.sepln.org/revistaSEPLN/revista/31/31-Pag301.pdf

CRESTI, E.- BACELAR do NASCIMENTO, F.- MORENO SANDOVAL, A.- VÉRONIS, J.- MARTIN, P.- CHOUKRI, K. (2004) "The C-ORAL-ROM CORPUS. A Multilingual Resource of Spontaneous Speech for Romance Languages", in LREC 2004. Proceeedings of the 4th International Conference on Language Resources and Evaluation. 26-28 May, 2004, Lisbon, Portugal. Paris: ELRA, European Language Resources Association.
http://lablita.dit.unifi.it/coralrom/papers/coralrom_lrec2004.pdf

CRESTI, E.- MONEGLIA, M. (Eds.) (2005) C-ORAL-ROM. Integrated Reference Corpora for Spoken Romance Languages. Amsterdam: John Benjamins (Studies in Corpus Linguistics 15) (including DVD).

CRESTI, E.- MONEGLIA, M.- BACELAR do NASCIMENTO, F.- MORENO SANDOVAL, A.- VÉRONIS, J.- MARTIN, P.- CHOUKRI, K.- MAPELLI, V.- FALAVIGNA, D.- CID, A.- BLUM, C. (2002) "The C-ORAL-ROM Project. New methods for spoken language archives in a multilingual romance corpus", in LREC 2002. Proceedings of the Third International Conference on Language Resources and Evaluation. 27 May - 2 June, 2002, Las Palmas de Gran Canaria, Spain. Paris: European Language Resources Association.
http://lablita.dit.unifi.it/coralrom/papers/Coralrom%20lrec.pdf

MORENO SANDOVAL, A. (2002) "La evolución de los corpus de habla espontánea: la experiencia del LLI-UAM", in RUBIO AYUSO, A. (Ed.) Actas de las II Jornadas en Tecnologías del Habla. Granada, del 16 al 18 de diciembre de 2002. Organizadas por la Red Temática en Tecnologías del Habla. Granada: Universidad de Granada, Departamento de Electrónica y Tecnología de Computadores.
http://www.lllf.uam.es/~sandoval/papers/corpus%20LLI.pdf

Corpus de conversación coloquial - Grupo Val.Es.Co
BRIZ, A. (Coord.) (1995) La conversación coloquial (Materiales para su estudio). València: Universitat de València, Facultad de Filología, Departamento de Filología Española (Lengua Española) (Cuadernos de FIlología, Anejo XVI).

BRIZ, A. (Coord.) (2001) Corpus de conversaciones coloquiales. Anejo 1 de Oralia. Madrid: ArcoLibros.

BRIZ, A. et al. (1993) "La elaboración de un corpus de español coloquial. Problemas metodológicos previos", Cahiers du Centre Interdisciplinaire des Sciencies du Langage, Actes du Colloque "Le Dialogue en question". Université de Toulouse -Le Mirail, Valencia, 1994. pp. 103-109.

BRIZ, A. (1996) "El corpus de conversación coloquial del grupo Val.Es.Co", in PAYRATÓ, Ll.- BOIX, E.- LLORET, M.-R.- LORENTE, M. (Eds.) Corpus, Corpora. Actes del 1er i 2on Col·loquis Lingüístics de la Universitat de Barcelona (CLUB-1, CLUB-2). Barcelona: Promociones y Publicaciones Universitarias SA. pp. 255-296.

BRIZ, A. et al. (1995) "La elaboración de un corpus de español coloquial. Problemas metodológicos previos", in Actas del I Congreso de Lingüística General. València: Universitat de València.

BRIZ, A.- GóMEZ MOLINA, J.R. (1992) "Scheme of Study of Colloquial Spanish: Some Methodological Considerations", LynX, A Monographic Series in Linguistics and World Perception 3: 111-124

CREA, Corpus de Referencia del Español Actual - SUbcorpus Oral
PINO MORENO, M.- SÁNCHEZ SÁNCHEZ, M. (1999) "El subcorpus oral del banco de datos CREA-CORDE (Real Academia Española): Procedimientos de transcripción y codificación", Oralia 2: 83-138.

Corpus Oral de Referencia del Español Contemporáneo
MARCOS MARÍN, F. (1991) "Corpus lingüístico de referencia de la lengua española", Boletín de la Academia Argentina de Letras 56: 129-155.

MARCOS MARÍN, F.- ZUMÁRRAGA, V. (1991) "El corpus de referencia de la lengua española", Razón y Fe 223/1, 109, Marzo 1991: 285-293.

MARCOS MARÍN, F.- BALLESTER, A.- SANTAMARÍA, C. (1993) "Transcription Conventions used for the Corpus of Spoken Contemporary Spanish", Literary & Linguistic Computing 8, 4: 283-292.
https://rowdyspace.utsa.edu/users/qkk563/public/FMMGraficos/Escritos/articulo/93LLCspoken.pdf

MARCOS MARÍN, F.- NICOLÁS MARTÍNEZ, M.C. (2003) "El etiquetado del Corpus Oral de Referencia del Español Contemporáneo", in SCARANO, A. (Ed.) Macro-syntaxe et Pragmatique. L’analyse linguistique de l’oral. Roma: Bulzoni, 2003, 321-328.

MORENO SANDOVAL, A. (2002) "La evolución de los corpus de habla espontánea: la experiencia del LLI-UAM", in RUBIO AYUSO, A. (Ed.) Actas de las II Jornadas en Tecnologías del Habla. Granada, del 16 al 18 de diciembre de 2002. Organizadas por la Red Temática en Tecnologías del Habla. Granada: Universidad de Granada, Departamento de Electrónica y Tecnología de Computadores.
http://www.lllf.uam.es/~sandoval/papers/corpus%20LLI.pdf

Norma lingüística culta de las ciudades del mundo hispánico
Macrocorpus de la norma lingüística culta de las principales ciudades del mundo hispánico (MC-NLCH). Preparado por José Antonio Samper Padilla, Clara Eugenia Hernández Cabrera y Magnolia Troya Déniz. Edición en CD-ROM. Las Palmas de Gran Canaria: Servicio de Publicaciones de la Universidad de Las Palmas de Gran Canaria, 1998.

Cuestionario para el estudio coordinado de la norma lingüística culta de las principales ciudades de Iberoamérica y de la Península Ibérica. I Fonética y Fonología. Madrid: PILEI - CSIC (Departamento de Geografía Lingüística I ), 1973.

ESGUEVA, M.- CANTARERO, M. (1981) El habla de la ciudad de Madrid. Materiales para su estudio. Madrid: CSIC.

LOPE BLANCH, J.M. (1986) El estudio del español hablado culto. Historia de un proyecto. México: Universidad Nacional Autónoma de México (Publicaciones del Centro de Lingüística Hispánica, 22)

LOPE BLANCH, J.M. (Coord.) (1971) El habla de la ciudad de México. Materiales para su estudio. México: Universidad Nacional Autónoma de México.

LOPE BLANCH, J.M. (Coord.) (1976) El habla popular de la ciudad de México. Materiales para su estudio. México: Universidad Nacional Autónoma de México.

LOPE BLANCH, J.M. (Coord.) (1995) El habla popular de la República Mexicana. Materiales para su estudio. México: Universidad Nacional Autónoma de México - El Colegio de México (Publicaciones del Centro de Lingüística Hispánica, 43).

SAMPER PADILLA, J.A. (1995) "Macrocorpus de la norma lingüística culta de las principales ciudades de España y América",Lingüística (Publicación de la Asociación de Lingüística y Filología de la América Latina) 7: 263-293.

PRESEEA, Proyecto para el Estudio Sociolingüístico del Español del España y de América
Ávila Muñoz, A. M., Lasarte Cervantes, M. C., & Villena Ponsoda, J. A. (Eds). (2008). El español hablado en Málaga II. Corpus oral para su estudio sociolingüístico. Nivel de estudios medio (incluye un CD-ROM). Málaga: Editorial Sarriá.

I. Introducción; 1.- PRESEEA y la investigación del español en el siglo XXI; 2.- El proyecto PRESEEA-Málaga. Estudio Sociolingüístico del Español Urbano de Málaga (ESESUMA); II. Corpus y lingüística de corpus; 3.- La lingüística de corpus. Una herramienta necesaria en la metodología (socio)lingüística actual; 4.- Niveles de acceso a los corpus orales transcritos. Aplicación al macrocorpus PRESEEA; 5.- Corpus PRESEEA-Málaga: nivel de estudios medio. Transcripción y etiquetado. Referencias bibliográficas. Transliteraciones. Entrevista 25. Entrevista 28. Entrevista 40.
Briceño, D. L., Fernández, M. F., Maldonado, J., Velazco, J., & Palm, P. (2010). Un nuevo corpus sociolingüístico del habla de Mérida: PRESEEA-MÉRIDA-VE. Lengua y Habla, 14, 1-11. Retrieved from http://erevistas.saber.ula.ve/index.php/lenguayhabla/article/view/1080

Lasarte Cervantes, M. C., Sánchez Sáez, J. M., Ávila Muñoz, A. M., & Villena Ponsoda, J. A. (Eds.). (2008). El español hablado en Málaga III. Corpus oral para su estudio sociolingüístico. Nivel de estudios superior (incluye un CD-ROM). Málaga: Editorial Sarriá.

Martín Butragueño, P., & Lastra, Y. (2011). Corpus sociolingüístico de la ciudad de México. Materiales de preseea-méxico. Volumen I. Hablantes de instrucción superior. México, D.F.: El Colegio de México.

Parte primera. Introducción; I.- Metodología; 1.- El proyecto PRESEEA; 2.- El proyecto PRESEA-Málaga. Estudio Sociolingüístico del Español Urbano de Málaga (ESESUMA); II.- Etiquetado del corpus. Problemas de anotación e intercambio; 0.- Objetivo; 1.- Niveles de acceso a los corpus orales transcritos y generación de tipos; 2.- Intercambio de documentos de distinto nivel; 3.- Transformación y validación de documentos a XML; 4.- Conclusiones; 5.- Apéndices; III.- Referencias bibliográficas; Parte segunda. Muestra de transliteración; Entrevista 46; Entrevista 65.
MORENO FERNÁNDEZ, F. (1997) "Metodología del 'Proyecto para el Estudio Sociolingüístico del Españo del España y de América'", in MORENO FERNÁNDEZ, F. (Ed.) Trabajos de sociolingüística hispánica. Alcalá de Henares: Universidad de Alcalá, Servicio de Publicaciones (Ensayos y Documentos, 27) pp. 137-167.

MORENO FERNÁNDEZ, F. (2003) Metodología del "Proyecto para el estudio sociolingüístico del español de España y de América" (Preseea). Versión revisada, Octubre de 2003.
http://www.linguas.net/LinkClick.aspx?fileticket=%2fthWeHX0AyY%3d&tabid=474&mid=928&language=es-ES

Vida Castro, M. (Ed.). (2007). El español hablado en Málaga I. Corpus oral para su estudio sociolingüístico. Nivel de estudios bajo (incluye un CD-ROM). Málaga: Editorial Sarriá.

Villena Ponsoda, J. A., vila Muñoz, A. M., Sánchez Bohorques, J. M., & Lasarte Cervantes, M. C. (2010). Problemas de anotación e intercambio en los corpus orales: Estrategias para la transformación de textos etiquetados en documentos XML. El caso de los corpus PRESEEA. Oralia. Análisis del Discurso Oral, 13, 261-323.

Convenciones de transcripción - Marcas y etiquetas PRESEEA - SGML
http://www.linguas.net/LinkClick.aspx?fileticket=NAyZ9Es5nC8%3d&tabid=474&mid=928&language=es-ES

Other resources
ALVAR EZQUERRA, M.- VILLENA PONSODA, J.A. (Coord.) (1994) Estudios para un corpus del español. Málaga: Universidad de Málaga (Analecta Malacitana, Anejo 7)

AZORÍN FERNÁNDEZ, D.- MARTÍNEZ LINARES, M.A.- SANTAMARÍA PÉREZ, M.I. (1999) "Léxico y creación léxica en un corpus oral de lenguaje juvenil", in FERNÁNDEZ GONZÁLEZ, J.- FERNÁNDEZ JUNCAL, C.- MARCOS SÁNCHEZ, M. - PRIETO DE LOS MOZOS, E.- SANTOS RÍO, L. (Eds.) Lingüística para el siglo XXI. III Congreso de Lingüística General (CLG3). Salamanca: Ediciones de la Universidad de Salamanca (Aquilafuente, 9). vol 1, pp. 217-228.

DOMÍNGUEZ, C.L. (1997) "El habla de Mérida: un corpus de estudio", Lengua y Habla 2.

DOMÍNGUEZ, C.L.- MORA, E. (Coords.) (1998) El habla de Mérida. Mérida (Venezuela): Universidad de Los Andes.

GALLARDO PAÚLS, B. - SANMARTÍN SÁEZ, J. (2005) Afasia fluente. Materiales para su estudio (Volumen 1 del corpus PerLA). València: Universitat de València.

GALLARDO PAÚLS, B. - MORENO CAMPOS, V. (2005) Afasia no fluente. Materiales y análisis pragmático (Volumen 2 del corpus PerLA). València: Universitat de València.

HERNÁNDEZ SACRISTÁN, C.- FERNÁNDEZ PEÑA, L. (1992) Conversación infantil. Materiales para su estudio en niños desde los cinco a los nueve años. Valencia: Promolibro.

MARTÍN ZORRAQUINO, M.A. (1991) "Estudio sociolingüístico del habla de Zaragoza: problemas y primeros resultados", in Actas del Congreso de Lingüistas Aragoneses, Zaragoza, 1991. pp. 169-200.

RODRÍGUEZ YÁÑEZ, J.P.- LORENZO, A.- RAMALLO, F.- ACUÑA FERREIRA, V.- ÁLVREZ LÓPEZ, S.- AMEAL GUERRA, A.- CASARES BERG, H.- VALVERDE JUNCAL, M. (2001) "El Corpus Informatizado de Fala Bilingïe Galego/Castelán de la Universidad de Vigo: presentación y problemas de identificación y etiquetado de los códigos gallego y castellano", in MORENO, A.I.- COLWELL, V. (Eds.) Perspectivas recientes sobre el discurso. Recent perspectives on discourse. León: Secretariado de Publicaciones y Medios Audiovisuales, Universidad de León - AESLA, Asociación Española de Lingüística Aplicada. (+ CD-ROM). p. 188.

VANN, R.E. (2003) "Digitizing and transcribing field recordings of Catalonian Spanish", in 3rd E-MELD (Electronic Metastructure for Endangered Languages Data) Workshop on Digitizing and Annotating Texts and Field Recordings. 11-13 July 2003. LSA Institute, Michigan State University.
http://emeld.org/workshop/2003/paper-Vann.html

VÁZQUEZ VEIGA, N. (1995) "'Corpus de lengua hablada en la ciudad de A Coruña': el rol del entrevistador en la conversación semidirigida", Moenia, Revista Lucense de Lingüística & Literatura 1: 181-202.

VERA LUJÁN, A. (1998) "Los medios de comunicación como recurso lingüístico (proyecto de acopio y distribución de materiales lingüísticos. Instituto Cervantes, España)", in La lengua española y los medios de comunicación. México: Siglo XXI Editores en coedición con la Secretaría de Educación Pública (México) y el Instituto Cervantes (España). Vol 2. pp. 1331-1338.
http://congresosdelalengua.es/zacatecas/ponencias/tecnologias/proyectos/vera.htm

Vann, R. E. (2009). Materials for the sociolinguistic description and corpus-based study of Spanish in Barcelona: Toward a documentation of colloquial Spanish in naturally occurring groups. Lewinston, NY: The Edwin Mellen Press.

Vila, M., González, S., Martí, M. A., Llisterri, J., & Machuca, M. J. (2010). ClInt: A bilingual Spanish-Catalan spoken corpus of clinical interviews. Procesamiento del Lenguaje Natural, 45, 105-111. Retrieved from http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/796

arrow_up

Applications of speech and spoken language resources

Research in phonetics


= Recommended introductory/general reading


= Recommended advanced reading

ALTENBERG, B. (1987) "Predicting text segmentation into tone units", en W. MEIJS (Ed.), Corpus Linguistics and Beyond. Preceedings on English Language Research on Computerized Corpora. Amsterdam: Rodopi. pp. 49-60; ; in SAMPSON, G.- McCARTHY, D. (Eds.) (2004) Corpus Linguistics: readings in a widening discipline. London - New York: Continuum International.

CAMPBELL, N. (1990) "Measuring Speech-Rate in the Spoken English Corpus", in AARTS, J.- MEIJS, W. (Eds.) Theory and Practice in Corpus Linguistics. Amsterdam: Rodopi (Language & Computers, Studies in Practical Linguistics 4). pp. 61-81.

CAMPBELL, N. (1996) "Speech timing in the SEC", in KNOWLES, G.- WICHMANN, A.- ALDERSON, P. (Eds.) Working with Speech: Perspectives on research into the Lancaster/IBM Spoken English Corpus. London & New York: Longman. pp. 214-232.

CARRERA, J. (1998) "Estudi del comportament dels segments /bl/, /gl/ i /r/", in PAYRATÓ, Ll. (Ed.) Oralment. Estudis de variació funcional. Barcelona: Publicacions de l'Abadia de Montserrat (Biblioteca Milà i Fontanals, 29). pp. 57-74.

CASTELLANOS, A.- BENEDÍ, J.-M.- CASACUBERTA, F. (1996) "An analysis of general acoustic-phonetic features for Spanish speech produced with the Lombard effect", Speech Communication 20, 1-2: 23-36.

CUÉTARA PRIEDE, J.O. (2004) Fonética de la ciudad de México. Aportaciones desde las tecnologías del habla. Tesis para obtener el título de Maestro en Lingüística Hispánica. Maestría en Lingüística Hispánica, Posgrado en Lingüística, Universidad Nacional Autónoma de México.


Harrington, J. (2010). Phonetic analysis of speech corpora. Chichester: Wiley-Blackwell. Retrieved from http://phonetik.uni-muenchen.de/~jmh/research/pasc010808/pasc.pdf

HIDALGO NAVARRO, A. (1997) La entonación coloquial. Función demarcativa y unidades de habla. Cuadernos de Filología (Anejo XXI). Valencia: Departamento de Filología Española (Lengua Española), Facultat de Filologia, Universitat de València.

KEATING, P.A. - BLANKENSHIP, B.- BYRD, D.- FLEMMING, E.- TODAKA, Y. (1992) "Phonetic analysis of the TIMIT corpus of American English at UCLA", UCLA Working Papers in Phonetics 81: 1-16.

KEATING, P.A.- BYRD, D.- FLEMMING, E.- TODAKA, Y (1994) "Phonetic analysis of word and segment variation using the TIMIT corpus of American English", Speech Communication 14, 1: 131-142

KEATING, P.A.- BYRD, D.- FLEMMING, E.- TODAKA, Y (1994) "Phonetic analysis of word and segment variation using the TIMIT corpus of American English", Speech Communication 14, 1: 131-142.

KNOWLES, G. (1992) "Pitch contours and tones in the Lancaster/IBM spoken English corpus", in LEITNER, G. (Ed) New Directions in English Language Corpora. Methodology, Results, Software Development. Berlin: Mouton de Gruyter. pp. 289-300

KNOWLES, G. (1996) "From text structure to prosodic structure", in KNOWLES, G.- WICHMANN, A.- ALDERSON, P. (Eds.) Working with Speech: Perspectives on research into the Lancaster/IBM Spoken English Corpus. London & New York: Longman. pp. 146-167.

KNOWLES, G.- WICHMANN, A.- ALDERSON, P. (Eds.) (1996) Working with Speech: Perspectives on research into the Lancaster/IBM Spoken English Corpus. London & New York: Longman.

LÓPEZ ESCOBEDO, F. (2004) El estudio de los diptongos del español de México para su aplicación en un reconocedor de habla. Tesis de Licenciatura en Lengua y Literaturas Hispánicas. Facultad de Filosofía y Letras, Universidad Nacional Autónoma de México.

MADDIESON, I. (1991) "Testing the universality of phonological generalizations with a phoneticaly specified segment database: results and limitations", UCLA Working Papers in Phonetics 78: 11-25.

MARTÍN BUTRAGUEÑO, P. (2003) "Hacia una descripción prosódica de los marcadores discursivos. Datos del español de México", in MARTÍN BUTRAGUEÑO, P.- HERRERA Z. E. (Eds.) La tonía. Dimensiones fonéticas y fonológicas. México: El Colegio de México, Centro de Estudios Lingüísticos y Literarios (Cátedra Jaime Torres Bodet, Estudios de Lingüística 4). pp. 375-402.
http://lef.colmex.mx/Sociolinguistica/Entonacion%20del%20espanol%20mexicano/Marcadores%20discursivos.pdf

Mora, E., Pietrosemoli, L., Cavé, C., Obediente, E., & La Cruz, E. (2005). Un corpus de pares mínimos para el español de Venezuela. Lengua y Habla, 9, 117-121.

MORA, J.C. (1998) "L’elisió i la intrusió contextual en la llengua oral: una anàlisi fonètica del català", in PAYRATÓ, Ll. (Ed.) Oralment. Estudis de variació funcional. Barcelona: Publicacions de l'Abadia de Montserrat (Biblioteca Milà i Fontanals, 29). pp. 75-90.

ORTIZ LIRA, H. (2003) "Los acentos tonales en un corpus de español de Santiago de Chile: su distribución y realización", in MARTÍN BUTRAGUEÑO, P.- HERRERA Z. E. (Eds.) La tonía. Dimensiones fonéticas y fonológicas. México: El Colegio de México, Centro de Estudios Lingüísticos y Literarios (Cátedra Jaime Torres Bodet, Estudios de Lingüística 4). pp. 303-318.

PICKERING, B. (1996) "Distributional features of TSMs in the SEC", in KNOWLES, G.- WICHMANN, A.- ALDERSON, P. (Eds.) Working with Speech: Perspectives on research into the Lancaster/IBM Spoken English Corpus. London & New York: Longman. pp. 109.128.

ROSADO ROBLEDO, L. (2003) El contacto dialectal: el caso de los inmigrantes yucatecos en la ciudad de México. Tesis de licenciatura. México: Universidad Nacional Autónoma de México.

RUDIN, E. - ELMER, W. (1993) "The 'Survey of English Dialects' as a phonetic database for research in areal and variationist linguistics" in FERNANDEZ-BARRIENTOS MARTÍN, J. (Ed.) Jornadas Internacionales de Lingüística Aplicada/International Conference of Applied Linguistics. Robert J. Di Pietro in Memorian. Actas/Proceedings. Granada: Instituto de Ciencias de la Educación de la Universidad de Granada. Vol. 2 pp. 666-673.

SAMPER, J.A. (1996) "El debilitamiento de /d/ en la norma culta de Las Palmas de Gran Canaria", in ARJONA, M.- LÓPEZ, J.- ENRÍQUEZ, A.- LÓPEZ, G.- NOVELLA, M.A. (Eds.) Actas del X Congreso Internacional de la Asociación de Filología y Lingüística de la América Latina. Veracruz, México, 11-16 de abril de 1993. México: Universidad Autónoma Nacional de México. pp. 791-796.

SAMPER, J.A.- TROYA, M. (2001) "Valores formánticos de la /e/ en sílaba abierta en la norma culta de Las Palmas de Gran Canaria", Estudios de Fonética Experimental (Universitat de Barcelona) 11: 41-66.

STENSTRÖM, A.-B. (1986) "A Study of Pauses as Demarcators in Discourse and Syntax", in AARTS, J.- . MEIJS, W. (Eds.) Corpus Linguistics II. New Studies in the Analysis and Exploitation of Computer Corpora. Amsterdam:Rodopi. pp. 203-218.

STENSTRÖM, A.-B. (1988) "Adverbial Commas and Prosodic Segmentation", in KYTÖ, M.- IHALAINEN, M.- RISSANEN, M. (Eds.) Corpus Linguistics. Hard and Soft. Proceedings of the Eighth International Conference on English Language Research on Computerized Corpora. Amsterdam: Rodopi. pp. 15-34.

TAYLOR, L. (1996) "The correlation between punctuation and tone group boundaries", in KNOWLES, G.- WICHMANN, A.- ALDERSON, P. (Eds.) Working with Speech: Perspectives on research into the Lancaster/IBM Spoken English Corpus. London & New York: Longman. pp. 129-145.

WICHMANN, A. (1991) "A study of up-arrows in the Lancaster/IBM Spoken English Corpus", in JOHANSSON, S.- STENSTRÖM, A. (Eds) English Computer Corpora. Selected Papers and Research Guide. Berlin: Mouton de Gruyter. pp. 165-178

WICHMANN, A. (1996) "Prosodic style: a corpus-based approach", in KNOWLES, G.- WICHMANN, A.- ALDERSON, P. (Eds.) Working with Speech: Perspectives on research into the Lancaster/IBM Spoken English Corpus. London & New York: Longman. pp. 168-188.

WILLIAMS, B. (1996) "The status of corpora as linguistic data", in KNOWLES, G.- WICHMANN, A.- ALDERSON, P. (Eds.) Working with Speech: Perspectives on research into the Lancaster/IBM Spoken English Corpus. London & New York: Longman. pp. 3-19.

arrow_up

Speech technologies

ATWELL, E. (1996) "Machine learning from corpus resources for speech and handwriting recognition", in THOMAS, J.- SHORT, M. (Eds.) Using Corpora for Language Research. Studies in Honour of Geoffrey Leech. London: Longman. pp. 151-166

BAKER, J.M. (1993) "Dictation, Directories and Data Bases. Emerging PC Applications for Large Vocabulary Speech Recognition" in EUROSPEECH 1993. Proceedings of the 3rd European Conference on Speech Communication and Technology. 21 - 23 September, 1993. Berlin, Germany. Vol. 1 pp. 3-12

BOULIANNE, G.- KENNY, P.- LENNIG, M.- O'SHAUGHNESSY, D.- MERMELSTEIN, P. (1994) "Books on tape as training data for continuous speech recognition", Speech Communication 14, 1: 61-70.

BERTENSTAM, J.- BLOMBERG, M.- CARLSON, R.- ELENIUS, K.- GRANSTRÖM, B.- GUSTAFSON, J.- HUNNICUTT, S.- HÖGBERG, J.- LINDELL, R.- NEOVIUS, L.- NORD, L.- SERPA-LEITAO, A.- STRÖM, N. (1995) "The Waxholm Application DataBase", in EUROSPEECH 1995. Proceedings of the 4th European Conference on Speech Communication and Technology. 18 - 21 September, 1995. Madrid, Spain. Vol 1, pp. 833-836.

BURGER, S.- DRAXLER, C. (1998) "Identifying Dialects of German from Digit Strings", in LREC 1998. Proceedings of the First International Conference on Language Resources and Evaluation. 28 - 30 May 1998. Granada, Spain. Paris: ELRA, European Language Resources Association. Vol. II. pp. 1053-1057.
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.43.3384

DRAXLER, C. (Ed.) (2000) Proceedings of the Workshop on Very Large Telephone Speech Databases. LREC 2000, Second International Conference on Language Resources and Evaluation. Athens, Greece, 29 May 2000. European Language Resources Association.

DRAXLER, C.- VAN DEL HEUVEL, H.- TROPF, H. (1998) "SpeechDat Experiences in Creating Large Multilingual Speech Databases for Teleservices", in LREC 1998. Proceedings of the First International Conference on Language Resources and Evaluation. 28 - 30 May 1998. Granada, Spain. Paris: ELRA, European Language Resources Association. Vol. I. pp. 361-366.

KENNY, P.- BOULIANNE, G.- GARUDADRI, H.- TRUDELLE, S.- HOLLAN, R.- LENNING, M.- O'SHAUGHNESSY, D. (1994) "Experiments in continuous speech recognition using books on tape", Speech Communication 14, 1: 49-60.

LAMEL, L.- ROSSET, S.- BENNACEF, S.- BONNEAU-MAYNARD, H.- DEVILLERS, L.- GAUVAIN, J.L. (1995) "Development of Spoken Language Corpora for Travel Information", in Eurospeech'95. Proceedings of the 4th European Conference on Speech Communication and Technology. Madrid, Spain, 18-21 September, 1995. Vol 3, pp. 1961-1964.

Llisterri, J., Machuca, M. J., Mota, C., Riera, M., & Ríos, A. (2005). Corpus orales para el desarrollo de las tecnologías del habla en español. Oralia. Análisis del Discurso Oral, 8, 289-325. Retrieved from http://liceu.uab.cat/~joaquim/publicacions/Llisterri_Machuca_Mota_Riera_Rios_05_Corpus_Orales_Tecnologias_Habla_Espanol.pdf

MACHUCA, M. J. (2006) "Corpus para el desarrollo de sistemas de diálogo", in LLISTERRI, J.- MACHUCA, M. J. (Eds.) Los sistemas de diálogo. Bellaterra - Soria: Universitat Autònoma de Barcelona, Servei de Publicacions - Fundación Duques de Soria (Manuals de la Universitat Autònoma de Barcelona, Lingüística, 45). pp. 61-79.

 MARIÑO, J.B.- PADRELL, J.- MORENO, A.- NADEU, C. (2000) "Monolingual and bilingual Spanish-Catalan speech recognizers developed from SpeechDat databases", in DRAXLER, C. (Ed.) Proceedings of the Workshop on Very Large Telephone Speech Databases. LREC 2000, Second International Conference on Language Resources and Evaluation. Athens, Greece, 29 May 2000. European Language Resources Association. pp. 57-61.

PICKERING, B. (1996) "Synthesising fundamental frequency contours: experimental results", in KNOWLES, G.- WICHMANN, A.- ALDERSON, P. (Eds.) Working with Speech: Perspectives on research into the Lancaster/IBM Spoken English Corpus. London & New York: Longman. pp. 203-213.

POLS, L. C. W. (1987) "Speech Technology and Corpus Linguistics", in W. MEIJS (Ed.) Corpus Linguistics and Beyond. Proceedings of the Seventh International Conference on English Language Research on Computerized Corpora. Amsterdam: Rodopi.

POLS, L.C.W. (1990) "How useful are speech databases for rule synthesis development and assessment?", in ICSLP 1990. Proceedings of the 1st International Conference on Spoken Language Processing. 19 - 22 November, 1990. Kobe, Japan. Vol 2, pp. 1289-1292.

POLS, L.C.W.- van SANTEN, J.P.H.- ABE, M.- KAHN, D.- KELLER, R. (1998) "The use of large text corpora for evaluating text-to-speech systems", in LREC 1998. Proceedings of the First International Conference on Language Resources and Evaluation. 28 - 30 May 1998. Granada, Spain. Paris: ELRA, European Language Resources Association. Vol. I, pp. 637-640; in SAMPSON, G.- McCARTHY, D. (Eds.) (2004) Corpus Linguistics: readings in a widening discipline. London - New York: Continuum International.

WESENICK, M.-B.- SCHIEL, F. (1995) Feasibility of Automatic Annotation and Building Pronunciation Lexica from Corpus Material. LRE-63314 SpeechDat, Report D3.1.2.3., Final version, 10 October 1995.
http://www.speechdat.org/speechdt/speechdat_m/deliverables/D3123.pdf

WILLIAMS, B.- ALDERSON, P. (1996) "Synthesizing British English intonation", in KNOWLES, G.- WICHMANN, A.- ALDERSON, P. (Eds.) Working with Speech: Perspectives on research into the Lancaster/IBM Spoken English Corpus. London & New York: Longman. pp. 191-202.

arrow_up

Linguistic analysis

Adolphs, S. (2009). Using a corpus to study spoken language. In S. Hunston & D. Oakey (Eds.), Doing applied linguistics: Key concepts and skills for postgraduate study. Oxford: Routledge.

AZORÍN, D.- MARTÍNEZ, M.A.- SANTAMARÍA, M.I. (1999) "Léxico y creación léxica en un corpus oral de lenguaje juvenil", in FERNÁNDEZ, J.- FERNÁNDEZ, C.- MARCOS, M. - PRIETO, E.- SANTOS, L. (Eds.) Lingüística para el siglo XXI. III Congreso de Lingüística General (CLG3). Salamanca: Ediciones de la Universidad de Salamanca (Aquilafuente, 9). vol 1, pp. 217-228.

BENDAZZOLI, C.- MONTI, C.- SANDRELLI, A.- RUSSO, M.- BARONI, M.- BERNARDINI, S.- MACK, G.- BALLARDINI, E.- MEAD, P. (2004) "Towards the creation of an electronic corpus to study directionality in simultaneous interpreting", in Compiling and Processing Spoken Language Corpora. LREC 2004, International Conference on Language Resources and Evaluation. 24th May 2004. Lisboa, Portugal. pp. 33-39.

BENTIVOGLIO, P.- SEDANO, M. (1993) "Investigación sociolingüística: sus métodos aplicados a una experiencia venezolana", Boletín de Lingüística 8: 3-35.

BERGLUND, Y. (1999) "Exploiting a large spoken corpus: An end-user's way to the BNC", International Journal of Corpus Linguistics 4,1: 29-52. BIBER, D.- JOHANSSON, S.- LEECH, G.- CONRAD, S.- FINEGAN, E. (1999) Longman Grammar of Spoken and Written English. London: Pearson Education.

BLANCHE-BENVENISTE, C. (1997) Approches de la langue parlée en français. Paris: OPHRYS (Collection L'Essentiel Français)

BLANCHE-BENVENISTE, C.- BILGER, M.- ROUGET, Ch.- van den EYNDE, K. (1991) Le français parlé. Etudes grammaticales. Paris: Editions du Centre National de la Recherche Scientifique (Sciences du Langage)

BORTOLINI, U. (1997) "L’uso del sistema CHILDES nell’analisi fonologica del linguaggio infantile", in BORTOLINI, U.- PUZZUTO, E. (Eds.) Il Progetto CHILDES-Italia. Contributi di ricerca sulla lingua italiana. Tirrenia: Edizioni del Cerro. pp. 13.42; in Quaderni del Centro di Studio per le Ricerche di Fonetica 16 (1997): 3-34.

CARTER, R.- McCARTHY, M. (1997) Exploring Spoken English. Cambridge: Cambridge University Press.

GARCÉS GÓMEZ, M. P. (1994) "Elementos de cohesión en el español hablado: 'pues'", in ALVAR EZQUERRA, M.- VILLENA PONSODA, J.A. (Coord.) Estudios para un corpus del español. Málaga: Universidad de Málaga (Analecta Malacitana, Anejo 7).

GARCÉS GÓMEZ, M. P. (1994) "Funciones y valores de 'entonces' en el español hablado", in ALVAR EZQUERRA, M.- VILLENA PONSODA, J.A. (Coord.) Estudios para un corpus del español. Málaga: Universidad de Málaga (Analecta Malacitana, Anejo 7).

GONZÁLEZ SALGADO, J.A. (2005) "Los corpus sonoros en la investigación de la lengua hablada", CLAC, Círculo de Lingüística Aplicada a la Comunicación 24.
http://www.ucm.es/info/circulo/no24/gsalgado.htm

HIDALGO NAVARRO, A. (1997) La entonación coloquial. Función demarcativa y unidades de habla. Cuadernos de Filología (Anejo XXI). Valencia: Departamento de Filología Española (Lengua Española), Facultat de Filologia, Universitat de València.

HIDALGO NAVARRO, A. (1998) "Alternancia de turnos y conversación. Sobre el papel regulador de los segmentos en el habla simultánea", Lingüística Española Actual 22, 2: 217-138.

HIDALGO NAVARRO, A. (1998) "Expresividad y función pragmática de la entonación en la conversación coloquial", Oralia. Análisis del discurso oral 1: 69-92.

HIDALGO NAVARRO, A. (2001) "Entonación y conversación: sucesión de turnos y superposiciones de habla", in de BUSTOS, J.J.- CHARADEAU, P.- GIRÓN, J.L.- IGLESIAS, S.- LÓPEZ ALONSO, C. (coord.) Lengua, discurso texto. I Simposio Internacional de Análisis del Discurso. Madrid: Visor. pp. 1597-1609.

HIDALGO NAVARRO, A. (2001) "Modalidad oracional y entonación. Notas sobre el funcionamiento pragmático de los rasgos suprasegmentales en la conversación", Moenia. Revista Lucense de Lingüística & Literatura 7: 271-292.

HIDALGO NAVARRO, A. (2003) "Microestructura discursiva y segmentación informativa en la conversación coloquial", ELUA, Estudios de Lingüística Aplicada, Universidad de Alicante 17: 367-385.

JIMÉNEZ RUIZ, J.L. (1999) "Campo de realización de la preposición "hasta" en el Corpus de la Variedad Juvenil Universitaria Alicantina", in FERNÁNDEZ, J.- FERNÁNDEZ, C.- MARCOS, M. - PRIETO, E.- SANTOS, L. (Eds.) Lingüística para el siglo XXI. III Congreso de Lingüística General (CLG3). Salamanca: Ediciones de la Universidad de Salamanca (Aquilafuente, 9). vol 2, pp. 963-972.

LORENZO SUÁREZ, A.M.- GÓMEZ GUINOVART, J. (1996) "Aspectos de análise lingüístico-cuantitativa automática do galego oral", in GÓMEZ GUINOVART, J.- LORENZO SUÁREZ, A. (Eds,) Lingüística e informática. Santiago de Compostela: Tórculo Edicións. pp. 57-86.

MARTÍN BUTRAGUEÑO, P. (2003) "Hacia una descripción prosódica de los marcadores discursivos. Datos del español de México", in MARTÍN BUTRAGUEÑO, P.- HERRERA Z. E. (Eds.) La tonía. Dimensiones fonéticas y fonológicas. México: El Colegio de México, Centro de Estudios Lingüísticos y Literarios (Cátedra Jaime Torres Bodet, Estudios de Lingüística 4). pp. 375-402.
http://lef.colmex.mx/Sociolinguistica/Entonacion%20del%20espanol%20mexicano/Marcadores%20discursivos.pdf

McCARTHY, M. (1999) Spoken Language and Applied Linguistics. Cambridge: Cambridge University Press.

RUDIN, E. - ELMER, W. (1993) "The 'Survey of English Dialects' as a phonetic database for research in areal and variationist linguistics" in FERNANDEZ-BARRIENTOS MARTÍN, J. (Ed) Jornadas Internacionales de Lingüística Aplicada/International Conference of Applied Linguistics. Robert J. Di Pietro in Memorian. Actas/Proceedings. Granada: Instituto de Ciencias de la Educación de la Universidad de Granada. vol. 2 pp 666-673

STENSTRÖM, A.-B. - SVARTVIK, J. (1994) "Imparsable speech: Repeats and other nonfluencies in spoken English", in OOSTDIJK, N.- DE HAAN, P. (Eds) Corpus-based Research into Language. Amsterdam: Rodopi. pp. 241-254

WILLIAMS, B. (1996) "The status of corpora as linguistic data", in KNOWLES, G.- WICHMANN, A.- ALDERSON, P. (Eds.) Working with Speech: Perspectives on research into the Lancaster/IBM Spoken English Corpus. London & New York: Longman. pp. 3-19.

arrow_up

Research in second language acquisition

Non-native spoken corpora

Cylwik, N., Wagner, A., & Demenko, G. (2009). The EURONOUNCE corpus of non-native Polish for asr-based pronunciation tutoring system. In SLaTE 2009. ISCA tutorial and research workshop on speech and language technology in education. Wroxall Abbey Estate, Warwickshire, England. 3-5 September, 2009.

Delais-Roussarie, E., & Yoo, H. -Y. (2011). Learner corpora and prosody: From the COREIL corpus to principles on data collection and corpus design. Poznań Studies in Contemporary Linguistics, 47(1), 26-39.

Detey, S., & Racine, I. (2010). Interphonologie, corpus et français langue étrangère: Le projet IPFC. Journée IPFC2010 : Interphonologie, corpus et français langue étrangère. Première journée du projet InterPhonologie du Français Contemporain. Fondation Maison des Sciences de l’homme, Paris, 8 décembre 2010. Retrieved from http://cblle.tufs.ac.jp/ipfc/assets/files/1-IPFC2010_Detey%26Racine_Interphonologie%20corpus%20et%20francais%20langue%20etrangere%20le%20projet%20IPFC.pdf

Racine, I., Zay, F., Detey, S., & Kawaguchi, Y. (2011). De la transcription de corpus à l’analyse interphonologique: Enjeux méthodologiques en FLE. In G. Col & N. Osu (Eds.), Transcrire, écrire, formaliser. Tavaux linguistiques du CerLICO, 24. (pp. 13-30). Rennes: Publications de l'Université de Rennes.

arrow_up

Research in clinical phonetics

Rudzicz, F., Namasivayam, A., & Wolff, T. (2011). The TORGO database of acoustic and articulatory speech from speakers with dysarthria. Language Resources and Evaluation, Online first, 1-19. doi:10.1007/s10579-011-9145-0

arrow_up

Documentation and teaching of minority languages

GRAAF, T. de (2002) "The use of archives and fieldwork for the study of the endangered languages of Russia", in Proceedings of the International LREC Workshop on Resources and Tools in Field Linguistics. LREC 2002.Third International Conference on Language Resources and Evaluation. Las Palmas, 26-27 May 2002. pp. 29-1 - 29-4.

I-wen SU, L. (2002) "Documentation of Formosan languages", in Proceedings of the International LREC Workshop on Resources and Tools in Field Linguistics. LREC 2002.Third International Conference on Language Resources and Evaluation. Las Palmas, 26-27 May 2002. pp. 32-1 - 32-8.

JACOBSON, M. (2004) "Corpus oraux en linguistique de terrain", in VÉRONIS, J. (Ed.) Le traitement automatique des corpus oraux, Traitement automatique des langues 45, 2: 63-88.

KOKKINAKIS, G.- COUTSOGEORGOPOULOS, H.- DERMATAS, H.- KAITSAS, G. (2000) "Electronic dictionary of pronunciation and usage of the Graecanic dialect of Southern Italy", in Ó CRÓINÍN, D. (Ed.) Proceedings of the Workshop on Developing Language Resources for Minority Languages: Reusability and Strategic Priorities. LREC 2000, Second International Conference on Language Resources and Evaluation. Athens, Greece, 30 May 2000. European Language Resources Association. pp. 30-40.

LEVIN, L.- VEGA, R.- CARBONELL, J.- BROWN, R.- LAVIE, A.- CAÑULEF, E.- HUENCHULLAN, C. (2002) "Data collection and language technologies for Mapudungun", in Proceedings of the International LREC Workshop on Resources and Tools in Field Linguistics. LREC 2002.Third International Conference on Language Resources and Evaluation. Las Palmas, 26-27 May 2002. pp. 18-1 - 18-4.

LJUBLINSKAJA, M.- SHERSTINOVA, T.- KUZNETSOVA, E. (2000) "Digital sounded lexicon of Nenets", in Ó CRÓINÍN, D. (Ed.) Proceedings of the Workshop on Developing Language Resources for Minority Languages: Reusability and Strategic Priorities. LREC 2000, Second International Conference on Language Resources and Evaluation. Athens, Greece, 30 May 2000. European Language Resources Association. pp. 71-74.

MERCIER, G.- SIROUX, J.- FAVEREAU, F.- LOUIS, F. (2000) "Courseware based on speech technology for Breton language pronunciation learning: Speech data bases and bilingual spoken dictionary", in Ó CRÓINÍN, D. (Ed.) Proceedings of the Workshop on Developing Language Resources for Minority Languages: Reusability and Strategic Priorities. LREC 2000, Second International Conference on Language Resources and Evaluation. Athens, Greece, 30 May 2000. European Language Resources Association. pp. 11-18.

arrow_up

Corpus Linguistics and Written Language Resources

Speech and Spoken Language Resources

Language resources


Speech and Spoken Language Resources - Bibliography
Joaquim Llisterri, Departament de Filologia Espanyola, Universitat Autònoma de Barcelona
http://liceu.uab.cat/~joaquim/language_resources/spoken_res/biblio_corpus_orals.html
Last updated: 6/9/14 23:46

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.