Speech Recognition
Bibliography


Speech recognition


General overviews


= Recommended introductory/general reading

Ainsworth, W. A. (1997). Some approaches to automatic speech recognition. In W. J. Hardcastle & J. Laver (Eds.), The handbook of phonetic sciences (pp. 721-743). Oxford: Blackwell

Anusuya, M. A., & Katti, S. K. (2009). Speech recognition by machine: A review. International Journal of Computer Science and Information Society, 6(3), 181-205. Retrieved from http://arxiv.org/pdf/1001.2267.pdf

Baker, J. M. (1987). State-of-the-art speech recognition, US research and business update. In J. Laver & M. Jack (Eds.), European Conference on Speech Technology (pp. 440-446). Edinburg: CEP Consultants.

Bernstein, J., & Franco, H. (1996). Speech recognition by computer. In N. J. Lass (Ed.), Principles of experimental phonetics (pp. 408-434). St. Louis: Mosby.

Bristow, G. (1986). The speech recognition problem. In G. Bristow (Ed.), Electronic speech recognition. Techniques, technology and applications (pp. 3-17). London: Collins.

Casacuberta, F., & Vidal, E. (1987). Reconocimiento automático del habla: metodologías y arquitecturas. In J. Mompín (Ed.), Inteligencia artificial: conceptos, técnicas y aplicaciones (pp. 167-177). Barcelona: Marcombo - Boixareu.

Casacuberta, F., & Vidal, E. (1990). Reconocimiento automático del habla. Estudios de Fonética Experimental, 4, 169-180. Retrieved from http://www.raco.cat/index.php/EFE/article/view/144289

Casacuberta, F. (1991). Aprendizaje automático en reconocimiento del habla. In Simposio de la lengua española. Ciencia y tecnología. Pabellón de España, Barcelona. 7-11 de octubre de 1991.

Cole, R. A., & Zue, V. (1997). Spoken language input. In R. A. Cole, J. Mariani, H. Uszkoreit, A. Zaenen, & V. Zue (Eds.), Survey of the state of the art in Human Language Technology (pp. 1-70). Cambridge: Cambridge University Press. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.366.9300

Chollet, G. (1994). Automatic speech and speaker recognition: Overview, current issues and perspectives. In E. Keller (Ed.), Fundamentals of speech synthesis and speech recognition. Basic concepts, state of the art and future challenges (pp. 129-148). Chichester: John Wiley & Sons.

Deroo, O. (n.d.). A short introduction to speech recognition. Mons: TCTS Lab, Théorie des Circuits et Traitement du Signal, Faculté Polytechnique de Mons. Retrieved from http://tcts.fpms.ac.be/asr/intro.php

Elphick, M. (1984). Speech recognition. In G. Bristow (Ed.), Electronic speech synthesis. Techniques, technology and applications (pp. 114-128). London: Granada.

Furui, S. (1991). Recent advances in speech recognition. In Eurospeech 1991. Proceedings of the 2nd European Conference on Speech Communication and Technology (Vol. 1, pp. 3-12). Genova, Italy. 24-26 September, 1991.

Grabianowski, E. (2006). How speech recognition works. HowStuffWorks.com. Retrieved from http://electronics.howstuffworks.com/gadgets/high-tech-gadgets/speech-recognition.htm

García Mateo, C. & Cardenal, A. (2008). Recoñecemento automático da fala: Ideas básicas e algúns exemplos. In E. Fernández Rei & X. L. Regueira (Eds.), Perspectivas sobre a oralidade. (pp. 249-72). Santiago de Compostela: Consello da Cultura Galega - Instituto da Lingua Galega.
http://consellodacultura.gal/mediateca/documento.php?id=10

Gauvain, J. L., & Lamel, L. (2002). Systèmes de reconnaissance, de compréhension et de dialogue. In J. Mariani (Ed.), Reconnaissance de la parole. Traitement automatique du langage parlé (Vol. 2, pp. 47-83). Paris: Hermès - Lavoisier.

Golderos, A., Martínez, R., Nombela, J. R., Pardo, J. M., Santos, J., & Muñoz, E. (1980). Comunicación hombre máquina por voz (y IV): El reconocimiento de la voz. Mundo Electrónico, 99, 131-134.

Huang, X., Baker, J., & Reddy, R. (2014). A historical perspective of speech recognition. Communications of the ACM, 57(1), 94-103. doi:10.1145/2500887

Huang, X., & Deng, L. (2010). An overview of modern speech recognition. In N. Indurkhya & F. J. Damerau (Eds.), Handbook of natural language processing (2nd ed.). Boca Raton, FL: CRC Press, Taylor and Francis. Retrieved from https://www.microsoft.com/en-us/research/publication/an-overview-of-modern-speech-recognition/

Juang, B.-H., & Rabiner, L. R. (2006). Speech recognition, Automatic: History. In K. Brown (Ed.), Encyclopedia of Language & Linguistics (2nd ed., pp. 806-819). Amsterdam: Elsevier. doi:10.1016/B0-08-044854-2/00906-8

Klatt, D. H. (1983). Human and automatic speech recognition. In M. P. R. van den Broecke & A. Cohen (Eds.), Proceedings of the 10th International Congress of Phonetic Sciences (pp. 183-186). Dordrecht: Foris.

Kurzweil, R. (1997). When will HAL understand what we are saying? Computer speech recognition and understanding. In D. G. Stork (Ed.), HAL’s Legacy. 2001’s computer as dream and reality (pp. 131-170). Cambridge, MA: The MIT Press. Retrieved from http://mitpress2.mit.edu/e-books/HAL/chap7/seven1.html

Lamel, L., & Gauvain, J. L. (2003). Speech recognition. In R. Mitkov (Ed.), The Oxford handbook of computational linguistics (pp. 305-322). Oxford: Oxford University Press.

Lea, W. A. (1974). Computer recognition of speech. In T. A. Sebeok (Ed.), Current Trends in Linguistics,12: Linguistics and Adjacent Arts and Sciences (Vol. 4, pp. 2765-2824). The Hague: Mouton.

Lea, W. A. (1986). The elements of speech recognition. In G. Bristow (Ed.), Electronic speech recognition. Techniques, technology and applications (pp. 49-129). London: Collins.

Levinson, S. E., & Liberman, M. Y. (1981). Speech recognition by computer. Scientific American, 244(4), 64-76.

Levinson, S. E., & Liberman, M. Y. (1981). Reconocimiento del habla por medio de ordenadores. Investigación y Ciencia, 57, 38-51.

Levinson, S. E., & Liberman, M. Y. (1989). Reconocimiento del habla por medio de ordenadores. In J. Agulló (Ed.), Acústica musical (pp. 106-121). Barcelona: Prensa Científica. (Original work published 1981)


Lleida, E. & Ortega, A. (2016). Reconocimiento del lenguaje hablado. In Á. L. Gonzalo (Ed.), Tecnologías del lenguaje en España. Comunicación inteligente entre personas y máquinas (pp. 1-18). Madrid - Barcelona: Fundación Telefónica - Ariel. Retrieved from https://www.fundaciontelefonica.com/arte_cultura/publicaciones-listado/pagina-item-publicaciones/itempubli/565/

Mariño, J. B., & Nadeu, C. (2004). La representación de la voz para el reconocimiento del habla. In M. A. Martí & J. Llisterri (Eds.), Tecnologías del texto y del habla (pp. 187-224). Barcelona - Soria: Edicions de la Universitat de Barcelona - Fundación Duques de Soria.

Moore, R. K. (1984). Overview of speech input. In J. N. Holmes (Ed.), Proceedings of the First International Conference on Speech Technology. Brighton, UK, October 23-25, 1984 (pp. 25-38). Amsterdam: North Holland.

Nadeu, C. (2001). Representación de la voz en el reconocimiento del habla. Quark. Ciencia, Medicina, Comunicación y Cultura, 21, 63-71. Retrieved from http://quark.prbb.org/21/021063.htm

Pardo, J. M. (1988). Reconocimiento del habla: una introducción. Procesamiento del Lenguaje Natural, 6, 3-16.

Peckham, J. B. (1984). Speech recognition - What is it worth? In J. N. Holmes (Ed.), Proceedings of the First International Conference on Speech Technology. Brighton, UK, October 23-25, 1984 (pp. 39-48). Amsterdam: North Holland.

Rabiner, L. R., & Juang, B.-H. (2010). Speech recognition by machine. In V. K. Madisetti (Ed.), The digital signal processing handbook. Video, speech and audio signal processing and associated standards (2nd ed., pp. 9-15). Roca Baton, FL: CRC Press.

Reddy, D. R. (1976). Speech recognition by machine: A review. Proceedings of the IEEE, 64(4), 501-531. doi:10.1109/PROC.1976.10158

Renals, S. & King, S. (2010). Automatic speech recognition. In W. J. Hardcastle, J. Laver, & F. E. Gibbon (Eds.), The handbook of phonetic sciences (2nd ed.). (pp. 804-38). Oxford: Wiley-Blackwell.

Roach, P., Miller, D., & Emslie, J. (1992). Speech analysis and recognition. In P. Roach (Ed.), Computing in linguistics and phonetics. Introductory readings (pp. 35-50). London: Academic Press.

Sopeña, L. de. (1993). Conversando con el ordenador. Reconocimiento automático del habla. Investigación y Ciencia, 200, 76-78.

Tapias, D. (2002). Interfaces de voz con lenguaje natural. In M. A. Martí & J. Llisterri (Eds.), Tratamiento del lenguaje natural. Tecnología de la lengua oral y escrita (pp. 189-207). Barcelona - Soria: Edicions Universitat de Barcelona - Fundación Duques de Soria.

Torres, M. I. (2006). El reconocimiento del habla. In J. Llisterri & M. J. Machuca (Eds.), Los sistemas de diálogo (pp. 81-98). Bellaterra - Soria: Universitat Autònoma de Barcelona - Fundación Duques de Soria.

Vaissière, J. (1985). Speech recognition: A tutorial. In F. Fallside & W. A. Woods (Eds.), Computer speech processing (pp. 191-242). Englewood Cliffs, NJ: Prentice Hall International.

Viglione, S. (1986). Recognition past and future. In G. Bristow (Ed.), Electronic speech recognition. Techniques, technology and applications (pp. 373-387). London: Collins.

Zue, V., Cole, R. A., & Ward, W. (1997). Speech recognition. In R. A. Cole, J. Mariani, H. Uszkoreit, A. Zaenen, & V. Zue (Eds.), Survey of the state of the art in human language technology (pp. 4-10). Cambridge: Cambridge University Press. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.366.9300l

up arrow

Textbooks

Ainsworth, W. A. (1988). Speech recognition by machine. London: P. Peregrinus on behalf of the Institution of Electrical Engineers.

1.- Introduction; 2.- Speech production and perception; 3.- Problems of automatic speech recognition; 4.- Techniques for signal processing; 5.- Speech-recognition algorithms; 6.- Architectures; 7.- Performance assessment; 8.- Applications; 9.- The future.

Casacuberta, F., & Vidal, E. (1987). Reconocimiento automático del habla. Barcelona: Marcombo - Boixareu.

1.- Introducción; 2.- Conceptos básicos en reconocimiento de formas; 3.- Preproceso y segmentación de la señal vocal; 4.- Aproximación global al reconocimiento del habla: (I) Palabras aisladas; 2.- Aproximación global al reconocimiento del habla: (II) Palabras conectadas; 6.- Aproximación analítica al reconocimiento del habla: (I) Métodos simbólicos; 7.- Aproximación analítica al reconocimiento del habla: (II) Métodos estocásticos; 8.- Aproximación analítica al reconocimiento del habla: (III) Métodos difusos; 9.- Métodos de inteligencia artificial; Apéndice: Síntesis del habla: Evolución histórica y situación actual.

Cater, J. P. (1984). Electronically hearing: Computer speech recognition. Indianapolis, IN: Howard W. Sams & Co.

1.- Introduction to voice processing; 2.- Characteristics of speech acoustics; 3.- Syntax and semantic interpretation; 4.- Speech signal-acquisition techniques; 5.- Methods of speech analysis; 6.- Feature extraction and pattern recognition; 7.- From words to actions (Interpreting commands); 8.- Applications of listening computers; 9.- A review of available voice-recognition systems; 10.- Building a working voice recognizer; 11.- Future directions of voice input systems. Appendix A: Glossary; Appendix B: Suggested readings and references; Appendix C: Manufacturers of speech-associated products.


Duxans, H., i Ruiz Costa-Jussà, M. (2012). Reconocimiento automático del habla. Barcelona: Universitat Oberta de Catalunya. Retrieved from https://www.exabyteinformatica.com/uoc/Audio/Procesamiento_de_audio/Procesamiento_de_audio_(Modulo_7).pdf

1.- Introducción al reconocimiento automático del habla; 2.- Aplicaciones de los reconocedores automáticos del habla; 3.- Funcionamiento básico de los reconocedores; 4.- El módulo de extracción de características; 5.- El módulo de descodificación; 6.- Técnicas de adaptación; 7.- Evaluación de la transcripción automática.

Haton, J.-P., Cerisara, C., Fohr, D., Laprie, Y., & Smaïli, K. (2006). Reconnaissance automatique de la parole. Du signal à son interprétation. Paris: Dunod. Book companion site: http://parole.loria.fr/livreParole/index.php

1.- Introduction à la reconnaissance automatique de la parole; 2.- La communication parlée; 3.- Analyse du signal vocal; 4.- Modèles acoustiques pour la reconnaissance automatique de la parole; 5.- Techniques avancées; 6.- La modélisation statistique du langage: application à la reconnaissance de la parole; 7.- La compréhension automatique de la parole; 8.- Robustesse de la reconnaissance de la parole; 9.- Mise en oeuvre d’un système; 10.- Un cadre articulatoire pour la reconnaissance automatique de la parole; 11.- Applications de la reconnaissance automatique de la parole.

Haton, J.-P., Pierrel, J.-M., Guy, P., Caelen, J., & Gauvain, J.-L. (1991). Reconnaissance automatique de la parole. Paris: Dunod.

1.- Présentation générale; 2.- Décodage acoustico-phonétique; 3.- Reconnaissance de mots isolés et de mots enchaînés; 4.- Lexique et phonologie.Parole et texte; 5.- Reconnaissance et compréhension de la parole continue.

Holmes, J. N. (1988). Speech synthesis and recognition. Wokingham: Van Nostrand Reinhold.

1.- Human speech communication; 2.- Mechanisms and models of human speech production; 3.- Mechanisms and models of the human auditory system; 4.- Digital coding of speech; 5.- Message synthesis from stores human speech components; 6.- Speech synthesis by rule; 7.- Speech recognition by pattern matching of whole words; 8.- Stochastic models for word recognition; 9.- Speech recognition for very large vocabularies; 10.- Possible future research directions for speech synthesis and recognition.

Holmes, J. N., & Holmes, W. (2001). Speech synthesis and recognition (2nd ed.). London - New York: Taylor and Francis. (Original work published 1988)

1.- Human speech communication; 2.- Mechanisms and models of human speech production; 3.- Mechanisms and models of the human auditory system; 4.- Digital coding of speech; 5.- Message synthesis from stored human speech components; 6.- Phonetic synthesis by rule; 7.- Speech synthesis from textual or conceptual input; 8.- Introduction to automatic speech recognition: template matching; 9.- Introduction to stochastic modelling; 10.- Introduction to front-end analysis for automatic speech recognition; 11.- Practical techniques for improving speech recognition performance; 12.- Automatic speech recognition for large vocabularies; 13.- Neural networks for speech recognition; 14.- Recognition of speaker characteristics; 15.- Applications and performance of current technology; 16.- Future research directions in speech synthesis and recognition; 17.- Further reading.

Jelinek, F. (1998). Statistical methods for speech recognition. Cambridge, MA: The MIT Press.

Llamas, C., & Cardeñoso, V. (1997). Reconocimiento automático del habla. Técnicas y aplicación. Valladolid: Secretariado de Publicaciones de la Universidad de Valladolid.

1.- Introducción; 2.- Reconocimiento del habla; 3.- Fundamentos lingüísticos del habla; 4.- Modelo físico del aparato fonador; 5.- Técnicas de alineamiento temporal no lineal; 6.- Distancias de medida local; 7.- Aplicación de redes neuronales artificiales; 8.- Reconocimiento de patrones temporales con MOM; 9.- Adquisición de la base de datos de voz; 10.- Clasificación de dígitos aislados utilizando DTW; 11.- Mejora de la clasificiación con una RNA; 12.- Clasificación de dígitos utilizando MOM; 13.- Notas finales; A.- Caracterización en rasgos de los fonemas del español; B.- Fonemas usuales; C.- Cálculo de los coeficientes LPC; D.- Fuentes C;- E.- Coeficientes cepstrales de MEL.

O’Shaughnessy, D. (2000). Speech communication. Human and machine (2nd ed.). New York: IEEE Press. (Original work published 1987)

1.- Introduction; 2.- Review of mathematics for speech processing; 3.- Speech production and acoustic phonetics; 4.- Hearing; 5.- Speech perception; 6.- Speech analysis; 7.- Coding of speech signals; 8.- Speech enhancement; 9.- Speech synthesis; 10.- Automatic speech recognition; 11.- Speaker recognition; Appendix: Computer sites for help on speech communication; References.

Poulton, A. S. (1983). Microcomputer speech synthesis and recognition. Wilmslow: Sigma Technical Press.

1.- Introduction; 2.- Human speech; 3.- Hearing; 4.- Signal processing techniques; 5.- Practical speech synthesis systems; 6.- Practical speech recognition systems; 7.- Present applications, future prospects.

Rabiner, L. R., & Huang, B. H. (1993). Fundamentals of speech recognition. Englewood Cliffs, NJ: Prentice Hall.

1.- Fundamentals of speech recognition; 2.- The speech signal: production, perception and acoustic-phonetic characterization; 3.- Signal processing and analysis methods for speech recognition; 4.- Pattern-comparison techniques; 5.- Speech recognition system design and implementation issues; 6.- Theory and implementation of Hidden Markov Models; 7.- Speech recognition based on connected word models; 8.- Large vocabulary continouos speech recognition; 9.- Task oriented applications of automatic speech recognition. up arrow

Compilations and conference proceedings

Bristow, G. (Ed.). (1986). Electronic speech recognition. Techniques, technology and applications. London: Collins.

Haton, J.-P. (Ed.). (1982). Automatic speech analysis and recognition. Proceedings of the NATO Advanced Studies Institute held at Bonas, France, June 29 - July 10, 1981. Dordrecht: Reidel.

House, A. S. (1988). The recognition of speech by machine - A bibliography. New York, NY: Academic Press.

Keller, E. (Ed.). (1994). Fundamentals of speech synthesis and speech recognition. Basic concepts, state of the art and future challenges. Chichester: John Wiley & Sons.

Laface, P., & De Mori, R. (Eds.). (1992). Speech recognition and understanding. Recent advances, trends and applications. Proceedings of the NATO Advanced Studies Institute held in Cetraro, Italy, July 1-13, 1990. Berlin - Heidelberg: Springer.

Lea, W. A. (Ed.). (1980). Trends in speech recognition. Englewood Cliffs, NJ: Prentice Hall.

Reddy, D. R. (Ed.). (1975). Speech recognition. Invited papers presented at the 1974 IEEE Symposium. New York, NY: Academic Press.

Schroeder, M. R. (Ed.). (1985). Speech and speaker recognition. Basel: Karger.

Schwab, E. E., & Nusbaum, H. C. (Eds.). (1986). Pattern recognition by humans and machines. Volume 1: Speech perception. Orlando, FL: Academic Press.

Suen, C. Y., & De Mori, R. (Eds.). (1982). Computer analysis and perception. Volume 2: Auditory signals. Roca Baton, FL: CRC Press.

Waibel, A., & Lee, K.-F. (Eds.). (1990). Readings in speech recognition. San Mateo, CA: Morgan Kaufmann.

Speech technologies: conference proceedings

up arrow

Speech recognition techniques

Anusuya, M. A., & Katti, S. K. (2011). Front end analysis of speech recognition: a review. International Journal of Speech Technology, 14(2), 99-145. doi:10.1007/s10772-010-9088-7

Woszczyna, M. (2001). Técnicas de reconocimiento del habla: entre la precisión y la velocidad. Quark. Ciencia, Medicina, Comunicación y Cultura, 21, 72-78. Retrieved from http://quark.prbb.org/21/021062.htm

Statistical methods

Bellegarda, J. (1997). Statistical techniques for robust ASR: Review and perspectives. In Eurospeech 1997. Proceedings of the 5th European Conference on Speech Communication and Technology (Vol. 1, pp. 33-36). Rhodes, Greece. 22-25 September, 1977.

Cox, S. J. (1990). Hidden Markov Models for automatic speech recognition: Theory and application. In C. Wheddon & R. Linggard (Eds.), Speech and language processing (pp. 209-230). London: Chapman and Hall.

Huang, X., Ariki, Y., & Jack, M. (1990). Hidden Markov Models for speech recognition. Edinburgh: Edinburgh University Press.

Jelinek, F. (1998). Statistical methods for speech recognition. Cambridge, MA: The MIT Press.

Jouvet, D. (1996). Modèles de Markov pour la reconnaissance de la parole. In H. Méloni (Ed.), Fondements et perspectives en traitement automatique de la parole (pp. 255-238). Paris: Éditions AUPELF-UREF.

Knill, K., & Young, S. (1997). Hidden Markov Models in speech and language processing. In S. Young & G. Bloothooft (Eds.), Corpus-based methods in language and speech processing (pp. 27-68). Dordrecht: Kluwer.

Rabiner, L. R., & Juang, B.-H. (2006). Speech recognition: Statistical methods. In K. Brown (Ed.), Encyclopedia of Language & Linguistics (pp. 1-18). Amsterdam: Elsevier. doi:10.1016/B0-08-044854-2/00907-X

up arrow

Phonetic and linguistic knowledge in speech recognition

Scharenborg, O. (2007). Reaching over the gap: A review of efforts to link human and automatic speech recognition research. Speech Communication, 49(5), 336-347. doi:10.1016/j.specom.2007.01.009

Phonetic knowledge in speech technology

Phonetic knowledge


= Recommended introductory/general reading

ADDA-DECKER, M.- de MAREÜIL, P.B.- ADDA, G.- LAMEL, L. (2005) "Investigating syllabic structures and their variation in spontaneous French", Speech Communication 46, 2: 119-139.
http://dx.doi.org/10.1016/j.specom.2005.03.006

ADDA-DECKER, M.- LAMEL, L. (1998) "Pronunciation variants across systems, languages and speaking style", in STRIK, H.- KESSENS, J.- WESTER, M. (Eds.) Proceedings of the Workshop Modeling Pronunciation Variation for Automatic Speech Recognition. Rolduc, 4-6 May 1998. pp. 1-6.

ADDA-DECKER, M.- LAMEL, L. (2000) "The use of lexica in automatic speech recognition", in VAN EYNDE, F.- GIBBON, D. (Eds.) Lexicon Development for Speech and Language Processing. Dordrecht: Kluwer Academic Publishers (Text, Speech and Language Technology, 12). pp. 235-266.

AINSWORTH, W.A. (2005) "Can phonetic knowledge be used to improve the performance of speech recognisers and synthesisers?", in BARRY, W.J.- van DOMMELEN, W.A. (Eds.) The Integration of Phonetic Knowledge in Speech Technology. Dordrecht: Springer (Text, Speech and Language Technology, 25). pp. 13-20.

Aubanel, V., & Nguyen, N. (2010). Automatic recognition of regional phonological variation in conversational interaction. Speech Communication, In Press, Accepted Manuscript. doi:10.1016/j.specom.2010.02.008

BATES, R. A. - OSTENDORF, M. - WRIGHT, R. A. (2007) "Symbolic phonetic features for modeling of pronunciation variation", Speech Communication 49, 2: 83-97.
http://dx.doi.org/10.1016/j.specom.2006.10.007

BECKER, R.W.- POZA, F. (1975) "Acoustic Phonetic Research in Speech Understanding", IEEE Transactions on Acoustics, Speech and Signal Processing ASSP-23, 5: 416-426.

BENZEGHIBA, M. - DE MORI, R. - DEROO, O. - DUPONT, S. - ERBES, T. - JOUVET, D. - FISSORE, L. - LAFACE, P. - MERTINS, A. - RIS, C. - ROSE, R. - TYAGI, V. - WELLEKENS, C. (2007) "Automatic speech recognition and speech variability: A review", Speech Communication 49. 10-11: 763-786.
http://dx.doi.org/10.1016/j.specom.2007.02.006

BLADON, A. (1985) "Acoustic Phonetics, Auditory Phonetics, Speaker Sex and Speech Recognition: A Thread" , in FALLSIDE, F.- WOODS, W.A. (Eds.) (1985) Computer Speech Processing. Englewood Cliffs, N.J. : Prentice Hall International. pp. 29-38.

BROAD, D.J.- SHOUP, J.E. (1975) "Concepts for Acoustic Phonetic Recognition", in REDDY, R.D. (Ed.) Speech Recognition. Invited Papers Presented at the 1974 IEEE Symposium. New York: Academic Press. pp 243-274.

Caballero, M., Moreno, A., & Nogueiras, A. (2009). Multidialectal Spanish acoustic modeling for speech recognition. Speech Communication, 51(3), 217-229. doi:10.1016/j.specom.2008.08.003

CHRISTENSEN, H.- LINDGREN, B.- ANDERSEN, O. (2005) "Introducing phonetically motivated, heterogeneous information into automatic speech recognition", in BARRY, W.J.- van DOMMELEN, W.A. (Eds.) The Integration of Phonetic Knowledge in Speech Technology. Dordrecht: Springer (Text, Speech and Language Technology, 25). pp. 67-86.

DUSAN, S.- RABINER, L.R. (2005) "On integrating insights from human speech perception into automatic speech recognition", in EUROSPEECH 2005 - INTERSPEECH 2005. Proceedings of the 9th european conference on speech communication and technology. 4-8 September, 2005. Lisbon, Portugal. pp. 1233-1236.
http://www.isca-speech.org/archive/interspeech_2005/i05_1233.html

FERREIROS, J.- MACÍAS GUARASA, J.- PARDO, J.M.- VILLARRUBIA, L. (1998) "Introducing multiple pronunciations in Spanish speech recognition systems", in STRIK, H.- KESSENS, J.- WESTER, M. (Eds.) Proceedings of the Workshop Modeling Pronunciation Variation for Automatic Speech Recognition. Rolduc, 4-6 May 1998. pp. 29-34.

FOSLER-LUSSIER, E.- GREENBERG, S.- MORGAN, N. (1999) "Incorporating contextual phonetics into automatic speech recognition", in OHALA, J.J.- HASAGAWA, Y.- OHALA, M.- GRANVILLE, D.- BAILEY, A.C. (Eds.) Proceedings of the 14th International Congress of Phonetic Sciences. San Francisco, 1-7 August 1999.
https://www.icsi.berkeley.edu/icsi/node/3033

FOSLER-LUSSIER, E.- BYRNE, W.- JURAFSKY, D. (Eds.) (2005) Pronunciation Modeling and Lexicon Adaptation. Special Issue. Speech Communication 46, 2.

Goldwater, S., Jurafsky, D., & Manning, C. D. (2010). Which words are hard to recognize? Prosodic, lexical, and disfluency factors that increase speech recognition error rates. Speech Communication, 52(3), 181-200. doi:10.1016/j.specom.2009.10.001

GRAVIER, G.- YVON, F.- JACOB, B.- BIMBOT, F. (2005) "Introducing contextual transcription rules in large vocabulary speech recognition", in BARRY, W.J.- van DOMMELEN, W.A. (Eds.) The Integration of Phonetic Knowledge in Speech Technology. Dordrecht: Springer (Text, Speech and Language Technology, 25). pp. 87-106.

GREENBERG, S. (1998) "Recognition in a new key - Towards a science of spoken language", in ICASSP 1998. Proceedings of the 1998 IEEE international conference on acoustics, speech and signal processing. 12 -15 May, 1998. Seattle, Washington, USA. pp. 1401-1405.
http://www1.icsi.berkeley.edu/~steveng/PDF/Recognition_in_a_New_Key.pdf

HAIN, T. (2005) "Implicit modelling of pronunciation variation in automatic speech recognition", Speech Communication 46, 2: 171-188.
http://dx.doi.org/10.1016/j.specom.2005.03.008

HARRINGTON, J. (1988) "Acoustic Cues for Automatic Recognition of English Consonants", in JACK, M.- LAVER, J. (Eds.) Aspects of Speech Technology. Edinburgh: Edinburgh University Press pp. 69-143.

KLATT, D. H. (1985) "The problem of variability in speech recognition and in models of speech perception", in J.A. PERKELL - D.H. KLATT (Eds.) Variability and Invariance in Speech Processes. Hillsdale, N.J.: Lawrence Erlbaum Ass. pp. 300-324.

KOREMAN, J.- ANDREEVA, B. (2000) "Can we use the linguistic information in the signal?", Phonus (Institute of Phonetics, University of the Saarland) 5: 47-58.
http://www.coli.uni-saarland.de/groups/WB/Phonetics/contents/phonus-pdf/phonus5/Koreman_PHONUS5.pdf

LI, D. - DONG, Y.- ACERO, A. (2006) "A bidirectional target-filtering model of speech coarticulation and reduction: Two-stage implementation for phonetic recognition", IEEE Transactions on Audio, Speech and Language Processing 14, 1: 256-265.
http://dx.doi.org/10.1109/TSA.2005.854107

NOLAN, F. (1986) "The nature of speech", in BRISTOW, G. (Ed.) (1986) Electronic Speech Recognition. Techniques, Technology and Applications. London: Collins.pp. 18-48.

OSTENDORF, M. (2000) "Incorporating linguistic theories of pronunciation variation into speech recognition models", in SPARCK JONES, K.- GAZDAR, G.- NEEDHAM, R. (Eds.) Computers, language and speech: Formal theories and statistical Data. Papers from a Royal Society / British Academy Discussion Meeting, September 1999. London: The Royal Society (Philosophical Transactions of the Royal Society, Series A: Mathematical, Physical en Engineering Sciences, Vol. 358, Issue 1769).

PASTOR, M.- CASACUBERTA, F. (2005) "Pronunciation modeling", in BARRY, W.J.- van DOMMELEN, W.A. (Eds.) The Integration of Phonetic Knowledge in Speech Technology. Dordrecht: Springer (Text, Speech and Language Technology, 25). pp. 133-148.

POLS, L.C.W. (1997) "Flexible, robust, and efficient human speech recognition", Proceedings of the Institute of Phonetic Sciences, University of Amsterdam 21: 1-10.
http://www.fon.hum.uva.nl/archive/1997/1997-Proc21-Pols.pdf

PRUTHI, T.- ESPY-WILSON, C.Y. (2004) "Acoustic parameters for the automatic detection of nasal manner", Speech Communication 43, 3: 241-266.
http://dx.doi.org/10.1016/j.specom.2004.06.001

SCHARENBORG, O. - WAN, V. - MOORE, R. K. (2007) "Towards capturing fine phonetic variation in speech using articulatory features", Speech Communication 49, 10-11: 811-826.
http://dx.doi.org/10.1016/j.specom.2007.01.005

SCHRAMM, H. - AUBERT, X.- BAKKER, B.- MEYER, C.- NEY, H. (2006) "Modeling spontaneous speech variability in professional dictation", Speech Communication 48, 5: 493-515.
http://dx.doi.org/10.1016/j.specom.2005.08.003

SROKA, J.J.- BRAIDA, L.D. (2005) "Human and machine consonant recognition", Speech Communication 45, 4: 401-423.
http://dx.doi.org/10.1016/j.specom.2004.11.009


STRIK, H.- CUCCHIARINI, C. (1999) "Modeling pronunciation variation for ASR: A survey of the literature", in STRIK, H. (Ed.) Special Issue on Modeling Pronunciation Variation for Automatic Speech Recognition. Speech Communication 29, 2-4: 225-246.
http://hstrik.ruhosting.nl/wordpress/wp-content/uploads/2013/04/a64.pdf

STRIK, H. (Ed.) Special Issue on Modeling Pronunciation Variation for Automatic Speech Recognition. Speech Communication 29, 2-4.

STRIK, H.- KESSENS, J.- WESTER, M. (Eds.) (1998) Proceedings of the Workshop Modeling Pronunciation Variation for Automatic Speech Recognition. Rolduc, 4-6 May 1998. ESCA, European Speech Communication Association; COST Action 249, Continuous Speech over the Telephone; A2RT, Automatic Acoustic Recognition Technologie.

SUOMI, K. (1987) "On spectral coarticulation in stop-vowel-stop syllables: implications for automatic speech recognition", Journal of Phonetics 15,1: 85-100.

URAGA, E.- PINEDA, L. (2002) "Automatic Generation of Pronunciation Lexicons for Spanish", in GELBUKH, A. (Ed.) Computational Linguistics and Intelligent Text Processing. Proceedings of the Third International Conference, CICLing 2002. México City, México, February 17-23, 2002. Heidelberg: Springer Verlag (Lectures Notes in Computer Science, 2276). pp. 330-338.

ZOLNAY, A. - KOCHAROV, D. - SCHLÜTER, R. - NEY, H. (2007) "Using multiple acoustic feature sets for speech recognition", Speech Communication 49, 6: 514-525.
http://dx.doi.org/10.1016/j.specom.2007.04.005

ZUE, V.W. (1983) "The use of phonetic rules in automatic speech recognition", Speech Communication 2, 2/3 : 181-186.

ZUE, V.W. (1985) "The Use of Speech Knowledge in Automatic Speech Recognition", Proceedings of the IEEE 73,11: 1602-1615.

ZUE, W.V. - SCHWARTZ, R.M. (1980) "Acoustic Processing and Phonetic Analysis", in LEA, W.A. (Ed.) Trends in Speech Recognition. Englewood Cliffs: Prentice Hall (Prentice Hall Signal Processing Series) pp. 101-124.

Spectrogram reading and speech recognition

Cole, R. A., Rudnicky, A. I., Zue, V., & Reddy, R. D. (1980). Speech as patterns on paper. In R. A. Cole (Ed.), Perception and production of fluent speech. (pp. 3-50). Hillsdale, NJ: Lawrence Erlbaum.

Connolly, J. H., Edmonds, E. A., Guzy, J. J., Johnson, S. R., & Woodcock, A. (1986). Automatic speech recognition based on spectrogram reading. International Journal of Man-Machine Studies, 24(6), 611-621. doi:10.1016/S0020-7373(86)80012-8

Gabrys, G. (1990). Difficulty in learning to read speech spectrograms: The role of visual segmentation (Technical Report LRDC/PITT/IMP-1. Cognitive Science Program. Office of Naval Research). Pittsburgh: Learning Research and Development Center, University of Pittsburgh. Retrieved from http://www.dtic.mil/docs/citations/ADA218827

Greene, B. G., Pisoni, D. B., & Carrell, T. D. (1984). Recognition of speech spectrograms. The Journal of the Acoustical Society of America, 76(1), 32-43. doi:10.1121/1.391035

Hatazaki, K., Komori, Y., Kawabata, T., & Shikano, K. (1990). Phoneme segmentation expert system using spectrogram reading knowledge. Systems and Computers in Japan, 21(12), 90-100. doi:10.1002/scj.4690211210

Ingemann, F., & Mermelstein, P. (1975). Speech recognition through spectrogram matching. The Journal of the Acoustical Society of America, 57(1), 253-255. Retrieved from http://www.haskins.yale.edu/Reprints/HL0166.pdf

Johannsen, J., MacAllister, J., Michalek, T., & Ross, S. (1983). A speech spectrogram expert. In ICASSP 1983. Proceedings of the IEEE international conference on acoustics, speech and signal processing. (pp. 746-9). Boston, Massachusetts, USA. April 14-16, 1983. doi:10.1109/ICASSP.1983.1172057

Katagiri, S., & Yokota, M. (1987). Phoneme recognition using visual features on speech spectrograms. In European conference on speech technology. (pp. 1365-8). Edinburgh, Scotland, UK. September 1987. Retrieved from http://www.isca-speech.org/archive/ecst_1987/e87_1365.html

Klatt, D. H., & Stevens, K. N. (1972). Sentence recognition from visual examination of spectrograms and machine-aided lexical searching. In 1972 Conference on speech communication and processing. (pp. 315-8). New York: IEEE Press.

Klatt, D. H., & Stevens, K. N. (1973). On the automatic recognition of continuous speech: Implications from a spectrogram-reading experiment. IEEE Transactions on Audio and Electroacoustics, 21(3), 210-217. doi:10.1109/TAU.1973.1162453

Lamel, L. (1988). Formalizing knowledge used in spectrogram reading: Acoustic and perceptual evidence from stops (RLE Technical Report 537). Cambridge, MA: Research Laboratory of Electronics, Massachusetts Institute of Technology. Retrieved from http://dspace.mit.edu/bitstream/handle/1721.1/4955/RLE-TR-537-20137092.pdf

Lamel, L. (1993). A knowledge-based system for stop consonant identification based on spectrogram reading. Computer Speech and Language, 7(2), 169-191. Retrieved from ftp://tlp.limsi.fr/public/lamel_csl_93.pdf

Leung, H., & Zue, V. (1986). Visual characterization of speech spectrograms. In ICASSP 1986. Proceedings of the IEEE international conference on acoustics, speech and signal processing. (pp. 2751-4). Tokyo, Japan. April 8 - 11, 1986. doi:10.1109/ICASSP.1986.1168558

Memmi, D., Eskenazi, M., Mariani, J., & Nguyen-Xuan, A. (1983). Un système expert pour la lecture de sonagrammes. Speech Communication, 2(2-3), 234-236. doi:10.1016/0167-6393(83)90037-7

Stern, P. E., Eskenazi, M., & Memmi, D. (1986). An expert system for speech spectrogram reading. In ICASSP 1986. Proceedings of the IEEE international conference on acoustics, speech and signal processing. (pp. 1193-6). Tokyo, Japan. April 8 - 11, 1986. doi:10.1109/ICASSP.1986.1168793

Zue, V., & Cole, R. (1979). Experiments on spectrogram reading. In ICASSP 1979. Proceedings of the IEEE international conference on acoustics, speech and signal processing. (pp. 116-9). Washington, District of Columbia, USA. April 2 - 4, 1979. doi:10.1109/ICASSP.1979.1170735

Zue, V., & Lamel, L. (1986). An expert spectrogram reader: A knowledge-based approach to speech recognition. In ICASSP 1986. Proceedings of the IEEE international conference on acoustics, speech and signal processing. (pp. 1197-200). Tokyo, Japan. April 8 - 11, 1986. doi:10.1109/ICASSP.1986.1168798

Spectrographic analysis of speech

up arrow

Prosodic knowledge

BATLINER, A.- MÖBIUS, B. (2005) "Prosodic models, automatic speech understanding, and speech synthesis: Towards the common ground?", in BARRY, W.J.- van DOMMELEN, W.A. (Eds.) The Integration of Phonetic Knowledge in Speech Technology. Dordrecht: Springer (Text, Speech and Language Technology, 25). pp. 21-44.

BARTKOVA, K. (1997) "Some experiments about the use of prosodic parameters in a speech recognition system", in Proceedings of the ESCA Workshop on Intonation. Athens, 18-20 September 1997. pp. 33-36.

BARTKOVA, K.- JOUVET, D. (1999) "Selective prosodic post-processing for improving recognition of French telephone numbers", in Eurospeech’99, 6th European Conference on Speech Communication and Technology. Budapest, Hungary, 5-10 September 1999. Vol 1 pp. 267-270.

BASSI, A.- BECERRA YOMA, N.- LONCOMILLA, P. (2006) "Estimating tonal prosodic discontinuities in Spanish using HMM", Speech Communication 48, 9: 1112-1125.
http://dx.doi.org/10.1016/j.specom.2006.03.006

CAMPBELL, N. (1993) "Automatic detection of prosodic boundaries in speech", Speech Communication 13, 3-4: 343-354.

CHEN, K. - HASEGAWA-JOHNSON, M. - COHEN, A. - BORYS, S. - SUNG-SUK, K. - COLE, J. - JEUNG-YOON, C. (2006) "Prosody dependent speech recognition on radio news corpus of American English", IEEE Transactions on Audio, Speech and Language Processing 14, 1: 232-245.
http://dx.doi.org/10.1109/TSA.2005.853208

ESCUDERO, D.- CARDEÑOSO, V. (2002) "Una experiencia en reconocimiento automático de tipos de unidades melódicas a partir de su perfil de entonación", in DÍAZ GARCÍA, J. (Ed.) Actas del II Congreso de Fonética Experimental. Sevilla 5, 6 y 7 de marzo de 2001. Sevilla: Laboratorio de Fonética, Facultad de Filología, Universidad de Sevilla. pp. 161-166.

GARCÍA, C.- TAPIAS, D. (2000) "La frecuencia fundamental de la voz y sus efectos en reconocimiento de habla continua", Procesamiento del Lenguaje Natural, Revista n. 26: 163-167.

Goldwater, S., Jurafsky, D., & Manning, C. D. (2010). Which words are hard to recognize? Prosodic, lexical, and disfluency factors that increase speech recognition error rates. Speech Communication, 52(3), 181-200. doi:10.1016/j.specom.2009.10.001

HASEGAWA-JOHNSON, M.- CHEN, K.- COLE, J.- BORYS, S.- KIM, S.-S.- COHEN, A.- ZHANG, T.- CHOI, J.-Y.. KIM, H.- YOON, T.- CHAVARRIA, S. (2005) "Simultaneous recognition of words and prosody in the Boston University Radio Speech Corpus", Speech Communication 46: 418-439.
http://dx.doi.org/10.1016/j.specom.2005.01.009

KOMPE, R. (1997) Prosody in Speech Understanding Systems. Berlin-New York: Science Springer (Lecture Notes in Artificial Intelligence, Vol. 1307 Subseries of Lecture Notes in Computer Science Springer).

LEA, W.A. (1980) "Prosodic aids in speech recognition" in LEA, W.A. (Ed.) Trends in Speech Recognition. Englewood Cliffs, N.J.: Prentice-Hall. pp. 166-205.

LONGUET-HIGGINS, C. (1985) "Tones of Voice: The Role of Intonation in Computer Speech Understanding", in FALLSIDE, F.- WOODS, W.A. (Eds.) Computer Speech Processing. Englewood Cliffs, N.J. : Prentice Hall International. pp. 293-302.

MÉLONI, H.- LANGLAIS, P. (1996) "Prosodie et reconnaissance de la parole", in MÉLONI, H. (Coord.) Fondements et Perspectives en Traitement Automatique de la Parole. Paris: Éditions AUPELF-UREF (Collection Universités Francophones). pp. 205-224.

PAGEL, V. (1999) De l’utilisation d’informations acoustiques suprasegmentales en reconnaissance de la parole continue. Thèse Doctorale. Université Henri Poincaré, Nancy.
http://vincent.pagel.free.fr/THESE/

RUBIO AYUSO, A.J. - MILONE, D.H. (2002) "Información prosódica y acentual para el reconocimiento automático del habla", in DÍAZ GARCÍA, J. (Ed.) Actas del II Congreso de Fonética Experimental. Sevilla 5, 6 y 7 de marzo de 2001. Sevilla: Laboratorio de Fonética, Facultad de Filología, Universidad de Sevilla. pp. 56-77.

SHRIBERG, E.- STOLCKE, A.- HAKKANI-TÜR, D.- TÜR, G. (2000) "Prosody-based automatic segmentation of speech into sentence and topics", Speech Communication 32, 1-2: 127-154.

Vicsi, K., & Szaszák, G. (2010). Using prosody to improve automatic speech recognition. Speech Communication, 52(5), 413-426. doi:10.1016/j.specom.2010.01.003

VICSI, K. - SZASZÁK, G. (2005) "Automatic Segmentation of Continuous Speech on Word Level Based on Supra-segmental Features", International Journal of Speech Technology 8, 4: 363-370.
http://dx.doi.org/10.1007/s10772-006-8534-z

WAIBEL, A. (1986) "Suprasegmentals in very large vocabulary word recognition", in SCHWAB, E.E.- NUSBAUM, H. (Eds.) Pattern Recognition by Humans and Machines. Volume 1: Speech Perception. Orlando: Academic Press, Inc. pp. 159-186.

WAIBEL, A. (1988) Prosody and Speech Recognition. San Mateo, CA: Morgan Kaufmann.

ZEISSLER, V. - ADELHARDT, J. - BATLINER, A. - FRANK, C. - NÖTH, E. - SHI R. P. - NIEMANN, H. (2006) "The prosody module", in WAHLSTER, W. (Ed.) SmartKom: Foundations of Multimodal Dialogue Systems. New York: Springer. pp.139-152.

up arrow

Phonological knowledge

CARSON-BERNDSEN, J. (1998) Time Map Phonology. Finite State Models and Event Logics in Speech Recognition. Dordrecht - Boston - London: Kluwer Academic Publishers (Text, Speech and Language Technology, 5).

COHEN, P.S.- MERCER, R.L. (1975) "The Phonological Component of an Automatic Speech-Recognition System", in REDDY, D.R. (Ed) Speech Recognition. Invited Papers Presented at the 1974 IEEE Symposium. New York: Academic Press. pp. 275-319.

CHURCH, K.W. (1987) Phonological parsing in speech recognition. Boston: Kluwer Academic Publishers (Kluwer International Series in Engineering and Computer Science, SECS 38).

DENG, L. (1997) "Speech recognition using autosegmental representation of phonological units with interface to the trended HMM", Speech Communication 23, 3: 211-222.

GIACHIN, E.- ROSENBERG, A.E.- LEE, C.-H. (1991) "Word juncture modeling using phonological rules for HMM-based continuous speech recognition", Computer Speech and Language 5,2: 155-168.

HOEQUIST Jr., C.- NOLAN, F. (1991) "On an application of phonological knowledge in automatic speech recognition", Computer Speech and Language 5,2: 133-153.

OSHIKA, B.- ZUE, V.W.- WEEKS, R.V. - NEU, H.- AURBACH, J. (1975) "The Role of Phonological Rules in Speech Understanding Research", IEEE Transactions on Acoustics, Speech and Signal Processing ASSP-23: 104-112.

PERENNOU, G.- BRIEUSSEL-POUSSE, L. (1998) "Phonological component in automatic speech recognition", in STRIK, H.- KESSENS, J.- WESTER, M. (Eds.) Proceedings of the Workshop Modeling Pronunciation Variation for Automatic Speech Recognition. Rolduc, 4-6 May 1998. pp. 91-96.

SENEFF, S.- WANG, C. (2005) "Statistical modeling of phonological rules through linguistic hierarchies", Speech Communication 46, 2: 204-216.
http://dx.doi.org/10.1016/j.specom.2005.03.005

SHOUP, J. E. (1980) "Phonological Aspects of Speech Recognition", in LEA, W.A. (Ed.) Trends in Speech Recognition. Englewood Cliffs: Prentice Hall . pp. 125-138.

up arrow

Recognition of emotional speech

Albornoz, E. M., Milone, D. H., & Rufiner, H. L. (2010). Multiple feature extraction and hierarchical classifiers for emotions recognition. In A. Esposito, N. Campbell, C. Vogel, A. Hussain, & A. Nijholt (Eds.), Development of multimodal interfaces: Active listening and synchrony. Second COST 2102 International Training School. Dublin, Ireland, March 23-27, 2009. Revised selected papers. (pp. 242-54). Berlin - Heidelberg: Springer. doi:10.1007/978-3-642-12397-9_20. Retrieved from http://fich.unl.edu.ar/test/sinc/sinc/sinc-publications/2010/AMR10a/sinc_AMR10a.pdf

Albornoz, E. M., Milone, D. H., & Rufiner, H. L. (2010b). Spoken emotion recognition using hierarchical classifiers. Computer Speech & Language, In Press, Accepted Manuscript. doi:10.1016/j.csl.2010.10.001

Ang, J., Dhillon, R., Krupski, A., Shriberg, E., & Stolcke, A. (2002). Prosody-Based automatic detection of annoyance and frustration in human-computer dialog. In ICSLP 2002 - interspeech 2002. Proceedings of the 7th international conference on spoken language processing. (pp. 2037-40). Denver, Colorado, USA, September 16-20, 2002. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.13.4027

Baber, C., Mellor, B., Graham, R., Noyes, J. M., & Tunley, C. (1996). Workload and the use of automatic speech recognition: The effects of time and resource demands. Speech Communication, 20(1-2), 37-54. doi:10.1016/S0167-6393(96)00043-X

Barra, R., Montero, J. M., Macías, J., D’Haro, L. F., San-Segundo, R., & Córdoba, R. (2006). Prosodic and segmental rubrics in emotion identification. In ICASSP 2006. Proceedings of the IEEE international conference on acoustics, speech and signal processing. (pp. 1085-8). Toulouse, France, 14-19 May 2006. Retrieved from http://www-gth.die.upm.es/research/documentation/AG-39Pro-06.pdf

Burkhardt, F., Ajmera, J., Englert, R., Stegmann, J., & Burleson, W. (2006). Detecting anger in automated voice portal dialogs. In Interspeech 2006 - ICSLP. Proceedings of the 9th international conference on spoken language processing. Pittsburgh, PA, USA. September 17-21, 2006. Retrieved from http://felix.syntheticspeech.de/publications/recognitionOfAnger.pdf

El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44(3). doi:10.1016/j.patcog.2010.09.020

Grimm, M., Kroschel, K., Mower, E., & Narayanan, S. (2007). Primitives-Based evaluation and estimation of emotions in speech. Speech Communication, 49(10-11), 787-800. doi:10.1016/j.specom.2007.01.010. Retrieved from http://asimov.usc.edu/~mower/Papers/GrimmSpeechComm2007.pdf

Hansen, J. H. L. (1996). Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition. Speech Communication, 20(1-2), 151-173. doi:10.1016/S0167-6393(96)00050-7

Huber, R., Batliner, A., Buckow, J., Nöth, E., Warnke, V., & Niemann, H. (2000). Recognition of emotion in a realistic dialogue scenario. In ICSLP 2000. Proceedings of the 6th international conference on spoken language processing. (pp. 665-8). Beijing, China, October 16-20, 2000. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.6965

Kessous, L., Castellano, G., & Caridakis, G. (2010). Multimodal emotion recognition in speech-based interaction using facial expression, body gesture and acoustic analysis. Journal on Multimodal User Interfaces, 3(1), 33-48. doi:10.1007/s12193-009-0025-5. Retrieved from http://www.image.ntua.gr/papers/638.pdf

Kotti, M., Paternò, F., & Kotropoulos, C. (2010). Speaker-Independent negative emotion recognition. In CIP 2010. 2nd International workshop on cognitive information processing. (pp. 417-22). Elba. June-14-16, 2010. doi:10.1109/CIP.2010.5604091. Retrieved from http://giove.isti.cnr.it/attachments/publications/2010-A2-041.pdf

Laukka, P., Neiberg, D., Forsell, M., Karlsson, I., & Elenius, K. (2011). Expression of affect in spontaneous speech: Acoustic correlates and automatic detection of irritation and resignation. Computer Speech & Language, 25(1), 84-104. doi:10.1016/j.csl.2010.03.004

Litman, D. J., & Forbes-Riley, K. (2006). Recognizing student emotions and attitudes on the basis of utterances in spoken tutoring dialogues with both human and computer tutors. Speech Communication, 48(5), 559-590. doi:10.1016/j.specom.2005.09.008.

López-Cózar, R., Silovsky, J., & Griol, D. (2010). Mejora del funcionamiento de sistemas de diálogo hablado mediante reconocimiento del estado emocional de usuarios. Procesamiento del Lenguaje Natural, 45, 191-198. Retrieved from http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/802

Luengo, I., & Navas, E. (2010). Feature analysis and evaluation for automatic emotion identification in speech. In FALA 2010. VI jornadas en tecnología del habla - II Iberian SLTech workshop. (pp. 267-70). Centro Social Caixanova, Vigo, Spain. 10-12 November, 2010. Retrieved from http://lorien.die.upm.es/~lapiz/rtth/JORNADAS/VI/pdfs/Proceedings_FALA2010.pdf

Luengo, I., Navas, E., & Hernáez, I. (2010). Feature analysis and evaluation for automatic emotion identification in speech. IEEE Transactions on Multimedia, 12(6), 490-501. doi:10.1109/TMM.2010.2051872

Luengo, I., Navas, E., Hernáez, I., & Sánchez, J. (2005). Reconocimiento automático de emociones utilizando parámetros prosódicos. Procesamiento del Lenguaje Natural, 35, 13-20. Retrieved from http://www.sepln.org/revistaSEPLN/revista/35/02.pdf

Morrison, D., Wang, R., & De Silva, L. C. (2007). Ensemble methods for spoken emotion recognition in call-centres. Speech Communication, 49(2), 98-112. doi:10.1016/j.specom.2006.11.004

Neiberg, D., & Ellenius, K. (2008). Automatic recognition of anger in spontaneous speech. In Interspeech 2008. Proceedings of the 9th annual conference of the international speech communication association. (pp. 2755-8). Brisbane, Australia. September 22-26, 2008. Retrieved from http://www.speech.kth.se/prod/publications/files/3189.pdf

Nogueiras, A., Moreno, A., Bonafonte, A., & Mariño, J. B. (2001). Speech emotion recognition using hidden markov models. In Eurospeech 2001 Scandinavia. Proceedings of the 7th european conference on speech communication and technology, 2nd Interspeech event. (pp. 2679-82). Aalborg, Denmark, September 3-7, 2001. Retrieved from http://www.isca-speech.org/archive/eurospeech_2001/e01_2679.html

Origlia, A., Galatà, V., & Ludusan, B. (2010). Automatic classification of emotions via global and local prosodic features on a multilingual emotional database. In Speech prosody 2010. Fifth international conference on speech prosody. Chicago, Illinois, USA. May 11-14, 2010. Retrieved from http://speechprosody2010.illinois.edu/papers/100213.pdf

Oudeyer, P. Y. (2003). The production and recognition of emotions in speech: Features and algorithms. International Journal of Human-Computer Studies, 59, 157-183. doi:10.1016/S1071-5819(02)00141-6. Retrieved from http://pyoudeyer.com/emotionsIJHCS.pdf

Polzehl, T., Schmitt, A., & Metze, F. (2010). Approaching multi-lingual emotion recognition from speech - on language dependency of acoustic/prosodic features for anger recognition. In Speech prosody 2010. Fifth international conference on speech prosody. Chicago, Illinois, USA. May 11-14, 2010. Retrieved from http://speechprosody2010.illinois.edu/papers/100442.pdf

Ranganath, R., Jurafsky, D., & McFarland, D. A. (2013). Detecting friendly, flirtatious, awkward, and assertive speech in speed-dates. Computer Speech & Language, 27(1), 89 - 115. doi:10.1016/j.csl.2012.01.005

Sidorova, J. (2009). Optimization techniques for speech emotion recognition. PhD Thesis, Departament de Traducció i Ciències del Llenguatge, Universitat Pompeu Fabra. Retrieved from http://hdl.handle.net/10803/7575

Sidorova, J., & Badia, T. (2008). ESEDA: Tool for enhanced speech emotion detection and analysis. Procesamiento del Lenguaje Natural, 41, 307-308. Retrieved from http://www.sepln.org/revistaSEPLN/revista/41/demo9.pdf

ten Bosch, L. (2003). Emotions, speech and the ASR framework. Speech Communication, 40(1-2), 213-225. doi:10.1016/S0167-6393(02)00083-3. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.132.4047&rep=rep1&type=pdf

Ververidis, D., & Kotropoulos, C. (2006). Emotional speech recognition: Resources, features, and methods. Speech Communication, 48(9), 1162-1181. doi:10.1016/j.specom.2006.04.003. Retrieved from http://poseidon.csd.auth.gr/papers/PUBLISHED/JOURNAL/pdf/Ververidis06a.pdf

Womack, B. D., & Hansen, J. H. L. (1996). Classification of speech under stress using target driven features. Speech Communication, 20(1-2), 131-150. doi:10.1016/S0167-6393(96)00049-0

Prosody and emotions

Synthesis of emotional speech

Emotions in spoken language systems

up arrow

Speech recognition products and applications

ALIPRANDI, C. - VERRUSO, F. (2006) "Tecnologie del Linguaggio Naturale e sottotitolazione multilingue diretta. Lo stato dell’arte in Italia e l’esperienza dei Campionati Intersteno", inTRAlinea. Special issue on Respeaking.
http://www.intralinea.it/specials/respeaking/eng_more.php?id=453_0_41_0_M

Barnard, E., Schalkwyk, J., van Heerden, C., & Moreno, P. J. (2010). Voice search for development. In Interspeech 2010. Proceedings of the 11th annual conference of the international speech communication association. Makuhari, Chiba, Japan. September 26-30, 2010. Retrieved from http://www.isca-speech.org/archive/interspeech_2010/i10_0282.html

BERTON, A. - KALTENMEIER, A. - HAIBER, U. - SCHREINER, O. (2006) "Speech recognition", in WAHLSTER, W. (Ed.) SmartKom: Foundations of Multimodal Dialogue Systems. New York: Springer. pp. 85-108.

Cardenal, A., Peso, P., Bueno, M., Espiña, A., Rodríguez Silva, D. A., Adkinson, L., & Pellitero, A. (2010). TACOMA: On-Line transcription of audiovisual material. In FALA 2010. VI jornadas en tecnología del habla - II Iberian SLTech workshop. (pp. 239-42). Centro Social Caixanova, Vigo, Spain. 10-12 November, 2010. Retrieved from http://lorien.die.upm.es/~lapiz/rtth/JORNADAS/VI/pdfs/Proceedings_FALA2010.pdf

CERF-DANON, H.- DeGENNARO, S.- FERRETI, M.- GONZÁLEZ, J.- KEPPEL, E. (1991) "Tangora - a large vocabulary speech recognition system for five languages", in Eurospeech’91. 2nd European Conference on Speech Communication and Technology. Genova, Italy, 24-26 September 1991. Vol 1. p. 183-192.

CHELBA, C. - SILVA, J. - ACERO, A. (2007) "Soft indexing of speech content for search in spoken documents", Computer Speech and Language 21, 3: 458-478.
http://dx.doi.org/10.1016/j.csl.2006.09.001

CÓRDOBA, R.- MACÍAS, J.- SAMA, V.- BARRA, R.- PARDO, J.M. (2005) "New advances in cross-task and speaker adaptation for air traffic control tasks", Procesamiento del Lenguaje Natural (Actas del XXI Congreso de la Sociedad Española para el Procesamiento del Lenguaje Natural. Universidad de Granada, 14-16 de septiembre de 2005), Revista nº 35: 21-28.
http://www-gth.die.upm.es/research/documentation/AI-90New-05.pdf

Delgado, H., Serrano, J., & Carrabina, J. (2010). Automatic metadata extraction from spoken content using speech and speaker recognition techniques. In FALA 2010. VI jornadas en tecnología del habla - II Iberian SLTech workshop. (pp. 201-4). Centro Social Caixanova, Vigo, Spain. 10-12 November, 2010. Retrieved from http://lorien.die.upm.es/~lapiz/rtth/JORNADAS/VI/pdfs/Proceedings_FALA2010.pdf

DEMEDTS, A. (1993) "Un sistema de reconocimiento del español con un léxico de 30.000 unidades", Boletín de la Sociedad Española para el Procesamiento del Lenguaje Natural 13: 435-437.

DIÉGUEZ, F.J.- GARCÍA, C.- CARDENAL, A. (2005) "Comparación de modelos de lenguaje para la transcripción automática de noticiarios televisivos", Procesamiento del Lenguaje Natural (Actas del XXI Congreso de la Sociedad Española para el Procesamiento del Lenguaje Natural. Universidad de Granada, 14-16 de septiembre de 2005), Revista nº 35: 269-276.

DUGAST, Ch.- AUBERT, X.- KNESER, R. (1995) "The Philips Large-Vocabulary Recognition System for American English, French and German", in Eurospeech’95. Proceedings of the 4th european conference on speech communication and technology. Madrid, Spain, 18-21 September, 1995. Vol 1, pp. 197-200.

FLETCHER, R. (1997) "First Impressions of ViaVoice, Continuous Dictation Software from IBM", Translation Journal 2, 1.
http://translationjournal.net/journal//02dict1.htm

García Mateo, C., Diéguez, J., Docío, L., & Cardenal, A. (2004). Transcrigal: A bilingual system for automatic indexing of broadcast news. In LREC 2004. Proceedings of the 4th international conference on language resources and evaluation. Lisbon, Portugal. May 24-30, 2004. Retrieved from http://www.lrec-conf.org/proceedings/lrec2004/summaries/382.htm

GONZÁLEZ, J.- MACÍAS, J.- PALMA, M.A.- PALOU, F.- TROS DE ILARDUYA, M. (1992) "Tangora/E, un reconocedor del habla para el castellano", Boletín de la Sociedad Española para el Procesamiento del Lenguaje Natural 12.

GRIMES, B. (1997) "Voice Recognition Software: Naturally Speaking from Dragon Systems", Translation Journal 2, 1.
http://translationjournal.net/journal//02dict2.htm

HAEB-UMBACH, R.- GAMM, S. (1995) "Human Factors of a Voice-Controlled Car Stereo", in Eurospeech’95. Proceedings of the 4th European Conference on Speech Communication and Technology. Madrid, Spain, 18-21 September, 1995. Vol 2, pp. 1453-1456.

HAUPTMANN, A. (2006) "Automatic spoken document retrieval", in BROWN, K. (Ed.) Encyclopedia of Language & Linguistics. Amsterdam: Elsevier. pp. 95-103.
http://dx.doi.org/10.1016/B0-08-044854-2/00922-6

HAIN, T.- WOODLAND, P.C.- EVERMANN, G.- GALES, M.J.F.- LIU, X.- MOORE, G.L.- POVEY, D. (2005) "Automatic transcription of conversational telephone speech", IEEE Transactions on Speech and Audio Processing 13, 6: 1173-1185.
http://dx.doi.org/10.1109/TSA.2005.852999

HUANG, X.- ALLEVA, F.- HON, H.-W.- HWANG, M.-Y.- LEE, K.-F.- ROSENFELD, R. (1993) "The SPHINX-II speech recognition system: an overview", Computer Speech and Language 7,2: 137-148.

Hughes, T., Nakajima, K., Ha, L., Vasu, A., Moreno, P., & LeBeau, M. (2010). Building transcribed speech corpora quickly and cheaply for many languages. In Interspeech 2010. Proceedings of the 11th annual conference of the international speech communication association. (pp. 1914-7). Makuhari, Chiba, Japan. September 26-30, 2010. Retrieved from http://www.isca-speech.org/archive/interspeech_2010/i10_1914.html

HUNT, M.J. (1998) "Practical Automatic Dictation Systems", The ELRA Newsletter 3,1: 4-7

LAMBERT, E. (1991) "La máquina de escribir con entrada vocal", in VIDAL BENEYTO, J. ( Dir.) Las industrias de la lengua. Trad. de M. Alvar et al. Salamanca / Madrid: Fundación Sánchez Ruipérez / Pirámide (Biblioteca del Libro, 5). pp. 455-461.

LAMBOURNE, A.- HEWITT, J.- LYON, C.- WARREN, S. (2004) "Speech-based real-time subtitling services", International Journal of Speech Technology 7, 4: 269-279.
http://dx.doi.org/10.1023/B:IJST.0000037071.39044.cc

LEE, K.F. (1989) Automatic Speech Recognition. The Developmen of the SPHINX System. Dordrecht: Kluwer.

MANDEL, M.A. (1992) "A commercial large-vocabulary discrete speech recognition system: Dragon Dictate", Language and Speech 35, 1-2: 237-246.

MEISEL, W.S. (1986) "Towards the ’Talkwriter’", in BRISTOW, G. (Ed.) (1986) Electronic Speech Recognition. Techniques, Technology and Applications. London: Collins. pp. 338-348.

Moreno, A. (2010). Information search engine for multilingual audiovisual content: BUCEADOR. In FALA 2010. VI jornadas en tecnología del habla - II Iberian SLTech workshop. (pp. 259-62). Centro Social Caixanova, Vigo, Spain. 10-12 November, 2010. Retrieved from http://lorien.die.upm.es/~lapiz/rtth/JORNADAS/VI/pdfs/Proceedings_FALA2010.pdf

NÉEL, F.- CHOLLET, G.- LAMEL, L.- MINKER, W.- CONSTANTINESCU, A. (1996) "Reconnaissance et comprehénsion de la parole: évaluation et applications", in MÉLONI, H. (Coord.) Fondements et Perspectives en Traitement Automatique de la Parole. Paris: Éditions AUPELF-UREF (Collection Universités Francophones).

PEA, E. - CANNAROZZO, L. (2006) "Considerazione sull’uso del Via Voice alla RTSI", inTRAlinea. Special issue on Respeaking.
http://www.intralinea.it/specials/respeaking/eng_more.php?id=486_0_41_0_M

POZA LARA, M.J.- VILLARRUBIA GRANDE, L.- SILES SÁNCHEZ, J.A. (1991) "Teoría y aplicaciones del reconocimiento automático del habla", Comunicaciones de Telefónica I+D 3.

Schalkwyk, J., Beeferman, D., Beaufays, F., Byrne, B., Chelba, C., Cohen, M., . . . Strope, B. (2010). “Your word is my command”: Google search by voice: A case study. In A. Neustein (Ed.), Advances in speech recognition. Mobile environments, call centers and clinics. (pp. 61-90). New York: Springer. doi:10.1007/978-1-4419-5951-5_4. Retrieved from http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/36340.pdf

Schmitt, A., Zaykovskiy, D., & Minker, W. (2009). Speech recognition for mobile devices. International Journal of Speech Technology, 11(2), 63-72.

Schuster, M. (2010). Speech recognition for mobile devices at Google. In B. T. Zhang & M. Orgun (Eds.), PRICAI 2010: Trends in artificial intelligence. 11th Pacific Rim international conference on artificial intelligence, Daegu, Korea, August 30–september 2, 2010. Proceedings. (pp. 8-10). Berlin - Heidelberg: Springer. doi:10.1007/978-3-642-15246-7_3. Retrieved from

STEINBISS, V.- NEY, H.- ESSEN, U.- TRAN, B.-H., - AUBERT, X.- DUGAST, C.- KNESER, R.- MEIER, H.-G. - OERDER, R.- HAEB-UMBACH, R.- GELLER, D.- HÖLLERBAUER, W.- BARTOSIK, H. (1995) "Continuous speech dictation - From theory to practice", Speech Communication 17, 1-2: 19-38.

TAPIAS MERINO, D. (1999) "Sistemas de reconocimiento de voz en las telecomunicaciones", in GÓMEZ GUINOVART, J.- LORENZO SUÁREZ, A.- PÉREZ GUERRA, J.- ÁLVAREZ LUGRÍS, A. (Eds.) Panorama de la investigación en lingüística informática. RESLA, Revista Española de Lingüística Aplicada, Volumen monográfico. pp. 83-102.

Varona, A., Rodríguez Fuentes, L. J., Penagarikano, M., Nieto, S., Diez, M., & Bordel, G. (2010). Search and access to information contained in the speech of multimedia resources. Procesamiento del Lenguaje Natural, 45, 317-318. Retrieved from http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/787

VILLARRUBIA GRANDE, L.- CORTÁZAR MÚGICA, I.- HERNÁNDEZ GÓMEZ, L.- LÓPEZ GONZALO, E. (2001) "Reconocimiento de voz en el entorno de las nuevas redes de comunicación UMTS e Internet", Comunicaciones de Telefónica I+D 23: 99-112.

VIVER, X. (2005) "Philips: Intelligent Speech Interpretation - la tecnología inteligente de reconocimiento de voz", Procesamiento del Lenguaje Natural (Actas del XXI Congreso de la Sociedad Española para el Procesamiento del Lenguaje Natural. Universidad de Granada, 14-16 de septiembre de 2005), Revista nº 35: 459-460.

Xiong, W., Wu, L., Alleva, F., Droppo, J., Huang, X., & Stolcke, A. (2017). The Microsoft 2017 conversational speech recognition system (Microsoft Technical Report No. MSR-TR-2017-39). Redmond, WA. Retrieved from https: //arxiv.org/abs/1708.06073

Zelenák, M., Schulz, H., & Hernando, J. (2010). Albayzín 2010 evaluation campaign: Speaker diarization. In FALA 2010. VI jornadas en tecnología del habla - II Iberian SLTech workshop. (pp. 301-4). Centro Social Caixanova, Vigo, Spain. 10-12 November, 2010. Retrieved from http://lorien.die.upm.es/~lapiz/rtth/JORNADAS/VI/pdfs/Proceedings_FALA2010.pdf

up arrow

Speech recognition evaluation and assessment

Burger, S., Sloane, Z. A., & Yang, J. (2006). Competitive evaluation of commercially available speech recognizers in multiple languages. In LREC 2006. Proceedings of the 5th International Conference on Language Resources and Evaluation. Genoa, Italy. May 24-26, 2006. Retrieved from http://www.cs.brandeis.edu//~marc/misc/proceedings/lrec-2006/pdf/802_pdf.pdf

Castillo Condado, O. (1999). Evaluación de un reconocedor fonético para el español hablado en México. Tesis de Licenciatura, Universidad de Las Américas, Puebla, México. Retrieved from http://catarina.udlap.mx/u_dl_a/tales/documentos/lis/castillo_c_o/

de Yzaguirre, L. (2000). Evaluación comparativa de dos sistemas comerciales de reconocimiento de voz. In I jornadas en tecnología del habla. Sevilla: Universidad de Sevilla - Universidad de Granada - Red Temática en Tecnologías del Habla. Retrieved from http://lorien.die.upm.es/~lapiz/rtth/JORNADAS/I/ACTAS.zip

Devine, E. G., Gaehde, S. A., & Curtis, A. C. (2000). Comparative evaluation of three continuous speech recognition software packages in the generation of medical reports. Journal of the American Medical Informatics Association, 7(5), 462-468. Retrieved from http://www.ncbi.nlm.nih.gov/pmc/articles/PMC79041/pdf/0070462.pdf

Doddington, G. R., Liggett, W., Martin, A. F., Przybocki, M., & Reynolds, D. A. (1998). SHEEP, GOATS, LAMBS and WOLVES. A statistical analysis of speaker performance in the NIST 1998 Speaker Recognition Evaluation. In ICSLP 1998. Proceedings of the 5th International Conference on Spoken Language Processing (pp. 1351-1354). Sidney Convention Centre, Sidney, Australia, 30 November - 4 December, 1998. Retrieved from http://www.isca-speech.org/archive/icslp_1998/i98_0608.html

Furui, S. (2007). Speech and speaker recognition evaluation. In L. Dybkjaer, H. Hemsen, & W. Minker (Eds.), Evaluation of text and speech systems. (pp. 1-28). Dordrecht: Springer. doi:10.1007/978-1-4020-5817-2_1

Gibbon, D., Moore, R., & Winski, R. (Eds). (1998). Assessment of recognition systems. In Spoken language system assessment. (pp. 67-93). Berlin - New York: Mouton de Gruyer.

Goldwater, S., Jurafsky, D., & Manning, C. D. (2010). Which words are hard to recognize? Prosodic, lexical, and disfluency factors that increase speech recognition error rates. Speech Communication, 52(3), 181-200. doi:10.1016/j.specom.2009.10.001

Hutchinson, B. (2001). A functional approach to speech recognition evaluation. In Eurospeech 2001 Scandinavia. Proceedings of the 7th european conference on speech communication and technology, 2nd Interspeech Event. (pp. 1683-6). Aalborg, Denmark, September 3-7, 2001. Retrieved from http://perso.telecom-paristech.fr/~chollet/Biblio/Congres/Audio/Eurospeech01/CDROM/papers/page1683.pdf

Lamel, L., Minker, W., & Paroubek, P. (2000). Toward best practice in the development and evaluation of speech recognition components of a spoken language dialogue system. Natural Language Engineering, 6(3-4), 305-322. Retrieved from https://perso.limsi.fr/Individu/pap/nle99.ps

López Gambino, M. S. (2012). An evaluation of automatic speech recognition in the Spanish version of Windows 7: Effects of language variety, speaking style and gender. BULAG, Bulletin de Linguistique Appliquée et Générale, 37, 97-116.

Mangold, H. (1989). Assessment of speech recognizers in public information and ordering systems. In Proceedings of the ESCA tutorial and research workshop on Speech Input / Output Assessment and Speech Databases. (pp. 37-58). Noordwijkerhout, The Netherlands. September 20-23, 1989. Retrieved from http://www.isca-speech.org/archive_open/sioa_89/sia_1037.html

Moore, R. K. (1989). Assessment of speech input systems. In Proceedings of the ESCA tutorial and research workshop on Speech Input / Output Assessment and Speech Databases. (pp. 27-32). Noordwijkerhout, The Netherlands. September 20-23, 1989. Retrieved from http://www.isca-speech.org/archive_open/sioa_89/sia_1027.html

Néel, F., Chollet, G., Lamel, L., Minker, W., & Constantinescu, A. (1996). Reconnaissance et comprehénsion de la parole: Évaluation et applications. In H. Méloni (Ed.), Fondements et perspectives en traitement automatique de la parole. (pp. 331-67). Paris: Éditions AUPELF-UREF.

Pallett, D. S. (1985). Performance assessment of automatic speech recognizers. Journal of Research of the National Bureau of Standards, 90(5), 371-387. Retrieved from http://nvlpubs.nist.gov/nistpubs/jres/090/5/V90-5.pdf

Pallett, D. S. (1986). Assessing the performance of speech recognisers. In G. Bristow (Ed.), Electronic speech recognition. Techniques, technology and applications. (pp. 277-306). London: Collins.

Pallett, D. S. (1989). Speech input assessment using benchmark tests: Procedures, advantages and limitations. In Proceedings of the ESCA tutorial and research workshop on Speech Input / Output Assessment and Speech Databases. (pp. 33-6). Noordwijkerhout, The Netherlands. September 20-23, 1989. Retrieved from http://www.isca-speech.org/archive_open/sioa_89/sia_1033.html

Pallett, D. S., & Fourcin, A. (1996). Speech input: Assessment and evaluation. In R. A. Cole, J. Mariani, H. Uszkoreit, A. Zaenen, & V. Zue (Eds.), Survey of the state of the art in human language technology. (pp. 495-9). Cambridge: Cambridge University Press. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.366.9300

Paulus, E. (2000). Some guidelines for the evaluation of approaches to automatic speech recognition. In W. F. Sendlmeier (Ed.), Speech and signals. Aspects of speech synthesis and automatic speech recognition. Dedicated to Wolfgang Hess on his 60th birthday. (pp. 129-39). Frankfurt am Main: Hector.

Serrahima, L. (2009). Reconocimiento de voz de Windows Vista: ¿Mejor, igual o peor que Dragon Naturally Speaking? Panace@, 10(29), 76-79. Retrieved from http://www.medtrad.org/panacea/IndiceGeneral/n29_tribuna-Serrahima2.pdf

Steeneken, H. J. M., & Varga, A. (1993). Assessment for automatic speech recognition: I. Comparison of assessment methods. Speech Communication, 12(3), 241-246. doi:10.1016/0167-6393(93)90094-2

Tatman, R. & Kasten, C. (2017). Effects of talker dialect, gender & race on accuracy of Bing Speech and YouTube automatic captions. In Interspeech 2017. Proceedings of the 18th Annual Conference of the International Speech Communication Association (pp. 934-938). Stockholm, Sweden. 20-24 August, 2017. https://doi.org/10.21437/Interspeech.2017-1746

Yao, X., Bhutada, P., Georgila, K., Sagae, K., Artstein, R., & Traum, D. (2010). Practical evaluation of speech recognisers for virtual human dialogue systems. In LREC 2010. Proceedings of the 7th International Conference on Language Resources and Evaluation. Valletta, Malta. 17-23 May, 2010. Retrieved from http://www.lrec-conf.org/proceedings/lrec2010/pdf/675_Paper.pdf

Young, S. J., & Chase, L. L. (1998). Speech recognition evaluation: A review of the U.S. CSR and LVCSR programmes,. Computer Speech & Language, 12(4), 263-279. doi:10.1006/csla.1998.0101

Young, S. J., Adda-Dekker, M., Aubert, X., Dugast, C., Gauvain, J. L., Kershaw, D. J., . . . Woodland, P. C. (1997). Multilingual large vocabulary speech recognition: The European SQALE project. Computer Speech & Language, 11(1), 73-89. doi:10.1006/csla.1996.0023

up arrow

Speaker recognition

Adami, A. G. (2007). Modeling prosodic differences for speaker recognition. Speech Communication, 49(4), 277-291. doi:10.1016/j.specom.2007.02.005

André-Obrecht, R. (Ed.). (2000). Speaker recognition and its commercial and forensic applications. Speech Communication, 31(2-3).

Beigi, H. (2011). Fundamentals of speaker recognition. Boston, MA: Springer US. doi:10.1007/978-0-387-77592-0

Beigi, H. (2012). Speaker recognition: Advancements and challenges. In J. Yang & S. J. Xie (Eds.), New trends and developments in biometrics (pp. 3-29). Rijeka - New York - Shanghai: InTech. doi:10.5772/52023

Bimbot, F., Hutter, H. P., Jaboulet, C., Koolwaaij, J., Lindberg, J., & Pierrot, J. B. (1998). An overview of the CAVE project research activities in speaker verification. In Proceedings of RLA2C, Speaker Recognition and its Commercial and Forensic Applications (pp. 215-220). Avignon, France. April, 1998. Retrieved from https://www.ubilab.org/publications/print_versions/pdf/bim98.pdf

Bimbot, F., Blomberg, M., Boves, L., Chollet, G., Jaboulet, C., Jacob, B., ? Mokbel, H. (1999). An overview of the Picasso project research activities in speaker verification for telephone application. In Eurospeech 1999. Proceedings of the 6th European Conference on Speech Communication and Technology. Budapest, Hungary. 5-9 September, 1999. Retrieved from http://www.speech.kth.se/prod/publications/files/1185.pdf

Bimbot, F., Blomberg, M., Boves, L., Genoud, D., Hutter, H.-P., Jaboulet, C., ? Pierrot, J.-B. (2000). An overview of the CAVE project research activities in speaker verification. Speech Communication, 31(2-3), 155-180. doi:10.1016/S0167-6393(99)00076-X

Bimbot, F., Hutter, H. P., Jaboulet, C., Koolwaaij, J., Lindberg, J., & Pierrot, J. B. (1997). Speaker verification in the telephone network: Research activities in the CAVE project. In Eurospeech 1997. Proceedings of the 5th European Conference on Speech Communication and Technology (pp. 971-974). Rhodes, Greece. 22-25 September, 1997. Retrieved from https://infoscience.epfl.ch/record/82425/files/jaboulet97.pdf

Bonastre, J.-F., Bimbot, F., Boë, L.-J., Campbell, J. P., Reynolds, D. A., & Magrin-Chagnolleau, I. (2003). Person authentication by voice: A need for caution. In Eurospeech 2003 - Interspeech 2003. Proceedings of the 8th European Conference on Speech Communication and Technology (pp. 33-36). Geneva, Switzerland. 1-4 September, 2003. Retrieved from http://www.isca-speech.org/archive/eurospeech_2003/e03_0033.html

Bourlard, H., & Morgan, N. (1998). Speaker verification. A quick overview. Martigny: Dalle Molle Institute for Perceptual Artificial Intelligence. Retrieved from http://publications.idiap.ch/downloads/reports/1998/98-12.pdf

Bricker, P. D., & Pruzansky, S. (1976). Speaker recognition. In N. J. Lass (Ed.), Contemporary issues in experimental phonetics (pp. 295-326). New York: Academic Press.

Campbell, J. P., Mason, J., & Ortega, J. (Eds.). (2006). Odyssey 2004: The speaker and Language Recognition Workshop Odyssey-04, Odyssey 2004: The speaker and Language Recognition Workshop. Toledo, Spain, 31 May - 3 June 2004. Computer Speech & Language, 20(2-3).

Cerdà, R., Farrús, M., & Hernando, J. (2005). Hacia una sinergía metodológica en la identificación de locutores. In Filología y lingüística. Estudios ofrecidos a Antonio Quilis (pp. 1515-1528). Madrid: Consejo Superior de Investigaciones Científicas - Universidad Nacional de Educación a Distancia - Universidad de Valladolid.

Chollet, G. (1994). Automatic speech and speaker recognition: Overview, current issues and perspectives. In E. Keller (Ed.), Fundamentals of speech synthesis and speech recognition. Basic concepts, state of the art and future challenges (pp. 129-148). Chichester: John Wiley & Sons.

Cosi, P. (1982). Speaker recognition: A survey. In J. P. Haton (Ed.), Automatic speech analysis and recognition (pp. 277-308). Dordrecht: Reidel.

Dankovičová, J., & Nolan, F. (1999). Some acoustic effects of speaking style on utterances for automatic speaker verification. Journal of the International Phonetic Association, 29(2), 115-128. doi:10.1017/S0025100300006496

Doddington, G. R. (1985). Speaker recognition - Identifying people by their voices. Proceedings of the IEEE, 73(11), 1651-1664. doi:10.1109/PROC.1985.13345

Escudero, D., Cardeñoso, V., Sánchez, J. M., Navas, E., & Hernáez, I. (2003). Uso de entonación en reconocimiento automático de locutor: Resultados preliminares. In J. Hernando (Ed.), SEAF 2003. Actas del II Congreso de la Sociedad Española de Acústica Forense (pp. 167-174). Barcelona: Universitat Politècnica de Catalunya - Sociedad Española de Acústica Forense. Retrieved from http://www.infor.uva.es/~descuder/investig/pdfs/SEAF2003.pdf

Farrús, M., Hernando, J., & Ejarque, P. (2007). Jitter and shimmer measurements for speaker recognition. In Interspeech 2007. Proceedings of the 8th Annual Conference of the International Speech Communication Association (pp. 778-781). Antwerp, Belgium. 27-31 August, 2007. Retrieved from http://www.cs.upc.edu/~nlp/papers/far_jit_07.pdf

Faúndez, M., & Satué, A. (2006). Speaker recognition experiments on a bilingual database. In L. Buera, E. Lleida, A. Miguel, & A. Ortega (Eds.), IV Jornadas en Tecnología del Habla (pp. 261-264). Zaragoza: Universidad de Zaragoza - Red Temática en Tecnologías del Habla. Retrieved from http://lorien.die.upm.es/~lapiz/rtth/JORNADAS/IV/4jth_cdrom.html

Fernández Pozo, R., Fombella, C., Torre, D., López Gonzalo, E., & Hernández Gómez, L. A. (2006). Estudio del uso de información prosódica en reconocimiento de locutor en ámbito forense. In L. Buera, E. Lleida, A. Miguel, & A. Ortega (Eds.), IV Jornadas en Tecnología del Habla (pp. 343-348). Zaragoza: Universidad de Zaragoza - Red Temática en Tecnologías del Habla. Retrieved from http://lorien.die.upm.es/~lapiz/rtth/JORNADAS/IV/4jth_cdrom.html

Furui, S. (1986). Research of individuality features in speech waves and automatic speaker recognition techniques. Speech Communication, 5(2), 183-197. doi:10.1016/0167-6393(86)90007-5

Furui, S. (1996). An overview of speaker recognition technology. In C.-H. Lee, S. F. K, & K. K. Paliwal (Eds.), Automatic speech and speaker recognition (pp. 31-56). Dordrecht: Kluwer.

Furui, S. (1997). Recent advances in speaker recognition. Pattern Recognition Letters, 18(9), 859-872. doi:10.1016/S0167-8655(97)00073-1

Furui, S. (1997). Speaker recognition. In R. A. Cole, J. Mariani, H. Uszkoreit, A. Zaenen, & V. Zue (Eds.), Survey of the state of the art in Human Language Technology (pp. 42-48). Cambridge: Cambridge University Press. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.366.9300

Furui, S. (2006). Speaker recognition and verification, Automatic. In K. Brown (Ed.), Encyclopedia of language & l inguistics (pp. 619-628). Amsterdam: Elsevier. doi:10.1016/B0-08-044854-2/00919-6

Furui, S. (2007). Speech and speaker recognition evaluation. In L. Dybkjaer, H. Hemsen, & W. Minker (Eds.), Evaluation of text and speech systems (pp. 1-27). Dordrecht: Springer Netherlands. doi:10.1007/978-1-4020-5817-2_1

Furui, S., & Rosenberg, A. E. (2010). Speaker verification. In V. K. Madisetti (Ed.), The digital signal processing handbook. Video, speech and audio signal processing and associated standards (2nd ed., pp. 10-23). Roca Baton, FL: CRC Press.

Garvin, P. L., & Ladefoged, P. (1963). Speaker identification and message identification in speech recognition. Phonetica, 9(4), 193-199. doi:10.1159/000258404

Greenberg, C. S., Martin, A. F., Doddington, G. R., & Godfrey, J. J. (2011). Including human expertise in speaker recognition systems: report on a pilot evaluation. In ICASSP 2011. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 5896-5899). Prague, Czech Republic. 22-27 May, 2011. doi:10.1109/ICASSP.2011.5947703

Hernández, L. A., Casajús, F. J., & García Gómez, R. (1984). Identificación de personas por sus voces. Mundo Electrónico, 146, 83-91.

Hernando, J., García Mateo, C., Rodríguez Liñares, L., González Rodríguez, J., & Ortega, J. (2000). Reconocimiento del locutor en telefonía: actividades del proyecto europeo COST 250. In SEAF 2000. Actas del I Congreso de la Sociedad Española de Acústica Forense (pp. 145-148). Madrid: Escuela Universitaria de Ingenieros Técnicos de Telecomunicación - Sociedad Española de Acústica Forense. Retrieved from http://nlp.lsi.upc.edu/papers/her00d.pdf

Karlsson, I., Banziger, T., Dankovicová, J., Johnstone, T., Lindberg, J., Melin, H., . . . Scherer, K. (2000). Speaker verification with elicited speaking styles in the VeriVox project. Speech Communication, 31(2-3), 121-129. doi:10.1016/S0167-6393(99)00073-4

Laver, J., Jack, M., & Gardiner, A. (Eds.). (1990). Proceedings of the tutorial and research workshop on Speaker Characterization in Speech Technology. Edinburgh: Centre for Speech Technology Research, University of Edinburgh - ESCA, European Speech Communication Association.

Leung, K. Y., Mak, M. W., Siu, M. H., & Kung, S. Y. (2006). Adaptive articulatory feature-based conditional pronunciation modeling for speaker verification. Speech Communication, 48(1), 71-84. doi:10.1016/j.specom.2005.05.013

Lindberg, J., Blomberg, M., & Melin, H. (1997). CAVE - Speaker verification in bank and telecom services. Phonum (Proceedings of Fonetik -97, Department of Phonetics, Umeå University), 4, 65-68. Retrieved from http://www.speech.kth.se/prod/publications/files/543.ps

Minematsu, N., Sekiguchi, M., & Hirose, K. (2002). Automatic estimation of one’s age with his/her speech based upon acoustic modeling techniques of speakers. In ICASSP 2002. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (Vol. 1, pp. I-137-I-140). Orlando, FL, USA. 13-17 May, 2002. doi:10.1109/ICASSP.2002.5743673

Minematsu, N., Sekiguchi, M., & Hirose, K. (2002). Performance improvement in estimating subjective ageness with prosodic features. In Speech Prosody 2002. Proceedings of the 1st International Conference on Speech Prosody. Aix-en-Provence, France. 11-13 April, 2002. Retrieved from http://www.isca-speech.org/archive/sp2002/sp02_507.html

Montero, A., González Domínguez, J., Ramos, D., López Moreno, I., Torre, D., & González Rodríguez, J. (2006). On the use of high-level information in speaker and language recognition. In L. Buera, E. Lleida, A. Miguel, & A. Ortega (Eds.), IV Jornadas en Tecnología del Habla (pp. 355-360). Zaragoza: Universidad de Zaragoza - Red Temática en Tecnologías del Habla. Retrieved from http://lorien.die.upm.es/~lapiz/rtth/JORNADAS/IV/4jth_cdrom.html

Ortega, J., Cruz, S., & González Rodríguez, J. (1998). Quantitative influence of speech variability factors for automatic speaker verification in forensic tasks. In ICSLP 1998. Proceedings of the 5th International Conference on Spoken Language Processing. Sydney, Australia. 30 November - 4 December, 1998.

Ortega, J., González Rodríguez, J., Marrero, V., Díaz, J. J., García, R., Lucena, J., & Sánchez Molero, J. A. G. (1998). AHUMADA: a large speech corpus in Spanish for speaker identification and verification. In ICASSP 1998. Proceedings of the EEE International Conference on Acoustics, Speech and Signal Processing (Vol. 2, pp. 773-776). Seattle, WA, USA. 12-15 May, 1998. doi:10.1109/ICASSP.1998.675379

Ortega, J., González Rodríguez, J., Marrero, V., Díaz, J., García, R., Lucena, J., & Sánchez, J. A. G. (1998). Speaker recognition-oriented ?Ahumada? large speech corpus. In LREC 1998. Proceedings of the 1st International Conference on Language Resources and Evaluation (Vol. 2, pp. 1101-1106). Granada, Spain. 28-30 May, 1998.

Ortega, J., González Rodríguez, J., & Marrero, V. (2000). AHUMADA: A large speech corpus in Spanish for speaker characterization and identification. Speech Communication, 31(2-3), 255-264. doi:10.1016/S0167-6393(99)00081-3

Ortega, J., González Rodríguez, J., & Tapias, D. (2000). Consistencia fonética del español en sistemas de verificación de locutor sobre locuciones de corta duración tipo PIN. In SEAF 2000. Actas del I Congreso de la Sociedad Española de Acústica Forense (pp. 199-206). Madrid: Escuela Universitaria de Ingenieros Técnicos de Telecomunicación - Sociedad Española de Acústica Forense.

Reynolds, D. A. (2002). An overview of automatic speaker recognition technology. In ICASSP 2002. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 4072-4075). Orlando, FL, USA. 13-17 May, 2002. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.86.3003&rep=rep1&type=pdf

Rodríguez, L., Docío, L., & García Mateo, C. (1998). Panorámica de la tecnología en reconocimiento automático de locutores. Novática. Revista de la Asociación de Técnicos de Informática, 133, 36-40.

Rose, P. (2006). Technical forensic speaker recognition: Evaluation, types and testing of evidence. Computer Speech & Language, 20(2-3), 159-191. doi:10.1016/j.csl.2005.07.003

Rosenberg, A. E. (1976). Automatic speaker verification: A review. Proceedings of the IEEE, 64(4), 475-487. doi:10.1109/PROC.1976.10156

Sharma, V., & Bansal, P. K. (2013). A review on speaker recognition approaches and challenges. International Journal of Engineering Research and Technology, 2(5), 1581-1588. Retrieved from http://www.ijert.org/view-pdf/3594/a-review-on-speaker-recognition-approaches-and-challenges

Shriberg, E. (2007). Higher level features in speaker recognition. In C. Muller (Ed.), Speaker classification I. Fundamentals, features, and methods (pp. 241-259). Berlin - Heidelberg - New York: Springer. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.68.504

Shriberg, E., Ferrer, L., Kajarekar, S., Venkataraman, A., & Stolcke, A. (2005). Modeling prosodic feature sequences for speaker recognition. Speech Communication, 46(3-4), 455-472. doi:10.1016/j.specom.2005.02.018

Shuterland, A., & Jack, M. (1988). Speaker verification. In M. Jack & J. Laver (Eds.), Aspects of speech technology (pp. 184-215). Edinburgh: Edinburgh University Press.

Weber, F., Manganaro, L., Peskin, B., & Shriberg, E. (2002). Using prosodic and lexical information for speaker identification. In ICASSP 2002. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (pp. I-141-I-144). Orlando, FL, USA. 13-17 May, 2002 . doi:10.1109/ICASSP.2002.5743674

Wolf, J. J. (1972). Efficient acoustic parameters for speaker recognition. The Journal of the Acoustical Society of America, 51(6B), 2044-2056. doi:10.1121/1.1913065

up arrow

Language identification

Adda-Decker, M., Antoine, F., Boula de Mareüil, P., Vasilescu, I., Lamel, L., Vaissière, J., . . . Liénard, J.-S. (1993). Phonetic knowledge, phonotactics and perceptual validation for automatic language identification. In ICPhS 2003. Proceedings of the 15th International Congress of Phonetic Sciences (pp. 747-750). Barcelona, Spain. 3-9 August 2003. Retrieved from https://perso.limsi.fr/Individu/mareuil/publi/PS021162.pdf

Antoine, F., Zhu, D., Boula de Mareüil, P., & Adda-Decker, M. (2004). Approches segmentales multilingues por l’identification automatique de la langue: phones et syllabes. In JEP 2004. 25èmes Journées d’Etudes sur la Parole. Fès, Maroc. 19-22 avril, 2004. Retrieved from https://perso.limsi.fr/Individu/mareuil/publi/Antoine-Zhu-etal.pdf

Barkat-Defradas, M., Vasilescu, I., & Pellegrino, F. (2003). Stratégies perceptuelles et identification automatique des langues: application au continuum dialectal arabe. Revue PArole, 25-26, 1-44. Retrieved from http://www.ddl.ish-lyon.cnrs.fr/fulltext/pellegrino/barkat-defradas_2003_parole.pdf

Muthusamy, Y. K., & Spitz, L. (1997). Automatic language identification. In R. A. Cole, J. Mariani, H. Uszkoreit, A. Zaenen, & V. Zue (Eds.), Survey of the state of the art in Human Language Technology (pp. 314-317). Cambridge: Cambridge University Press. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.366.9300

Muthusamy, Y. K., Barnard, E., & Cole, R. A. (1994). Reviewing automatic language identification. IEEE Signal Processing Magazine, 11(4), 33-41. doi:10.1109/79.317925

Navrátil, J. (2006). Automatic language identification. In T. Schultz & K. Kirchhoff (Eds.), Multilingual speech processing (pp. 233-272). Burlington, MA: Elsevier Academic Press.

Rodríguez Fuentes, L. J., Penagarikano, M., Varona, A., Díez, M., & Bordel, G. (2010). Overview of the Albayzín 2010 language recognition evaluation: Database design, evaluation plan and preliminary analysis of the results. In FALA 2010. VI Jornadas en Tecnología del Habla - II Iberian SLTech Workshop (pp. 309-315). Centro Social Caixanova, Vigo, Spain. 10-12 November, 2010. Retrieved from http://lorien.die.upm.es/~lapiz/rtth/JORNADAS/VI/pdfs/0070.html

Zissman, M. A., & Berkling, K. M. (2001). Automatic language identification. Speech Communication, 35(1-2), 115-124. doi:10.1016/S0167-6393(00)00099-6

up arrow

Spoken language understanding

Gauvain, J.-L., & Lamel, L. (2002). Systèmes de reconnaissance, de compréhension et de dialogue. In J. Mariani (Ed.), Reconnaissance de la parole. Traitement automatique du langage parlé (Vol. 2, pp. 47-83). Paris: Hermès - Lavoisier.

Gupta, N., Tur, G., Hakkani-Tur, D., Bangalore, S., Riccardi, G., & Gilbert, M. (2006). The AT&T spoken language understanding system. IEEE Transactions on Audio, Speech and Language Processing, 14(1), 213-222. doi:10.1109/TSA.2005.854085

Kompe, R. (1997). Prosody in speech understanding systems. Berlin - Heidelberg: Springer.

Minker, W. (1999). Compréhension automatique de la parole spontanée. Paris: L’Harmattan.

Price, P. (1997). Spoken language understanding. In R. A. Cole, J. Mariani, H. Uszkoreit, A. Zaenen, & V. Zue (Eds.), Survey of the state of the art in Human Language Technology (pp. 49-56). Cambridge: Cambridge University Press. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.366.9300


Segarra, E. (2006). La interpretación semántica. In J. Llisterri & M. J. Machuca (Eds.), Los sistemas de diálogo (pp. 99-118). Bellaterra - Soria: Universitat Autònoma de Barcelona - Fundación Duques de Soria.

Tur, G., & de Mori, R. (Eds). (2011). Spoken language understanding: Systems for extracting semantic information from speech. Oxford - New York: John Wiley & Sons.

Wang, Y.-Y., Deng, L., & Acero, A. (2005). Spoken language understanding. IEEE Signal Processing Magazine, 22(5), 16-31. doi:10.1109/MSP.2005.1511821

Zue, V. (1991). From signals to symbols to meaning: On machine understanding of spoken language. In ICPhS 1991. Actes du 12ème Congrès International de Sciences Phonétiques (Vol. 1, pp. 74-83). Aix-en-Provence, France. 19-24 August, 1991.

up arrow

Speech recognition


Speech Recognition - Bibliography
Joaquim Llisterri, Departament de Filologia Espanyola, Universitat Autònoma de Barcelona

Last updated: