Speech Synthesis
Bibliography


Speech Synthesis


General overviews


= Recommended introductory/general reading

Bristow, G. (1984). Overview of speech output devices. In J. N. Holmes (Ed.), Proceedings of the First International Conference on Speech Technology. Brighton, UK, October 23-25, 1984. (pp. 17-24). Amsterdam: North Holland.

Bristow, G. (1984). Towards the future. In G. Bristow (Ed.), Electronic speech synthesis. Techniques, technology and applications. (pp. 336-41). London: Granada.

Carlson, R. & Granström, B. (1997). Speech synthesis. In W. J. Hardcastle & J. Laver (Eds.), The handbook of phonetic sciences. (pp. 768-88). Oxford: Blackwell.

Carlson, R. & Granström, B. (2010). Speech synthesis. In W. J. Hardcastle, J. Laver, & F. E. Gibbon (Eds.), The handbook of phonetic sciences (2nd ed.). (pp. 781-803). Oxford: Wiley-Blackwell.

Carré, R. (1971). La synthèse de la parole. Bulletin d’Audiophonologie, 3, 501-527.

Casacuberta, F., Vidal, E., Vicens, M., & Benedí, J. (1981). Sistemas informáticos para el análisis y síntesis del habla. Revista de Informática y Automática, 50, 9-27.

Cole, R. A. (1997). Spoken output technologies. In R. A. Cole, J. Mariani, H. Uszkoreit, A. Zaenen, & V. Zue (Eds.), Survey of the state of the art in Human Language Technology. Cambridge: Cambridge University Press. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.366.9300

Cooper, F. S. (1962). Speech synthesizers. In A. Sovijarvi & P. Aalto (Eds.), ICPhS 1962. Proceedings of the 4th International Congress of Phonetic Sciences. (pp. 3-13). New York: Humanities Press.

Cosi, P. (2002). Presente e futuro della sintesi vocale. Quaderni dell’Istituto di Fonetica e Dialettologia, 4.

Delattre, P. C. (1964). La synthèse acoustique de la parole. Bulletin de la Société des Professeurs de Français en Amérique, 18, 13-26.

Docherty, G. & Shockey, L. (1988). Speech synthesis. In M. Jack & J. Laver (Eds.), Aspects of speech technology. (pp. 144-83). Edinburgh: Edinburgh University Press.

Fant, G. (1968). Analysis and synthesis of the speech processes. In B. Malmberg (Ed.), Manual of phonetics. (pp. 173-277). Amsterdam: North Holland.

Fant, G., Granström, B., & Carlson, R. (1991). La síntesis del habla como componente de la tecnología del habla y de los sistemas de información. In J. Vidal Beneyto (Ed.), Las industrias de la lengua. (pp. 313-25). Madrid: Fundación Sánchez Ruipérez - Pirámide.

Flanagan, J. L. (1972). The synthesis of speech. Scientific American, 226(2), 45-58.

Flanagan, J. L. (1972). Voices of men and machines. The Journal of the Acoustical Society of America, 51(5), 1375-1386.

Flanagan, J. L. (1973). Voices of men and machines. In J. L. Flanagan & L. R. Rabiner (Eds.), Speech synthesis. (pp. 9-21). Stroudsburg: Dowden, Hutchinson & Ross. (Original work published 1972)

Flanagan, J. L. (1984). Voices of men and machines. In G. Bristow (Ed.), Electronic speech synthesis. Techniques, technology and applications. (pp. 48-69). London: Granada. (Original work published 1972)

Flanagan, J. L., Coker, C., Rabiner, L., Schafer, R. W., & Umeda, N. (1970). Synthetic voices for computers. IEEE Spectrum, 7(10), 22-45.

Gagnon, R., Fons, K., & Gargagliano, T. (1984). Phonetic synthesis. In G. Bristow (Ed.), Electronic speech synthesis. Techniques, technology and applications. (pp. 177-91). London: Granada.

Javkin, H. R. (1996). Speech analysis and syntesis. In N. J. Lass (Ed.), Principles of experimental phonetics. (pp. 245-76). St. Louis: Mosby.

Keller, E. (2002). Future challenges. In E. Keller, G. Bailly, A. Monaghan, J. Terken, & Huckvale (Eds.), Improvements in speech synthesis. COST 258: The naturalness of synthetic speech. (pp. 351-2). Chichester: John Wiley & Sons.

Keller, E. (2002). Towards greater naturalness: Future directions of research in speech synthesis. In E. Keller, G. Bailly, A. Monaghan, J. Terken, & M. Huckvale (Eds.), Improvements in speech synthesis. COST 258: The naturalness of synthetic speech. (pp. 3-17). Chichester: John Wiley & Sons.

Lancia, R. (1970). Analyse et synthèse electronique de la parole. In Nouvelles perspectives en phonétique. (pp. 37-57). Bruxelles: Presses Universitaries de Bruxelles.

Lemmetty, S. (1999). Review of speech synthesis technology. Master’s Thesis. Laboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, Retrieved from http://research.spa.aalto.fi/publications/theses/lemmetty_mst/index.html

Llisterri, J. (1985). Sobre màquines parlants. Papers de Batxillerat, 3(8), 216-220. Retrieved from http://liceu.uab.cat/~joaquim/publicacions/Llisterri_85_Maquines_Parlants.pdf

Llisterri, J. (1988). La síntesis del habla: Estado de la cuestión. Procesamiento del Lenguaje Natural, 6, 17-41. Retrieved from htpp://liceu.uab.cat/~joaquim/publicacions/Llisterri_88_Sintesis.pdf

Mariño, J. B., Nadeu, C., & Llisterri, J. (1987). Síntesis automática del habla. In Inteligencia artificial: Conceptos, técnicas y aplicaciones. (pp. 157-65). Barcelona: Marcombo. Retrieved from http://liceu.uab.cat/~joaquim/publicacions/Marino_Nadeu_Llisterri_87_Sintesis_Automatica_Habla.pdf

Martí, J. (1987). Síntesis del habla: Evolución histórica y situación actual. In F. Casacuberta & E. Vidal (Eds.), Reconocimiento automático del habla. (pp. 197-205). Barcelona: Marcombo - Boixareu Editores.

Martí, J. (1988). Síntesis del habla (evolución histórica y situación actual). In C. Martín Vide (Ed.), Lenguajes naturales y lenguajes formales III.1. Actas del III Congreso de lenguajes naturales y lenguajes formales. (pp. 213-137). Barcelona: PPU.

Martí, J. (1990). Estado actual de la síntesis de voz. Estudios de Fonética Experimental, 4, 147-168.


Montero, J. M. (2016). Generación de lenguaje hablado. In Á. L. Gonzalo (Ed.), Tecnologías del lenguaje en España. Comunicación inteligente entre personas y máquinas (pp. 19-44). Madrid - Barcelona: Fundación Telefónica - Ariel. Retrieved from https://www.fundaciontelefonica.com/arte_cultura/publicaciones-listado/pagina-item-publicaciones/itempubli/565/

Nusbaum, H. C. & Shintel, H. (2006). Speech synthesis. In K. Brown (Ed.), Encyclopedia of language & linguistics. (pp. 19-31). Amsterdam: Elsevier. doi:10.1016/B0-08-044854-2/00913-5

O’Shaughnessy, D. (1983). Automatic speech synthesis. IEEE Communications Magazine, 21(9), 26-34.

Rigault, A. (1962). La synthèse de la parole. Études de Linguistique Appliquée, 1, 55-69.

Rodríguez Banga, E. (2008). Síntese de voz: Estado da cuestión e perspectivas. In E. Fernández Rei & X. L. Regueira (Eds.), Perspectivas sobre a oralidade. (pp. 235-48). Santiago de Compostela: Consello da Cultura Galega - Instituto da Lingua Galega. Retrieved from http://consellodacultura.gal/publicacion.php?id=10

Sagisaka, Y. (1997). Overview - spoken output technologies. In R. A. Cole, J. Mariani, H. Uszkoreit, A. Zaenen, & V. Zue (Eds.), Survey of the state of the art in Human Language Technology. Cambridge: Cambridge University Press. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.366.9300

Scully, C. & Whiteside, C. (1992). Speech production modelling and speech synthesis. In Computing in linguistics and phonetics. (pp. 73-84). London: Academic Press.

Stella, M. (1985). Speech synthesis. In F. Fallside & W. A. Woods (Eds.), Computer speech processing. (pp. 421-60). Englewood Cliffs, NJ: Prentice Hall International.

Tatham, M. (1970). Speech synthesis - A critical review of the state of the art. International Journal of Man-Machine Studies, 2, 303-308. Retrieved from http://www.morton-tatham.co.uk/publications/to_1994/speech%20synthesis%20-%20state%20of%20the%20art.pdf

Trancoso, I. & Viana, M. C. (1995). Síntese e reconhecimento de fala. In M. H. Mateus & A. Branco (Eds.), Engenharia da linguagem. Lisboa: Colibri.

Viana, M. C. (2001). Síntese de fala. In E. Ranchhod (Ed.), Tratamento das línguas por computador. Uma introdução à linguística computacional e suas aplicações. (pp. 133-93). Lisboa: Caminho.

Zue, V. (1982). Computer voice response and speech synthesis. Trends and Perspectives in Signal Processing, 2(4), 7-9.

Text-to-speech synthesis

up arrow

Textbooks


= Recommended introductory/general reading

Cater, J. P. (1983). Electronically speaking: Computer speech generation. Indianapolis: Howard W. Sams & Co.

Flanagan, J. L. (1972). Speech analysis, synthesis and perception (2nd expanded ed.). Berlin: Springer.

Furui, S. (2001). Digital speech processing, synthesis and recognition (2nd ed.). New York: Marcel Dekker.

Holmes, J. N. (1972). Speech synthesis. London: Mills & Boon.

Holmes, J. N. (1988). Speech synthesis and recognition. Wokingham: Van Nostrand Reinhold.

Holmes, J. N. & Holmes, W. (2001). Speech synthesis and recognition (2nd ed.). London: Taylor & Francis.

Liénard, J. S. (1977). Les processus de la communication parlée. Introduction à l´analyse et à la synthèse de la parole. Paris: Masson.

Linggard, R. (1985). Electronic synthesis of speech. Cambridge: Cambridge University Press.

Magno, E., Abatti, A., & Dossi, L. (1971). Introduzione all’analisi ed alla sintesi strumentale della parola. Bologna: Pàtron.

Morgan, N. (1984). Talking chips. IC speech synthesis. New York: McGraw-Hill.

Poulton, A. S. (1983). Microcomputer speech synthesis and recognition. Wilmslow: Signa Technical Press.

Schroeder, M. R. (2004). Computer speech. Recognition, compression, synthesis. With introductions to hearing and signal analysis and a glossary of speech and computer terms (2nd ed.). Berlin - Heidelberg - New York: Springer. (Original work published 1999)

Sclater, N. (1983). Introduction to electronic speech synthesis. Indianapolis: Howard W. Sams & Co.


Tatham, M. & Morton, K. (2005). Developments in speech synthesis. Chichester: John Wiley & Sons.

Witten, I. H. (1982). Principles of computer speech. London: Academic Press.

Witten, I. (1986). Making computers talk. An introduction to speech synthesis. Hemel Hempstead: Prentice Hall.

Text-to-speech synthesis

up arrow

Compilations and conference proceedings


= Recommended introductory/general reading

Bailly, G. & Benoît, C. (Eds). (1992). Talking machines. Theories, models and designs. Amsterdam: North Holland - Elsevier.

Bristow, G. (Ed). (1984). Electronic speech synthesis. Techniques, technology and applications. London: Granada.

Cole, R. A. (1997). Spoken output technologies. In R. A. Cole, J. Mariani, H. Uszkoreit, A. Zaenen, & V. Zue (Eds.), Survey of the state of the art in Human Language Technology. Cambridge: Cambridge University Press. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.366.9300

Flanagan, J. L. & Rabiner, L. R. (Eds). (1973). Speech synthesis. Stroudsburg: Dowden, Hutchinson & Ross.

Keller, E. (Ed). (1994). Fundamentals of speech synthesis and speech recognition. Basic concepts, state of the art and future challenges. Chichester: John Wiley & Sons.

Keller, E., Bailly, G., Monahan, A., Terken, J., & Huckvale, M. (Eds). (2002). Improvements in speech synthesis. COST 258: The naturalness of synthetic speech. Chichester: John Wiley & Sons.

Kleijn, W. B. & Paliwal, K. K. (Eds). (1995). Speech coding and synthesis. Amsterdam: Elsevier.


SSW1-1990. Proceedings of the ESCA Workshop on Speech Synthesis. Autrans, France, September 25-28, 1990. Retrieved from http://www.isca-speech.org/archive_open/ssw1/index.html


SSW2-1994. Proceedings of the Second ESCA/IEEE Workshop on Speech Synthesis. Mohonk Mountain House, New Paltz, NY, USA, September 12-15, 1994. Retrieved from http://www.isca-speech.org/archive_open/ssw2/index.html


SSW3-1998. Proceedings of the Third ESCA/COCOSDA Workshop on Speech Synthesis. Jenolan Caves House, Blue Mountains, Australia, November 26-29, 1998. Retrieved from http://www.isca-speech.org/archive_open/ssw3/index.html


SSW4-2001. Proceedings of the Fourth ISCA tutorial and research workshop on Speech Synthesis. Perthshire, Scotland, August 29 - September 1, 2001. Retrieved fromhttp://www.isca-speech.org/archive_open/ssw4/index.html


SSW5-2004. Proceedings of the Fifth ISCA tutorial and research workshop on Speech Synthesis. Pittsburgh, PA, USA, June 14-16, 2004. Retrieved from http://www.isca-speech.org/archive_open/ssw5/index.html


SSW6-2007. Proceedings of the Sixth ISCA tutorial and research workshop on Speech Synthesis. Bonn, Germany, August 22-24, 2007. Retrieved from http://www.isca-speech.org/archive_open/ssw6/index.html


SSW7-2010. Proceedings of the Seventh ISCA tutorial and research workshop on Speech Synthesis. Kyoto, Japan, September 22-24, 2010. Retrieved from http://www.isca-speech.org/archive/ssw7/


SSW8-2013. Proceedings of the Eight ISCA tutorial and research workshop on Speech Synthesis. Barcelona, Spain. August 31 - September 2, 2013. Retrieved from http://ssw8.talp.cat


SSW9-2016. Proceedings of the Ninth ISCA tutorial and research workshop on Speech Synthesis. Sunnyvale, USA. 13-15 September, 2016. Retrieved from http://www.isca-speech.org/archive/SSW_2016/

van Santen, J. H. P., Sproat, R., Olive, J., & Hirschberg, J. (Eds). (1996). Progress in speech synthesis. New York: Springer.

Speech technologies: conference proceedings

Text-to-speech synthesis

up arrow

Speech synthesis techniques

Formant synthesis


= Recommended introductory/general reading

ALLEN, J.- HUNNICUTT, M.S.- KLATT, D.H. (with R.C. ARMSTRONG and D. PISONI) (1987) From Text to Speech: The MITalk System. Cambridge: Cambridge University Press (Cambridge Studies in Speech Science and Communication ). [cap. 12 "The Klatt formant synthesizer"]

CARLSON, R.- SIGVARDSON, T.- SJÖLANDER, A. (2002) "Data-driven formant synthesis", TMH-QPSR, Speech, Music and Hearing Quarterly Progress and Status Report 44: 121-124.
http://www.speech.kth.se/prod/publications/files/qpsr/2002/2002_44_1_121-124.pdf

FANT, G.- MÁRTONY, J. (1962) "Instrumentation for parametric synthesis (OVE II). Synthesis strategy, and quantization of synthesis parameters", STL-QPSR, Speech Transmission Laboratory - Quarterly Progress and Status Report 2/1962: 18-24; in FANT, G. (2004) Speech Acoustics and Phonetics. Dordrecht: Kluwer Academic Publishers (Text, Speech and Language Technology, 24). pp. 68-76.

HANSON, H.M.- STEVENS, K.N. (2002) "A quasiarticulatory approach to controlling acoustic source parameters in a Klatt-type formant synthesizer using HLsyn", Journal of the Acoustical Socieyt of America 112: 1158-1182.

HEID, S.- HAWKINS, S. (1998) "PROCSY: A Hybrid Approach to High-Quality Formant Synthesis using HLsyn", in SSW3 1998. Proceedings of the Third ESCA/COCOSDA Workshop on Speech Synthesis. 26 - 29 November, 1998. Jenolan Caves, Blue Mountains, Australia. pp. 219-224.
http://www.ling.cam.ac.uk/procsy/procsy.ps
http://www.ling.cam.ac.uk/procsy/procsy-ext.ps

HOLMES, J. N. (1979) "Synthesis of Natural-Sounding Speech Using a Formant Synthesizer" in LINDBLOM, B. - OHMAN, S. (Eds.) Frontiers of Speech Communication Research. London: Academic Press. pp. 275-85.

HOLMES, J. N. (1985) "A Parallel-Formant Synthesizer for Voice-Machine Output", in FALLSIDE, F. - WOODS, W.A. (Eds.) Computer Speech Processing. Englewood Cliffs, N.J. : Prentice Hall International. pp. 163-189.

HOLMES, J.N. (1983) "Formant Synthesizers: Cascade or Parallel?", Speech Communication 2: 251-273; in ATAL, B.S.- MILLER, L.J.- KENT, R.D. (Eds.) (1991) Papers in Speech Communication: Speech Processing. New York: Acoustical Society of America. pp. 33-56.

HUGHES, P.M. (1990) "Formant based speech synthesis", in WHEDDON, C.- LINGGARD, R. (Eds.) Speech and Language Processing. London: Chapman and Hall. pp. 145-156.

HUNT, A.- HOWARD, D.M.- MORRISON, G.- WORSDALL, A. (2000) "A real-time interface for a formant speech synthesizer", Logopedics, Phoniatrics, Vocology 25, 4: 169-175.

JESUS, L.M.T. de - VAZ, F.- PRINCIPE, J.C. (1997) "An implementation of the Klatt speech synthesizer", Electrónica e Telecomunicações 2, 1: 141-146.

KLATT, D.H. (1980) "Software for a Cascade/Parallel Formant Synthesizer", Journal of the Acoustical Society of America 67, 3: 971-995; in KENT, R.D.- ATAL, B.S.- MILLER, J.L. (Eds.) (1991) Papers in Speech Communication: Speech Production. New York: Acoustical Society of America. pp. 765-789.

KLATT, D.H.- KLATT, L.C. (1990) "Analysis, synthesis and perception of voice quality variations among female and male talkers", Journal of the Acoustical Society of America 87, 2: 820-857; in KENT, R.D.- ATAL, B.S.- MILLER, J.L. (Eds) (1991) Papers in Speech Communication: Speech Production. New York: Acoustical Society of America. pp. 791-828.

KLATT, D.H. KLSYN: A Formant Synthesis Program. Revised for the IBM-PC implementation by Keith Johnson. Department of Linguistics, Ohio State University.
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.558.5767


LADEFOGED, P. (1985) "The Phonetic Basis for Computer Speech Generation" en F. FALLSIDE - W.A. WOODS (Eds) Computer Speech Processing. Englewood Cliffs, N.J. : Prentice Hall International. pp. 3-27.

LILJENCRANTS, J.C.W.A. (1968) "The OVE III Speech Synthesiser", IEEE Transactions on Audio and Electroacoustics AU-16, 1: 137-140.

RUTLEDGE, J.- CUMMINGS, K.- LAMBERT, D.- CLEMENTS, M. (1995) "Synthesizing styled speech using the Klatt synthesizer", in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. Detroit, Michigan. Vol. 1, pp. 648-651.

STEVENS, K.N.- BICKLEY, C.A. (1991) "Constraints among parameters simplify control of Klatt formant synthesizer", Journal of Phonetics 19, 1: 161-174.


Styger, T., & Keller, E. (1994). Formant synthesis. In E. Keller (Ed.), Fundamentals of speech synthesis and speech recognition. Basic concepts, state of the art and future challenges. (pp. 109-28). Chichester: John Wiley & Sons. Retrieved from https://es.scribd.com/document/38696787/Formant-Synthesis

up arrow

Corpus-based speech synthesis

Adell, J., & Bonafonte, A. (2004). Towards phone segmentation for concatenative speech synthesis. In SSW5-2004. Proceedings of the fifth ISCA tutorial and research workshop on speech synthesis. (pp. 139-44). Pittsburgh, PA, USA, June 14-16, 2004. Retrieved from http://www.isca-speech.org/archive_open/ssw5/ssw5_139.html

Armenta, A., Escalada, J. G., Garrido, J. M., & Rodríguez Crespo, M. A. (2003). Conversor texto a voz multilingüe de telefónica I+D. Procesamiento del Lenguaje Natural, 31, 331-332. Retrieved from http://www.sepln.org/revistaSEPLN/revista/31/31-Pag331.pdf

Barra, R., Yamagishi, J., King, S., Montero, J. M., & Macías, J. (2010). Analysis of statistical parametric and unit selection speech synthesis systems applied to emotional speech. Speech Communication, 52(5), 394-404. doi:10.1016/j.specom.2009.12.007

Campbell, N. (2005). Developments in corpus-based speech synthesis: Approaching natural conversational speech. IEICE Transactions on Information and Systems, E88-D(3), 376-383. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.99.8099&rep=rep1&type=pdf

Campillo, F., & Rodríguez Banga, E. (2006). A method for combining intonation modelling and speech unit selection in corpus-based speech synthesis systems. Speech Communication, 48(8), 941-956. doi:10.1016/j.specom.2005.12.004

Campillo, F., van Santen, J. P. H., & Rodríguez Banga, E. (2006). A model for the f0 reset in corpus-based intonation approaches. In Interspeech 2006 - ICSLP. Proceedings of the 9th international conference on spoken language processing. (pp. 2362-5). Pittsburgh, PA, USA, September 17-21, 2006. Retrieved from http://www.isca-speech.org/archive/interspeech_2006/i06_1404.html

Cardeñoso, V., & Escudero, D. (2002). Statistical modelling of stress groups in Spanish. In Speech prosody 2002. First international conference on speech prosody. (pp. 207-10). Aix-en-Provence, France, 11-13 April, 2002. Retrieved from http://www.isca-speech.org/archive/sp2002/sp02_207.html

Carvalho, P. M., Oliveira, L., Trancoso, I., & Viana, M. C. (1998). Concatenative speech synthesis for European Portuguese. In SSW3-1998. Proceedings of the third ESCA/COCOSDA workshop on speech synthesis. (pp. 159-64). Jenolan Caves House, Blue Mountains, Australia, November 26-29, 1998. Retrieved from http://www.inesc-id.pt/pt/indicadores/Ficheiros/3261.pdf

Clark, R. A. J., Richmond, K., & King, S. (2007). Multisyn: Open-Domain unit selection for the Festival speech synthesis system. Speech Communication, 49(4), 317-330. doi:10.1016/j.specom.2007.01.014

Damper, R. I. (Ed). (2001). Data-Driven techniques in speech synthesis. Dordrecht: Kluwer.

Dutoit, T. (2008). Corpus-Based speech synthesis. In J. E. Benesty, M. M. Sondhi, & Y. Huang (Eds.), Springer handbook of speech processing. (pp. 437-520). Berlin - Heidelberg: Springer.

Escudero, D., & Cardeñoso, V. (2006). Visualization of prosodic knowledge using corpus driven MEMOInt intonation modelling. In P. Sojka, I. Kopecek, & K. Pala (Eds.), Lecture Notes in Computer Science: TDS 2006. 9Th international conference on text, speech and dialog. (pp. 645-52). Berlin - Heidelberg: Springer. Retrieved from http://www.infor.uva.es/~descuder/investig/pdfs/tsd2006.pdf

Escudero, D., & Cardeñoso, V. (2007). Applying data mining techniques to corpus based prosodic modeling. Speech Communication, 49(3), 213-229. doi:10.1016/j.specom.2007.01.008

Escudero, D., Cardeñoso, V., & Bonafonte, A. (2002). Corpus based extraction of quantitative prosodic parameters of stress groups in Spanish. In ICASSP 2002. Proceedings of the IEEE international conference on acoustics, speech and signal processing. (pp. 481-4). Orlando, Florida, May 13-17, 2002. Retrieved from https://pdfs.semanticscholar.org/c275/aefa6a1bb0e0086f3f6a4156425601e5bc59.pdf

Esquerra, I., & Bonafonte, A. (2004). Habla emocional mediante métodos de re-síntesis y selección de unidades. In URSI 2004. Actas del XIX simposium nacional de la unión científica internacional de radio. Universitat Ramon Llull, Barcelona, 8-10 de septiembre de 2004. Retrieved from http://www.lsi.upc.edu/~nlp/papers/esquerra04.pdf

Iida, A., Campbell, N., Higuchi, F., & Yasumura, M. (2003). A corpus-based speech synthesis system with emotion. Speech Communication, 40(1-2), 161-187. doi:10.1016/S0167-6393(02)00081-X

Möbius, B. (2000a). Corpus-Based speech synthesis: Methods and challenges. Arbeitspapiere des Instituts für Maschinelle Sprachverarbeitung, 6(4), 87-116. Retrieved from http://www.ims.uni-stuttgart.de/institut/mitarbeiter/moebius/papers/unitsel.pdf

Möbius, B. (2000b). Corpus-Based speech synthesis: Methods and challenges. In W. F. Sendlmeier (Ed.), Forum Phoneticum: Speech and signals. Aspects of speech synthesis and automatic speech recognition. Dedicated to Wolfgang Hess on his 60th birthday. (pp. 79-96). Frankfurt am Main: Hector.

Paulo, S., Oliveira, L., Mendes, C., Figueira, L., Cassaca, R., Viana, M. C., & Moniz, H. (2008). DIXI - A generic text-to-speech system for European Portuguese. In Lecture Notes in Artificial Intelligence: PROPOR 2008. Computational processing of the Portuguese language. Eighth international conference, aveiro, Portugal, September 8-10, 2008, proceedings. (pp. 91-100). Heidelberg: Springer. Retrieved from http://www.inesc-id.pt/pt/indicadores/Ficheiros/5009.pdf

Rodríguez Banga, E., Campillo, F., Fernández Rei, E., & Méndez, F. (2002). Sistema de conversión texto-voz en lengua gallega basado en la selección combinada de unidades acústicas y prosódicas. Procesamiento del Lenguaje Natural, 29, 153-158. Retrieved from http://www.sepln.org/revistaSEPLN/revista/29/29-Pag153.pdf

Syrdal, A. K., Wightman, C. W., Conkie, A., Stylianou, Y., Beutnagel, M., Schroeter, J., . . . Makashay, M. J. (2000). Corpus based techniques in the AT&T nextgen synthesis system. In Interspeech 2000 - ICSLP. Proceedings of the 6th international conference on spoken language processing. (pp. 410-5). Beijing, China, October 16-20, 2000. Retrieved from http://www.isca-speech.org/archive/icslp_2000/i00_3410.html

Toda, T., Kawai, H., Tsuzaki, M., & Shikano, K. (2006). An evaluation of cost functions sensitively capturing local degradation of naturalness for segment selection in concatenative speech synthesis. Speech Communication, 48(1), 45-56. doi:10.1016/j.specom.2005.05.011

Torres, H., & Gurlekian, J. A. (2008). Acoustic speech unit segmentation for concatenative synthesis. Computer Speech and Language, 22(2), 196-206. doi:10.1016/j.csl.2007.07.002

Triviño, M. P., & Alías, F. (2008). Predicción estadística de discontinuidades espectrales del habla para síntesis concatenativa. Procesamiento del Lenguaje Natural, 40, 67-74. Retrieved from http://www.sepln.org/revistaSEPLN/revista/40/11p15.pdf

Text-to-speech synthesis systems: Loquendo

up arrow

Statistical techniques in speech synthesis

Anumanchipalli, G. K., Cheng, Y. C., Fernandez, J., Huang, X., Mao, Q., & Black, A. W. (2010). KLATTSTAT: Knowledge-Based parametric speech synthesis. In SSW7-2010. Proceedings of the seventh ISCA tutorial and research workshop on speech synthesis. (pp. 206-10). Kyoto, Japan, September 22-24, 2010. Retrieved from http://www.cs.cmu.edu/~gopalakr/publications/klattstat_anumanchipalli.pdf

Barra, R., Yamagishi, J., King, S., Montero, J. M., & Macías, J. (2010). Analysis of statistical parametric and unit selection speech synthesis systems applied to emotional speech. Speech Communication, 52(5), 394-404. doi:10.1016/j.specom.2009.12.007

Barros, M. J., Maia, R., Tokuda, K., Resende Jr., F., & Freitas, D. (2005). Hmm-Based European Portuguese TTS system. In Interspeech 2005 - Eurospeech. Proceedings of the 9th European conference on speech communication and technology. (pp. 2581-4). Lisbon, Portugal. September 4-8, 2005. Retrieved from http://www.isca-speech.org/archive/interspeech_2005/i05_2581.html

Drugman, T., Moinet, M., & Dutoit, T. (2008). On the use of machine learning in statistical parametric speech synthesis. In Benelearn 2008. Spa, Belgium. Retrieved from http://tcts.fpms.ac.be/publications/papers/2008/benelearn_tdamtd.pdf

Gonzalvo, J. (2010). Síntesi basada en Models Ocults de Markov aplicada a l’espanyol i a l’anglès, les seves aplicacions i una proposta híbrida. Tesi doctoral, Departament de Comunicacions i Teoria del Senyal, Universitat Ramon Llull, Barcelona. Retrieved from http://hdl.handle.net/10803/9146

Maia, R., Zen, H., Tokuda, K., Kitamura, T., & Resende Jr., F. (2003). Towards the development of a brazilian Portuguese text-to-speech system based on HMM. In Eurospeech 2003. Proceedings of the 8th European conference on speech communication and technology. (pp. 2645-8). Geneva, Switzerland, September 1-4, 2003. Retrieved from http://www.sp.nitech.ac.jp/~zen/publications/maia-euro03.pdf

Siri Team (2017). Deep learning for Siri’s voice: On-device Deep Mixture Density Networks for Hybrid Unit Selection Synthesis. Apple Machine Learning Journal, 1(4). Retrieved from https://machinelearning.apple.com/2017/08/06/siri-voices.html

Zen, H., Tokuda, K., & Black, A. W. (2009). Statistical parametric speech synthesis. Speech Communication, 51(11), 1039-1064. doi:10.1016/j.specom.2009.04.004

up arrow

Synthesis of speaking styles

ABE, M. (1997) "Speaking Styles: Statistical Analysis and Synthesis by a Text-to-Speech System", in van SANTEN, J.P.H. - SPROAT, R.W.- OLIVE, J.P.- HIRSCHBERG. J. (Eds.) Progress in Speech Synthesis. New York: Springer. pp. 495-510.

CARLSON, R. (1991) "Synthesis: Modelling variability and constraints", in Eurospeech 91. 2nd european conference on speech communication and technology. Genova, Italy, 24-26 September 1991. vol. 3 pp. 1043-1048.

CARLSON, R. (1992) "Synthesis: modelling variability and constraints", Speech Communication 11, 2-3: 159-166.

CARLSON, R.- GRANSTRÖM, B.- KARLSSON, I. (1990) "Experiments with voice modelling in speech synthesis", in LAVER, J.- JACK, M.- GARDINER, A. (Eds.) ESCA Workshop on Speaker Charcterization in Speech Technology. CSTR: Edinburgh. pp. 28-39.

DUEZ, D. (2002) "Reduction and assimilatory processes in conversational French speech: implications for speech synthesis", in KELLER, E. - BAILLY, G.- MONAGHAN, A.- TERKEN, J.- HUCKVALE, M. (Eds.) Improvements in Speech Synthesis. Cost 258: The Naturalness of Synthetic Speech. Chichester: John Wiley & Sons. pp. 228-236.

GRANSTRÖM, B.- NORD, L. (1991) "Ways of exploring speaker characteristics and speaking styles", in Actes du XIIème Congrès International des Sciences Phonétiques. 19-24 août 1991, Aix-en-Provence, France. Aix-en-Provence: Université de Provence, Service des Publications. Vol 4. pp. 278-281.

GRANSTRÖM, B.- NORD, L. (1992) "Neglected dimensions in speech synthesis", Speech Communication 11, 4-5: 459-462.

GUSTAFSON, K.- HOUSE, D. (2002) "Prosodic parameters of a "fun" speaking style", in KELLER, E. - BAILLY, G.- MONAGHAN, A.- TERKEN, J.- HUCKVALE, M. (Eds.) Improvements in Speech Synthesis. Cost 258: The Naturalness of Synthetic Speech. Chichester: John Wiley & Sons. pp. 264-272.

HENTON, C. (1999) "Where is Female Synthetic Speech?", Journal of the International Phonetic Association 29,1: 51-62.

PÉAN, V.- LACHERET-DUJOUR, A. (1995) "Phonological Rule Modelling Style Variations of ’E’ caduc in French Parisian Spontaneous Speech for Text-to-Speech Synthesis", in Eurospeech’95. Proceedings of the 4th European Conference on Speech Communication and Technology. Madrid, Spain, 18-21 September, 1995. Vol 3, pp. 1835-1838.

TERKEN, J. (2002) "Variability and speaking styles in speech synthesis", in KELLER, E. - BAILLY, G.- MONAGHAN, A.- TERKEN, J.- HUCKVALE, M. (Eds.) Improvements in Speech Synthesis. Cost 258: The Naturalness of Synthetic Speech. Chichester: John Wiley & Sons. pp. 199-203.

YARRINGTON, D.- FOULDS, R. (1993) "Personalizing synthesized voices", in GRANSTRÖM, B.- HUNNICUTT, S.- SPENS, K.-E. (Eds.) Speech and Language Technology for Disabled Persons. Proceedings of an ESCA Workshop. Sotckholm, Sweden, May 31-June 2, 1993. pp. 169-172.

up arrow

Synthesis of emotional speech

Abadjieva, E., Murray, I. R., & Arnott, J. L. (1992). Methodological aspects of the implementation of emotional characteristics in synthesized speech. Proceedings of the Institute of Acoustics, 14(6), 487-494.

Abadjieva, E., Murray, I. R., & Arnott, J. L. (1993). Applying analysis of human emotional speech to enhance synthetic speech. In Eurospeech 1993. Proceedings of the 3rd european conference on speech communication and technology. (pp. 909-12). Berlin, Germany, September 21-23, 1993. Retrieved from http://www.isca-speech.org/archive/eurospeech_1993/e93_0909.html

Adell, J., Bonafonte, A., & Escudero, D. (2005). Analysis of prosodic features: Towards modelling of emotional and pragmatic atributes of speech. Procesamiento del Lenguaje Natural, 35, 277-283. Retrieved from http://www.sepln.org/revistaSEPLN/revista/35/34.pdf

Arnott, J. L., Alm, N., & Murray, I. (1993). Enhancing a communication prosthesis with vocal emotion effects. In Speech and language technology for disabled persons. Proceedings of an ESCA workshop. (pp. 165-8). Stockholm, Sweden, May 31 - June 2, 1993. Retrieved from http://www.isca-speech.org/archive_open/sltdp_93/sdp3_165.html

Barra, R., Montero, J. M., Macías, J., Gutiérrez Arriola, J. M., Ferreiros, J., & Pardo, J. M. (2007). On the limitations of voice conversion techniques in emotion identification tasks. In Interspeech 2007. Proceedings of the 8th annual conference of the international speech communication association. (pp. 2233-6). Antwerp, Belgium, August 27-31, 2007. Retrieved from http://www-gth.die.upm.es/research/documentation/AI-104Ont-07.pdf

Barra, R., Yamagishi, J., King, S., Montero, J. M., & Macías, J. (2010). Analysis of statistical parametric and unit selection speech synthesis systems applied to emotional speech. Speech Communication, 52(5), 394-404. doi:10.1016/j.specom.2009.12.007. Retrieved from http://www-gth.die.upm.es/research/documentation/AG-084Ana-10.pdf

Bou-Ghazale, S. E., & Hansen, J. H. L. (1996). Generating stressed speech from neutral speech using a modified CELP vocoder. Speech Communication, 20(1-2), 93-110. doi:10.1016/S0167-6393(96)00047-7

Boula de Mareüil, P., Célérier, P., & Toen, J. (2002). Generation of emotions by a morphing technique in English, French and Spanish. In Speech prosody 2002. First international conference on speech prosody. Aix-en-Provence, France, 11-13 April, 2002. Retrieved from http://www.isca-speech.org/archive/sp2002/sp02_187.html

Bulut, M., Narayanan, S. S., & Syrdal, A. K. (2002). Expressive speech synthesis using a concatenative synthesizer. In ICSLP 2002 - interspeech 2002. Proceedings of the 7th international conference on spoken language processing. (pp. 1265-8). Denver, Colorado, USA, September 16-20, 2002. Retrieved from http://www.isca-speech.org/archive/icslp_2002/i02_1265.html

Burkhardt, F., & Sendlmeier, W. F. (2000). Verification of acoustical correlates of emotional speech using formant synthesis. In Speech and emotion. ISCA tutorial and research workshop. (pp. 151-6). Belfast, Northern Ireland, UK. September 5-7, 2000. Retrieved from http://felix.syntheticspeech.de/publications/ISCAbelfast.pdf

Burkhardt, F., & Sendlmeier, W. F. (2000). Verification of acoustical correlates of emotional speech using formant synthesis. In W. F. Sendlmeier (Ed.), Speech and signals. Aspects of speech synthesis and automatic speech recognition. Dedicated to Wolfgang Hess on his 60th birthday. (pp. 27-39). Frankfurt am Main: Hector.

Cabral, J., & Oliveira, L. (2006). Emovoice: A system to generate emotions in speech. In Interspeech 2006 - ICSLP. Proceedings of the 9th international conference on spoken language processing. (pp. 1798-801). Pittsburgh, PA, USA, September 17-21, 2006. Retrieved from http://homepages.inf.ed.ac.uk/jscabral/artigos/jpc_interspeech_2006.pdf

Cahn, J. E. (1989). Generating expression in synthesized speech. Master’s Thesis, MIT Media Laboratory, Massachusetts Institute of Technology.Retrieved from http://alumni.media.mit.edu/~cahn/masters-thesis.html

Cahn, J. E. (1990). The generation of affect in synthesized speech. Journal of the American Voice I/O Society, 8, 1-19. Retrieved from http://alumni.media.mit.edu/~cahn/papers/avios90-article.pdf

Castelazo Luna, Y. (2007). Desarrollo de una aplicación de síntesis de voz con características prosódicas. Tesis de Licenciatura en Computación, División de Ciencias Básicas e Ingeniería, Universidad Autónoma Metropolitana, Unidad de Itzapalapa. Retrieved from http://www.cedip.edu.mx/tesinas/tesis_uam/Modelado%20de%20voz%20con%20caracteristicas%20prosodicas_UAMI13524.pdf

Drioli, C., Tisato, G., Cosi, P., & Tesser, F. (2003). Emotions and voice quality: Experiments with sinusoidal modeling. In Proceedings of VOQUAL’03. (pp. 127-32). Retrieved from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.133.5631

Drioli, C., Tisato, G., Cosi, P., & Tesser, F. (2004). Emotions and voice quality: Experiments with sinusoidal modelling. Quaderni della Sezione di Fonetica e Dialettologia dell’ISTC, 6, 149-154.

Drioli, C., Tisato, G., Cosi, P., & Tesser, F. (2004). Emozioni e "qualità vocalica": Esperimenti con modelli di sintesi sinusoidale. Quaderni della Sezione di Fonetica e Dialettologia dell’ISTC, 6, 155-160.

Esquerra, I. (2006). Síntesis de habla emocional por selección de unidades. In L. Buera, E. Lleida, A. Miguel, & A. Ortega (Eds.), IV jornadas en tecnología del habla. (pp. 161-5). Zaragoza: Universidad de Zaragoza - Red Temática en Tecnologías del Habla. Retrieved from http://lorien.die.upm.es/~lapiz/rtth/JORNADAS/IV/4jth_cdrom.html

Esquerra, I., & Bonafonte, A. (2004). Habla emocional mediante métodos de re-síntesis y selección de unidades. In URSI 2004. Actas del XIX simposium nacional de la unión científica internacional de radio. Universitat Ramon Llull, Barcelona, 8-10 de septiembre de 2004. Retrieved from http://www.lsi.upc.edu/~nlp/papers/esquerra04.pdf

Francisco, V., Gervás, P., & Hervás, R. (2005). Análisis y síntesis de expresión emocional en cuentos leídos en voz alta. Procesamiento del Lenguaje Natural, 35, 293-300. Retrieved from http://www.sepln.org/revistaSEPLN/revista/35/36.pdf

Henton, C., & Litwinowicz, P. (1994). Saying and seeing it with feeling: Techniques for synthesizing visible, emotional speech. In SSW2-1994. Proceedings of the second ESCA/IEEE workshop on speech synthesis. (pp. 73-6). Mohonk Mountain House, New Paltz, NY, USA, September 12-15, 1994. Retrieved from http://www.isca-speech.org/archive_open/ssw2/ssw2_073.html

Higuchi, N., Hirai, T., & Sagisaka, Y. (1994). Effect of speaking style on parameters of fundamental frequency contour. In SSW2-1994. Proceedings of the second ESCA/IEEE workshop on speech synthesis. (pp. 135-8). Mohonk Mountain House, New Paltz, NY, USA, September 12-15, 1994. Retrieved from http://www.isca-speech.org/archive_open/ssw2/ssw2_135.html

Hirose, K., Sato, K., Asano, Y., & Minematsu, N. (2005). Synthesis of F0 contours using generation process model parameters predicted from unlabeled corpora: Application to emotional speech synthesis. Speech Communication, 46(3-4), 385-404. doi:10.1016/j.specom.2005.03.014

Iida, A., Campbell, N., Higuchi, F., & Yasumura, M. (2003). A corpus-based speech synthesis system with emotion. Speech Communication, 40(1-2), 161-187. doi:10.1016/S0167-6393(02)00081-X. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.91.5734&rep=rep1&type=pdf

Iriondo, I. (2008). Producción de un corpus oral y modelado prosódico para la síntesis del habla expresiva. Tesis doctoral, Departament de Comunicacions i Teoria del Senyal, Universitat Ramon Llull, Barcelona. Retrieved from http://www.tdx.cat/TDX-0627108-123102

Iriondo, I., Alías, F., Melenchón, J., & Llorca, M. A. (2004). Modeling and synthesizing emotional speech for Catalan text-to-speech synthesis. In ADS 2004. Affective dialogue systems. ISCA tutorial and research workshop. (pp. 197-208). Kloster Irsee, Germany. June 14-16, 2004. Retrieved from http://www-gth.die.upm.es/research/documentation/referencias/Iriondo_Modeling.pdf

Iriondo, I., Guaus, R., Rodríguez, A., Lázaro, P., Montoya, N., Blanco, J. M., . . . Longhi, L. (2000). Validation of an acoustical modelling of emotional expression in Spanish using speech synthesis techniques. In Speech and emotion. ISCA tutorial and research workshop. (pp. 161-6). Belfast, Northern Ireland, UK. September 5-7, 2000. Retrieved from http://www.isca-speech.org/archive_open/speech_emotion/spem_161.html

Iriondo, I., Planet, S., Socoró, J. C., Martínez, E., Alías, F., & Monzó, C. (2008). Automatic refinement of an expressive speech corpus assembling subjective perception and automatic classification. Speech Communication, 51(9), 744-758. doi:10.1016/j.specom.2008.12.001

Iriondo, I., Socoró, J. C., & Alías, F. (2007). Prosody modelling of Spanish for expressive speech synthesis. In ICASSP 2007. Proceedings of the 32nd IEEE international conference on acoustics, speech and signal processing. (pp. 821-4). Honolulu, HI, USA. Retrieved from https://pdfs.semanticscholar.org/72aa/deb4444ff35622f8a3a8ab02e631a97dfc90.pdf

Iriondo, I., Socoró, J. C., Formiga, L., Gonzalvo, X., Alías, F., & Miralles, P. (2006). Modelado y estimación de la prosodia mediante razonamiento basado en casos. In L. Buera, E. Lleida, A. Miguel, & A. Ortega (Eds.), IV jornadas en tecnología del habla. (pp. 183-8). Zaragoza: Universidad de Zaragoza - Red Temática en Tecnologías del Habla. Retrieved from http://lorien.die.upm.es/~lapiz/rtth/JORNADAS/IV/4jth_cdrom.html

Lucas, J. M., Alcázar, R., Montero, J. M., Fernández Martínez, F., Barra, R., D’Haro, L. F., . . . Pardo, J. M. (2008). Desarrollo de un robot-guía con integración de un sistema de diálogo y expresión de emociones: Proyecto ROBINT. Procesamiento del Lenguaje Natural, 40, 51-58. Retrieved from http://www.sepln.org/revistaSEPLN/revista/40/09p12.pdf

Magno Caldognetto, E., Cosi, P., Drioli, C., Tisato, G., & Cavicchio, F. (2004). Modifications of phonetic labial targets in emotive speech: Effects of the co-production of speech and emotions. Speech Communication, 44(1-4), 173-185. doi:10.1016/j.specom.2004.10.012

Montero, J. M. (2003). Estrategias para la mejora de la naturalidad y la incorporación de variedad emocional a la conversión texto a voz en castellano. Tesis doctoral, Departamento de Ingeniería Electrónica, Escuela Técnica Superior de Ingenieros de Telecomunicación, Universidad Politécnica de Madrid. Retrieved from http://oa.upm.es/300/2/Juan_Manuel_Montero.pdf

Montero, J. M., Gutiérrez, J., Enríquez, E., & Pardo, J. M. (1999). Analysis and modelling of emotional speech in Spanish. In ICPhS 1999. Proceedings of the 14th international congress of phonetic sciences. (pp. 957-60). University of California, San Francisco, August 1-7, 1999. Retrieved from http://www-gth.die.upm.es/research/documentation/AI-55Ana-99.pdf

Montero, J. M., Gutiérrez Ariola, J., de Córdoba Herralde, R., Enríquez Carrasco, E., & Pardo Muñoz, J. M. (2002). The role of pitch and tempo in Spanish emotional speech: Towards concatenative synthesis. In E. Keller, G. Bailly, A. Monaghan, J. Terken, & M. Huckvale (Eds.), Improvements in speech synthesis. COST 258: The naturalness of synthetic speech. (pp. 246-51). Chichester: John Wiley & Sons.

Montero, J. M., Gutiérrez Arriola, J. M., Colás, J., Macías, J., Enríquez, E., & Pardo, J. M. (1999). Development of an emotional speech synthesiser in Spanish. In Eurospeech 1999. Proceedings of the 6th european conference on speech communication and technology. (pp. 2099-102). Budapest, Hungary, September 5-9, 1999. Retrieved from http://www-gth.die.upm.es/research/documentation/AI-53Dev-99.pdf

Montero, J. M., Gutiérrez Arriola, J. M., Palazuelos, S., Enríquez, E., Aguilera, S., & Pardo, J. M. (1998). Emotional speech synthesis: From speech database to TTS. In ICSLP 1998. Proceedings of the 5th international conference on spoken language processing. (pp. 923-6). Sidney Convention Centre, Sidney, Australia, 30 November - 4 December, 1998. Retrieved from http://www-gth.die.upm.es/research/documentation/AI-45Emo-98.pdf

Monzó, C., Calzada, A., Iriondo, I., & Socoró, J. C. (2010). Expressive speech style transformation: Voice quality and prosody modification using a harmonic plus noise model. In Speech prosody 2010. Fifth international conference on speech prosody. Chicago, Illinois USA. May 11-14, 2010. Retrieved from http://speechprosody2010.illinois.edu/papers/100985.pdf

Murray, I. R., & Arnott, J. L. (1993). Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion. The Journal of the Acoustical Society of America, 93(2), 1097-1108. doi:10.1121/1.405558

Murray, I. R., & Arnott, J. L. (1995). Implementation and testing of a system for producing emotion-by-rule in synthetic speech. Speech Communication, 16(4), 369-390. doi:10.1016/0167-6393(95)00005-9

Murray, I. R., Arnott, J. L., & Rohwer, E. A. (1996). Emotional stress in synthetic speech: Progress and future directions. Speech Communication, 20(1-2), 85-91. doi:10.1016/S0167-6393(96)00046-5

Ní Chasaide, A., & Gobl, C. (2002). Voice quality and the synthesis of affect. In E. Keller, G. Bailly, A. Monaghan, J. Terken, & M. Huckvale (Eds.), Improvements in speech synthesis. COST 258: The naturalness of synthetic speech. (pp. 252-63). Chichester: John Wiley & Sons.

Nordstrand, M., Svanfeldt, G., Granström, B., & House, D. (2004). Measurements of articulatory variation in expressive speech for a set of Swedish vowels. Speech Communication, 44(1-4), 187-196. doi:10.1016/j.specom.2004.09.003

Oudeyer, P. Y. (2003). The production and recognition of emotions in speech: Features and algorithms. International Journal of Human-Computer Studies, 59, 157-183. doi:10.1016/S1071-5819(02)00141-6. Retrieved from http://pyoudeyer.com/emotionsIJHCS.pdf

Schröder, M. (2001). Emotional speech synthesis: A review. In Eurospeech 2001 Scandinavia. Proceedings of the 7th european conference on speech communication and technology, 2nd interspeech event. (pp. 561-4). Aalborg, Denmark, September 3-7, 2001. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.9050

Schröder, M. (2004). Speech and emotion research: An overview of research frameworks and a dimensional approach to emotional speech synthesis. PhD Thesis, Institute of Phonetics, Saarland University. Retrieved from http://www.dfki.de/dfkibib/publications/docs/schroeder_phd_2004.pdf

Schröder, M. (2008). Expressive speech synthesis: Past, present, and possible futures. In J. Tao & T. Tan (Eds.), Affective information processing. (pp. 111-26). London: Springer. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.158.8141&rep=rep1&type=pdf#page=123

Shaikh, M. A. M., Rebordao, A. R. F., & Hirose, K. (2010). Improving TTS synthesis for emotional expressivity by a prosodic parametrization of affect based on linguistic analysis. In Speech prosody 2010. Fifth international conference on speech prosody. Chicago, Illinois, USA. May 11-14, 2010. Retrieved from http://speechprosody2010.illinois.edu/papers/100970.pdf

Tatham, M., & Morton, K. (2004). Expression in speech: Analysis and synthesis. Oxford: Oxford University Press.

Prosody and emotions

Recognition of emotional speech

Emotion in spoken language systems

up arrow

Multimodal synthesis

BENOÎT, C. (1990) "Synthesis of talking faces: Why and how ?", in Proceedings of the ESCA Tutorial Day on Speech Synthesis. Autrans, France, 25-28 September 1990. pp.49-54.

BENOÎT, Ch. - LE GOFF, B. (1998) "Audio-visual speech synthesis from French text: Eight years of models, designs and evaluation at the ICP", Speech Communication 26, 1-2: 105-115.

DOHEN, M.- LOEVENBRUCK, H.- CATHIARD, M.-A.- SCHWARTZ, J.-L. (2004) "Visual perception of contrastive focus in reiterant French speech", Speech Communication 44, 1-4: 155-172.
http://dx.doi.org/10.1016/j.specom.2004.10.009

FAGEL, S.- CLEMENS, C. (2004) "An articulation model for audiovisual speech synthesis—Determination, adjustment, evaluation", Speech Communication 44, 1-4: 141-154.
http://dx.doi.org/10.1016/j.specom.2004.10.006

GRANSTRÖM, B.- HOUSE, D.- BESKOW, J. (2002) "Speech and Gestures for Talking Faces in Conversational Dialogue Systems", in GRANSTRÖM, B.- HOUSE, D.- KARLSSON, I. (Eds.) Multimodality in Language and Speech Systems. Dordrecht: Kluwer (Text, Speech and Language Technology, 19). pp. 209-241.

MAGNO CALDOGNETTO, E.- COSI, P.- DRIOLI, C.- TISATO, G.- CAVICCHIO, F. (2004) "Modifications of phonetic labial targets in emotive speech: effects of the co-production of speech and emotions", Speech Communication 44, 1-4: 173-185.
http://dx.doi.org/10.1016/j.specom.2004.10.012

MASSARO, D.W. (1998) Perceiving Talking Faces. From Speech Perception to Behavioral Principle. Cambridge, MA: The MIT Press (Bradford Books).

NORDSTRAND, M.- SVANFELDT, G.- GRANSTRÖM, B.- HOUSE, D. (2004) "Measurements of articulatory variation in expressive speech for a set of Swedish vowels", Speech Communication 44, 1-4: 186-196.
http://dx.doi.org/10.1016/j.specom.2004.09.003

OUNI, S.- COHEN, M.M.- MASSARO, D.W. (2005) "Training Baldi to be multilingual: A case study for an Arabic Badr", Speech Communication 45, 2: 115-138.
http://dx.doi.org/10.1016/j.specom.2004.11.008

SCHWEITZER, A. - BRAUNSCHWEILER, N. - DOGIL, G. - KLANKERT, T. - MÖBIUS, B. - MÖHLER, G. - MORAIS, E. - SÄUBERLICH, B. - THOMAE, M. (2006) "Multimodal speech synthesis", in WAHLSTER, W. (Ed.) SmartKom: Foundations of Multimodal Dialogue Systems. New York: Springer. pp. 411-438.

up arrow

Spoken language generation


= Recommended introductory/general reading

ALTER, K.- PIRKER, H.- FINKLER, W. (Eds.) (1997) Concept to Speech Generation Systems. Proceedings of a Workshop Sponsored by the Association for Computational Linguistics. 11 July 1997, Universidad Nacional de Educación a Distancia, Madrid, Spain.

BERKOW, J.- GRANSTRÖM, B.- HOUSE, D. (2002) "A multi-modal speech synthesis tool applied to audio-visual prosody", in KELLER, E. - BAILLY, G.- MONAGHAN, A.- TERKEN, J.- HUCKVALE, M. (Eds.) Improvements in Speech Synthesis. Cost 258: The Naturalness of Synthetic Speech. Chichester: John Wiley & Sons. pp. 372-382.

DEEMTER, K. van - ODIJK, J. (1997) "Context modeling and the generation of spoken discourse", Speech Communication 21, 1-2: 101-122.

FALLSIDE, F.- YOUNG, S. (1984) "Speech Output from Complex Systems", in BRISTOW, G. (Ed.) Electronic Speech Synthesis. Techniques, Technology and Applications. London: Granada pp. 275-287.


LAVID, J. (2006) "La generación del lenguaje en los sistemas de diálogo", in LLISTERRI, J.- MACHUCA, M. J. (Eds.) Los sistemas de diálogo. Bellaterra - Soria: Universitat Autònoma de Barcelona, Servei de Publicacions - Fundación Duques de Soria (Manuals de la Universitat Autònoma de Barcelona, Lingüística, 45). pp. 153-176.

McKEOWN, K.R.- MOORE, J.D. (1997) "Spoken Language Generation", in COLE, R.A.- MARIANI, J.- USZKOREIT, H.- ZAENEN, A.- ZUE, V. (Eds.) Survey of the State of the Art in Human Language Technology. Cambridge: Cambridge University Press.
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.366.9300

PAN, S.- McKEOWN, K. (1997) "Integrating Language Generation with Speech Synthesis in a Concept to Speech System", in ALTER, K.- PIRKER, H.- FINKLER, W. (Eds.) Concept to Speech Generation Systems. Proceedings of a Workshop Sponsored by the Association for Computational Linguistics. 11 July 1997, Universidad Nacional de Educación a Distancia, Madrid, Spain. pp. 23-28.

SPYNS, P. - DEPREZ, F.- van TICHELEN, L.- van COILE, B. (1997) "Message-to-Speech: High Quality Speech Generation for Messaging and Dialogue Systems", in ALTER, K.- PIRKER, H.- FINKLER, W. (Eds.) Concept to Speech Generation Systems. Proceedings of a Workshop Sponsored by the Association for Computational Linguistics. 11 July 1997, Universidad Nacional de Educación a Distancia, Madrid, Spain. pp. 11-16.

TEICH, E.- HAGEN, E.- GROTE, B.- BATEMAN, J. (1997) "From communicative context to speech: Integrating dialogue processing, speech production and natural language generation", Speech Communication 21, 1-2: 73-100.

up arrow

Markup languages for speech synthesis

HUCKVALE, M. (2002) "The use and potential of Extensible Mark-Up (XML) in speech generation", in KELLER, E. - BAILLY, G.- MONAGHAN, A.- TERKEN, J.- HUCKVALE, M. (Eds.) Improvements in Speech Synthesis. Cost 258: The Naturalness of Synthetic Speech. Chichester: John Wiley & Sons. pp. 297-306.

MONAGHAN, A. (2002) "Mark-up for speech synthesis: A review and some suggestions", in KELLER, E. - BAILLY, G.- MONAGHAN, A.- TERKEN, J.- HUCKVALE, M. (Eds.) Improvements in Speech Synthesis. Cost 258: The Naturalness of Synthetic Speech. Chichester: John Wiley & Sons. pp. 307-319.

TAYLOR, P.- ISARD, A. (1997) "SSML: A speech synthesis markup language", Speech Communication 21, 1-2: 123-133.

up arrow

Text-to-Speech Synthesis

General overviews, compilations and textbooks


= Recommended introductory/general reading

Aida-Zade, K. R., Ardil, C., & Sharifova, A. M. (2010). The main principles of text-to-speech synthesis system. International Journal of Electrical, Computer, Electronics and Communication Engineering, 4(1), 13-19. Retrieved from http://waset.org/publications/8303/the-main-principles-of-text-to-speech-synthesis-system

Allen, J. (1973). Synthesis of speech from unrestricted text. In J. L. Flanagan & L. R. Rabiner (Eds.), Speech synthesis. (pp. 416-28). Stroudsburg: Dowden, Hutchinson & Ross. (Original work published 1976)

Allen, J. (1976). Synthesis of speech from unrestricted text. Proceedings of the IEEE, 64(4), 433-442.

Allen, J. (1979). Speech synthesis from text. In J. C. Simon (Ed.), Spoken language generation and understanding. Proceedings of the NATO Advanced Study Institute held at Bonas, France, June 26-July 7, 1979. (pp. 383-96). Dordrecht: Reidel.

Allen, J. (1991). Synthesis of speech from unrestricted text. In B. S. Atal, L. J. Miller, & R. D. Kent (Eds.), Papers in speech communication: Speech production. (pp. 3-12). New York: Acoustical Society of America. (Original work published 1976)

Allen, J. (1992). Overview of text-to-speech systems. In S. Furui & M. Sondhi (Eds.), Advances in speech signal processing. New York: M. Dekker.

Barbosa, P. A. (1999). Revelar a estrutura rítmica de uma língua construindo máquinas falantes: Pela integração entre ciência e tecnologia de fala. In E. M. Scarpa (Ed.), Estudos de prosódia. (pp. 21-52). Campinas: Editoria da Unicamp. Retrieved from http://www.unicamp.br/iel/site/docentes/plinio/EstudosProsodia.pdf

Barbosa, P. A. (2002). A construção de máquinas falantes como lugar de integração entre ciências e tecnologias de fala. Intercâmbio, 11, 189-194. Retrieved from http://www.unicamp.br/iel/site/docentes/plinio/intercambio11.pdf

Black, A. W. (2006). Multilingual speech synthesis. In T. Schultz & K. Kirchhoff (Eds.), Multilingual speech processing. (pp. 207-31). Burlington, MA: Elsevier Academic Press.


Bonafonte, A., Escudero, D., & Riera, M. (2006). La conversión de texto en habla. In J. Llisterri & M. J. Machuca (Eds.), Los sistemas de diálogo. (pp. 177-208). Bellaterra - Soria: Universitat Autònoma de Barcelona - Fundación Duques de Soria.

Carlson, R., Granström, B., & Hunnicutt, S. (1990). Multilingual text-to-speech development and applications. In W. A. Ainsworth (Ed.), Advances in speech, hearing and language processing. Volume I. (pp. 269-96). London: JAI Press.

d’Alessandro, C., Garnier, M., & Boula de Mareüil, P. (1996). Synthèse de la parole à partir du texte. In H. Méloni (Ed.), Fondements et perspectives en traitement automatique de la parole. (pp. 81-96). Paris: Éditions AUPELF-UREF.

d’Alessandro, C. & Tzoukermann, E. (Eds). (2002). Synthèse de la parole à partir du texte. Traitement Automatique des Langues, 14(1).

Dilts, M. (1984). Text to speech. In G. Bristow (Ed.), Electronic speech synthesis. Techniques, technology and applications. (pp. 94-113). London: Granada.


Dutoit, T. & Stylianou, Y. (2003). Text-to-Speech synthesis. In R. Mitkov (Ed.), The Oxford handbook of computational linguistics. (pp. 323-38). Oxford: Oxford University Press.

Dutoit, T. (1997). High-Quality text-to-speech synthesis: An overview. Journal of Electrical & Electronics Engineering, 17(1), 25-37. Retrieved from http://tcts.fpms.ac.be/publications/regpapers/1997/ieeea97_td.zip


Dutoit, T. (1997). An introduction to text-to-speech synthesis. Dordrecht: Kluwer.

Dutoit, T. (1999). A short introduction to text-to-speech synthesis [Web page]. Mons: TCTS Lab, Faculté Polytechnique de Mons. Retrieved from http://tcts.fpms.ac.be/synthesis/introtts_old.html

Edgington, M., Lowry, A., Jackson, P., Breen, A. P., & Minnis, S. (1996a). Overview of current text-to-speech techniques. Part I: Text and linguistic analysis. BT Technology Journal, 14(1).

Edgington, M., Lowry, A., Jackson, P., Breen, A. P., & Minnis, S. (1996b). Overview of current text-to-speech techniques. Part II: Prosody and speech generation. BT Technology Journal, 14(1).

Gagnon, R., Fons, K., & Gargagliano, T. (1984). Phonetic synthesis. In G. Bristow (Ed.), Electronic speech synthesis. Techniques, technology and applications. (pp. 177-91). London: Granada.


Henton, C. (2012). Text-to-Speech synthesis development. In C. A. Chapelle (Ed.), The Encyclopedia of Applied Linguistics. Oxford: Blackwell.

Llisterri, J. & West, M. (1987). Los sistemas de conversión de texto a voz mediante síntesis por reglas: Una aproximación interdisciplinar. In C. Martín Vide (Ed.), Lenguajes naturales y lenguajes formales II. Actas del II Congreso de lenguajes naturales y lenguajes formales. (pp. 183-96). Barcelona: Promociones y Publicaciones Universitarias. Retrieved from http://liceu.uab.cat/~joaquim/publicacions/Llisterri_West_87_Conversion_Texto_Voz.pdf

Llisterri, J. (2001). La conversión de texto en habla. Quark. Ciencia, Medicina, Comunicación y Cultura, 21, 79-89. Retrieved from http://liceu.uab.cat/~joaquim/publicacions/CTH_Quark_01.pdf


Llisterri, J., Carbó, C., Machuca, M. J., Mota, C., Riera, M., & Ríos, A. (2004). La conversión de texto en habla: Aspectos lingüísticos. In M. A. Martí & J. Llisterri (Eds.), Tecnologías del texto y del habla. (pp. 145-86). Barcelona: Edicions de la Universitat de Barcelona - Fundación Duques de Soria. Retrieved from http://liceu.uab.cat/~joaquim/publicacions/Llisterri_Carbo_Machuca_Mota_Riera_Rios_04_Conversion_Texto_Habla.pdf

Narayanan, S. & Alwan, A. (Eds). (2005). Text-to-speech synthesis. New paradigms and advances. Indianapolis: Prentice Hall.

Pfister, B. & Traber, C. (1994). Text-to-Speech synthesis: An introduction and a case study. In E. Keller (Ed.), Fundamentals of speech synthesis and speech recognition. Basic concepts, state of the art and future challenges. (pp. 87-108). Chichester: John Wiley & Sons.

Rodríguez Crespo, M. A. (1997). Introducción a la conversión texto-voz. Philologia Hispalensis, 11(2), 177-192.

Rodríguez Crespo, M. A., Escalada, J. G., & Monzón, L. (1991). Teoría y aplicaciones de la conversión texto-voz. Comunicaciones de Telefónica I+D, 2(4).

Schroeter, J. (2006). Text to-speech (TTS) synthesis. In The electrical engineering handbook. (pp. 16.1-16.13). Roca Baton, FL: CRC Press.


Taylor, P. (2009). Text-to-Speech synthesis. Cambridge: Cambridge University Press. [Draft version: http://svr-www.eng.cam.ac.uk/~pat40/ttsbook_draft_2.pdf]

1. Introduction; 2. Communication and language; 3. The text-to-speech problem; 4. Text segmentation and organisation; 5. Text decoding; 6. Prosody prediction from text; 7. Phonetics and phonology; 8. Pronunciation; 9. Synthesis of prosody; 10. Signals and filters; 11. Acoustic models of speech production; 12. Analysis of speech signals; 13. Synthesis techniques based on vocal tract models; 14. Synthesis by concatenation and signal processing modification; 15. Hidden Markov model synthesis; 16. Unit selection synthesis; 17. Further issues; 18. Conclusions.

van Santen, J. H. P., Sproat, R., Olive, J., & Hirschberg, J. (Eds). (1996). Progress in speech synthesis. New York: Springer.

Corpus-based speech synthesis

up arrow

Text-to-Speech Systems

Multilingual

BLACK, A.. TAYLOR, P. (1997) The Festival Speech Synthesis System: system documentation. Technical Report HCRC/TR-83, Human Communications Research Centre, University of Edinburgh, Scotland UK, January 1997.
http://www.cstr.ed.ac.uk/projects/festival/

HOFFMANN, R. (1999) "A Multilingual Text-to-Speech System", The Phonetician 80: 5-10.

Méndez Pazó, F., Docío, L., Arza, M., & Campillo, F. (2010). The Albayzín 2010 text-to-speech evaluation. In FALA 2010. VI jornadas en tecnología del habla - II Iberian SLTech workshop. (pp. 317-21). Centro Social Caixanova, Vigo, Spain. 10-12 November, 2010. Retrieved from http://lorien.die.upm.es/~lapiz/rtth/JORNADAS/VI/pdfs/Proceedings_FALA2010.pdf

OLASZY, G. (1980) "MULTIVOX - A flexible text-to-speech synthesis for Hungarian, Finnish, German, Esperanto, Italian and other languages for IBM - PC", in TUBACH, J.P.- MARIANI, J.J. (Eds.) (1989) Eurospeech 89. European Conference on Speech Communication and Technology. Paris, September 1989. Edinburgh: CEP Consultants Ltd. vol 2 pp. 525-29.

OLASZY, G.- GORDOS, G.- NÉMETH, G. (1992) "The MULTIVOX multilingual text-to-speech converter", in BAILLY, G.- BENOÎT, C. (Eds.) Talking Machines. Theories, Models and Designs. Amsterdam: North-Holland / Elsevier Science Publishers. pp. 385-412.

RODRÍGUEZ, M.A.- ESCALADA, J.G.- TORRE, D. (1998) "Conversor texto-voz multilingüe para español, catalán, gallego y euskera", Procesamiento del Lenguaje Natural, Revista n. 23: 16-23.

SPROAT, R. (Ed.) (1997) Multilingual Text-to-Speech Synthesis: The Bell Labs Approach. Dordrecht: Kluwer Academic Publishers.

1.- Introduction: J. van Santen, R. Sproat; 2.- Methods and Tools: J. van Santen, R. Sproat; 3.- Multilingual Text Analysis: R. Sproat, et al.: 4.- Further Issues in Text Analysis: R. Sproat; 5.- Timing: J. van Santen; 6.- Intonation: J. van Santen, et al.; 7.- Synthesis: J. Olive, et al.; 8. Evaluation: J. van Santen; 9.- Further Issues: R. Sproat, et al.; A: Character Set Encodings; B: Glossary of Grammatical Labels.

SPROAT, R.- OLIVE, J. (1995) "An approach to Text-to-Speech Synthesis", in KLEIJN, W.B.- PALIWAL, K.K. (Eds.) Speech Coding and Synthesis. Amsterdam: Elsevier Science.

SPROAT, R. W.- OLIVE, J.P. (1997) "A Modular Architecture for Multilingual Text-to-Speech", in van SANTEN, J.P.H. - SPROAT, R.W.- OLIVE, J.P.- HIRSCHBERG. J. (Eds.) Progress in Speech Synthesis. New York: Springer. pp. 565-574.

Loquendo

Badino, L., Barolo, C., & Quazza, S. (2004). A general approach to TTS reading of mixed-language texts. In Interspeech 2004 - ICSLP. Proceedings of the 8th international conference on spoken language processing. (pp. 849-52). Jeju Island, Korea. October 4-8, 2004. Retrieved from http://www.isca-speech.org/archive/interspeech_2004/i04_0849.html

Baggia, P., Badino, L., Bonardo, D., & Massimino, P. (2006). Achieving perfect TTS intelligibility. In AVIOS speech technology symposium. SpeechTek West 2006. San Francisco, USA. January 30 - February 1, 2006. Retrieved from http://www.tmcnet.com/channels/speech-recognition-and-text-to-speech-technology/articles/3301-achieving-perfect-tts-intelligibility.htm

Balestri, M., Pacchiotti, A., Quazza, S., Salza, P. L., & Sandri, S. (1999). Choose the best to modify the least: A new generation concatenative synthesis system. In Eurospeech 1999. Proceedings of the 6th European conference on speech communication and technology. (pp. 2291-4). Budapest, Hungary. September 5-9, 1999. Retrieved from http://www.mirlab.org/conference_papers/International_Conference/Eurospeech%201999/PAPERS/S11P2/B059.PDF

Bonaventura, P., Giuliani, F., Garrido, J. M., & Ortín, I. (1998). Grapheme-to-phoneme transcription rules for Spanish, with application to automatic speech recognition and synthesis. In S. Bergler (Ed.), Partially automated techniques for transcribing naturally occurring continuous speech. Proceedings of the workshop (COLING-ACL 98. 36th annual meeting of the Association for Computational Linguistics and 17th international conference on Computational Linguistics) (pp. 33-39). Montreal, Quebec, Canada. 16 August, 1998. Retrieved from http://aclweb.org/anthology/W/W98/W98-0804.pdf

Garrido, J. M., Ortín, I., Quazza, S., Salza, P. L., & Mancini, F. (2000). Desarrollo de un módulo de asignación de parámetros prosódicos para la versión en español del sistema de conversión texto-habla ACTOR®. Procesamiento del Lenguaje Natural, 26, 183-190. Retrieved from http://www.sepln.org/revistaSEPLN/revista/26/garrido-alminana.pdf

Gili Fivela, B., & Quazza, S. (1996). A prosodic parser for an Italian text-to-speech system. Procesamiento del Lenguaje Natural, 19, 189-200. Retrieved from http://www.sepln.org/revistaSEPLN/revista/19/19-Todo.pdf

Llisterri, J., Machuca, M. J., Madrigal, N., Mancini, F., Massimino, P., Mota, C., . . . Ríos, A. (2004). Aspectos lingüísticos en el diseño de un conversor de texto en habla en castellano y en catalán: El sistema loquendo TTS®. In VI congreso de lingüística general.. (pp. 521-2). Santiago de Compostela: Universidade de Santiago de Compostela, Facultade de Filoloxía, Área de Lingüística Xeral. Retrieved from http://liceu.uab.cat/~joaquim/publicacions/Llisterri_et_al_04_Conversor_Texto_Habla_Castellano_Catalan_Loquendo.pdf

Quazza, S., Donetti, L., Moisa, L., & Salza, P. L. (2001). Actor®: A multilingual unit-selection speech synthesis system. In SSW4-2001. Proceedings of the fourth ISCA tutorial and research workshop on speech synthesis. (pp. 217-8). Perthshire, Scotland. August 29 - September 1, 2001. Retrieved from http://www.isca-speech.org/archive_open/ssw4/ssw4_209.html

Rello, L., & Llisterri, J. (2010). Naturalidad y expresividad en la conversión de texto en habla: Las consonantes róticas en coda silábica en español. In IX congreso internacional de lingüística general. Universidad de Valladolid, 21 de junio de 2010. Retrieved from http://liceu.uab.cat/~joaquim/speech_technology/CLG_10/CLG_10.html

Salza, P. L., Barolo, C., & Dobrzynska, B. (2009). Etichettatura fonetica e TTS. In L. Romito, V. Galatà, & R. Lio (Eds.), AISV 2007. La fonetica sperimentale. Metodo e applicazioni. Atti del 4o convegno nazionale AISV - Associazione Italiana di Scienze della Voce. Università della Calabria, Arcavacata di Rende (CS). 3-5 dicembre 2007. [CD-ROM] (pp. 337-47). Torriana: EDK Editore.

Zovato, E., Salza, P. L., & Quazza, S. (2006). La valutazione diagnostica come ausilio per lo sviluppo dei sistemi di sintesi vocale. In V. Giordani, V. Bruseghini, & P. Cosi (Eds.), AISV 2006. Scienze vocali e del linguaggio. Metodologie di valutazione e risorse linguistiche. Atti del 3o convegno nazionale AISV - Associazione iItaliana di Scienze della Voce. ITC-IRST, Povo di Trento. 29-30 novembre - 1 dicembre 2006. [CD-ROM] (pp. 243-50). Torriana: EDK Editore.

up arrow

Spanish

BONAFONTE, A.- ESQUERRA, I,- FEBRER, A.- VALLVERDÚ, F. (1997) "A Bilingual Text-to-Speech System in Spanish and Catalan", in KOKKINAKIS, G.- FAKOTAKIS, N.- DERMATAS, E. (Eds.) Eurospeech’97. 5th European Conference on Speech Communication and Technology. Rhodes, Greece, 22-25 September 1997. Vol. 5. pp. 2455 - 2458.

BONAFONTE, A.- ESQUERRA, I.- FEBRER, A.- FONOLLOSA, J.A.- VALLVERDÚ, F. (1998) "The UPC Text-to-Speech System for Spanish and Catalan", in Proceedings of the 5th international conference on spoken language processing, ICSLP’98, Sydney, Australia, 30th November-4th December 1998
http://www.isca-speech.org/archive/icslp_1998/i98_1146.html

BULLÓN, J.L.- PÉREZ, J.C. (1994) "Conversión de texto a voz en castellano aplicando el algoritmo PSOLA", Boletín de la Sociedad Española para el Procesamiento del Lenguaje Natural 14: 217-232.

CASTEJÓN LAPEYRA, F.- ESCALADA SARDINA, G.- MONZÓN SERRANO, L.- RODRÍGUEZ CRESPO, M.A.- SANZ VELASCO, P. (1994) "Un conversor texto-voz para el español", Comunicaciones de Telefónica I+D, 5, 2: 114-131.

CONEJO, J.M.- VAN COILE, B. (1991) "Desarrollo de un conversor de texto a voz en español dentro de una arquitectura multilingüe", Boletín de la Sociedad Española para el Procesamiento del Lenguaje Natural 11: 221-230.

FLORES TOSCANO, L. (2001) Síntesis de voz mediante la implementación de Unit Selection. Tesis de Licenciatura. Departamento de Ingeniería en Sistemas de Computación, Universidad de las Américas, Puebla, México.

LÓPEZ GONZALO, E.- RODRÍGUEZ BANGA, E.- GARCÍA MATEO, C.- HERNÁNDEZ GÓMEZ, L. (1994) "Modelado lingüístico y acústico para un sistema de conversión de texto a habla", Boletín de la Sociedad Española para el Procesamiento del Lenguaje Natural 14: 257-272.

LÓPEZ-GONZALO, E. - OLASZY, G.- NÉMETH, G. (1993) "Improvement of the Spanish version of the Multivox text-to-speech system", in Eurospeech’93. 3rd European Conference on Speech Communication and Technology. Berlin, Germany, 21-23 September 1993. Vol. 2 pp. 869-872.

MARTÍ, J.- NIÑEROLA, D. (1987) "SINCAS: un conversor texto-voz en castellano", Procesamiento del Lenguaje Natural, Boletín n. 5: 111-122.

MONTERO, J.M.- GUTIÉRREZ ARRIOLA, J.- COLÁS, J.- MACÍAS GUARASA, J.- ENRÍQUEZ, E.- PARDO, J.M. (1999) "Development of an emotional speech synthesiser in Spanish", in Eurospeech99, 6th european conference on speech communication and technology. September 5-9, 1999, Budapest, Hungary. pp. 2099-2102.
http://www.isca-speech.org/archive/eurospeech_1999/e99_2099.html

MONTERO MARTÍNEZ, J.M.- GUTIÉRREZ ARIOLA, J.- de CÓRDOBA HERRALDE, R.- ENRÍQUEZ CARRASCO, E.V.- PARDO MUñOZ, J.M. (2002) "The role of pitch and tempo in Spanish emotional speech: Towards concatenative synthesis", in KELLER, E. - BAILLY, G.- MONAGHAN, A.- TERKEN, J.- HUCKVALE, M. (Eds.) Improvements in Speech Synthesis. Cost 258: The Naturalness of Synthetic Speech. Chichester: John Wiley & Sons. pp. 246-251.

MORA, E.- HIRST, D.- CAVÉ, C. (2000) "Développement et évaluation d’un systéme de synthése pour l’espagnol vénézuélien: projet et état d’avancement", Travaux Interdisciplinaires du Laboratoire Parole et Langage d’Aix-en-Provence 19: 91-98.

PÉREZ, J.C.- VIDAL, E. (1991) " Un sistema de conversión de texto a voz para el castellano", Boletín de la Sociedad Española para el Procesamiento del Lenguaje Natural 11: 197-208.

RODRÍGUEZ CRESPO, M.A. (1997) "Introducción a la conversión texto-voz", Philologia Hispalensis 11, 2: 177-192.

RODRÍGUEZ CRESPO, M.A.- ESCALADA SARDINA, J.G.- MACARRÓN LARUMBE, A.- MONZÓN SERRANO, L. (1993) "AMIGO: Un conversor texto-voz para el español", Boletín de la Sociedad Española para el Procesamiento del Lenguaje Natural 13: 389-400.

RODRÍGUEZ, CRESPO, M.A. - ESCALADA SARDINA, J.G.- MONZÓN SERRANO, L.- MACARRÓN LARUMBE, A. (1991) "Teoría y aplicaciones de la conversión texto-voz", Comunicaciones de Telefónica I+D, 2, 4.

SANTOS, A. - OLABE, J.C.- MUÑOZ MERINO, E.- LÓPEZ BARRIOS, C.- QUILIS, A.- MARTÍNEZ, M. (1985) "Sistema de conversión de texto a voz en español", Procesamiento del lenguaje natural, Boletín n. 3 : 21-28.

SANTOS, J.M.- NOMBELA, J.R. (1982) "Text to Speech Conversion in Spanish. A Complete Rule-based Synthesis System", in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Paris, 1982. pp. 1593-1596.

VIEREGGE, W.H.- KERKOF, P.A.M.- BOVES, L.- VAN GERWEN, R. (1987) "Automatic Text-to-speech Conversion for Spanish", Proceedings of the Institute of Phonetics, Catholic University of Nijmegen 11: 29-30.

up arrow

Catalan

BONAFONTE, A.- ESQUERRA, I,- FEBRER, A.- VALLVERDÚ, F. (1997) "A Bilingual Text-to-Speech System in Spanish and Catalan", in KOKKINAKIS, G.- FAKOTAKIS, N.- DERMATAS, E. (Eds.) Eurospeech’97. 5th European Conference on Speech Communication and Technology. Rhodes, Greece, 22-25 September 1997. Vol. 5. pp. 2455 - 2458.

BONAFONTE, A.- ESQUERRA, I.- FEBRER, A.- FONOLLOSA, J.A.- VALLVERDÚ, F. (1998) "The UPC Text-to-Speech System for Spanish and Catalan", in Proceedings of the 5th international conference on spoken language processing, ICSLP’98, Sydney, Australia, 30th November-4th December 1998
http://www.isca-speech.org/archive/icslp_1998/i98_1146.html

BONAFONTE, A. - FEBRER, A. (2000) Eines de conversió text-parla, Jornades del Centre de Referència en Enginyeria Lingüística (CREL), Institut d’Estudis Catalans, Barcelona, 4 i 5 d’abril de 2000.

CAMPS, J.- BAILLY, G.- MARTÍ, J. (1992) "Synthèse a partir du texte pour le catalan", Actes de 19èmes Journées du GEP, Bruxelles, 19-22 mai 1992. pp. 329-333.

LLISTERRI, J. (2001) "Las tecnologías del habla en lengua catalana", Simposio "As linguas minoritarias e as tecnoloxías da fala",VIII Conferencia Internacional de Linguas Minoritarias, Santiago de Compostela, 22 de novembro 2001.
http://liceu.uab.cat/~joaquim/publicacions/Llisterri_02_TecnolHabla_Catalan.pdf

MARTÍ, J. (1986) "SINCAT. El sintetitzador català de veu", Quaderns Tècnics 7: 13-20.

MARTÍ, J. (1987) " Un conversor text-veu en català: Sistema SINCAT (SINtetitzador de CATalà)" in MARTÍN VIDE, C. (Ed.) Lenguajes naturales y lenguajes formales II. Barcelona: PPU. pp. 197-209.

up arrow

Galician

Armenta, A., Escalada, J. G., Garrido, J. M., & Rodríguez Crespo, M. A. (2003). Conversor texto a voz multilingüe de Telefónica I+D. Procesamiento del Lenguaje Natural, 31, 331-332.

López Gonzalo, E., Villar, J. M., & Hernández Gómez, L. A. (2002). Automatic prosody modelling of Galician and its applications to Spanish. In E. Keller, G. Bailly, A. Monaghan, J. Terken, & M. Huckvale (Eds.), Improvements in speech synthesis. Cost 258: The naturalness of synthetic speech. (pp. 218-27). Chichester: John Wiley & Sons.

Rodríguez Crespo, M. A., Escalada, J. G., & Torre, D. (1998). Conversor texto-voz multilingüe para el español, catalán, gallego y euskera. Procesamiento del Lenguaje Natural, 23, 16-23.

Cotovía

Campillo, F., & Rodríguez Banga, E. (2005). Evaluación del modelado acústico y prosódico del sistema de conversión texto-voz Cotovía. Procesamiento del Lenguaje Natural, 35, 5-12.

Campillo, F., & Rodríguez Banga, E. (2006). A method for combining intonation modelling and speech unit selection in corpus-based speech synthesis systems. Speech Communication, 48(8), 941-956.

Campillo, F., van Santen, J. P. H., & Rodríguez Banga, E. (2006). A model for the f0 reset in corpus-based intonation approaches. In Interspeech 2006 - ICSLP. Proceedings of the 9th international conference on spoken language processing. (pp. 2362-5). Pittsburgh, PA, USA, September 17-21, 2006.

Fernández Rei, E. (1999). Tecnologías del habla y síntesis de voz en gallego. In X. Gómez Guinovart, A. Lorenzo, J. Pérez Guerra, & A. Ávarez Lugrís (Eds.), Panorama de la investigación en lingüística informática. Volumen monográfico de la Revista Española de Lingüística Aplicada. (pp. 103-16). Asociación Española de Lingüística Aplicada.

Fernández Rei, E., & González González, M. (1998). Un sintetizador de voz para el gallego. In G. Luquet (Ed.), Travaux de linguistique hispanique. (pp. 65-76). Paris: Presses de la Sorbonne Nouvelle.

Fernández Salgado, X., & Rodríguez Banga, E. (1999). Segmental duration modelling in a text-to-speech system for the Galician language. In Eurospeech 1999. Proceedings of the 6th european conference on speech communication and technology. (pp. 1635-8). Budapest, Hungary, September 5-9, 1999.

Fernández Salgado, X., & Rodríguez Banga, E. (2000). A hierarchical intonation model for synthesising F0 contours in Galician language. In ICSLP - Interspeech 2000. Proceedings of the 7th international conference on spoken language processing. (pp. 625-8). Beijing, China, October 16-20, 2000.

Fernández Salgado, X., & Rodríguez Banga, E. (2000). Proposición de un marco adecuado para el estudio de contornos de F0 para síntesis de voz. Procesamiento del Lenguaje Natural, 24, 175-182.

González González, M. (2004). A síntese de voz en lingua galega: o proxecto Cotovía. Revista Galega do Ensino, 44, 199-215.

González González, M., Losada, R., & Fernández Rei, E. (1999). O galego e as tecnoloxías da fala: o caso do sintetizador de voz. In Actas do V Congreso Internacional de Estudios Galegos. (pp. 703-16). Trier: Edicións do Castro - Galicien-Zentrum der Universität Trier.

Rodríguez Banga, E., Campillo, F., Fernández Rei, E., & Méndez, F. (2002). Sistema de conversión texto-voz en lengua gallega basado en la selección combinada de unidades acústicas y prosódicas. Procesamiento del Lenguaje Natural, 29, 153-158.

Rodríguez Banga, E., Fernández Salgado, X., Fernández Rei, E., & González González, M. (1998). Análisis lingüístico para un conversor texto-voz en lengua gallega. Novática. Revista de la Asociación de Técnicos de Informática, 133(Mayo - Junio), 40-45.

Rodríguez Banga, E., García Mateo, C., & Fernández Salgado, X. (2001). Concatenative text-to-speech synthesis based on sinusoidal modelling. In E. Keller, G. Bailly, A. Monaghan, J. Terken, & M. Huckvale (Eds.), Improvements in Speech Synthesis. COST 258: The Naturalness of Synthetic Speech. (pp. 52-63). Chichester: John Wiley & Sons.

Rodríguez Banga, E., Méndez, F., Campillo, F., Iglesias, G., & Docío, L. (2008). Descripción del sintetizador de voz Cotovía para la evaluación Albayzín TTS 2008. In I. Hernáez (Ed.), V Jornadas en Tecnología del Habla. (pp. 100-3). Bilbao: Universidad del País Vasco - Red Temática en Tecnologías del Habla.

up arrow

Basque

HERNÁEZ, I.- OLABE, J.C.- ETXEBERRIA, P.- ETXEBERRIA, B.- CUESTA, A. (1994) "Ahozka. Un sistema de conversión de texto a voz para el Euskara", Boletín de la Sociedad Española para el Procesamiento del Lenguaje Natural 14: 241-256.

up arrow

Portuguese

Armenta, A., Escalada, J. G., Garrido, J. M., & Rodríguez Crespo, M. A. (2003). Conversor texto a voz multilingüe de Telefónica I+D. Procesamiento del Lenguaje Natural, 31, 331-332. Retrieved November 9, 2008, from http://www.sepln.org/revistaSEPLN/revista/31/31-Pag331.pdf

Barbosa, P. A., Violaro, F., Albano, E., Simões, F., Aquino, P., Madureira, S., et al. (1999). Aiuruetê: A high-quality concatenative text-to-speech system for Brazilian Portuguese with demisyllabic analysis-based units and a hierarchical model of rhythm production. In Eurospeech 1999. Proceedings of the 6th european conference on speech communication and technology. (pp. 2059-62). Budapest, Hungary, September 5-9, 1999.

Barros, M. J., Maia, R., Tokuda, K., Resende Jr., F., & Freitas, D. (2005). HMM-Based European Portuguese TTS system. In Interspeech 2005 - Eurospeech. Proceedings of the 9th european conference on speech communication and technology. (pp. 2581-4). Lisbon, Portugal, September 4-8, 2005. Retrieved November 18, 2008, from http://www.isca-speech.org/archive/interspeech_2005/i05_2581.html

Campos, G. L. (1980). Síntese de voz para o idioma português. Tese de Doutorado. Escola Politécnica, Universidade de São Paulo.

Egashira, F. (1992). Síntese de voz a partir de texto. Dissertação de Mestrado. Universidade Estadual de Campinas.

Maia, R., Zen, H., Tokuda, K., Kitamura, T., & Resende Jr., F. (2003). Towards the development of a Brazilian Portuguese text-to-speech system based on HMM. In Eurospeech 2003. Proceedings of the 8th european conference on speech communication and technology. (pp. 2645-8). Geneva, Switzerland, September 1-4, 2003. Retrieved December 9, 2008, from http://www.sp.nitech.ac.jp/~zen/publications/maia-euro03.pdf

Oliveira, L., Viana, M. C., & Trancoso, I. (1991). DIXI - Portuguese text-to-speech system. In Eurospeech 1991. Proceedings of the 2nd european conference on speech communication and technology. (pp. 1239-42). Genova, Italy, September 24-26, 1991. Retrieved November 6, 2008, from http://www.inesc-id.pt/pt/indicadores/Ficheiros/3195.pdf

Oliveira, L., Viana, M. C., & Trancoso, I. (1993). DIXI: Sistema de síntese de fala a partir de texto para o portuguës. In EPLP 1993. Actas do 1o encontro de processamento da língua portuguesa escrita e falada. (pp. 153-8). Lisboa: INESC.

Paulo, S., Oliveira, L., Mendes, C., Figueira, L., Cassaca, R., Viana, M. C., et al. (2008). DIXI - A generic text-to-speech system for European Portuguese. In PROPOR 2008. Computational processing of the Portuguese language. Eighth International Conference, Aveiro, Portugal, September 8-10, 2008. (pp. 91-100). Heidelberg: Springer. Retrieved November 6, 2008, from http://www.inesc-id.pt/pt/indicadores/Ficheiros/5009.pdf

Silva, S., Resende Jr., F., & Netto, S. (2001). A text-to-speech system for the Brazilian Portuguese based on syllabic units. In Proceedings of the second IEEE South-American Workshop on Circuits and Systems. Retrieved November 8, 2008, from http://www02.smt.ufrj.br/~sergioln/papers/BC08.pdf

Simões, F. (1999). Implementação de um sistema de conversão texto-fala para o português do Brasil. Dissertação de Mestrado. Programa de Pós-Graduação em Engenharia Elétrica, Faculdade de Engenharia Elétrica e de Computação, Universidade Estadual de Campinas.

Teixeira, J. P., Freitas, D., Gouveia, P., Olászy, G., & Németh, G. (1998). MULTIVOX - conversor texto fala para português. In II Encontro para o Processamento Computacional da Língua Portuguesa Escrita e Falada. Porto Alegre, Brasil, Novembro de 1998. Retrieved December 6, 2008, from http://www.ipb.pt/~joaopt/publicacoes/artigos/MULTIVOX%20Conversor%20_%20PROPOR%2098.pdf

Torres, R., Seixas, J. M., Netto, S., Freitas, D., & Brasil, E. (2008). Portable implementation of a text-to-speech system for Portuguese. In Proceedings of the European Signal Processing Conference. Retrieved from https://www.eurasip.org/Proceedings/Eusipco/Eusipco2008/papers/1569105420.pdf

Violaro, F., Barbosa, P., Albano, E., & Françozo, E. (1996). Um conversor texto-fala para o português Brasileiro com processamento lingüístico de alta qualidade. In Anais do VII Simpósio Brasileiro de Microondas e Optoeletrônica, XV Simpósio Brasileiro de Telecomunicações. (pp. 361-6).

up arrow

Phonetic and linguistic modelling for text-to-speech systems


= Recommended introductory/general reading

Ainsworth, W. A. (2005). Can phonetic knowledge be used to improve the performance of speech recognisers and synthesisers? In W. J. Barry & W. A. van Dommelen (Eds.), The integration of phonetic knowledge in speech technology. (pp. 13-20). Dordrecht: Springer.

Barbosa, P. A. (2002). A construção de máquinas falantes como lugar de integração entre ciências e tecnologias de fala. Intercâmbio, 11, 189-194. Retrieved from http://www.unicamp.br/iel/site/docentes/plinio/intercambio11.pdf

Carlson, R. & Granström, B. (1991). Speech synthesis development and phonetic research - a personal introduction. Journal of Phonetics, 19(1), 3-8. Retrieved from http://www.speech.kth.se/~rolf/papers/wwjphonint.pdf

Carlson, R. (Ed). (1991). Speech synthesis and phonetics. Special issue. Journal of Phonetics, 19(1).

Fant, G. (1991). What can basic research contribute to speech synthesis? Journal of Phonetics, 19(1), 75-90.

Huckvale, M. (2002). Speech synthesis, speech simulation and speech science. In Interspeech 2002 - ICSLP. Proceedings of the 7th international conference on spoken language processing. (pp. 1261-4). Denver, Colorado, USA, September 16-20, 2002. Retrieved from http://www.phon.ucl.ac.uk/home/mark/papers/icslp02synth.pdf


Llisterri, J., Carbó, C., Machuca, M. J., Mota, C., Riera, M., & Ríos, A. (2004). La conversión de texto en habla: Aspectos lingüísticos. In M. A. Martí & J. Llisterri (Eds.), Tecnologías del texto y del habla. (pp. 145-86). Barcelona: Edicions de la Universitat de Barcelona - Fundación Duques de Soria. Retrieved from http://liceu.uab.cat/~joaquim/publicacions/Llisterri_Carbo_Machuca_Mota_Riera_Rios_04_Conversion_Texto_Habla.pdf

Mattingly, I. G. (1974). Speech synthesis for phonetic and phonological models. In T. A. Sebeok (Ed.), Current trends in linguistics. Vol 12: Linguistics and adjacent arts and sciences. Part 4. (pp. 2451-87). The Hague: Mouton.

Pols, L. C. W. & van Bezooijen, R. (1991). Gaining phonetic knowledge whilst improving synthetic speech quality? Journal of Phonetics, 19(1), 139-146.

Tatham, M. (1972). The role of phonetic synthesis in the development of phonetic theory. Occasional Papers, Department of Language and Linguistics, University of Essex, 12, 28-31. Retrieved from http://www.morton-tatham.co.uk/publications/to_1994/role%20of%20speech%20synthesis.pdf

van Santen, J. P. H. (2005). Phonetic knowledge in text-to-speech synthesis. In W. J. Barry & W. A. van Dommelen (Eds.), The integration of phonetic knowledge in speech technology. (pp. 149-66). Dordrecht: Springer.

Phonetic knowledge in speech technology

up arrow

Automatic phonetic transcription


= Recommended introductory/general reading

ALLEN, J.- HUNNICUTT, M.S.- KLATT, D.H. (with R.C. ARMSTRONG and D. PISONI) (1987) From Text to Speech: The MITalk System. Cambridge: Cambridge University Press (Cambridge Studies in Speech Science and Communication). [Ch. 6: Letter-to-Sound and Lexical Stress]

ANDERSEN, O.- DALSGAARD, P. (1995) "Multi-lingual testing of a self-learning approach to phonemic transcription of orthography", in Eurospeech’95. Proceedings of the 4th european conference on speech communication and technology. Madrid, Spain, 18-21 September, 1995. Vol 2. pp. 1117-1120.

BAGSHAW, P. (1998) "Phonemic transcription by analogy in text-to-speech synthesis: Novel word pronunciation and lexicon compression", Computer Speech and Language 12, 2: 119-142.

BAKAMIDIS, S.- CARAYANNIS, G. (1987) "PHONEMIA. A Phoneme Transcription System for Speech Synthesis in Modern Greek", Speech Communication 6,2: 159-170.

BELLEGARDA, J.R. (2005) "Unsupervised, language-independent grapheme-to-phoneme conversion by latent analogy", Speech Communication 46, 2: 140-152.
http://dx.doi.org/10.1016/j.specom.2005.03.002

BERENDSEN, E. (1986) "Phoneme to Grapheme Assignment for Various Purposes", Progress Report of the Institute of Phonetics, University of Utrecht 11,1: 17-24.

BERENDSEN, E. (1987) "Extensions in the UEL Grapheme to Phoneme Conversion System", Progress Report of the Institute of Phonetics, University of Utrecht 12,1: 16-22.

BERENDSEN, E.- DON, J. (1987) "Morphology and Stress in a Rule-Based Grapheme-to-Phoneme Conversion System for Dutch", in LAVER, J.- JACK, M.A. (Eds.) European Conference on Speech Technology. Edinburgh, September,1987. pp. 239-242.

BOULA DE MAREÜIL, P.- YVON, F.- D’ALESSANDRO, C.- AUBERGÉ, V.- BAGEIN, M- BAILLY, G.- BÉCHET, F.- FOUKIA, S..- GOLDMAN, J.-P. . KELLER, E.- O’SHAUHNESSY, D.- PAGEL, V.- SANNIER, F.- VÉRONIS, J.- ZELLNER, B. (1998) "Evaluation of Grapheme-to-Phoneme Conversion for Text-to-Speech Synthesis in French", in RUBIO, A.- GALLARDO, N.- CASTRO, R.- TEJADA, A. (Eds.) Proceedings of the First International Conference on Language Resources and Evaluation. May 28 - 30, 1998, Granada, Spain. European Language Resources Association. Vol. I. pp. 641-646.

BOVES, L.- BLOEMBERG, W.- SENDERS, W.- WILLEMSE, R. (1987) "Phoneme to Grapheme Conversion", Proceedings of the Institute of Phonetics, Catholic University of Nijmegen 11: 27-28.

BOVES, L.- SENDERS, W.- WESTER, J.- WILLEMSE, R. (1987) "Phoneme to Grapheme Conversion by Rules", in LAVER, J.- JACK, M.A. (Eds.) European Conference on Speech Technology. Edinburgh, September,1987. pp. 150-153.

COSI, P. (1987) "A Graph- Oriented Approach to the Grapheme-to-Phoneme Transcription of Italian Written Texts", in LAVER, J.- JACK, M.A. (Eds.) European Conference on Speech Technology. Edinburgh, September,1987. pp. 273-276.

COSI, P. (1987) "A Graph-Oriented Implementation of a Grapheme-to-Phoneme Transcriber for Italian", Speech Communication 6: 203-216.

DAELEMANS, W.M.P.- van den BOSCH, A.P.J. (1997) "Language-Independent Data-Oriented Grapheme-to-Phoneme Conversion", in van SANTEN, J.P.H. - SPROAT, R.W.- OLIVE, J.P.- HIRSCHBERG. J. (Eds.) Progress in Speech Synthesis. New York: Springer. pp. 77-90.

DAMPER, R.I.- EASTMOND, J.F.G. (1997) "Pronunciation by analogy: Impact on implementational choices on performance", Language and Speech 40,1: 1-23.

DEDINA, M.J.- NUSBAUM, H.C. (1991) "PRONOUNCE: a program for pronunciation by analogy", Computer Speech and Language 5,1: 55-64.

DIVAY, M.- VITALE, A.J. (1997) "Algorithms for grapheme-phoneme translation for English and French: Applications for database searches and speech synthesis", Computational Linguistics 23,4: 495-523.


DUTOIT, T. (1997) An Introduction to Text-to-Speech Synthesis. Dordrecht: Kluwer Academic Publishers (Text, Speech and Language Technology, 3). [Ch. 5: Automatic Phonetization]

ELOVITZ, H.S.- JOHNSON, R.- McHUGH, A.-SHORE, J.E. (1976) "Letter-to-Sound Rules for Automatic Translation of English Text to Phonetics", IEEE Transactions on Acoustics, Speech & Signal Processing ASSP- 24 :446-459.

FERRI, G.- PIERUCCI, P.- SANZONE, D. (1994) "An integrated morpho-syntactic analysis with phonetic transcription for an Italian text-to-speech system", in Conference Proceedings of the Second ESCA/IEEE Workshop on Speech Synthesis. September 12-15, 1994. Mohonk Mountain House, New Paltz, New York, USA. pp. 183-186.

HUNNICUTT, S. (1980) "Grapheme-to-Phoneme Rules: A Review", Speech Transmission Laboratory -Quarterly Progress and Status Report 2-3.

JEKOSCH, U. (1987) "Phoneme to Grapheme Conversion System of Unrestricted German Text" in LAVER, J.- JACK, M.A. (Eds.) European Conference on Speech Technology. Edinburgh, September,1987 .pp. 154-157.

KERKHOFF, J.- WESTER, J.- BOVES, L. (1984) "A Compiler for Implementing the Linguistic Phase of a Text-to-Speech Conversion System", Proceedings of the Institute of Phonetics of the Catholic University of Nijmegen 8: 60-69.

LAMMENS, J.M.G. (1987) "A Lexicon-Based Grapheme-to-Phoneme Conversion System" in LAVER, J.- JACK, M.A. (Eds.) European Conference on Speech Technology. Edinburgh, September,1987. pp. 281-285.

LAMMENS, J.M.G. (1987) "A Lexicon-Based Grapheme-to-Phoneme Conversion System for Dutch", Progress Report of the Institute of Phonetics, University of Utrecht 12,1: 23-31.

LAPORTE, E. (1988) Méthodes algoritmiques et lexicales de phonetisation de textes. Thèse doctorale. Centre d’études et de recherches en informatique linguistique, Université Paris 7.

LEEUWEN, H. van - BERENDSEN, E. - LANGEWEG, S. (1986) "Linguistics as an Input for a Flexible Grapheme-to-Phoneme Conversion System in Dutch",in International Conference on Speech Input/Output; Techniques and Applications. London: IEE pp. 200-205.

LUCAS, S.M.- DAMPER, R.I. (1992) "Syntactic neural networks for bi-directional text-phonetics translation", in BAILLY, G.- BENOÎT, C. (Eds.) (1992) Talking Machines. Theories, Models and Designs. Amsterdam: North-Holland / Elsevier Science Publishers. pp. 127-142.

MENG, H.- HUNICUTT, S.- SENEFF, S.- ZUE, V. (1996) "Reversible Letter-to-Sound / Sound-to-Letter Generation Based on Parsing Word Morphology", Speech Communication 18,1: 47-64.

MOLBAEK HANSEN, P. (1982) "The Construction of a Grapheme - to - Phone Algorithm for Danish", Annual Report of the Institute of Phonetics, University of Copenhaguen 16: 127-136.

MOLBAEK HANSEN, P. (1983) "An Orthography Normalizing Program for Danish", Annual Report of the Institute of Phonetics, University of Copenhaguen 17: 87-109.

PAGEL, V.- LENZO, K.- BLACK, A. (1998) Letter to Sound Rules for Accented Lexicon Compression. In Computation and Language E-Print Archive, Paper cmp-lg/9808010 (21 Agust 1998).

PRASAD, K.V.K.K.- LAMBA, T.S. (1986) "Experiments on Automatic Transliteration of Text into Phonetic Symbols", in International Conference on Speech Input/Output; Techniques and Applications. London: IEE pp. 206-209.

RNETZEPOPOULOS, P.A.- KOKKINAKIS, G. (1996) "Efficient multilingual phoneme-to-grapheme conversion based on HMM", Computational Linguistics 22,3: 351-376.

SULLIVAN, K.P.H.- DAMPER, R.I. (1992) "Novel word pronunciation within a text-to-speech system" , in BAILLY, G.- BENOÎT, C. (Eds.) (1992) Talking Machines. Theories, Models and Designs. Amsterdam: North-Holland / Elsevier Science Publishers. pp. 183-196.

SULLIVAN, K.P.H.- DAMPER, R.I. (1993) "Novel-word pronunciation: a cross-language study", Speech Communication 13, 3-4: 441-452.

VAN BAEL, C. - BOVES, L. - van den HEUVEL, H. - STRIK, H. (2007) "Automatic phonetic transcription of large speech corpora", in Computer Speech and Language 21, 4: 652-668.
http://dx.doi.org/10.1016/j.csl.2007.03.003

VENEZKY, R.L. (1966) "Automatic Spelling-to-Sound Conversion", in GARVIN, P.L.- SPOLSBY, P. (Eds.) Computation in Linguistics. Bloomington: Indiana University Press. pp. 146-161.

VITALE, T. (1991) "An algorithm for high accuracy name pronunciation by parametric speech synthesizer", Computational Linguistics 17: 257-276.

WILLIAMS, B. (1994) "Welsh letter-to-sound rules: rewrite rules and two-level rules compared", Computer Speech and Language 8,3: 261-277.

up arrow

Spanish automatic phonetic transcription

Almela, R. (1982). Division automatique des syllabes en espagnol. Cahiers de Lexicologie, 40, 77-94.

Bonaventura, P., Giuliani, F., Garrido, J. M., & Ortín, I. (1998). Grapheme-to-phoneme transcription rules for Spanish, with application to automatic speech recognition and synthesis. In S. Bergler (Ed.), Partially automated techniques for transcribing naturally occurring continuous speech. Proceedings of the workshop (COLING-ACL 98. 36th annual meeting of the Association for Computational Linguistics and 17th international conference on Computational Linguistics) (pp. 33-39). Montreal, Quebec, Canada. 16 August, 1998. Retrieved from http://aclweb.org/anthology/W/W98/W98-0804.pdf

Cabrera, C., Contini, M., & Boë, L. J. (1991). La phonétisation du castillan. In ICPhS 1991. Actes du 12ème congrès international de sciences phonétiques. (pp. 114-7). Aix-en-Provence: Université de Provence, Service des Publications.

Castro, M. J., España, S., Marzal, A., & Salvador, I. (2001). Transcriptor ortográfico-fonético para el castellano. Procesamiento del Lenguaje Natural, 27, 241-246. Retrieved from http://www.sepln.org/revistaSEPLN/revista/27/27-articulo28.pdf

Cuétara, J. O. (2004). Fonética de la ciudad de México. Aportaciones desde las tecnologías del habla. Tesis para obtener el título de Maestro en Lingüística Hispánica. Maestría en Lingüística Hispánica, Posgrado en Lingüística, Universidad Nacional Autónoma de México. Retrieved from http://turing.iimas.unam.mx/~luis/DIME/publicaciones/tesis/Cuetara_Tesis_MLH-UNAM.pdf

Enríquez, E. (1991). El problema de las ambigüedades fonéticas y su tratamiento automático. Boletín de la Real Academia Española, 71(252), 157-183.

Enríquez, E., & Casado, C. (1991). Hacia un algoritmo para la conversión automática de fonema en grafema en español. Anuario de Lingüística Hispánica, 7, 151-204.

Espinoza, P. A. (2007). Sistematización del fenómeno de silabicación en el corpus DIME para su aplicación en las tecnologías del habla. Tesis para obtener el título de Licenciada en Lenguas y Literaturas Hispánicas. Licenciatura en Lenguas y Literaturas Hispánicas, Universidad Nacional Autónoma de México. Retrieved from http://turing.iimas.unam.mx/~luis/DIME/publicaciones/tesis/tesis_pdf_ale.pdf

Garrido, J. M., Laplaza, Y., Marquina, M., Schoenfelder, C., & Rustullet, S. (2012). TexAFon: A multilingual text processing tool for text-to-speech applications. In IberSpeech 2012. VII jornadas en Tecnología del Habla and III Iberian SLTech Workshop (pp. 281-289). Escuela Politécnica Superior, Universidad Autónoma de Madrid. 21-23 November, 2012. Retrieved from http://lorien.die.upm.es/~lapiz/rtth/JORNADAS/VII/IberSPEECH2012_OnlineProceedings.pdf

Howard, H., & Goldman, R. P. (1994). From text to syllable in Castilian. Procesamiento del Lenguaje Natural, 15. Retrieved from http://www.sepln.org/revistaSEPLN/revista/15/grupo3-3.pdf

Llisterri, J., & Mariño, J. B. (1993). Spanish adaptation of SAMPA and automatic phonetic transcription. SAM-A/UPC/001/v1. ESPRIT project 6819 (SAM-A Speech Technology Assessment in Multilingual Applications). Retrieved from http://liceu.uab.cat/~joaquim/publicacions/SAMPA_Spanish_93.pdf

López Morràs, X. (2004). Transcriptor fonético automático del español [Web page]. Retrieved from http://www.aucel.com/pln/transbase.html

Monzo, C., Alías, F., Morán, A., & Gonzalvo, X. (2006). Transcripción fonética de acrónimos en castellano utilizando el algoritmo C4.5. Procesamiento del Lenguaje Natural, 37, 275-282. Retrieved from http://www.sepln.org/revistaSEPLN/revista/37/34.pdf

Moreno, A., & Mariño, J. B. (1998). Spanish dialects: Phonetic transcription. In ICSLP 1998. Proceedings of the 5th international conference on spoken language processing. Sidney Convention Centre, Sidney, Australia, 30 November - 4 December, 1998. Retrieved from http://www.isca-speech.org/archive/icslp_1998/i98_0598.html

Olivier, A., & Kirschning, I. (1999). Evaluación de métodos de determinación automática de una transcripción fonética. In ENC 1999. Segundo encuentro nacional de computación. Pachuca, Hidalgo, México. Retrieved from http://ict.udlap.mx/people/ingrid/ingrid/ENC99_409.pdf

Pérez, H. E., & Armstrong, T. (1998). Diseño e implementación de un transcriptor fonético automático de textos generales del español. Onomázein, 3, 315-324. Retrieved from http://onomazein.letras.uc.cl/Articulos/3/N6_Perez.pdf

Pérez Gutiérrez, J. A., & Guerrero Pérez, J. L. (1993). Transfon: Transcriptor fonético para el castellano. In C. Martín Vide (Ed.), Lenguajes naturales y lenguajes formales IX. Actas del IX congreso de lenguajes naturales y lenguajes formales. (pp. 227-36). Barcelona: Promociones y Publicaciones Universitarias.

Polyákova, T., & Bonafonte, A. (2008). Transcripción fonética en un entorno plurilingüe. In V Jornadas en tecnología del habla. (pp. 207-10). Bilbao: Universidad del País Vasco - Red Temática en Tecnologías del Habla. Retrieved from http://lorien.die.upm.es/~lapiz/rtth/JORNADAS/V/pdfs/articulo/art_51.pdf

Ríos, A. (1993). La información lingüística en la transcripción fonética automática del español. Procesamiento del Lenguaje Natural, 13, 381-387. Retrieved from http://liceu.uab.cat/publicacions/Rios_93_Transcripcion_Fonetica_Automatica_Espanol.pdf

Ríos, A. (1994). El contenido fónico en el sistema de diccionarios electrónicos del español. In J. Llisterri & D. Poch (Eds.), Nuevos horizontes de la lingüística aplicada. Actas del XII congreso nacional de la Asociación Española de Lingüística Aplicada. (pp. 333-40). Barcelona.

Ríos, A. (1996). Un alfabeto fonético del español para usos informáticos. Lingüística. Publicación Anual de la Asociación de Lingüística y Filología de la América Latina, 8, 237-244. Retrieved from http://elies.rediris.es/elies16/Rios96.html

Ríos, A. (1999). La transcripción fonética automática del diccionario electrónico de formas simples flexivas del español: Un estudio fonológico en el léxico. Estudios de Lingüística Española, 4. Retrieved from http://elies.rediris.es/elies4/

Ríos Mestre, A. (1998). La transcripción fonética automática del diccionario electrónico de formas simples flexivas del español: Un estudio fonológico en el léxico. Tesis doctoral, Departament de Filologia Espanyola, Universitat Autònoma de Barcelona.

Rodríguez Crespo, M. A., & Escalada, J. G. (1990). Text analysis system with automatic letter to allophone conversion for a Spanish text to speech synthesizer. In SSW1-1990. Proceedings of the ESCA workshop on speech synthesis. (pp. 105-8). Autrans, France, September 25-28, 1990. Retrieved from http://www.isca-speech.org/archive_open/ssw1/ssw1_105.html

San-Segundo, R., Montero, J. M., Córdoba, R., & Gutiérrez Arriola, J. M. (2000). Stress assignment in Spanish proper names. In Interspeech 2000 - ICSLP. Proceedings of the 6th international conference on spoken language processing. (pp. 346-9). Beijing, China, October 16-20, 2000. Retrieved from http://www-gth.die.upm.es/research/documentation/AG-06Str-00.pdf

Subirats, C., Llisterri, J., & Poch, D. (1988). El diccionario electrónico del español con un conversor de texto a voz. In C. Martín Vide (Ed.), Lenguajes naturales y lenguajes formales III.1. Actas del III congreso de lenguajes naturales y lenguajes formales. (pp. 341-56). Barcelona: Promociones y Publicaciones Universitarias. Retrieved from http://liceu.uab.cat/~joaquim/publicacions/Subirats_Llisterri_Poch_88_Diccionario_Conversor.pdf

Uraga, E., & Pineda, L. A. (2002). Automatic generation of pronunciation lexicons for Spanish. In A. Gelbukh (Ed.), CICLing 2002. Computational linguistics and intelligent text processing. Proceedings of the third international conference. (pp. 330-8). Heidelberg: Springer. Retrieved from https://link.springer.com/chapter/10.1007/3-540-45715-1_34

up arrow

Catalan automatic phonetic transcription

Garrido, J. M., Laplaza, Y., Marquina, M., Schoenfelder, C., & Rustullet, S. (2012). TexAFon: A multilingual text processing tool for text-to-speech applications. In IberSpeech 2012. VII jornadas en Tecnología del Habla and III Iberian SLTech Workshop (pp. 281-289). Escuela Politécnica Superior, Universidad Autónoma de Madrid. 21-23 November, 2012. Retrieved from http://lorien.die.upm.es/~lapiz/rtth/JORNADAS/VII/IberSPEECH2012_OnlineProceedings.pdf

PACHÉS LEAL, P. (1999) Improved Modelling for Robust Speech Recognition. Tesi Doctoral. Departament de Teoria del Senyal i Comunicacions, Universitat Politècnica de Catalunya. [Ch. 6: The Segre Automatic Transcriber]

PACHÈS, P.- DE LA MOTA, C.- RIERA, M.- PEREA, M.P.- FEBRER, A.- ESTRUCH, M.- GARRIDO, J.M.- MACHUCA, M.J.- RÍOS, A.- LLISTERRI, J.- ESQUERRA, I.- HERNANDO, J.- PADRELL, J.- NADEU, C. (2000) "Segre: An automatic tool for grapheme-to-allophone transcription in Catalan", in Proceedings of the Workshop on Developing Language Resources for Minority Languages: Reusability and Strategic Priorities (LREC-2000 Second International Conference on Language Resources and Evaluation). Athens, Greece, 30 May 2000. pp. 52-61.
http://liceu.uab.cat/~joaquim/publicacions/Paches_et_al_00_SEGRE_Phonetic_Transcription_Catalan.pdf

up arrow

Portuguese automatic phonetic transcription

Albano, E. & Moreira, A. (1996). Archisegment-based letter-to-phone conversion for concatenative speech synthesis in Portuguese. In ICLSP 1996. Proceedings of the 4th international conference on spoken language processing. (pp. 1708-11). Philadelphia, PA, USA, October, 3-6, 1996.

Barbosa, F., Pinto, G., Resende Jr., F., Gonçalves, C. A., Monserrat, R., & Rosa, M. C. (2003). Grapheme-Phone transcription algorithm for a brazilian Portuguese TTS. In N. Mamede, J. Baptista, I. Trancoso, & M. G. Nunes (Eds.), PROPOR 2003. Computational Processing of the Portuguese language. Sixth international workshop, faro, portugal, june 26-27, 2003. Proceedings. (pp. 23-30). Heidelberg: Springer.

Barros, M. J. & Weiss, C. (2006). Maximum entropy motivated grapheme-to-phoneme, stress and syllable boundary prediction for Portuguese text-to-speech. In L. Buera, E. Lleida, A. Miguel, & A. Ortega (Eds.), IV Jornadas en Tecnología del Habla. (pp. 177-82). Zaragoza: Universidad de Zaragoza - Red Temática en Tecnologías del Habla. Retrieved November 18, 2008, from http://lorien.die.upm.es/~lapiz/rtth/JORNADAS/IV/4jth_cdrom.html

Braga, D., Coelho, L., & Resende Jr., F. (2007). Homograph ambiguity resolution in front-end design for Portuguese TTS systems. In Interspeech 2007. Proceedings of the 8th Annual Conference of the International Speech Communication Association. (pp. 1761-4). Antwerp, Belgium, August 27-31, 2007. Retrieved November 28, 2008, from http://www.isca-speech.org/archive/interspeech_2007/i07_1761.html

Candeias, S. & Perdigão, F. (2008). Conversor de grafemas para fones baseado em regras para português. In L. Costa, N. Cardoso, & D. Santos (Eds.), Linguateca: 10 anos / Actas do Encontro na Curia. Linguateca. Retrieved November 6, 2008, from http://www.linguateca.pt/Linguateca10anos/ResumosAlargados/CandeiasPerdigaoL10.pdf

Oliveira, C., Moutinho, L., & Teixeira, A. (2004). Un novo sistema de conversão grafema-fone para PE baseado em transdutores. In Actas do III Congresso Internacional de Fonética e Fonologia. Maranhão, Brasil.

Oliveira, C., Moutinho, L., & Teixeira, A. (2005). On European Portuguese automatic syllabification. In Interspeech 2005 - Eurospeech. Proceedings of the 9th european conference on speech communication and technology. (pp. 2933-6). Lisbon, Portugal, September 4-8, 2005. Retrieved November 18, 2008, from https://pdfs.semanticscholar.org/6576/1ccd8cadc876524dcd5c5461cb2c32a57899.pdf

Trancoso, I., Viana, M. C., Silva, F., Marques, G., & Oliveira, L. (1994). Rule-Based vs neural-network based approaches to letter-to-phone conversion for Portuguese common and proper names. In ICSLP 1994. Proceedings of the 3rd international conference on spoken language processing. (pp. 1767-70). Yokohama, Japan, September 18-22, 1994. Retrieved December 9, 2008, from the ISCA Archive database, http://www.isca-speech.org/archive/icslp_1994/i94_1767.html

up arrow

Segmental acoustic phonetic information

FANT, C.G. (1991) "What can basic research contribute to speech synthesis?", Journal of Phonetics 19,1: 75-90.

HERTZ, S. R. (1991) "Streams, Phones and transitions: toward a new phonological and phonetic model of formant timing", Journal of Phonetics 19,1: 91-110.

HESS, W.J. (1995) "Improving the quality of speech synthesis systems at segmental level", in SORIN, C.- MARIANI, J.- MELONI, H.- SCHOENTGEN, J. (Eds.) Levels in Speech Communication. Relations and Interactions. A Tribute to Max Wajskop / Hommage à Max Wajskop. Amsterdam: Elsevier Science B.V. pp. 239-248.

HUCKVALE, M.A. "Modelling acoustic and phonetic variability of Speech", in International Conference on Speech Input/Output; Techniques and Applications. London: IEE Conference Publication 258, 1986. pp. 54-58.

LADEFOGED, P. (1985) "The Phonetic Basis for Computer Speech Generation", in F. FALLSIDE - W.A. WOODS (Eds.) (1985) Computer Speech Processing. Englewood Cliffs, N.J. : Prentice Hall International. pp. 3-27.

MARRERO, V.- DE SANTOS, A. (1994) "Estudios de fonética acústica y síntesis del habla", in Actas del X Congreso de la Sociedad Española para el Procesamiento del Lenguaje Natural, Córdoba, 20-22 de julio de 1994.

MATTINGLY, I. G. (1974) "Speech Synthesis for Phonetic and Phonological Models", in T.A. SEBEOK (Ed.) Current Trends in Linguistics, vol 12, Linguistics and Adjacent Arts and Sciences, vol 4. Mouton: The Hague. pp. 2451-2487.

MILLER, C. (1998) Pronunciation modeling in speech synthesis. PhD Thesis. Department of Linguistics, University of Pennsylvania.
http://repository.upenn.edu/ircs_reports/55/

MILLER, C.- KARAALI, O.- MASSEY, N. (1997) Variation and Synthetic Speech. Report-no: Motorola-SSML-1. In Computation and Language E-Print Archive, Paper cmp-lg/9711004 (17 November 1997).
http://xxx.lanl.gov/abs/cmp-lg/9711004

POLS, L.C.W. - VAN BEZOOIJEN, R. (1991) "Gaining Phonetic Knowledge whilst improving synthetic speech quality ?", Journal of Phonetics 19,1: 139-146.

SORIN, C. (1991) "Some observations on the processing of mute "e" in a French diphone-based speech synthesis system", Journal of Phonetics 19,1: 147-160.

VAGGES, K.- COSI, P. (1990) "Coarticolazione e sintesi della voce", Quaderni del Centro di Studio per le Ricerche di Fonetica 9: 639-647.

up arrow

Prosodic modelling

BAILLY, G.- HOLM, B. (2005) "SFC: A trainable prosodic model", Speech Communication 46: 348-364.
http://dx.doi.org/10.1016/j.specom.2005.04.008

BATLINER, A.- MÖBIUS, B. (2005) "Prosodic models, automatic speech understanding, and speech synthesis: Towards the common ground?", in BARRY, W.J.- van DOMMELEN, W.A. (Eds.) The Integration of Phonetic Knowledge in Speech Technology. Dordrecht: Springer (Text, Speech and Language Technology, 25). pp. 21-44.

BRUCE, G.- GRANSTRÖM, B. (1993) "Prosodic modelling in Swedish speech synthesis", Speech Communication 13, 1-2: 63-74.

CAELEN-HAUMONT, G. (1994) "Semantic and Pragmatic Prediction of Prosodic Structures", in KELLER, E. (Ed.) Fundamentals of Speech Synthesis and Speech Recognition. Basic Concepts, State of the Art and Future Challenges. Chichester: John Wiley & Sons. pp. 271-296.

CAMPILLO, F.- RODRÍGUEZ, E. (2005) "Evaluación del modelado acústico y prosódico del sistema de conversión texto-voz Cotovía", Procesamiento del Lenguaje Natural (Actas del XXI Congreso de la Sociedad Española para el Procesamiento del Lenguaje Natural. Universidad de Granada, 14-16 de septiembre de 2005), Revista nº 35: 5-12.

CHU, M.- ZHAO, Y.- CHANG, E. (2006) "Modeling stylized invariance and local variability of prosody in text-to-speech synthesis", Speech Communication 48, 6: 716-726.
http://dx.doi.org/10.1016/j.specom.2005.10.003

DI CRISTO, A.- DI CRISTO, Ph.- CAMPIONE, E.- VÉRONIS, J. (2000) "A prosodic model for text-to-speech synthesis in French", in BOTINIS, A. (Ed.) Intonation: Analysis, Modelling and Technology. Dordrecht: Kluwer Academic Publishers (Text, Speech and Language Technology, 15). pp. 321-356.

DIRKSEN, A.- COLEMAN, J.S. (1997) "All-Prosodic Speech Synthesis", in van SANTEN, J.P.H. - SPROAT, R.W.- OLIVE, J.P.- HIRSCHBERG. J. (Eds.) Progress in Speech Synthesis. New York: Springer. pp. 91-108.

ESCUDERO, D. - CARDEÑOSO, V. (2007) "Applying data mining techniques to corpus based prosodic modeling", Speech Communication 49, 3: 213-229.
http://dx.doi.org/10.1016/j.specom.2007.01.008

ESCUDERO, D.- CARDEÑOSO, V.- BONAFONTE, A. (2003) "Experimental evaluation of the relevance of prosodic features in Spanish using machine learning techniques", in Eurospeech 2003. 8th European Conference on Speech Communication and Technology. 1-4 September 2003, Geneva, Switzerland.
http://www.isca-speech.org/archive/eurospeech_2003/e03_2309.html

ESCUDERO, D.- GONZÁLEZ, C.- CARDEÑOSO, V. (2002) "Quantitative evaluation of relevant prosodic factors for text-to-speech synthesis in Spanish", in ICLSP 2002, Proceedings of the International Conference on Spoken Language Processing. Casual Productions. pp. 1165-1168.
http://www.infor.uva.es/~descuder/investig/pdfs/icslp2002.pdf

EMERARD, F.- MORTAMET, L.- COZANNET, A. (1992) "Prosodic processing in a text-to-speech system using a database and learning procedures", in BAILLY, G.- BENOÎT, C. (Eds.) Talking Machines. Theories, Models and Designs. Amsterdam: North-Holland / Elsevier Science Publishers. pp. 225-254.

FACKRELL, J.- VEREECKEN, H.- GROVER, C.- MARTENS; J.P.- van COILE, B. (2002) "Corpus-based development of prosodic models across six languages", in KELLER, E. - BAILLY, G.- MONAGHAN, A.- TERKEN, J.- HUCKVALE, M. (Eds.) Improvements in Speech Synthesis. Cost 258: The Naturalness of Synthetic Speech. Chichester: John Wiley & Sons. pp. 176-185.

GARRIDO, J.M.- ORTÍN, I.- QUAZZA, S.- SALZA, P.L.- MANCINI, F. (2000) "Desarrollo de un módulo de asignación de parámetros prosódicos para la versión en español del sistema de conversión texto-habla ACTOR®", Procesamiento del Lenguaje Natural, Revista n. 26: 183-190.
http://liceu.uab.cat/publicacions/Garrido_et_al_00_Prosodia_Sintesis_Actor.pdf

GUAÏTELLA, I.- SANTI, S. (1992) "The punctuation and perception of read and spontaneous prosody: An application to speech synthesis", in BAILLY, G.- BENOÎT, C. (Eds.) Talking Machines. Theories, Models and Designs. Amsterdam: North-Holland / Elsevier Science Publishers. pp. 351-366.

HIROSE, K.- HIRST, D.- SAGISAKA, Y. (Eds.) (2005) Quantitative Prosody Modelling for Natural Speech Description and Generation. Special issue. Speech Communication 46.

HORNE, M.- FILIPSON, M. (1995) "Developing the Prosodic Component for Swedish Speech Synthesis", in Eurospeech’95. Proceedings of the 4th European Conference on Speech Communication and Technology. Madrid, Spain, 18-21 September, 1995. Vol 1, pp. 611-614.

HORNE, M.- FILIPSON, M.- JOHANSSON, Ch.- LJUNGQVIST, M.- LINDSTRÖM, A. (1993) "Improving the Prosody in TTS Systems: Morphological and Lexical-Semantic Methods for Tracking ’New’ vs. ’Given’ Information", in HOUSE, D.- TOUATI, P. (Eds.) Proceedings of an ESCA Workshop on Prosody. September 27-29, 1993, Lund, Sweden. Lund University Department of Linguistics and Phonetics, Working Papers 41. pp. 298-211.

ISARD, S.D. (1985) "Speech Synthesis and the Rythm of English", in FALLSIDE, F. - WOODS, W.A. (Eds.) Computer Speech Processing. Englewood Cliffs, N.J. : Prentice Hall International. pp. 479-489.

KERHOFF, J.- RIETVELD, T. (1995) "The Generation of Prosody in the Nijmegen Rule Oriented Speech Synthesis System", in Eurospeech’95. Proceedings of the 4th european conference on speech communication and technology. Madrid, Spain, 18-21 September, 1995. Vol 3, pp. 1831-1834.

KOCHANSKI, G.- SHIH, C. (2003) "Prosodic modeling with soft templates", Speech Communication 39, 3-4: 311-352.

KOHLER, K.J. (1991) "Prosody in speech synthesis: the interplay between basic resarch and TTS application", Journal of Phonetics 19,1: 121-138.

KOHLER, K.J. (1997 "Parametric Control of Prosodic Variables by Symbolic Input in TTS Synthesis", in van SANTEN, J.P.H. - SPROAT, R.W.- OLIVE, J.P.- HIRSCHBERG. J. (Eds.) Progress in Speech Synthesis. New York: Springer. pp. 459-476.

LARREUR, D.- EMERARD, F.- MARTY, F. "Linguistic and prosodic processing for a text-to-speech synthesis system", in TUBACH, J.P.- MARIANI, J.J. (Eds.) Eurospeech 89. european conference on speech communication and technology. Paris, September 1989. Edinburgh: CEP Consultants Ltd. vol 1 pp. 510-513.

LAVER, J. (1993) "Repetition and re-start strategies for prosody in text-to-speech conversion systems", Speech Communication 13, 1-2: 75-85.

LLISTERRI, J.- MACHUCA, M. J.- de la MOTA, C.- RIERA, M.- RÍOS, A. (2003) "Entonación y tecnologías del habla", in PRIETO, P. (Ed.) Teorías de la entonación. Barcelona: Ariel (Ariel Lingüística). pp. 209-243.
http://liceu.uab.cat/~joaquim/publicacions/Llisterri_Machuca_Mota_Riera_Rios_03_Entonacion_Tecnologias_Habla.pdf

LÓPEZ, E. (1993) Estudio de técnicas de procesado lingüístico y acústico para sistemas de conversión texto-voz en español basados en concatenación de unidades. Tesis doctoral. E.T.S. I. de Telecomunicación, Universidad Politécnica de Madrid.
http://www.gaps.ssr.upm.es/images/eduardo/TesisEdu.ps.rar

LÓPEZ, E.- ÁLVAREZ, J.- HERNÁNDEZ, L. (1994) "Metodología para el modelado prosódico de un sistema de conversión de texto a habla en castellano", Actas del X Congreso de la Sociedad Española para el Procesamiento del Lenguaje Natural, Córdoba, 20-22 de julio de 1994.
http://www.sepln.org/revistaSEPLN/revista/15/gen.pdf

LÓPEZ, E.- HERNÁNDEZ, L.A. (1993) "Prosodic Modelling for a Text-to-Speech System in Spanish", in HOUSE, D.- TOUATI, P. (Eds.) Proceedings of an ESCA Workshop on Prosody. September 27-29, 1993, Lund, Sweden. Lund University Department of Linguistics and Phonetics, Working Papers 41. pp. 74-77.

LÓPEZ, E.- HERNÁNDEZ, L.A. (1995) "Automatic Data-Driven Prosodic Modeling for Text to Speech", in Eurospeech’95. Proceedings of the 4th European Conference on Speech Communication and Technology. Madrid, Spain, 18-21 September, 1995. Vol 1, pp. 585-588.
http://www.gaps.ssr.upm.es/images/docs/EUROS95.ps

LÓPEZ, E.- RODRÍGUEZ, E.- GARCÍA, C.- HERNÁNDEZ, L. (1994) "Modelado lingüístico y acústico para un sistema de conversión de texto a habla", Boletín de la Sociedad Española para el Procesamiento del Lenguaje Natural 14: 257-272.
http://www.sepln.org/revistaSEPLN/revista/14/14-Pag257.pdf

LÓPEZ, E.- RODRÍGUEZ, J.M. (1996) "Statistical Methods in Data Driven Modeling of Spanish Prosody for Text-to-Speech", in ICSLP 96, The Fourth International Conference on Spoken Language Processing. October 3 - 6, Wyndham Franklin Plaza Hotel, Philadelphia, PA, USA.
http://www.gaps.ssr.upm.es/images/docs/ICSLP96.ps

LÓPEZ, E.- RODRÍGUEZ, J.M.- HERNÁNDEZ, L.- VILLAR, J.M. (1997) "Automatic Corpus-Based Training of Rules for Prosodic Generation in Text-to-Speech", in Eurospeech’97. Proceedings of the 5th European Conference on Speech Communication and Technology. Rhodes, Greece, 22-25 September 1997.
http://www.gaps.ssr.upm.es/images/docs/eurospeech97.ps

LÓPEZ, E.- RODRÍGUEZ, J.M.- HERNÁNDEZ, L.- VILLAR, J-M. (1997) "Automatic Prosodic Modeling for Speaker and Task Adaptation in Text-to-Speech", in ICASSP 97, International Conference on Acoustics, Speech and Signal Processing.
http://www.gaps.ssr.upm.es/images/docs/ICASSP97.ps

LÓPEZ GONZALO, E.- VILLAR NAVARRO, J.M.- HERNÁNDEZ GÓMEZ, L.A. (2002) "Automatic prosody modelling of Galician and its applications to Spanish", in KELLER, E. - BAILLY, G.- MONAGHAN, A.- TERKEN, J.- HUCKVALE, M. (Eds.) Improvements in Speech Synthesis. Cost 258: The Naturalness of Synthetic Speech. Chichester: John Wiley & Sons. pp. 218-227.

MARTÍ, J.- GUDAYOL, F. (1994) "Ritmo y entonación en la lectura del castellano", Boletín de la Sociedad Española para el Procesamiento del Lenguaje Natural 14: 273-290.
http://www.sepln.org/revistaSEPLN/revista/14/14-Pag273.pdf

MATTINGLY, I. G. (1966) "Synthesis by Rule of Prosodic Features", Language and Speech 9 : 1-13.

McKEOWN, K.- PAN, S. (2000) "Prosodic modelling in concept-to-speech generation: methodological issues", in SPARCK JONES, K.- GAZDAR, G.- NEEDHAM, R. (Eds.) Computers, language and speech: Formal theories and statistical Data. Papers from a Royal Society / British Academy Discussion Meeting, September 1999. London: The Royal Society (Philosophical Transactions of the Royal Society, Series A: Mathematical, Physical en Engineering Sciences, Vol. 358, Issue 1769).

MIXDORFF, H. (2002) "MFGI, a linguistically motivated quantitative model of German prosody", in KELLER, E. - BAILLY, G.- MONAGHAN, A.- TERKEN, J.- HUCKVALE, M. (Eds.) Improvements in Speech Synthesis. Cost 258: The Naturalness of Synthetic Speech. Chichester: John Wiley & Sons. pp. 134-143.

MONAGHAN, A. (1990) "Rhythm and stress in speech synthesis", Computer Speech and Language 4: 71-78.

MONAGHAN, A. (2002) "State-of-the art summary of European synthetic prosody R&D", in KELLER, E. - BAILLY, G.- MONAGHAN, A.- TERKEN, J.- HUCKVALE, M. (Eds.) Improvements in Speech Synthesis. Cost 258: The Naturalness of Synthetic Speech. Chichester: John Wiley & Sons. pp. 93-103.

MONAGHAN, A. (2002) "Prosody in synthetic speech: Problems, solutions and challenges", in KELLER, E. - BAILLY, G.- MONAGHAN, A.- TERKEN, J.- HUCKVALE, M. (Eds.) Improvements in Speech Synthesis. Cost 258: The Naturalness of Synthetic Speech. Chichester: John Wiley & Sons. pp. 89-92.

PARDO, J.M.- GIMÉNEZ DE LOS GALANES, F.M.- VALLEJO, J.A.- BERROJO, M.A.- MONTERO, J.M.- ENRÍQUEZ, E.- ROMERO, A. (1995) "Spanish text-to-speech, from prosody to acoustics", in Proceedings of the International Congress of Acoustics. Trondheim, Norway, 1995. pp. 133-136.

PARDO, J.M.- MARTÍNEZ, M.- QUILIS, A.- MUÑOZ, E. (1987) "Improving Text to Speech Conversion in Spanish: Linguistic Analysis and Prosody ", in LAVER, J.- JACK, M.A. (Eds.) European Conference on Speech Technology. Edinburgh, September 1987. Edinburgh: CEP Consultants Ltd. pp. 173-178.

PIERREHUMBERT, J. (2006) "Prosody, intonation and speech technology", in BATES, M. - WEISCHEDEL, R. M. (Eds.) Challenges in Natural Language Processing. Cambridge: Cambridge University Press (Studies in Natural Language Processing). pp. 257-282.

PORTELE, T.- HEUFT, B. (1997) "Towards a prominence-based synthesis system", Speech Communication 21, 1-2: 61-72.

QUAZZA, S.- SALZA, P.L.- SANDRI, S.- SPINI, A. (1993) "Prosodic Control in a Text-to-Speech System for Italian", in HOUSE, D.- TOUATI, P. (Eds.) Proceedings of an ESCA Workshop on Prosody. September 27-29, 1993, Lund, Sweden. Lund University Department of Linguistics and Phonetics, Working Papers 41. pp. 78-81.

SAGISAKA, Y.- YAMASHITA, T.- KOKENAWA, Y. (2005) "Generation and perception of F0 markedness for communicative speech synthesis", Speech Communication 46: 376-384.
http://dx.doi.org/10.1016/j.specom.2005.03.017

SANTEN, J.H.P., van (1997) "Prosodic modelling in text-to-speech synthesis", in KOKKINAKIS, G.- FAKOTAKIS, N.- DERMATAS, E. (Eds.) Eurospeech’97. 5th european conference on speech communication and technology. Rhodes, Greece, 22-25 September 1997. Vol. 1. pp. KN-18 - KN-28.

SANTEN, J. van - KAIN, A.- KLABBERS, E.- MISHRA, T. (2005) "Synthesis of prosody using multi-level unit sequences", Speech Communication 46: 365-375.
http://dx.doi.org/10.1016/j.specom.2005.01.008

SIEBENHAAR, B.- ZELLNER KELLER, B.- KELLER, R. (2002) "Phonetic and timing considerations in a Swiss High German TTS system", in KELLER, E. - BAILLY, G.- MONAGHAN, A.- TERKEN, J.- HUCKVALE, M. (Eds.) Improvements in Speech Synthesis. Cost 258: The Naturalness of Synthetic Speech. Chichester: John Wiley & Sons. pp. 165-175.

STENSBY, S.- HORVEI, B.- OTTESEN, G.E. (1993) "Lexicon and prosodic structure in a text-to-speech system", in HOUSE, D.- TOUATI, P. (Eds.) Proceedings of an ESCA Workshop on Prosody. September 27-29, 1993, Lund, Sweden. Lund University Department of Linguistics and Phonetics, Working Papers 41. pp. 82-85.

TERKEN, J.- COLLIER, R. (1995) "The Generation of Prosodic Structure and Intonation in Speech Synthesis", in KLEIJN, W.B.- PALIWAL, K.K. (Eds.) Speech Coding and Synthesis. Amsterdam: Elsevier Science.

van COILE, B.- de ZITTER, A.- van TICHELEN, L.- VORSTERMANS, A. (1994) "Prosody transplantation in text-to-speech: applications and tools", in Conference Proceedings of the Second ESCA/IEEE Workshop on Speech Synthesis. September 12-15, 1994. Mohonk Mountain House, New Paltz, New York, USA. pp. 105-108.

XYDAS, G.- KOUROUPETROGLOU, G. (2006) "Tone-Group F0 selection for modeling focus prominence in small-footprint speech synthesis", Speech Communication 48, 9: 1057-1078.
http://dx.doi.org/10.1016/j.specom.2006.02.002

YOUNG, N.J.- FALLSIDE, F. (1987) "Generating Words and Prosody for Use in Speech Synthesis", in LAVER, J.- JACK, M.A. (EdEuropean Conference on Speech Technology. Edinburgh, September 1987. Edinburgh: CEP Consultants Ltd. pp. 17-10.

ZELLNER KELLER, B.- KELLER, E. (2002) "Representing speech rhythm", in KELLER, E. - BAILLY, G.- MONAGHAN, A.- TERKEN, J.- HUCKVALE, M. (Eds.) Improvements in Speech Synthesis. Cost 258: The Naturalness of Synthetic Speech. Chichester: John Wiley & Sons. pp. 154-164.

up arrow

Segmental duration

BARTKOVA, K.- SORIN, C. (1987) "A Model of Segmental Duration for Speech Synthesis in French", Speech Communication 6 : 245-260.

CAMPBELL, W.N. (1992) "Syllable-based segmental duration", in BAILLY, G.- BENOÎT, C. (Eds.) Talking Machines. Theories, Models and Designs. Amsterdam: North-Holland / Elsevier Science Publishers. pp. 211-224.

CÓRDOBA, R.- VALLEJO, J.A.- MONTERO, J.M.- GUTIÉRREZ ARRIOLA, J.M.- LÓPEZ, M.A.- PARDO, J.M. (1999) "Automatic modeling of duration in a Spanish text-to-speech system using neural networks", in Eurospeech’99, 6th European Conference on Speech Communication and Technology. September 5-9, 1999, Budapest, Hungary. pp. 1619-1622.
http://www-gth.die.upm.es/research/documentation/AI-52Aut-99.pdf

DOHALSKÁ, M.- MEJALDOVÁ, J.- DUBEDA, T. (2002) "Prosodic paramters of synthetic Czech: Developing rules for duration and intensity", in KELLER, E. - BAILLY, G.- MONAGHAN, A.- TERKEN, J.- HUCKVALE, M. (Eds.) Improvements in Speech Synthesis. Cost 258: The Naturalness of Synthetic Speech. Chichester: John Wiley & Sons. pp. 129-133.

FEBRER, A.- PADRELL, J.- BONAFONTE, A. (1998) "Modeling Phone Duration: Application to Catalan TTS", in Proceedings of the 3rd International Workshop on Speech Synthesis. Jenolan Caves, Australia, 27th - 29th November 1998.
http://www.isca-speech.org/archive_open/ssw3/ssw3_043.html

GARRIDO, J.M.- RÍOS, A.- JIMÉNEZ, E.- LLISTERRI, J. (2000) Models prosòdics per a la conversió de text a parla, Jornades del Centre de Referència en Enginyeria Lingüística (CREL), Institut d’Estudis Catalans, Barcelona, 4 i 5 d’abril de 2000.
http://liceu.uab.cat/~joaquim/publicacions/SFI_UAB_Models_prosodics.pdf

KAIKI, N.- TAKEDA, K.- SAGISAKA, Y. (1992) "Linguistic properties in the control of segmental duration for speech synthesis", in BAILLY, G.- BENOÎT, C. (Eds.) Talking Machines. Theories, Models and Designs. Amsterdam: North-Holland / Elsevier Science Publishers. pp. 255-264.

KELLER, E.- ZELLNER-KELLER, B.- LOCAL, J. (2000) "A serial prediction component for speech timing", in SENDLMEIER, W.F. (Ed.) Speech and Signals. Aspects of Speech Synthesis and Automatic Speech Recognition. Dedicated to Wolfgang Hess on his 60th Birthday. Frankfurt: Hector (Forum Phoneticum, 69). pp. 41-49.

KLATT, D.H. (1979) "Synthesis by Rule of Segmental Durations in English Sentences", in B. LINDBLOM -S. OHMAN (Eds.) Frontiers of Speech Communication Research. New York: Academic Press. pp. 287-300.

MARÍN GÁLVEZ, R. (1994) "Diseño y evaluación de un modelo de duración vocálica del español para la síntesis del habla", Actas del X Congreso de la Sociedad Española para el Procesamiento del Lenguaje Natural, Córdoba, 20-22 de julio de 1994.
http://liceu.uab.cat/publicacions/Marin_94_Duracion_Vocales_Sintesis_Espanol.pdf

SANTEN, J.P.H. van (1992) "Deriving text-to-speech durations from natural speech", in BAILLY, G.- BENOÎT, C. (Eds.) Talking Machines. Theories, Models and Designs. Amsterdam: North-Holland / Elsevier Science Publishers. pp.265-274.

SANTEN, J.P.H. van (1994) "Assignment of Segmental Duration in Text-to-Speech Synthesis", Computer Speech and Language 8, 2: 95-128.

SANTEN, J.P.H. van (1995) "Computation of Timing in Text-to-Speech Synthesis", in KLEIJN, W.B.- PALIWAL, K.K. (Eds.) Speech Coding and Synthesis. Amsterdam: Elsevier Science.

SANTOS, A.- MUÑOZ, P.- MARTÍNEZ, M. (1988) "Diseño y evaluación de reglas de duración en la conversión de texto a voz", Procesamiento del Lenguaje Natural, Boletín n. 6: 69-92.

SHIH, Ch.- AO, B. (1997) "Duration Study for the Bell Laboratories Mandarin Text-to-Speech System", in van SANTEN, J.P.H. - SPROAT, R.W.- OLIVE, J.P.- HIRSCHBERG. J. (Eds.) Progress in Speech Synthesis. New York: Springer. pp. 383-400.

TZOUKERMANN, E.- SOUMOY, O. (1995) "Segmental Duration in French Text-to-Speech Synthesis", in Eurospeech’95. Proceedings of the 4th European Conference on Speech Communication and Technology. Madrid, Spain, 18-21 September, 1995. Vol 1, pp. 607-610.

up arrow

Intonation modelling

AKERS, G.- LENNIG, M. (1985) "Intonation in Text-to-Speech Synthesis: Evaluation of Algorithms", Journal of the Acoustical Society of America 77,6: 2157-2165.

ALTENBERG, B. (1987) Prosodic Patterns in Spoken English: Studies in the Correlation between Prosody and Grammar for Text-to-Speech Conversion. Lund: Lund University Press (Lund Studies in English, 76).

ALLEN, J.- O’SHAUGHNESSY, D. (1976) "A Comprehensive Model for Fundamental Frequency Generation", 1976 IEEE International Conference on Acoustics, Speech and Signal Processing. Rome, N.Y. : Canterbury Press. pp. 701-704.

AUBERGÉ, V. (1992) :"Developing a structured lexicon for synthesis of prosody", in BAILLY, G.- BENOÎT, C. (Eds.) Talking Machines. Theories, Models and Designs. Amsterdam: North-Holland / Elsevier Science Publishers. pp. 307-322.

AUBERGÉ, V. (1993) "Prosody modeling with a dynamic lexicon of intonative forms: Application for text-to-speech synthesis", in HOUSE, D.- TOUATI, P. (Eds.) Proceedings of an ESCA Workshop on Prosody. September 27-29, 1993, Lund, Sweden. Lund University Department of Linguistics and Phonetics, Working Papers 41. pp. 62-65.

AUBERGÉ, V.- BAILLY, G. (1995) "Generation of intonation: a global approach", in Eurospeech’95. Proceedings of the 4th European Conference on Speech Communication and Technology. Madrid, Spain, 18-21 September, 1995. Vol 3. pp. 2065-2068.

BAILLY, G.- AUBERGÉ, V. (1997) "Section Introduction. Phonetic Representations for Intonation", in van SANTEN, J.P.H. - SPROAT, R.W.- OLIVE, J.P.- HIRSCHBERG. J. (Eds.) Progress in Speech Synthesis. New York: Springer. pp. 435-442.

BAILLY, G.- BARBE, T.- WANG, H. (1992) "Automatic labelling of large prosodic databases: Tools, methodology and links with a text-to-speech system", in BAILLY, G.- BENOÎT, C. (Eds.) Talking Machines. Theories, Models and Designs. Amsterdam: North-Holland / Elsevier Science Publishers. pp. 323-334.

BEAUGENDRE, F. (1995) "Generating French Intonation at Different Speaking Rates", in Eurospeech’95. Proceedings of the 4th European Conference on Speech Communication and Technology. Madrid, Spain, 18-21 September, 1995. Vol 1, pp. 603-606.

BEAUGENDRE, F. (1996) "Modèles de l’intonation pour la synthèse de la parole", in MÉLONI, H. (Coord.) Fondements et Perspectives en Traitement Automatique de la Parole. Paris: Éditions AUPELF-UREF (Collection Universités Francophones). pp. 97-198.

BEAUGENDRE, F.- LACHERET-DUJOUR, A. (1993) "Automatic generation of French intonation based on a perceptual study and morphosyntactic information", in Eurospeech’93. 3rd European Conference on Speech Communication and Technology. Berlin, Germany, 21-23 September 1993. Vol 2 pp. 1219-1222.

BRUCE, G.- FILIPSON, M.- FRID, J.- GRANSTRÖM, B.- GUSTAFSON, K.- HORNE, M.- HOUSE, D. (2000) "Modelling of Swedish text and discourse intonation in a speech synthesis framework", in BOTINIS, A. (Ed.) Intonation: Analysis, Modelling and Technology. Dordrecht: Kluwer Academic Publishers (Text, Speech and Language Technology, 15). pp. 291-320.

BUHMANN, J.- VEREECKEN, H.- FACKRELL, J.- MARTENS, J.-P. - van COILE, B. (2000) "Data driven intonation modelling of 6 languages", in ICSLP 2000. Proceedings of the Sixth international conference on spoken language processing. Beijin, China.<

Campillo, F., van Santen, J., & Rodríguez Banga, E. (2009). Integrating phrasing and intonation modelling using syntactic and morphosyntactic information. Speech Communication, 51(5), 452-465.

CARDEÑOSO, V.- ESCUDERO, D. (2002) "Statistical modelling of stress groups in Spanish", in Proceedings of Speech Prosody 2000, an International Conference. Aix-en-Provence, France, 11-13 April 2002. pp. 207-210.
http://www.isca-speech.org/archive/sp2002/sp02_207.html

COLLIER, R. (1991) "Multi-language intonation synthesis", Journal of Phonetic 19,1: 61-74.

COLLIER, R.- TERKEN, J. (1987) "Intonation by Rule in Text-to-Speech Applications", in LAVER, J.- JACK, M.A. (Eds.) European Conference on Speech Technology. Edinburgh, September 1987. Edinburgh: CEP Consultants Ltd. pp. 165-168.

D’ALESSANDRO, C.- MERTENS, P.- BEAUGENDRE, F. (1994) "Automatic stylization of intonation: application to speech synthesis", in Conference Proceedings of the Second ESCA/IEEE Workshop on Speech Synthesis. September 12-15, 1994. Mohonk Mountain House, New Paltz, New York, USA. pp. 155-158.

DE TOURNEMIRE, S. (1997) "Identification and automatic generation of prosodic contours for a text-to-speech synthesis system in French", in Proceedings of Eurospeech’97, 5th European Conference on Speech Communication and Technology. Rhodes, Greece, 22-25 September 1997, Vol. I, pp. 191-194

DE TOURNEMIRE, S. (1998) "Automatic transcription of intonation using an identified prosodic alphabet" in Proceedings ICSLP’98, 5th International Conference on Spoken Language Processes. Sydney, 30 November - 4 December 1998. Vol 5. pp. 1955-1958.

DOBNIKAR, A. (2002) "Improvements in modelling the F0 contour for different types of intonation units in Slovene", in KELLER, E. - BAILLY, G.- MONAGHAN, A.- TERKEN, J.- HUCKVALE, M. (Eds.) Improvements in Speech Synthesis. Cost 258: The Naturalness of Synthetic Speech. Chichester: John Wiley & Sons. pp. 144-153.

EPITROPAKIS, G.- YIOURGALIS, N.- KOKKINAKIS, G. (1993) "High Quality Intonation Algorithm for the Greek TTS-System", in HOUSE, D.- TOUATI, P. (Eds.) Proceedings of an ESCA Workshop on Prosody. September 27-29, 1993, Lund, Sweden. Lund University Department of Linguistics and Phonetics, Working Papers 41. pp. 70-73.

ESCUDERO, D. (2002) Modelado estadístico de entonación con funciones de Bézier: Aplicaciones a la conversión texto-voz en español. Tesis doctoral. Departamento de Informática, Universidad de Valladolid.
http://www.infor.uva.es/~descuder/investig/tesis/master.ps

ESCUDERO, D. (2003) "Modelado estadístico de entonación con funciones de Bézier: aplicaciones a la conversión texto-voz en español", Procesamiento del Lenguaje Natural, Revista nº 30: 125-126.

ESCUDERO, D.- BONAFONTE, A.- CARDEÑOSO, V. (2002) "Corpus based extraction of quantitative prosodic parameters of stress groups in Spanish", in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2002. pp. 481-484.
https://pdfs.semanticscholar.org/c275/aefa6a1bb0e0086f3f6a4156425601e5bc59.pdf
http://www.infor.uva.es/~descuder/investig/pdfs/icassp2002.pdf

ESCUDERO, D.- CARDEÑOSO, V. (2000) "Obtención automática de modelos de entonación a partir de un corpus empleando splines y patrones estadísticos: primeros resultados", in Actas de las I Jornadas en Tecnologías del Habla. Sevilla, 6-10 noviembre de 2000.

ESCUDERO, D.- CARDEÑOSO, V. (2001) "Modelo cuantitativo de entonación del español", Procesamiento del Lenguaje Natural (Actas del XVII Congreso de la Sociedad Española para el Procesamiento del Lenguaje Natural, Universidad de Jaén, 12-14 de septiembre de 2001), Revista n. 27: 233-240.
http://www.sepln.org/revistaSEPLN/revista/27/27-articulo27.pdf
http://www.infor.uva.es/~descuder/investig/pdfs/sepln2001.ps

ESCUDERO, D.- GONZÁLEZ, C.- CARDEÑOSO, V. (2002) "Evaluación objetiva y subjetiva de entonación sintética", in RUBIO AYUSO, A. (Ed.) (2002) Actas de las II Jornadas en Tecnologías del Habla. Granada, del 16 al 18 de diciembre de 2002. Organizadas por la Red Temática en Tecnologías del Habla. Granada: Universidad de Granada, Departamento de Electrónica y Tecnología de Computadores.
http://www.infor.uva.es/~descuder/investig/pdfs/jth2002.ps

Estrada, M., & Baqué, L. (2008). Modélisation prosodique des interrogatives en français por une application de synthèse de la parole. In B. Lépinette & B. Gómez (Eds.), Linguistique plurielle. VI congrés international de linguistique française. València: Universitat Politècnica de València. Retrieved from http://sites.google.com/site/lorrainebaqueuab/publis/Valencia2006.pdf

FERNÁNDEZ, X.- RODRÍGUEZ, E. (2000) "Proposición de un marco adecuado para el estudio de contornos de F0 para síntesis de voz", Procesamiento del Lenguaje Natural, Revista nº 26: 175-182.
http://www.sepln.org/revistaSEPLN/revista/26/fernandez-salgado.pdf

GARRIDO ALMIÑANA, J.M. (1996) Modelling Spanish Intonation for Text-to-Speech Applications. Ph.D. Thesis. Departament de Filologia Espanyola, Facultat de Lletres, Universitat Autònoma de Barcelona.
http://hdl.handle.net/10803/4885

GARRIDO, J.M. (1991) "Estilización de patrones melódicos del español para sistemas de conversión texto-habla", Procesamiento del Lenguaje Natural, Boletín nº 11: 209-220.
http://liceu.uab.cat/publicacions/Garrido_91_Estilizacion_Patrones_Melodicos_Sintesis.pdf

GARRIDO, J.M. (1991) Modelización de patrones melódicos del español para la síntesis y el reconocimiento. Bellaterra: Departament de Filologia Espanyola, Universitat Autònoma de Barcelona.
http://liceu.uab.cat/publicacions/Garrido_91_Modelizacion.pdf

GARRIDO, J.M. (1996) Modelling Spanish Intonation for Text-to-Speech Applications. Ph.D. Thesis. Departament de Filologia Espanyola, Facultat de Lletres, Universitat Autònoma de Barcelona. 2 vols.

GARRIDO, J.M.- RÍOS, A.- JIMÉNEZ, E.- LLISTERRI, J. (2000) Models prosòdics per a la conversió de text a parla, Jornades del Centre de Referència en Enginyeria Lingüística (CREL), Institut d’Estudis Catalans, Barcelona, 4 i 5 d’abril de 2000.
http://liceu.uab.cat/~joaquim/publicacions/SFI_UAB_Models_prosodics.pdf

GUTIÉRREZ, J.M.- GIMÉNEZ DE LOS GALANES, F.M.- SAVOJI, M.H.- PARDO, J.M. (1997) "Speech synthesis and prosody modification using segmentation and modeling of the excitation signal", in Eurospeech’97. Proceedings of the 5th European Conference on Speech Communication and Technology. Rhodes, Greece, 22-25 September 1997. pp. 1059-1062.

GUTIÉRREZ, J.M.- MONTERO, J.M.- SAIZ, D.- PARDO, J.M. (2001) "New rule-based and data-driven strategy to incorporate Fujisaki’s F0 model to a text-to-speech system in Castilian Spanish", in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Salt Lake City, 2001. pp. 821-824.

HERNÁEZ, I.- GAMINDE, I.- ETXEBARRIA, B.- ETXEBARRIA, P.- GANDARIAS, R. (1995) "Curvas de F0 en euskara: Primera aproximación a la obtención de modelos para conversión de texto a voz", Procesamiento del Lenguaje Natural, Revista n. 17: 272-288.
http://www.sepln.org/revistaSEPLN/revista/17/17-Pag272.pdf

HIGUCHI, N.- HIRAI, T.- SAGISAKA, Y. (1997) "Effect of Speaking Style on Parameters of Fundamental Frequency Contour", in van SANTEN, J.P.H. - SPROAT, R.W.- OLIVE, J.P.- HIRSCHBERG. J. (Eds.) Progress in Speech Synthesis. New York: Springer. pp. 417-428.

HIRAI, T.- IWAHASHI, N.- HIGUCHI, N.- SAGISAKA, Y. (1997) "Automatic Extraction of F0 Control Rules Using Statistical Analysis", in van SANTEN, J.P.H. - SPROAT, R.W.- OLIVE, J.P.- HIRSCHBERG. J. (Eds.) Progress in Speech Synthesis. New York: Springer. pp. 333-346.

HIYAKUMOTO, L.- PREVOST, S.- CASSELL, J. (1997) "Semantic and Discourse Information for Text-to-Speech Intonation", in ALTER, K.- PIRKER, H.- FINKLER, W. (Eds.) Concept to Speech Generation Systems. Proceedings of a Workshop Sponsored by the Association for Computational Linguistics. 11 July 1997, Universidad Nacional de Educación a Distancia, Madrid, Spain. pp. 47-56.

HORNE, M.A.- FILIPSON, M.K.D. (1997) "Computational Extraction of Lexico-Grammatical Information for Generation of Swedish Intonation", in van SANTEN, J.P.H. - SPROAT, R.W.- OLIVE, J.P.- HIRSCHBERG. J. (Eds.) Progress in Speech Synthesis. New York: Springer. pp. 443-458.

HOUSE, J. (1990) "A revised model for intonation for synthesis by rule", Speech, Hearing and Language: Work in Progress UCL 4: 123-135.

JILKA, M.- MÖHLER, G.- DOGIL, G. (1999) "Rules for the generation of ToBI-based American English intonation", Speech Communication 28, 2: 83-108.

JOHNSON, M. (1990) "Implementation of an intonation algorithm for synthesis-by-rule", Speech, Hearing and Language: Work in Progress UCL 4: 197-

KOHLER, K.J. (1990) "Macro and micro Fo in the synthesis of intonation", in KINGSTON, J.- BECKMAN, M.E. (Eds.) Papers in Laboratory Phonology I: Between Grammar and Physics of Speech. Cambridge: Cambridge University Press. pp. 115-138.

KUGLER-KRUSE, M.- POSMYK, R. (1987) "Methods for the Simulation of Natural Intonation in the ’Syrub’ Text-to-Speech System for Unrestricted German Text", in LAVER, J.- JACK, M.A. (Eds.) European Conference on Speech Technology. Edinburgh, September 1987. Edinburgh: CEP Consultants Ltd. pp. 177-180.

LADD, D.R. (1987) "A Model of Intonational Phonology for Use in Speech Synthesis by Rule", in LAVER, J.- JACK, M.A. (Eds.) European Conference on Speech Technology. Edinburgh, September 1987. Edinburgh: CEP Consultants Ltd. pp. 21-24.

MARTÍ, J.- GUDAYOL, F. (1994) "Ritmo y entonación en la lectura del castellano", Boletín de la Sociedad Española para el Procesamiento del Lenguaje Natural 14: 273-290.
http://www.sepln.org/revistaSEPLN/revista/14/14-Pag273.pdf

MARTIN, Ph. (2002) "Modelling F0 in various Romance languages: implementation in some TTS systems", in KELLER, E. - BAILLY, G.- MONAGHAN, A.- TERKEN, J.- HUCKVALE, M. (Eds.) Improvements in Speech Synthesis. Cost 258: The Naturalness of Synthetic Speech. Chichester: John Wiley & Sons. pp. 104-119.

MERTENS, P.- BEAUGENDRE, F.- d’ALESSANDRO, Ch. R. (1997) "Comparing Approaches to Pitch Contour Stylization for Speech Synthesis", in van SANTEN, J.P.H. - SPROAT, R.W.- OLIVE, J.P.- HIRSCHBERG. J. (Eds.) Progress in Speech Synthesis. New York: Springer. pp. 347-364.

MIXDORFF, H. (1998) Intonation Patterns of German - Quantitative Analysis and Synthesis of F0 Countours. PhD Thesis. Technical University of Berlin.
http://public.beuth-hochschule.de/~mixdorff/thesis/index.html

MIXDORFF, H.- FUJISAKI, H. (1995) "A Scheme for a Model-Based Synthesis by Rule of F0 contours of German Utterances", in Eurospeech’95. Proceedings of the 4th european conference on speech communication and technology. Madrid, Spain, 18-21 September, 1995. Vol 3, pp. 1823-1826.

MÖBIUS, B. (1994) "A quantitative model of German intonation and its application to speech synthesis", in Conference Proceedings of the Second ESCA/IEEE Workshop on Speech Synthesis. September 12-15, 1994. Mohonk Mountain House, New Paltz, New York, USA. pp. 139-142.

MÖBIUS, B. (1997) "Synthesizing German Intonation Contours", in van SANTEN, J.P.H. - SPROAT, R.W.- OLIVE, J.P.- HIRSCHBERG. J. (Eds.) Progress in Speech Synthesis. New York: Springer. pp. 401-416.

MÖBIUS, B.- PÄTZOLD, M.- HESS, W. (1993) "Analysis and synthesis of German Fo contours by means of Fujisaki’s model", Speech Communication 13 1-2: 53-62.

MONAGHAN, A.I.C. (1993) "The intonation of textual anomalies in text-to-speech", Speech Communication 12,4: 371-382.

MONTERO, J.M.- D’HARO, L.F.- CÓRDOBA, J.A.- VALLEJO, J.- GUTIÉRREZ, J.- PARDO, J.M. (2003) "ANN F0 Modeling for Female-Voice Synthesis in Spanish: Restricted and Non-Restricted Domains", in Proceedings of the 15th International Congress of Phonetic Sciences. Barcelona, 3-9 August 2003. pp. 563-566.

MONTERO, J.M.- de CÓRDOBA, R.- VALLEJO, J.A.- GUTIÉRREZ, J.- ENRÍQUEZ, E.- PARDO, J.M. (2000) "Restricted-domain female-voice synthesis in Spanish: From database design to ANN prosodic modeling", in ICSLP’00. Proceedings of the 6th International Conference on Spoken Language Processing. Beijing, China, 2000.
http://www-gth.die.upm.es/research/documentation/AI-57Res-00.pdf

MORLEC, Y.- BAILLY, G.- AUBERGÉ, V. (1995) "Synthesis and evaluation of intonation with a superposition model", in Eurospeech’95. Proceedings of the 4th European Conference on Speech Communication and Technology. Madrid, Spain, 18-21 September, 1995. Vol 3. pp. 2043-2046.

OLASZY, G.- NÉMETH, G. (1997) "Prosody generation for German CST/TTS systems (from theoretical intonation patterns to practical realisation", Speech Communication 21, 1-2: 37-60.

PICKERING, B. (1996) "Synthesising fundamental frequency contours: experimental results", in KNOWLES, G.- WICHMANN, A.- ALDERSON, P. (Eds.) Working with Speech: Perspectives on research into the Lancaster/IBM Spoken English Corpus. London & New York: Longman. pp. 203-213.

PIERREHUMBERT, J. (1981) "Synthesizing Intonation", Journal of the Acoustical Society of America 70,4: 985-995.

PREVOST, S.- STEEDMAN, M. (1994) "Specifying intonation from context for speech synthesis", Speech Communication 15, 1-2: 139-153.

QUENÉ, H. - KAGER, R. (1992). "The derivation of prosody for text-to-speech from prosodic sentence structure", Computer Speech and Language 6: 77-98.

ROSS, K.- OSTENDORF, M. (1994) "A dynamical system model for generating Fo for synthesis", in Conference Proceedings of the Second ESCA/IEEE Workshop on Speech Synthesis. September 12-15, 1994. Mohonk Mountain House, New Paltz, New York, USA. pp. 131-134.

TAYLOR, P. (1993) "Synthesizing Intonation using the RFC Model", in HOUSE, D.- TOUATI, P. (Eds.) Proceedings of an ESCA Workshop on Prosody. September 27-29, 1993, Lund, Sweden. Lund University Department of Linguistics and Phonetics, Working Papers 41. pp. 86-89.

TAYLOR, P. (2000) "Analysis and synthesis of intonation using the Tilt model", Journal of the Acoustical Society of America 107, 3: 1697-1714.

TRABER, C. (1992) "Fo generations with a databse of natural Fo patterns and with a neural network", in BAILLY, G.- BENOÎT, C. (Eds) Talking Machines. Theories, Models and Designs. Amsterdam: North-Holland / Elsevier Science Publishers. pp. 287-304.

VÉRONIS, J.- DI CRISTO, Ph.- COURTOIS, F.- CHAUMETTE, C. (1998) "A stochastic model of intonation for text-to-speech synthesis", Speech Communication 26,4: 233-244.
https://pdfs.semanticscholar.org/286c/ae845a00442960bbbf435bd2883dc0e3938f.pdf

WILLEMS, N.- COLLIER, R.- ’t HART, J. (1988) "A synthesis scheme for British English intonation", Journal of the Acoustical Society of America 84,4: 1281-1291.

WILLIAMS, B.- ALDERSON, P. (1996) "Synthesizing British English intonation", in KNOWLES, G.- WICHMANN, A.- ALDERSON, P. (Eds.) Working with Speech: Perspectives on research into the Lancaster/IBM Spoken English Corpus. London & New York: Longman. pp. 191-202.

up arrow

Prosodic parsing

AGÜERO, P.D.- BONAFONTE, A.- (2003) "Phrase break prediction: a comparative study", XIX Congreso de la Sociedad Española para el Procesamiento del Lenguaje Natural. Universidad de Alcalá, 10, 11 y 12 de septiembre de 2003. Procesamiento del Lenguaje Natural 31: 107-114.
http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/3156/1647

BACHENKO, J.- FITZPATRICK, E. (1990) "A computational grammar of discourse-neutral prosodic phrasing in English", Computational Linguistics 16 (3): 155-170.

BRUCE, G.- GRANSTRÖM, B.- HOUSE, D. (1992) " Prosodic phrasing in Swedish speech synthesis", in BAILLY, G.- BENOÎT, C. (Eds.) Talking Machines. Theories, Models and Designs. Amsterdam: North-Holland / Elsevier Science Publishers. pp. 113-126.

Campillo, F., van Santen, J., & Rodríguez Banga, E. (2009). Integrating phrasing and intonation modelling using syntactic and morphosyntactic information. Speech Communication, 51(5), 452-465.

GILI FIVELA, B.- QUAZZA, S. (1996) "A Prosodic Parser for an Italian Text-to-Speech System", Actas del XII Congreso de la Sociedad Española para el Procesamiento del Lenguaje Natural, Sevilla, septiembre de 1996. Procesamiento del Lenguaje Natural, Revista 19: 189-200.

HIRSCHBERG, J.- PRIETO, P. (1996) "Training intonational phrasing rules automatically for English and Spanish text-to-speech", Speech Communication 18,3: 283-292.

LEE, S.- OH, Y.-H. (1999) "Tree-based modeling of prosodic phrasign and segmental durationfor Korean TTS systems", Speech Communication 28, 4: 283-300.

MARSI, E.C.- COPPEN, P.-A. J.M.- GUSSENHOVEN, C.H.M.- RIETVELD, T.C.M. (1997) "Prosodic and Intonational Domains in Speech Synthesis", in van SANTEN, J.P.H. - SPROAT, R.W.- OLIVE, J.P.- HIRSCHBERG. J. (Eds.) Progress in Speech Synthesis. New York: Springer. pp. 477-494.

OSTENDORF, M.- VEILLEUX, N. (1994) "A Hierarchical Stochastic Model for Automatic Prediction of Prosodic Boundary Location", Computational Linguistics 20,1: 27-54.

OSTENDORF. M.- WIGHTMAN, C.W.- VEILLEUX, N.M. (1993) "Parse scoring with prosodic information: an analysis/synthesis approach", Computer Speech and Language 7,3: 193-210.

QUENÉ, H.- KAGER, R. (1992) "The derivation of prosody for text-to-speech from prosodic sentence structure", Computer Speech and Language 6: 77-98.

SANDERMAN, A. (1994) "How can prosody segment the flow of (synthetic) speech?" , in Conference Proceedings of the Second ESCA/IEEE Workshop on Speech Synthesis. September 12-15, 1994. Mohonk Mountain House, New Paltz, New York, USA. pp. 147-150.

SANDERMAN, A.A.- COLLIER, R. (1996) "Prosodic rules for the implementation of phrase boundaries in synthetic speech", Journal of the Acoustical Society of America 100, 5: 3390-3397

SANDERS, P.- TAYLOR, P. (1995) "Using Statistical Models to Predict Phrase Boundaries for Speech Synthesis", in Eurospeech’95. Proceedings of the 4th European Conference on Speech Communication and Technology. Madrid, Spain, 18-21 September, 1995. Vol 3, pp. 1811-1814.

TAYLOR, P.- BLACK, A.W. (1998) "Assigning phrase breaks from part-of-speech sequences", Computer Speech and Languag 12, 2: 99-117.

VEILLEUX, N. M. (1997) "Probabilistic model of acoustic / prosody / concept relationships for speech synthesis", in ALTER, K.- PIRKER, H.- FINKLER, W. (Eds.) Concept to Speech Generation Systems. Proceedings of a Workshop Sponsored by the Association for Computational Linguistics. 11 July 1997, Universidad Nacional de Educación a Distancia, Madrid, Spain. pp. 1-10.

WANG, M.Q. - HIRSCHBERG, J. (1992) "Automatic classification of intonational phrase boundaries", Computer Speech and Language 6: 175-196.

up arrow

Pause assignment

ALMEIDA BARBOSA, P.- BAILLY, G. (1997) "Generation of Pauses Within the z-score Model", in van SANTEN, J.P.H. - SPROAT, R.W.- OLIVE, J.P.- HIRSCHBERG. J. (Eds.) Progress in Speech Synthesis. New York: Springer. pp. 365-382.

PUIGVÍ, D.- JIMÉNEZ, D.- FERNÁNDEZ, J.M. (1994) "Parametrización de las pausas ortográficas en castellano. Aplicación a un conversor de texto a habla", Actas del X Congreso de la Sociedad Española para el Procesamiento del Lenguaje Natural. Córdoba, 20-22 de julio de 1994.
http://liceu.uab.cat/publicacions/Puigvi_Jimenez_Fernandez_94_Pausas_Sintesis_Castellano.pdf

up arrow

Intensity modelling

BAGSHAW, P. (1998) "Unsupervised training of phone duration and energy models for text-to-speech synthesis", in Proceedings ICSLP’98, 5th International Conference on Spoken Language Processing. Sydney, 30 November - 4 December 1998. Vol 2. pp. 17-20.

BARTKOVA, K.- HAFFNER, P.- LARREUR, D. (1993) "Intensity Prediction for Speech Synthesis in French", in HOUSE, D.- TOUATI, P. (Eds.) Proceedings of an ESCA Workshop on Prosody. September 27-29, 1993, Lund, Sweden. Lund University Department of Linguistics and Phonetics, Working Papers 41. pp. 280-283.

BLECUA, B.- ACÍN, V. (1995) "Propuesta de un modelo de intensidad vocálica del castellano y el catalán aplicable a un sistema de conversión de texto a habla", Procesamiento del Lenguaje Natural, Revista n.17: 257-271.
http://www.sepln.org/revistaSEPLN/revista/17/17-Pag257.pdf

DOHALSKÁ, M.- MEJALDOVÁ, J.- DUBEDA, T. (2002) "Prosodic paramters of synthetic Czech: Developing rules for duration and intensity", in KELLER, E. - BAILLY, G.- MONAGHAN, A.- TERKEN, J.- HUCKVALE, M. (Eds.) Improvements in Speech Synthesis. Cost 258: The Naturalness of Synthetic Speech. Chichester: John Wiley & Sons. pp. 129-133.

up arrow

Stress assignment

BAART, J.L.G. (1989) "Focus and accent in a Dutch text-to-speech system", Proceedings of the European Chapter of the Association for Computational Linguistics, Cambridge, 1989. pp. 111-114

CARLSON, R.- GRANSTRÖM, B. (1973) "Word Accent, Emphatic Stress and Syntax in a Synthesis by Rule Scheme for Swedish", Speech Transmission Laboratory - Quarterly Progress and Status Report 2-3: 31-36.

HIRSCHBERG, J. (1992) "Using discourse context to guide pitch accent decisions in synthetic speech", in BAILLY, G.- BENOÎT, C. (Eds.) Talking Machines. Theories, Models and Designs. Amsterdam: North-Holland / Elsevier Science Publishers. pp. 367-376.

MONAGHAN, A.I. (1991) "Accentuation and speech rate in the CSRT TTS system", in Proceedings of the ESCA Workshop on Phonetics and Phonology of Speaking Styles. Barcelona, Spain, 30 September - 2 October, 1991. pp. 41.1-41.5.

SPROAT, R. (1994) "English noun-phrase accent prediction for text-to-speech", Computer Speech and Language 8,2: 79-94.

THEUNE, M. (1997) "Contrastive accent in a data-to-speech system", in Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics. 7-12 July 1997, Universidad Nacional de Educación a Distancia (UNED), Madrid, Spain.pp. 519-521.

WILLIAMS, B. (1987) "Word stress assignment in a text-to-speech synthesis system for British English", Computer Speech and Language 2,1: 235-272.

up arrow

Phonological information

HERTZ, S.R. (1990) "The Delta programming language: an integrated approcah to nonlinear phonology, phonetics and speech synthesis", in KINGSTON, J.- BECKMAN, M.E. (Eds.) Papers in Laboratory Phonology I: Between Grammar and Physics of Speech. Cambridge: Cambridge University Press.

KLATT, D.H. (1976) "Structure of a Phonological Rule Component for a Synthesis- by- Rule Program", IEEE Transactions on Acoustics, Speech and Signal Processing ASSP-24: 291-298.

LOCAL, J. (1994) "Phonological Structure, Parametric Phonetic Interpretation and Natural-Sounding Synthesis", in KELLER, E. (Ed.) Fundamentals of Speech Synthesis and Speech Recognition. Basic Concepts, State of the Art and Future Challenges. Chichester: John Wiley & Sons. pp. 253-270.

MATTINGLY, I. G.(1971) "Synthesis by Rule as a Tool for Phonological Research", Language and Speech 14:

MATTINGLY, I. G. (1974) "Speech Synthesis for Phonetic and Phonological Models", in T.A. SEBEOK (Ed.) Current Trends in Linguistics, vol 12, Linguistics and Adjacent Arts and Sciences, vol 4. Mouton: The Hague. pp. 2451-2487.

TAYLOR, P. (2000) "Concept-to-speech synthesis by phonological structure matching", in SPARCK JONES, K.- GAZDAR, G.- NEEDHAM, R. (Eds.) Computers, language and speech: Formal theories and statistical Data. Papers from a Royal Society / British Academy Discussion Meeting, September 1999. London: The Royal Society (Philosophical Transactions of the Royal Society, Series A: Mathematical, Physical en Engineering Sciences, Vol. 358, Issue 1769).

up arrow

Text processing and linguistic analysis


= Recommended introductory/general reading

Alhonen, J. (2009). Multilingual number expansion for TTS. In COCOSDA 2009. 12th International Oriental conference on speech databases and assessment. (pp. 110-5). Urumqi, China. August 10-12, 2009. doi:10.1109/ICSDA.2009.5278365

Allen, J., Hunnicutt, M. S., & Klatt, D. (1987). Morphological analysis. In From text to speech: The MITalk system. (pp. 23-39). Cambridge: Cambridge University Press.

Allen, J., Hunnicutt, M. S., & Klatt, D. (1987). The phrase-level parser. In From text to speech: The MITalk system. (pp. 40-51). Cambridge: Cambridge University Press.

Allen, J., Hunnicutt, M. S., & Klatt, D. (1987). Text preprocessing. In From text to speech: The MITalk system. (pp. 16-22). Cambridge: Cambridge University Press.

Batůšek R., & Dvořák, J. (1999). Text preprocessing for Czech speech synthesis. In V. Matoušek, P. Mautner, J. Ocelíková, & P. Sojka (Eds.), TDS 1999. Text, speech and dialogue. Second international workshop. Plzen, Czech Republic, September 13–17, 1999. Proceedings. (pp. 845-). Berlin: Springer. doi:10.1007/3-540-48239-3_38

Boëffard, O., Bigorgne, D., Cherbonnel, B., Emerard, F., Roussarie, L., Bagshaw, P., . . . Traber, C. (1996). Utilisation de techniques d’apprentissage automatique pour les traitements linguistiques et prosodiques en synthèse de la parole: Quelques résultats en anglais, allemand et français. In JEP 1996. Actes des XXèmes journés d’études sur la parole. (pp. 383-6). Avignon, France. 10-14 juin 1996.

Boves, L., Refice, M., Martínez, M., Casado, C., & Pardo, J. M. (1988). El procesador lingüístico para un sistema multilingüe de conversión texto-habla y habla-texto. Procesamiento del Lenguaje Natural, 6, 53-68.

Breen, A., Eggleton, B., Dion, P., & Minnis, S. (2002). Refocussing on the text normalisation process in text-to-speech synthesis. In ICSLP 2002 - Interspeech 2002. Proceedings of the 7th international conference on spoken language processing. (pp. 153-6). Denver, Colorado, USA. September 16-20, 2002. Retrieved from http://www.isca-speech.org/archive/icslp_2002/i02_0153.html

Burileanu, D., Dan, C., Sima, M., & Burileanu, C. (1999). A parser-based text preprocessor for Romanian language TTS synthesis. In Eurospeech 1999. Proceedings of the 6th European conference on speech communication and technology. (pp. 2063-6). Budapest, Hungary, September 5-9, 1999. Retrieved from http://www.ece.uvic.ca/~msima/PAPERS/Eurospeech_1999/Eurospeech_1999_B044.pdf

Carlson, R., & Granström, B. (1986). Linguistic processing in the KTH multilingual text-to-speech system. In ICASSP 1986. Proceedings of the IEEE international conference on acoustics, speech and signal processing. (pp. 2403-6). Tokyo, Japan. April 8 - 11, 1986.

Cotto, D. (1993). Improvement of unrestricted text synthesis by the linguistic preprocessing tool: TEXOR. In ESCA - NATO/RSG10 Workshop on applications of speech technology. (pp. 199-202). Lautrach, Bavaria, Germany. September 16-17, 1993. Retrieved from http://www.isca-speech.org/archive_open/ast_93/ast3_199.html

Coughlin, D. A. (1999). Leveraging syntactic information for text normalization. In V. Matousek, P. Mautner, J. Ocelíková, & P. Sojka (Eds.), TDS 1999. Text, speech and dialogue. Second international workshop. Plzen, Czech Republic, September 13–17, 1999. Proceedings. (pp. 842-). Berlin - Heidelberg: Springer. doi:10.1007/3-540-48239-3_17


Dutoit, T. (1997). Morpho-Syntactic analysis. In An introduction to text-to-speech synthesis. (pp. 71-104). Dordrecht: Kluwer.


Dutoit, T. (1997b). Preprocessing. In An introduction to text-to-speech synthesis. (pp. 73-7). Dordrecht: Kluwer.

Edgington, M., Lowry, A., Jackson, P., Breen, A. P., & Minnis, S. (1996). Overview of current text-to-speech techniques: Part I - text and linguistic analysis. BT Technology Journal, 14(1), 68-83.

Ferri, G., Pierucci, P., & Sanzone, D. (1997). A complete linguistic analysis for an Italian text-to-speech system. In J. P. H. van Santen, R. Sproat, Olive, & J. Hirschberg (Eds.), Progress in speech synthesis. (pp. 123-38). New York: Springer.

Garrido, J. M., Laplaza, Y., Marquina, M., Schoenfelder, C., & Rustullet, S. (2012). TexAFon: A multilingual text processing tool for text-to-speech applications. In IberSpeech 2012. VII jornadas en Tecnología del Habla and III Iberian SLTech Workshop (pp. 281-289). Escuela Politécnica Superior, Universidad Autónoma de Madrid. 21-23 November, 2012. Retrieved from http://lorien.die.upm.es/~lapiz/rtth/JORNADAS/VII/IberSPEECH2012_OnlineProceedings.pdf

Gaudinat, A., & Wehrli, E. (1997). Analyse syntaxique et synthèse de la parole: Le projet FipsVox. Traitement Automatique des Langues, 38(1), 121-134.

Hiyakumoto, L., Prevost, S., & Cassell, J. (1997). Semantic and discourse information for text-to-speech intonation. In K. Alter, H. Pirker, & W. Finkler (Eds.), Concept to speech generation systems. Proceedings of a workshop sponsored by the Association for Computational Linguistics. (pp. 47-56). Universidad Nacional de Educación a Distancia, Madrid, Spain. July 11, 1997. Retrieved from http://aclweb.org/anthology/W/W97/W97-1207.pdf

Jansen, R., van Hessen, A., & Pols, L. C. W. (1998). Pre-processing of input text: Improving pronunciation for the Fluent Dutch tex-to-speech synthesiser. Proceedings of the Institute of Phonetic Sciences, University of Amsterdam, 22, 125-134. Retrieved from http://www.fon.hum.uva.nl/archive/1998/1998-Proc22-JansenHessenPols.pdf

Kanis, J., Zelinka, J., & Müller, L. (2005). Automatic numbers normalization in inflectional languages. In SPECOM 2005. International workshop “Speech and Computer”. (pp. 663-6). Moscow State Linguistic University, Moscow, Russia. October 17-19, 2005. Retrieved from http://www.kky.zcu.cz/cs/publications/1/KanisJ_2005_Automaticnumbers.pdf

Liberman, M. Y., & Church, K. W. (1992). Text analysis and word pronunciation in text-to-speech synthesis. In S. Furui & M. M. Sondhi (Eds.), Advances in speech signal processing. New York: Marcel Dekker. Retrieved from https://pdfs.semanticscholar.org/a5be/0177798bdf8186d3a89664cf4de4de2e31c9.pdf?_ga=2.42161504.415191213.1506290754-1521038290.1468345841

Lindström, A., & Ljunqvist, M. (1992). Text processing within a speech synthesis system. In ICSLP 1992. Proceedings of the 2nd international conference on spoken language processing. Banff, Alberta, Canada, October 13-16, 1992. Retrieved from http://www.isca-speech.org/archive/icslp_1994/i94_1683.html

Miranda, L. A., & Rodríguez-Sánchez Torres, L. (1994). Analizador morfosintáctico de nombres propios y siglas. Procesamiento del Lenguaje Natural, 15. Retrieved from http://www.sepln.org/revistaSEPLN/revista/15/grupo3-2.pdf

Moberg, M., & Pärssinen, K. (2007). Multilingual rule-based approach to number expansion: Framework, extensions and application. International Journal of Speech Technology, 9(1), 29-40. doi:10.1007/s10772-006-9002-5

Monaghan, A. I. C. (1992). Heuristic strategies for higher level analysis of unrestricted text. In G. Bailly & C. Benoît (Eds.), Talking machines. Theories, models and designs. (pp. 143-6). Amsterdam: North Holland - Elsevier.

Monzón, L., Rodríguez Crespo, M. A., & Escalada, G. (1993). Módulo de análisis sintáctico para un sistema de conversión texto-voz en castellano. Procesamiento del Lenguaje Natural, 13, 367-379. Retrieved from http://www.sepln.org/revistaSEPLN/revista/13/13-Todo.pdf

Morton, K., & Tatham, M. (1995). Pragmatic effects in speech synthesis. In Eurospeech 1995. Proceedings of the 4th European conference on speech communication and technology. Vol 3. (pp. 1819-22). Madrid, Spain. September 18-21, 1995. Retrieved from http://www.isca-speech.org/archive/eurospeech_1995/e95_1819.html

Nakatani, C. H. (1997). Discourse structural constraints on accent in narrative. In J. P. H. van Santen, R. Sproat, J. P. Olive, & J. Hirschberg (Eds.), Progress in speech synthesis. (pp. 139-56). New York: Springer.

Nolan, F. (1984). Applying linguistics to synthesis. In G. Bristow (Ed.), Electronic speech synthesis. Techniques, technology and applications. (pp. 320-35). London: Granada.

Olaszi, P. (2000). Analysis of written and spoken form of Hungarian numbers for TTS applications. International Journal of Speech Technology, 3(3), 177-186. doi:10.1023/A:1026506930945

O’Shaughnessy, D. (1992). Text processing for text-to-speech synthesis. In G. Bailly & C. Benoît (Eds.), Talking machines. Theories, models and designs. (pp. 109-12). Amsterdam: North Holland - Elsevier.

Quazza, S., & van den Heuvel, H. (2000). The use of lexica in text-to-speech systems. In F. van Eynde & D. Gibbon (Eds.), Lexicon development for speech and language processing. (pp. 207-34). Dordrecht: Kluwer.

Reichel, U. D., & Pfitzinger, H. R. (2006). Text preprocessing for speech synthesis. In Proceedings of the TC-STAR speech to speech translation workshop. (pp. 207-12). Barcelona, Spain. June 19-21, 2006. Retrieved from http://www.phonetik.uni-muenchen.de/forschung/publikationen/ReichelPfitzingerTCS06.pdf

Ribeiro, R., Oliveira, L., & Trancoso, I. (2003). Using morphossyntactic information in TTS systems: Comparing strategies for European Portuguese. In N. Mamede, I. Trancoso, J. Baptista, & M. G. Volpe Nunes (Eds.), PROPOR 2003. Computational processing of the Portuguese language, 6th international workshop. Faro, Portugal, June 26–27, 2003. Proceedings. (pp. 195-). Berlin - Heidelberg: Springer. doi:10.1007/3-540-45011-4_21. Retrieved from http://www.l2f.inesc.pt/documents/papers/2003RibeiroA.pdf

Rodríguez, E., Fernández, X., Fernández, E., & González, M. (1998). Análisis lingüístico para un conversor de texto a voz en lengua gallega. Novática. Revista de la Asociación de Técnicos de Informática, 133, 41-45.

Rodríguez Crespo, M. A., & Escalada, J. G. (1990). Text analysis system with automatic letter to allophone conversion for a Spanish text to speech synthesizer. In SSW1-1990. Proceedings of the ESCA workshop on speech synthesis. (pp. 105-8). Autrans, France, September 25-28, 1990. Retrieved from http://www.isca-speech.org/archive_open/ssw1/ssw1_105.html

Romsdorfer, H., & Pfister, B. (2007). Text analysis and language identification for polyglot text-to-speech synthesis. Speech Communication, 49(9), 697-724. doi:10.1016/j.specom.2007.04.006. Retrieved from http://www.tik.ee.ethz.ch/~spr/publications/Romsdorfer:07.pdf

Russi, T. (1992). A framework for morphological and syntactic analysis and its application in a text-to-speech system for German. In G. Bailly & C. Benoît (Eds.), Talking machines. Theories, models and designs. (pp. 163-82). Amsterdam: North Holland - Elsevier.

Şaupe, A., Teodorescu, L., Ordean, M., Boldizsar, R., Ordean, M., & Silaghi, G. (2009). Efficient parsing of Romanian language for text-to-speech purposes. In V. Matoušek & P. Mautner (Eds.), TDS 2009. Text, speech and dialogue. 12th International conference. Pilsen, Czech Republic, September 13-17, 2009. Proceedings. (pp. 323-30). Berlin - Heidelberg: Springer. doi:10.1007/978-3-642-04208-9_45. Retrieved from https://www.researchgate.net/publication/226174955_Efficient_Parsing_of_Romanian_Language_for_Text-to-Speech_Purposes

Sproat, R. (1996). Multilingual text analysis for text-to-speech synthesis. In W. Wahlster (Ed.), ECAI 96. Proceedings of the 12th European conference on artificial intelligence. Chichester: John Wiley & Sons. Retrieved from http://lanl.arxiv.org/pdf/cmp-lg/9608012.pdf

Sproat, R. (1997). The analysis of text in text-to-speech synthesis. In J. P. H. van Santen, R. Sproat, J. P. Olive, & J. Hirschberg (Eds.), Progress in speech synthesis. (pp. 73-6). New York: Springer.

Sproat, R. (1997). Text interpretation for TtS synthesis. In R. A. Cole, J. Mariani, H. Uszkoreit, A. Zaenen, & V. Zue (Eds.), Survey of the state of the art in human language technology. (pp. 202-9). Cambridge: Cambridge University Press. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.366.9300


Sproat, R. (2008). Linguistic processing for speech synthesis. In J. E. Benesty, M. M. Sondhi, & Y. Huang (Eds.), Springer handbook of speech processing. (pp. 457-69). Berlin - Heidelberg: Springer. doi:10.1007/978-3-540-49127-9_22

Sproat, R., Black, A. W., Chen, S., Kumar, S., Ostendorf, M., & Richards, C. (1999). Normalization of non-standard words. WS’99 final report. Baltimore, MD: The Center for Language and Speech Processing, The John Hopkins University. WS’99 Summer Workshop. Retrieved from https://www.clsp.jhu.edu/vfsrv/ws99/projects/normal/report/report.pdf

Sproat, R., Black, A. W., Chen, S., Kumar, S., Ostendorf, M., & Richards, C. (2001). Normalization of non-standard words. Computer Speech & Language, 15(3), 287-333. doi:10.1006/csla.2001.0169


Taylor, P. (2009). Text decoding: Finding the words from the text. In Text-to-Speech synthesis. (pp. 79-111). Cambridge: Cambridge University Press. Retrieved from http://svr-www.eng.cam.ac.uk/~pat40/ttsbook_draft_2.pdf


Taylor, P. (2009). Text segmentation and organisation. In Text-to-Speech synthesis. (pp. 52-78). Cambridge: Cambridge University Press. Retrieved from http://svr-www.eng.cam.ac.uk/~pat40/ttsbook_draft_2.pdf

Trilla, A. (2009). Natural language processing techniques in text-to-speech synthesis and automatic speech recognition. Working paper, Barcelona: Departament de Tecnologies Mèdia, Enginyeria i Arquitectura La Salle, Universitat Ramon Llull.Retrieved from http://atrilla.net/data/files/micnlp09.pdf

Trilla, A. (2010). Natural language processing techniques applied to speech technologies. DEA, Diploma d’Estudis Avançats, Grup de Recerca en Tecnologies Mèdia, La Salle, Universitat Ramon Llull. Retrieved from http://atrilla.net/data/files/dea_atrilla.pdf

Xydas, G., Karberis, G., & Kouroupertroglou, G. (2004). Text normalization for the pronunciation of non-standard words in an inflected language. In G. A. Vouros & T. Panayiotopoulos (Eds.), SETN 2004. Methods and applications of artificial intelligence. Third Hellenic conference on AI. Samos, Greece, May 5-8, 2004. Proceedings. (pp. 390-9). Berlin - Heidelberg: Springer. doi:10.1007/978-3-540-24674-9_41. Retrieved from http://speech.di.uoa.gr/sppages/spppdf/web-xydas_setn04.pdf

Yarowsky, D. (1997). Homograph disambiguation in text-to-speech synthesis. In J. P. H. van Santen, R. Sproat, J. P. Olive, & J. Hirschberg (Eds.), Progress in speech synthesis. (pp. 157-72). New York: Springer. Retrieved from http://www.cs.jhu.edu/~yarowsky/pubs/ssynth.ps

Yarowsky, D. (1999). A comparison of corpus-based techniques for restoring accents in Spanish and French text. In S. Armstrong, K. Church, P. Isabelle, S. Manzi, R. Tzoukermann, & D. Yarowsky (Eds.), Natural language processing using very large corpora. (pp. 99-120). Dordrecht: Kluwer. Retrieved from http://www.cs.jhu.edu/~yarowsky/pubs/kluwerbook.ps

Zelinka, J., Kanis, J., & Müller, L. (2005). Automatic transcription of numerals in inflectional languages. In V. Matoušek, P. Mautner, & T. Pavelka (Eds.), TDS 2005. Text, speech and dialogue. 8th International conference. Karlovy Vary, Czech Republic, September 12-15, 2005. Proceedings. (pp. 326-33). Berlin - Heidelberg: Springer. doi:10.1007/11551874_42

up arrow

Assessment of text-to-speech systems

Aguilar, L., Fernández, J. M., Garrido, J. M., Llisterri, J., Macarrón, A., Monzón, L., & Rodríguez Crespo, M. Á. (1994). Diseño de pruebas para la evaluación de habla sintetizada en español y su aplicación a un sistema de conversión de texto a habla. Procesamiento del Lenguaje Natural (Actas del X Congreso de la SEPNL, Córdoba, 20-22 de Julio de 1994), 15. Retrieved from http://liceu.uab.cat/~joaquim/publicacions/Aguilar_et_al_94_Evaluacion_Texto_Habla_Espanol.pdf

Aguilar, L., Fernández, J. M., Garrido, J. M., Llisterri, J., Macarrón, A., Monzón, L., & Rodríguez Crespo, M. Á. (1994). Evaluation of a Spanish text-to-speech system. In SSW2-1994. Proceedings of the Second ESCA/IEEE Workshop on Speech Synthesis (pp. 207-210). Mohonk Mountain House, New Paltz, NY, USA. 12-15 September, 1994. Retrieved from http://liceu.uab.cat/~joaquim/publicacions/Aguilar_et_al_94_Evaluation_Spanish_TTS.pdf

up arrow

Applications

HESS, W.J. (1997) "Section Introduction. A Brief History of Applications", in van SANTEN, J.P.H. - SPROAT, R.W.- OLIVE, J.P.- HIRSCHBERG. J. (Eds.) Progress in Speech Synthesis. New York: Springer. pp. 563-564.

KELLER, E.- ZELLNER KELLER, B. (2000) "New Uses for Speech Synthesis", The Phonetician 81, 1: 35-40.

Pucher, M., Schuchmann, G., & Fröhlich, P. (2009). Regionalized text-to-speech systems: Persona design and application scenarios. In A. Esposito, A. Hussain, M. Marinaro, & R. Martone (Eds.), Multimodal signals: Cognitive and algorithmic issues. (pp. 216-22). Berlin - Heidelberg: Springer. doi:10.1007/978-3-642-00525-1_21

REOPATH, R. (1984) "Specific Applications of Speech Synthesis", in HOLMES, J. (Ed.) (1984) Proceedings of the 1st International Conference on Speech Technology. 23-25 October 1984, Brighton, UK. Bedford & Amsterdam: IFS Publications Ltd UK & North- Holland. pp. 145-159.

SORIN, C.- EMERARD, F. (1996) "Domaines d’application et evaluation de la synthèse de la parole à partir du texte", in MÉLONI, H. (Ed.) Fondements et perspectives en traitement automatique de la parole. Paris: Éditions AUPELF-UREF (Collection Universités Francophones). pp. 123-132.

TALBOTT, M. (1984) "A Cookbook of Application Ideas", in BRISTOW, G. (Ed.) Electronic Speech Synthesis. Techniques, Technology and Applications. London: Granada pp. 303-319.

up arrow

Telecommunications

GAGNOULET, C.- SORIN, C. (1993) "CNET Speech Recognition and Text-to-Speech for Telecommunications Applications", in Applications of Speech Technology. Proceedings of Joint ESCA-NATO/RSG 10 Tutorial and Workshop. Lautrach Conference Center, Bavaria, Germany, 17-17 September 1993. pp. 31-34.

RODRÍGUEZ CRESPO, M.A.- ESCALADA SARDINA, J.G.- MONZÓN SERRANO, L.- MACARRÓN LARUMBE, A. (1991) "Conversión texto-voz para el español en Telefónica I+D", in Simposio de la Lengua Española. Ciencia y Tecnología. Pabellón de España, Barcelona 7-11 de octubre de 1991.

ROE, P. (1984) "Speech Synthesis in Telecommunications", in BRISTOW, G. (Ed.) Electronic Speech Synthesis. Techniques, Technology and Applications. London: Granada. pp. 260-273.

SORIN, C. (1990) "Text-to-speech synthesis and telephone network experiments in France", in Proceedings of the ESCA Tutorial Day on Speech Synthesis. Autrans, France, 25-28 September 1990. pp. 83-89.

up arrow

Information systems

LAMEL, L.F.- GAUVAIN, J.L.- PROUTS, B.- BOUHIER, C.- BOESCH, R. (1993) "Generation and synthesis of broadcast messages", in Applications of Speech Technology. Proceedings of Joint ESCA-NATO/RSG 10 Tutorial and Workshop. Lautrach Conference Center, Bavaria, Germany, 17-17 September 1993. pp. 207-210.

van COILE, B.- RUHL, H.W.- VOGTEN, L.- THOONE, M.- GOSS, S.- DELAEY, D.- MOONS, E.- TERKEN, J.M.B.- DE PIJPER, J.R.- KUGLER, M.- KAUFHOLZ, P.- KRUGER, R.- LEYS, S.- WILLEMS, S. (1997) "Speech synthesis for the new Pan-European traffic message control system RDS-TMC", Speech Communication 23, 4: 297-305.

up arrow

Voice disabilities

ABADJIEVA, E.- MURRAY, I.R.- ARNOTT, J.L. (1993) "An enhanced development system for emotional speech synthesis for used in vocal prostheses", Proceedings of ECART2, the 2nd European Conference on the Advancement of Rehabilitation Technology. Stockholm, Sweden, 26-28 May 1993. paper1.2

ALM, N.- ARNOTT, J.L.- MURRAY, I.R. (1992) "Bypassing communication difficulties to allow satisfying conversational participation by a non-speaking person", Proceedings of the Institute of Acoustics 14,6: 637-644.

CARLSON, R.- GALYAS, K.- GRANSTRÖM, B.- HUNNICUTT, S.- LARSSON, B.- NEOVIUS, L. (1981) "A Multi-Language, Portable Text-to-Speech System for the Disabled", Speech Transmission Laboratory - Quarterly Progress Status Report 2-3: 8-16.

CARLSON, R.- GALYAS, K.- GRANSTRÖM, B.- PETTERSSON, M.- ZACHRISSON, G. (1980) "Speech Synthesis for the Non-Vocal in Training and Communication", Speech Transmission Laboratory -Quarterly Progress Status Report 1: 13-27.

CARLSON, R.- GRANSTRÖM, B.- HUNNICUTT, S. (1981) "Bliss Communication with Speech or Text Output", Speech Transmission Laboratory - Quarterly Progress and Status Report 4: 29-38.

DAMPER, R.I. (1990) "Speech aids for the handicapped", in AINSWORTH, W.A. (Ed.) Advances in speech, hearing and language processing. Vol 1. London: JAI Press. pp. 297-332.

DELIEGE, R.J.H. (1989) "An experimental Dutch keyboard-to-speech system for the speech impaired", Speech Communication 8,1: 81-90.

GIMÉNEZ DE LOS GALANES, F. (1995) "Conversión texto-voz como ayuda a la comunicación", in AGUILERA NAVARRO, S. (Coord.) Nuevas tecnologías aplicadas a la discapacidad. Proyectos y experiencias. Madrid: Instituto Nacional de Servicios Sociales, Ministerio de Asuntos Sociales. pp. 39-46.

GRANSTRÖM, B.- HUNNICUTT, S.- SPENS, K.E. (Eds.) (1993) Speech and Language Technology for Disabled Persons. Proceedings of an ESCA Workshop. Stockholm, Sweden, May 31-June 2, 1992. Stockholm: KTH-ESCA.

GUZMÁN, A. (2000) "Text Assist: síntesis de voz", Boletín de AELFA (Asociación Española de Logopedia, Foniatría y Audiología) 3: 9-10.

HUNNICUTT, S. (1987) "La síntesis de voz como ayuda técnica", Mundo electrónico 170: 63-

MARRERO, V.- DE SANTOS, A.- AGUILERA, S. (1991) "ASELA: Análisis, síntesis y evaluación del lenguaje y la audición", Procesamiento del Lenguaje Natural 10: 87-93.

NÉMETH, G.- OLASZY, G.- PATAKI, L.- HERNÁNDEZ-GÓMEZ, L.A.- FREITAS, D. (1995) "Improvement, Evaluation and Testing of a Low Cost Multilingual Portable Speaking Aid for the Speech Impaired", in Eurospeech’95. Proceedings of the 4th European Conference on Speech Communication and Technology. Madrid, Spain, 18-21 September, 1995. Vol 3, pp. 1887-1890.

TODMAN, J.- RANKIN, D.- FILE, P. (1999) "The use of stored text in computer-aided conversation: A single-case experiment", Journal of Language and Social Psychology 18, 3: 287-309.

up arrow

Analysis by synthesis of pathological voices

ALWAN, A.- BANGAYAN, P.- KEIMAN, J.- LONG, C. (1995) "Time and Frequency Synthesis Parameters for Severe Pathological Voice Qualities", in ELENIUS, K.- BRANDERUD, P. (Eds.) Proceedings of the XIIIth International Congress of Phonetic Sciences. Stockholm, Sweden, 13-19 August, 1995. Vol. 2, pp. 250-253.
http://www.seas.ucla.edu/spapl/paper/alwan_icphs95.pdf

BANGAYAN, P.- ALWAN, A.- KREIMAN, J.- LONG, C. (1994) "Synthesis of Severely Pathological Voices", Journal of the Acoustical Society of America 95, 5: 1pSP5.

BANGAYAN, P.- LONG, C.- ALWAN, A.A.- KREIMAN, J.- GERRATT, B.R. (1997) "Analysis by synthesis of pathological voices using the Klatt synthesizer", Speech Communication 22, 4: 343-368.
http://www.seas.ucla.edu/spapl/paper/bangayan_speechcomm97.pdf

LONG, C.- BANGAYAN, P.- ALWAN, A. (1993) "Acoustic Analysis and Synthesis of Pathological Voice Qualities", Journal of the Acoustical Society of America 93, 3, 2: 2aSP9.

up arrow

Visual disabilities

ALLEN, J. (1973) "Reading Machines for the Blind: The Technical Problems and the Methods Adopted for Their Solution", IEEE Transactions on Audio and Electroacoustics AU-21,3: 259-264.

BELLIK. Y (1997) "Multimodal text editor interface including speech for the blind", Speech Communication 23, 4: 319-332.

BEZOOIJEN, R. van (1989) "Evaluation of the suitability of Dutch text-to-speech conversion for application in a digital daily newspaper",in Proceedings of the ESCA Tutorial Day and Workshop on Speech Input/Output Assessment and Speech Databases. Noordwijkerhout, the Netherlands, 20-23 September 1989. pp.6.3.1-6.3.4.

CARLSON, R.- GRANSTRÖM, B. (1986) "Applications of a Multi-Lingual Text-to-Speech System for the Visually Impaired", in EMILIANI, P.L. (Ed.) Development of Electronic Aids for the Visually Impaired. Dordrecht: Martinus Nijhoff. pp. 87-96

CARLSON, R.- GRANSTRÖM, B.- LARSSON, K. (1976) "Evaluation of a Text-to-Speech System as a Reading Machine for the Blind", Speech Transmission Laboratroy - Quaterly Progress Status Report 2-3: 9-13.

COOPER, F.S.- GAITENBY, J.H.- MATTINGLY, I.G. - UMEDA, N. (1969) "Reading Aids for the Blind: A Special case of Machine-to-Man Communication", IEEE Transactions on Audio and Electroacoustics AU-17,4: 266-270.

GOLDEROS, A.- MARTINEZ, R.- NOMBELA, J.R. - PARDO, M.- SANTOS, J.- MUÑOZ, E. (1980) "Comunicación hombre-máquina por voz (II) Calculadora parlante en español para invidentes", Mundo electrónico 97: 95-98.

GRANSTRÖM, B.- HUNNICUTT, S.- SPENS, K.E. (Eds.) (1993) Speech and Language Technology for Disabled Persons. Proceedings of an ESCA Workshop. Stockholm, Sweden, May 31-June 2, 1992. Stockholm: KTH-ESCA.

HJELMQUIST, E. (1989) "Spoken newspaper for the blind", in Proceedings of the ESCA Tutorial Day and Workshop on Speech Input/Output Assessment and Speech Databases. Noordwijkerhout, the Netherlands, 20-23 September 1989. pp. 13-19.

KURZWEIL, R. (1976) "The Kurzweil Reading Machine: a technical overview", in REDDEN, M.R.- SCHWANDT, W. (Eds.) Science, Technology and the Disabled. Washington DC: American Association for the Advancement of Science, Report 76-R-11. pp. 3-11.

LLISTERRI, J.- FERNÁNDEZ, N.- GUDAYOL, F.- POYATOS, J.J.- MARTÍ, J. (1993) "Testing user’s acceptance of Ciber232, a text to speech system used by blind persons", in GRANSTRÖM, B.- HUNNICUTT, S.- SPENS, K.-E. (Eds.) Speech and Language Technology for Disabled Persons. Proceedings of an ESCA Workshop. Stockholm, Sweden, May 31-June 2, 1993. pp.203-206.
http://liceu.uab.cat/~joaquim/publicacions/Stockholm_93/stockholm_93.html

up arrow

Language teaching

COHEN, R. (Dir.) (1992) Quand l’ordinateur parle... Utilisation de la synthèse vocale dans l’apprentissage et le perfectionnement de la langue écrite. Paris: Presses Universitaires de France (L’educateur).

de PIJPER, J.R. (1997) "High-Quality Message-to-Speech Generation in a Practical Application", in van SANTEN, J.P.H. - SPROAT, R.W.- OLIVE, J.P.- HIRSCHBERG. J. (Eds.) Progress in Speech Synthesis. New York: Springer. pp. 575-590.

GRAY, T (1984) "Talking Computers in the Classroom", in BRISTOW, G. (Ed.) Electronic Speech Synthesis. Techniques, Technology and Applications. London: Granada pp. pp. 243-259.

GONZÁLEZ, M. (2002) "Laverca: diccionario de verbos gallegos con voz sintetizada", in DÍAZ GARCÍA, J. (Ed.) Actas del II Congreso de Fonética Experimental. Sevilla 5, 6 y 7 de marzo de 2001. Sevilla: Laboratorio de Fonética, Facultad de Filología, Universidad de Sevilla. pp. 209-214.

Handley, Z. (2009). Is text-to-speech synthesis ready for use in computer-assisted language learning?. Speech Communication, 51(10), 906-919.

STRATIL, M.- WESTON, G.- BURKHARDT, D. (1987) "Exploration of foreign languages speech synthesis", Literary and Linguistic Computing 2,2: 116-119.

TAMBAKAS, D.- EPITROPAKIS, N.- FAKOTAKIS, N.- KOKKINAKIS, G. (1993) "A voice interactive educational system", in Applications of Speech Technology. Proceedings of Joint ESCA-NATO/RSG 10 Tutorial and Workshop. Lautrach Conference Center, Bavaria, Germany, 17-17 September 1993. pp. 187-190.

Speech Technology Applications

up arrow

Speech Synthesis


Speech Synthesis - Bibliography
Joaquim Llisterri, Departament de Filologia Espanyola, Universitat Autònoma de Barcelona

Last updated: