Detail projektu

Nové směry ve výzkumu a využití hlasových technologií

Období řešení: 01.01.2005 — 31.12.2007

Zdroje financování

Grantová agentura České republiky - Standardní projekty

- plně financující (2005-01-01 - 2007-12-31)

O projektu


Projekt navazuje na předchozí úspěšný výzkum v oblasti zpracování řeči podporovaný GA ČR, zahájený komplexním projektem (1996 až 2001) a následovaný současným úkolem (2002-2004). Díky nim došlo k propojení všech předních českých pracovišť zabývajících se analýzou, rozpoznáváním a syntézou řeči a k rozvinutí spolupráce při řešení výzkumných úkolů, přesahujících zaměření jednotlivých dílčích týmů. Na této spolupráci je postaven i předkládaný projekt. Vychází z dosavadních výsledků v oblasti zpracování signálů, z existence vlastních rozsáhlých databází pro tvorbu akustických modelů použitelných v rozpoznávání i při syntéze, z propracovaných metod pravděpodobnostního modelování jazyka i ze zkušeností získaných návrhem funkčních prototypů. S ohledem na současné světové trendy bude hlavní pozornost zaměřena na rozvoj metod a algoritmů použitelných v distribuovaných a autonomních mobilních zařízeních, v rozpoznávacích systémech s velmi rozsáhlými slovníky, v hlasových syntezátorech pro interaktivní komunikační služby, v úlohách automatického přepisu zvukových nahrávek např. zpravodajství, rozhovorů, apod. Řešeno bude též multimodální zpracování řeči s podporou vizuální informace a rovněž otázky spojené s rozpoznáváním osob podle hlasu. Hlavní prioritou bude uplatnit všechny nové poznatky v prostředí češtiny s ohledem na její specifické potřeby.

Popis anglicky

The proposed project follows up the previous research activities carried out in the speech processing area by the team that integrates all Czech research groups which are recently active in speech analysis, synthesis and recognition. It was established in 1996 to participate on an ambitious 6-year project supported by the GACR and later continued in another speech oriented project ending in 2002. Each of the groups involved has its own proficiency in a specific domain, which allows the consortium to work on integrated and complex tasks. In the previous years the team has created large databases of annotated speech recordings, which are now available both training and testing purposes in speech recognition domain as well as for speech synthesis. In addition, a set of powerful tools and platforms for developing own recognition and synthesis systems has been built together with several working prototypes that serve for evaluation and demonstration purposes. Based on this state and with respect to the recent trends in voice technologies, the project will focus on the investigation and implementation of algorithms that are applicable in distributed, embedded and mobile systems, in recognition engines working with very large vocabularies, in TTS modules for interactive communication and information services, in automatic transcription of broadcast news as well as in multimodal audio-visual interfaces. Primarily, the research will address specific needs of Czech.

Klíčová slova
hlasové technologie;automatické rozpoznání řeči;multi-lingualní systémy;verifikace a rozpoznání řečníka;rozpoznání spojité řeči;audiovizuální zpracování řeči;rozsáhlé řečové databáze;dialogové systémy;optimalizace prozodie

Klíčová slova anglicky
voice technology;automatic speech recognition;multi-lingual systems;speaker recognition and verification;spontaneous speech recognition;accoustic-visual speech processing;automatic transcription;large speech databases;dialogue systems;prosody optimization

Označení

GA102/05/0278

Originální jazyk

čeština

Řešitelé

Útvary

Ústav počítačové grafiky a multimédií
- spolupříjemce (01.01.2005 - 31.12.2007)

Výsledky

SUMEC, S., KADLEC, J. Event Editor - The Multi-Modal Annotation Tool. In Workshop on Multimodal Interaction and Related Machine Learning Algorithms (MLMI). Edinburgh: 2005. p. 1 ( p.)
Detail

SZŐKE, I., SCHWARZ, P., BURGET, L., FAPŠO, M., KARAFIÁT, M., ČERNOCKÝ, J., MATĚJKA, P. Comparison of Keyword Spotting Approaches for Informal Continuous Speech. In Interspeech'2005 - Eurospeech - 9th European Conference on Speech Communication and Technology. European Conference EUROSPEECH. Lisabon: 2005. p. 633-636. ISSN: 1018-4074.
Detail

MIKOLOV, T.; OPARIN, I.; GLEMBEK, O.; BURGET, L.; KARAFIÁT, M.; ČERNOCKÝ, J. Použití mluvených korpusů ve vývoji systému pro rozpoznávání českých přednášek. Praha: Univerzita Karlova v Praze, 2007. s. 1-5.
Detail

SZŐKE, I.; BURGET, L.; KARAFIÁT, M. Combination of Word and Phoneme Approach for Spoken Term Detection. Brno: 2007. p. 1 (1 s.).
Detail

SZŐKE, I.; FAPŠO, M.; KARAFIÁT, M.; BURGET, L.; GRÉZL, F.; SCHWARZ, P.; GLEMBEK, O.; MATĚJKA, P.; KOPECKÝ, J.; ČERNOCKÝ, J. Spoken Term Detection System Based on a Combination of LVCSR and Phonetic Search. Brno: 2007. p. 1 (1 s.).
Detail

GRÉZL, F.; KARAFIÁT, M.; ČERNOCKÝ, J. Neural network topologies and bottle neck features in speech recognition. Brno: 2007. p. 78-82.
Detail

FAPŠO, M., SCHWARZ, P., SZŐKE, I., ČERNOCKÝ, J., SMRŽ, P., BURGET, L., KARAFIÁT, M. Search Engine for Information Retrieval from Multi-modal Records. Edinburgh: 2005.
Detail

SZŐKE, I., SCHWARZ, P., MATĚJKA, P., BURGET, L., FAPŠO, M., KARAFIÁT, M., ČERNOCKÝ, J. Comparison of Keyword Spotting Approaches for Informal Continuous Speech. In 2nd Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms. Edinburgh: 2005. p. 1 ( p.)
Detail

ZHU, Q., CHEN, B., GRÉZL, F., MORGAN, N. Improved MLP Structures for Data-Driven Feature Extraction for ASR. In Interspeech'2005 - Eurospeech - 9th European Conference on Speech Communication and Technology. European Conference EUROSPEECH. Lisabon: 2005. p. 2129 ( p.)ISSN: 1018-4074.
Detail

STOLCKE, A., ANGUERA, X., BOAKYE, K., CETIN, Ö., GRÉZL, F., JANIN, A., MANDAL, A., PESKIN, B., WOOTERS, C., ZHENG, J. Further Progress in Meeting Recognition: The ICSI-SRI Spring 2005 Speech-to-Text Evaluation System. In Machine Learning for Multimodal Interaction, Second International Workshop, MLMI 2005, Edinburgh, UK, July 11-13, 2005, Revised Selected Papers. Lecture Notes in Computer Science 3869, Springer 2006. Edinburgh, Scotland: University of Edinburgh, 2005. p. 463-475. ISBN: 978-3-540-32549-9.
Detail

FAPŠO, M., SMRŽ, P., SCHWARZ, P., SZŐKE, I., BURGET, L., KARAFIÁT, M., ČERNOCKÝ, J. Systém pre efektívne vyhľadávanie v rečových databázach. In Sborník databázové konference DATAKON 2005. Brno: Masaryk University, 2005. s. 323-333. ISBN: 80-210-3813-6.
Detail

KARAFIÁT, M., BURGET, L., ČERNOCKÝ, J. Using Smoothed Heteroscedastic Linear Discriminant Analysis in Large Vocabulary Continuous Speech Recognition System. In 2nd Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms. Edinbourgh: 2005. p. 1 ( p.)
Detail

KARAFIÁT, M. The Development of the AMI System for the Transcription of Speech in Meetings. In Machine Learning for Multimodal Interaction, Second International Workshop, MLMI 2005, Edinburgh, UK, July 11-13, 2005, Revised Selected Papers. Lecture Notes in Computer Science Volume 3869, Springer 2006. Edinburgh: University of Edinburgh, 2005. p. 344-356. ISBN: 978-3-540-32549-9.
Detail

KARAFIÁT, M. Transcription of Conference Room Meetings: an Investigation. In Interspeech'2005 - Eurospeech - 9th European Conference on Speech Communication and Technology. European Conference EUROSPEECH. Lisabon: International Speech Communication Association, 2005. p. 1 ( p.)ISSN: 1018-4074.
Detail

HAIN, T.; BURGET, L.; DINES, J.; GARAU, G.; KARAFIÁT, M.; LINCOLN, M.; MCCOWAN, I.; MOORE, D.; WAN, V.; ORDELMAN, R.; RENALS, S. The 2005 AMI System for the Transcription of Speech in Meetings. In Machine Learning for Multimodal Interaction, Second International Workshop, MLMI 2005, Edinburgh, UK, July 11-13, 2005, Revised Selected Papers. Lecture Notes in Computer Science Volume 3869, Springer 2006. Edinburgh: University of Edinburgh, 2005. p. 450-462. ISBN: 978-3-540-32549-9.
Detail

ASHBY, S., BOURBAN, S., CARLETTA, J., FLYNN, M., GUILLEMOT, M., HAIN, T., KARAISKOS, V., KRAAIJ, W., KRONENTHAL, M., LATHOUD, G., LINCOLN, M., LISOWSKA, A., MCCOWAN, I., POST, W., REIDSMA, D., WELLNER, P., KADLEC, J. The AMI Meeting Corpus: A Pre-Announcement. In Workshop on Multimodal Interaction and Related Machine Learning Algorithms (MLMI). Edinburgh: 2005. p. 1 ( p.)
Detail

MOTLÍČEK, P., BURGET, L., ČERNOCKÝ, J. Non-parametric Speaker Turn Segmentation of Meeting Data. In Interspeech'2005 - Eurospeech - 9th European Conference on Speech Communication and Technology. European Conference EUROSPEECH. Lisabon: International Speech Communication Association, 2005. p. 657-660. ISSN: 1018-4074.
Detail

SZŐKE, I. Smooth Pitch Tracker Based on Harmonic and Noise Model. In STUDENT EEICT 2005. Brno: Faculty of Information Technology BUT, 2005. p. 673-677. ISBN: 80-214-2890-2.
Detail

MATĚJKA, P. Phoneme Recognition Tuning for Language Identification System. In Proceedings of the 11th conference STUDENT EEICT 2005. Brno: Faculty of Electrical Engineering and Communication BUT, 2005. p. 658-653. ISBN: 80-214-2890-2.
Detail

MATĚJKA, P., SCHWARZ, P., ČERNOCKÝ, J., CHYTIL, P. Phonotactic Language Identification. In Proceedings of Radioelektronika 2005. Brno: Faculty of Electrical Engineering and Communication BUT, 2005. p. 140-143. ISBN: 80-214-2904-6.
Detail

MATĚJKA, P., SCHWARZ, P., ČERNOCKÝ, J., CHYTIL, P. Phonotactic Language Identification using High Quality Phoneme Recognition. In Interspeech'2005 - Eurospeech - 9th European Conference on Speech Communication and Technology. European Conference EUROSPEECH. Lisbon: International Speech Communication Association, 2005. p. 2237-2240. ISSN: 1018-4074.
Detail

ASHBY, S., BOURBAN, S., CARLETTA, J., FLYNN, M., GUILLEMOT, M., HAIN, T., KARAISKOS, V., KRAAIJ, W., KRONENTHAL, M., LATHOUD, G., LINCOLN, M., LISOWSKA, A., MCCOWAN, I., POST, W., REIDSMA, D., WELLNER, P., KADLEC, J. The AMI Meeting Corpus. In Measuring Behavior 2005 Proceedings Book. Wageningen: 2005. p. 1 ( p.)
Detail

FAPŠO, M.; SMRŽ, P.; SCHWARZ, P.; SZŐKE, I.; SCHWARZ, M.; ČERNOCKÝ, J.; KARAFIÁT, M.; BURGET, L. Information Retrieval from Spoken Documents. In Proceedings of the Seventh International Conference on Intelligent Text Processing and Computational Linguistics (CICLING 2006). Mexico City: Springer Verlag, 2006. p. 410-416. ISBN: 3-540-32205-1.
Detail

FAPŠO, M.; SCHWARZ, P.; SZŐKE, I.; SMRŽ, P.; SCHWARZ, M.; ČERNOCKÝ, J.; KARAFIÁT, M.; BURGET, L. Search Engine for Information Retrieval from Speech Records. In Proceedings of the Third International Seminar on Computer Treatment of Slavic and East European Languages. Bratislava: 2006. p. 100-101.
Detail

BURGET, L.; FAPŠO, M.; MATĚJKA, P.; SMRŽ, P.; ČERNOCKÝ, J.; KARAFIÁT, M.; SCHWARZ, P.; SZŐKE, I. Indexing and search methods for spoken documents. In Proceedings of the Ninth International Conference on Text, Speech and Dialogue, TSD 2006. Lecture Notes in Computer Science. LNCS. Berlin: Springer Verlag, 2006. p. 351-358. ISSN: 0302-9743.
Detail

MATĚJKA, P.; SCHWARZ, P.; BURGET, L.; ČERNOCKÝ, J. Use of anti-models to furher improve state-of-the-art PRLM language recognition system. In Proceedings of ICASSP 2006. Toulouse: 2006. p. 197-200.
Detail

BURGET, L.; MATĚJKA, P.; ČERNOCKÝ, J. Discriminative Training Techniques for Acoustic Language Identification. In Proceedings of ICASSP 2006. Toulouse: 2006. p. 209-212.
Detail

SCHWARZ, P.; MATĚJKA, P.; ČERNOCKÝ, J. Hierarchical structures of neural networks for phoneme recognition. In Proceedings of ICASSP 2006. Toulouse: 2006. p. 325-328.
Detail

MATĚJKA, P.; BURGET, L.; SCHWARZ, P.; ČERNOCKÝ, J. Brno University of Technology System for NIST 2005 Language Recognition Evaluation. In Proceedings of Odyssey 2006: The Speaker and Language Recognition Workshop. San Juan: 2006. p. 57-64. ISBN: 1-4244-0472-X.
Detail

MATĚJKA, P., BURGET, L., SCHWARZ, P., ČERNOCKÝ, J. NIST 2005 Language Recognition Evaluation. In Proceedings of NIST LRE 2005. Washington DC: 2006. p. 1-37.
Detail

ČERNOCKÝ, J.; MATĚJKA, P.; BURGET, L.; SCHWARZ, P. Automatic Language Identification System. In Sborník příspěvků z odborného semináře "Nové technologie v radiokomunikacích". Brno: Brno University of Defence, 2006. p. 1-6.
Detail

HUBEIKA, V. Estimation of Gender and Age from Recorded Speech. In Proc. ACM Student Research competition 2006. Prague: Czech Technical University, 2006. p. 25-32. ISBN: 80-01-03595-6.
Detail

KARAFIÁT, M.; GRÉZL, F.; SCHWARZ, P.; BURGET, L.; ČERNOCKÝ, J. Robust heteroscedastic linear discriminant analysis and LCRC posterior features in large vocabulary continuous speech recognition. In Proc. Fifth Slovenian and First International Language Technologies Conference. Ljubljana: 2006. p. 1-4.
Detail

KOPECKÝ, J.; SZŐKE, I.; FAPŠO, M.; KARAFIÁT, M.; BURGET, L.; OPARIN, I.; SCHWARZ, P.; MATĚJKA, P.; ČERNOCKÝ, J.; GLEMBEK, O. BUT System for NIST STD 2006 - Arabic. In Proc. NIST SPoken Term Detection Evaluation workshop (STD 2006). Washington D.C.: National Institute of Standards and Technology, 2006. p. 1-0.
Detail

KONTÁR, S. Parallel training of neural networks for speech recognition. In Proc. 12th International Conference on Soft Computing MENDEL'06. Brno: Brno University of Technology, 2006. p. 6037-0. ISBN: 80-214-3195-4.
Detail

GLEMBEK, O.; KARAFIÁT, M.; BURGET, L.; ČERNOCKÝ, J. Czech Speech Recognizer for Multiple Environments. In Radioeletronika 2006. Bratislava: 2006. p. 1-4.
Detail

VONDRA, M.; VÍCH, R. Can ASR be Used for Evaluating Speech Quality?. In 17th Czech-German Workshop Speech Processing. 1. Praha: UFE AVČR, 2007. s. 115-121. ISBN: 978-80-86269-00-9.
Detail

KARAFIÁT, M.; BURGET, L.; ČERNOCKÝ, J.; HAIN, T. Real-Time ASR from Meetings. In Proc. INTERSPEECH 2007. Proceedings of Interspeech. Antwerpen: International Speech Communication Association, 2007. p. 1260-1263. ISSN: 1990-9772.
Detail

ČERNOCKÝ, J.; SZŐKE, I.; FAPŠO, M.; KARAFIÁT, M.; BURGET, L.; KOPECKÝ, J.; GRÉZL, F.; SCHWARZ, P.; GLEMBEK, O.; OPARIN, I.; . Search in speech for public security and defense. In Proc. IEEE Workshop on Signal Processing Applications for Public Security and Forensics, 2007 (SAFE '07). Washington D.C.: IEEE Signal Processing Society, 2007. p. 1-7. ISBN: 1-4244-1226-9.
Detail

ČERNOCKÝ, J.; BURGET, L.; SCHWARZ, P.; MATĚJKA, P.; KARAFIÁT, M.; GLEMBEK, O.; KOPECKÝ, J.; SZŐKE, I.; FAPŠO, M.; GRÉZL, F.; . Search in speech, language identification and speaker recognition in Speech@FIT. In Proc. 17th International Conference Radioelektronika, 2007. Brno: Department of Radioelectronics FEEC BUT, 2007. p. 1-6. ISBN: 978-80-214-3390-8.
Detail

FAPŠO, M. Search in speech records. In Proc. 13th Conference STUDENT EEICT 2007. Brno: Faculty of Electrical Engineering and Communication BUT, 2007. p. 1-3. ISBN: 978-80-214-3410-3.
Detail

MATĚJKA, P.; BURGET, L.; GLEMBEK, O.; SCHWARZ, P.; HUBEIKA, V.; FAPŠO, M.; MIKOLOV, T.; PLCHOT, O. BUT system description for NIST LRE 2007. In Proc. 2007 NIST Language Recognition Evaluation Workshop. Orlando: National Institute of Standards and Technology, 2007. p. 1-5.
Detail

AL-HAMES, M.; HAIN, T.; ČERNOCKÝ, J.; SCHREIBER, S.; POEL, M.; MÜLLER, R.; MARCEL, S.; VAN LEEUWEN, D.; ODOBEZ, J.; BA, S.; . Audio-Visual Processing in Meetings: Seven Questions and Current AMI Answers. In Proc. 3nd Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms (MLMI 2006). Washington D.C.: 2006. p. 1-0.
Detail

SZŐKE, I.; FAPŠO, M.; KARAFIÁT, M.; BURGET, L.; GRÉZL, F.; SCHWARZ, P.; GLEMBEK, O.; MATĚJKA, P.; KONTÁR, S.; ČERNOCKÝ, J. BUT System for NIST STD 2006 - English. In Proc. NIST SPoken Term Detection Evaluation workshop (STD 2006). Washington D.C.: National Institute of Standards and Technology, 2006. p. 1-0.
Detail

GRÉZL, F.; KARAFIÁT, M.; KONTÁR, S.; ČERNOCKÝ, J. Probabilistic and bottle-neck features for LVCSR of meetings. In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2007). Hononulu: IEEE Signal Processing Society, 2007. p. 757-760. ISBN: 1-4244-0728-1.
Detail

MATĚJKA, P.; BURGET, L.; SCHWARZ, P.; GLEMBEK, O.; KARAFIÁT, M.; GRÉZL, F.; ČERNOCKÝ, J.; VAN LEEUWEN, D.; BRÜMMER, N.; STRASHEIM, A. STBU system for the NIST 2006 speaker recognition evaluation. In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2007). Honolulu: IEEE Signal Processing Society, 2007. p. 221-224. ISBN: 1-4244-0728-1.
Detail

GRÉZL, F.; ČERNOCKÝ, J. TRAP-based Techniques for Recognition of Noisy Speech. In Proc. 10th International Conference on Text Speech and Dialogue (TSD 2007). LNCS. Berlin: Springer Verlag, 2007. p. 270-277. ISBN: 978-3-540-74627-0.
Detail

KARAFIÁT, M.; GRÉZL, F.; SCHWARZ, P.; BURGET, L.; ČERNOCKÝ, J. Robust heteroscedastic linear discriminant analysis and LCRC posterior features in meeting data recognition. In Proc. 3nd Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms (MLMI 2006). LNCS 4299. Berlin: Springer Verlag, 2006. p. 275-284. ISBN: 3-540-69267-3.
Detail

HUBEIKA, V.; SZŐKE, I.; BURGET, L.; ČERNOCKÝ, J. Maximum Likelihood and Maximum Mutual Information Training in Gender and Age Recognition System. In Proc. 10th International Conference on Text Speech and Dialogue (TSD 2007). Pilsen: Springer Verlag, 2007. p. 1-6. ISBN: 978-3-540-74627-0.
Detail

SZŐKE, I., SCHWARZ, P., BURGET, L., KARAFIÁT, M., MATĚJKA, P., ČERNOCKÝ, J. Phoneme Based Acoustics Keyword Spotting in Informal Continuous Speech. Lecture Notes in Computer Science, 2005, vol. 2005, no. 3658, p. 302 ( p.)ISSN: 0302-9743.
Detail

BURGET, L.; MATĚJKA, P.; SCHWARZ, P.; GLEMBEK, O.; ČERNOCKÝ, J. Analysis of feature extraction and channel compensation in GMM speaker recognition system. IEEE Transactions on Audio, Speech, and Language Processing, 2007, vol. 15, no. 7, p. 1979-1986. ISSN: 1558-7916.
Detail

BRÜMMER, N.; BURGET, L.; ČERNOCKÝ, J.; GLEMBEK, O.; GRÉZL, F.; KARAFIÁT, M.; VAN LEEUWEN, D.; MATĚJKA, P.; SCHWARZ, P.; STRASHEIM, A. Fusion of heterogeneous speaker recognition systems in the STBU submission for the NIST speaker recognition evaluation 2006. IEEE Transactions on Audio, Speech, and Language Processing, 2007, vol. 15, no. 7, p. 2072-2084. ISSN: 1558-7916.
Detail

MATĚJKA, P., SCHWARZ, P., ČERNOCKÝ, J., CHYTIL, P. Tuning Phonotactic Language Identificaion System. Brno: Faculty of Information Technology BUT, 2005. p. 1 ( p.)
Detail

GRÉZL, F. Spectral plane investigation for probabilistic features for ASR. Edinburgh: 2005. p. 82 ( p.)
Detail

HAIN, T.; BURGET, L.; KARAFIÁT, M.: AMI Large vocabulary continuous speech recognizer. https://www.fit.vut.cz/research/product/25/. URL: https://www.fit.vut.cz/research/product/25/. (software)
Detail

SCHWARZ, P.; MATĚJKA, P.; ČERNOCKÝ, J.; SZŐKE, I.: System for on-line keyword spotting. https://www.fit.vut.cz/research/product/22/. URL: https://www.fit.vut.cz/research/product/22/. (software)
Detail

FAPŠO, M.; SZŐKE, I.; SCHWARZ, P.; ČERNOCKÝ, J.: Indexation and search engine for multimodal data. https://www.fit.vut.cz/research/product/24/. URL: https://www.fit.vut.cz/research/product/24/. (software)
Detail

CHALUPNÍČEK, K.; ČERNOCKÝ, J.; KAŠPÁREK, T.: Web-based system for semi-automatic checks of speech annotations. https://www.fit.vut.cz/research/product/27/. URL: https://www.fit.vut.cz/research/product/27/. (software)
Detail

BURGET, L.; GLEMBEK, O.; KARAFIÁT, M.; KONTÁR, S.; SCHWARZ, P.; ČERNOCKÝ, J.: STK Toolkit. https://www.fit.vut.cz/research/product/26/. URL: https://www.fit.vut.cz/research/product/26/. (software)
Detail

SCHWARZ, P.; MATĚJKA, P.; BURGET, L.; GLEMBEK, O.: VUT-SW-Search; Phoneme recognizer based on long temporal context. http://speech.fit.vutbr.cz/en/software/phoneme-recognizer-based-long-temporal-context. URL: http://speech.fit.vutbr.cz/en/software/phoneme-recognizer-based-long-temporal-context. (software)
Detail