Detail projektu

Multiligvální rozpoznávání a vyhledávání v řeči pro elektronické slovníky

Období řešení: 01.09.2009 — 31.08.2013

O projektu

Projekt je zaměřen na výzkum, vývoj a ověření technologií, které umožní prototypovat systémy pro rozpoznávání a vyhledávání v řeči pouze s několika hodinami přepsaných trénovacích dat, bez fonetické a lingvistické expertízy. Tyto technologie budou ověřeny v oblasti elektronických slovníků.

Popis anglicky
The proposed project aims at research, development and assessment of technologies for prototyping of speech recognition and search systems with only a few hours of transcribed training data, without the need for phonetic or linguistic expertise. These technologies will be tested in the domain of electronic dictionaries.

Klíčová slova
multiligvalita, rozpoznávání řeči, detekceklíčových slov, elektronické slovníky

Klíčová slova anglicky
multilinguality, speech recognition, keyword spotting, electronic dictionaries

Označení

FR-TI1/034

Originální jazyk

čeština

Řešitelé

Útvary

Ústav počítačové grafiky a multimédií
- spolupříjemce (01.09.2009 - 30.08.2013)
Lingea s.r.o.
- příjemce (01.09.2009 - 30.08.2013)

Zdroje financování

Ministerstvo průmyslu a obchodu ČR - TIP

- částečně financující (2009-09-01 - 2013-08-30)

Výsledky

POVEY, D.; BURGET, L.; AGARWAL, M.; AKYAZI, P.; FENG, K.; GHOSHAL, A.; GLEMBEK, O.; GOEL, N.; KARAFIÁT, M.; RASTROW, A.; ROSE, R.; SCHWARZ, P.; THOMAS, S. Subspace Gaussian mixture models for speech recognition. In Proc. International Conference on Acoustics, Speech, and Signal Processing. Proc. International Conference on Acoustics, Speech, and Signal Processing. Dallas: IEEE Signal Processing Society, 2010. p. 4330-4333. ISBN: 978-1-4244-4296-6. ISSN: 1520-6149.
Detail

POVEY, D.; GHOSHAL, A.; BOULIANNE, G.; BURGET, L.; GLEMBEK, O.; GOEL, N.; HANNEMANN, M.; MOTLÍČEK, P.; QIAN, Y.; SCHWARZ, P.; SILOVSKÝ, J.; STEMMER, G.; VESELÝ, K. The Kaldi Speech Recognition Toolkit. In Proceedings of ASRU 2011. Hilton Waikoloa Village Resort, Hawaii: IEEE Signal Processing Society, 2011. p. 1-4. ISBN: 978-1-4673-0366-8.
Detail

GHOSHAL, A.; POVEY, D.; AGARWAL, M.; AKYAZI, P.; BURGET, L.; FENG, K.; GLEMBEK, O.; GOEL, N.; KARAFIÁT, M.; RASTROW, A.; ROSE, R.; SCHWARZ, P.; THOMAS, S. A novel estimation of feature-space MLLR for full_covariance models. In Proc. International Conference on Acoustics, Speech, and Signal Processing. Proc. International Conference on Acoustics, Speech, and Signal Processing. Dallas: IEEE Signal Processing Society, 2010. p. 4310-4313. ISBN: 978-1-4244-4296-6. ISSN: 1520-6149.
Detail

GOEL, N.; THOMAS, S.; AGARWAL, M.; AKYAZI, P.; BURGET, L.; FENG, K.; GHOSHAL, A.; GLEMBEK, O.; KARAFIÁT, M.; POVEY, D.; RASTROW, A.; ROSE, R.; SCHWARZ, P. Approaches to automatic lexicon learning with limited training examples. In Proc. International Conference on Acoustics, Speech, and Signal Processing. Proc. International Conference on Acoustics, Speech, and Signal Processing. Dallas: IEEE Signal Processing Society, 2010. p. 5094-5097. ISBN: 978-1-4244-4296-6. ISSN: 1520-6149.
Detail

POVEY, D.; KARAFIÁT, M.; GHOSHAL, A.; SCHWARZ, P. A Symmetrization of the Subspace Gaussian Mixture Model. In Proceedings of 2011 IEEE International Conference on Acoustics, Speech, and Signal Processing. Praha: IEEE Signal Processing Society, 2011. p. 4504-4507. ISBN: 978-1-4577-0537-3.
Detail

POVEY, D.; BURGET, L.; AGARWAL, M.; AKYAZI, P.; GHOSHAL, A.; GLEMBEK, O.; GOEL, N.; KARAFIÁT, M.; RASTROW, A.; ROSE, R.; SCHWARZ, P.; THOMAS, S. The subspace Gaussian mixture model-A structured model for speech recognition. COMPUTER SPEECH AND LANGUAGE, 2011, vol. 25, no. 2, p. 404-439. ISSN: 0885-2308.
Detail

KARAFIÁT, M.; BURGET, L.; MATĚJKA, P.; GLEMBEK, O.; ČERNOCKÝ, J. iVector-Based Discriminative Adaptation for Automatic Speech Recognition. In Proceedings of ASRU 2011. Hilton Waikoloa Village, Big Island, Hawaii: IEEE Signal Processing Society, 2011. p. 152-157. ISBN: 978-1-4673-0366-8.
Detail

VESELÝ, K.; KARAFIÁT, M.; GRÉZL, F. Convolutive Bottleneck Network Features for LVCSR. In Proceedings of ASRU 2011. Big Island, Hawaii: IEEE Signal Processing Society, 2011. p. 42-47. ISBN: 978-1-4673-0366-8.
Detail

MIKOLOV, T.; DEORAS, A.; POVEY, D.; BURGET, L.; ČERNOCKÝ, J. Strategies for Training Large Scale Neural Network Language Models. In Proceedings of ASRU 2011. Hilton Waikoloa Village, Big Island, Hawaii: IEEE Signal Processing Society, 2011. p. 196-201. ISBN: 978-1-4673-0366-8.
Detail

GRÉZL, F.; KARAFIÁT, M.; JANDA, M. Study of Probabilistic and Bottle-Neck Features in Multilingual Environment. In Proceedings of ASRU 2011. Hilton Waikoloa Village, Big Island, Hawaii: IEEE Signal Processing Society, 2011. p. 359-364. ISBN: 978-1-4673-0366-8.
Detail

KOMBRINK, S.; MIKOLOV, T.; KARAFIÁT, M.; BURGET, L. Improving Language Models for ASR Using Translated In-domain Data. In Proceedings of 2012 IEEE International Conference on Acoustics, Speech and Signal Processing. Kyoto: IEEE Signal Processing Society, 2012. p. 4405-4408. ISBN: 978-1-4673-0044-5.
Detail

KARAFIÁT, M.; JANDA, M.; ČERNOCKÝ, J.; BURGET, L. Region Dependent Linear Transforms in Multilingual Speech Recognition. In Proc. International Conference on Acoustics, Speech, and Signal Processing 2012. Kyoto: IEEE Signal Processing Society, 2012. p. 4885-4888. ISBN: 978-1-4673-0044-5.
Detail

JANDA, M. Grapheme Based Speech Recognition. In Proceedings of the 18th Conference STUDENT EEICT 2012. Brno: Brno University of Technology, 2012. p. 441-445. ISBN: 978-80-214-4460-7.
Detail

BRUMMER, N.; CUMANI, S.; GLEMBEK, O.; KARAFIÁT, M.; MATĚJKA, P.; PEŠÁN, J.; PLCHOT, O.; SOUFIFAR, M.; DE VILLIERS, E.; ČERNOCKÝ, J. Description and analysis of the Brno276 system for LRE2011. In Proceedings of Odyssey 2012: The Speaker and Language Recognition Workshop. Singapur: International Speech Communication Association, 2012. p. 216-223. ISBN: 978-981-07-3093-2.
Detail

PLCHOT, O.; KARAFIÁT, M.; BRUMMER, N.; GLEMBEK, O.; MATĚJKA, P.; DE VILLIERS, E.; ČERNOCKÝ, J. Speaker vectors from Subspace Gaussian Mixture Model as complementary features for Language Identification. In Proceedings of Odyssey 2012, The Speaker and Language Recognition Workshop. Singapur: International Speech Communication Association, 2012. p. 330-333. ISBN: 978-981-07-3093-2.
Detail

MIKOLOV, T.; KOMBRINK, S.; DEORAS, A.; BURGET, L.; ČERNOCKÝ, J. RNNLM - Recurrent Neural Network Language Modeling Toolkit. In Proceedings of ASRU 2011. Hilton Waikoloa Village, Big Island, Hawaii: IEEE Signal Processing Society, 2011. p. 1-4. ISBN: 978-1-4673-0366-8.
Detail

VESELÝ, K.; KARAFIÁT, M.; GRÉZL, F.; JANDA, M.; EGOROVA, E. The Language-Independent Bottleneck Features. In Proceedings of IEEE 2012 Workshop on Spoken Language Technology. Miami: IEEE Signal Processing Society, 2012. p. 336-341. ISBN: 978-1-4673-5124-9.
Detail

JANDA, M.; KARAFIÁT, M.; ČERNOCKÝ, J. Dealing with Numbers in Grapheme-Based Speech Recognition. In Proceedings of 15th International Conference on Text, Speech and Dialogue. Lecture Notes in Computer Science. Lecture Notes in Computer Science, 2012, Volume 7499. Springer-Verlag Berlin Heidelberg 2012: Springer Verlag, 2012. p. 438-445. ISBN: 978-3-642-32789-6. ISSN: 0302-9743.
Detail

SZŐKE, I.; FAPŠO, M.; VESELÝ, K. BUT2012 přístup pro Spoken Web Search úkol na MediaEval2012. In Working Notes Proceedings of the MediaEval 2012 Workshop. CEUR Workshop Proceedings. Pisa: CEUR-WS.org, 2012. s. 1-2. ISSN: 1613-0073.
Detail

TEJEDOR, J.; FAPŠO, M.; SZŐKE, I.; ČERNOCKÝ, J.; GRÉZL, F. Comparison of methods for language-dependent and language-independent query-by-example spoken term detection. ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2012, vol. 2012, no. 30, p. 1-34. ISSN: 1046-8188.
Detail

JANDA, M. Automatic Generation Of Pronunciation Dictionaries Based On Diarization. In Proceedings of the 19th Conference Student EEICT 2013. Brno: Brno University of Technology, 2013. p. 228-232. ISBN: 978-80-214-4695-3.
Detail

EGOROVA, E.; VESELÝ, K.; KARAFIÁT, M.; JANDA, M.; ČERNOCKÝ, J. Manual and Semi-Automatic Approaches to Building a Multilingual Phoneme Set. In Proceedings of ICASSP 2013. Vancouver: IEEE Signal Processing Society, 2013. p. 7324-7328. ISBN: 978-1-4799-0355-9.
Detail

SOUFIFAR, M.; BURGET, L.; PLCHOT, O.; CUMANI, S.; ČERNOCKÝ, J. Regularized Subspace n-Gram Model for Phonotactic iVector Extraction. In Proceedings of Interspeech 2013. Proceedings of the 14th Annual Conference of the International Speech Communication Association (Interspeech 2013). Lyon: International Speech Communication Association, 2013. p. 74-78. ISBN: 978-1-62993-443-3. ISSN: 2308-457X.
Detail

BURGET, L.; SCHWARZ, P.; AGARWAL, M.; AKYAZI, P.; FENG, K.; GHOSHAL, A.; GLEMBEK, O.; GOEL, N.; KARAFIÁT, M.; POVEY, D.; RASTROW, A.; ROSE, R.; THOMAS, S. Multilingual acoustic modeling for speech recognition based on Subspace Gaussian Mixture Models. In Proc. International Conference on Acoustictics, Speech, and Signal Processing. Proc. International Conference on Acoustics, Speech, and Signal Processing. Dallas: IEEE Signal Processing Society, 2010. p. 4334-4337. ISBN: 978-1-4244-4296-6. ISSN: 1520-6149.
Detail

KARAFIÁT, M.; GRÉZL, F.; EGOROVA, E.; JANDA, M.; ČERNOCKÝ, J.; KAŠPAR, M.: ZB - FR-TI1/034; Prototypování rozpoznávačů řeči pro nové jazyky. http://www.fit.vutbr.cz/research/groups/speech/publi/2013/Overena_technologie_2013_Projekt_FR_TI1_034.pdf. URL: http://www.fit.vutbr.cz/research/groups/speech/publi/2013/Overena_technologie_2013_Projekt_FR_TI1_034.pdf. (ověřená technologie)
Detail

KARAFIÁT, M.; GRÉZL, F.; EGOROVA, E.; JANDA, M.; ČERNOCKÝ, J.: R - MPO TIP FR-TI1/034; Multilingvální modely pro rozpoznávání řeči. Produkt je umístěn na serverech ÚPGM FIT VUT v Brně.. URL: https://www.fit.vut.cz/research/product/375/. (software)
Detail