System for Automatic Transcription of Sessions of the Polish Senate

Krzysztof MARASEK; Danijel KORŽINEK; Łukasz BROCKI

doi:10.2478/aoa-2014-0054

Authors

Krzysztof MARASEK Polish-Japanese Institute of Information Technology, Poland
Danijel KORŽINEK Polish-Japanese Institute of Information Technology, Poland
Łukasz BROCKI Polish-Japanese Institute of Information Technology, Poland

Abstract

This paper describes research behind a Large-Vocabulary Continuous Speech Recognition (LVCSR) system for the transcription of Senate speeches for the Polish language. The system utilizes several components: a phonetic transcription system, language and acoustic model training systems, a Voice Activity Detector (VAD), a LVCSR decoder, and a subtitle generator and presentation system. Some of the modules relied on already available tools and some had to be made from the beginning but the authors ensured that they used the most advanced techniques they had available at the time. Finally, several experiments where performed to compare the performance of both more modern and more conventional technologies.

Keywords:

large vocabulary speech recognition, language modelling, transcription, transliteration, subtitles.

References

Brocki, Ł. (2010a). Koneksjonistyczny model języka polskiego. In XII International PhD Workshop OWD 2010 .

Brocki, Ł. (2010b). Koneksjonistyczny Model Języka w Systemach Rozpoznawania Mowy . PhD thesis, Polish-Japanese Institute of Information Technology.

Brocki, Ł., Koržinek, D., and Marasek, K. (2006). Recognizing connected digit strings using neural networks. In Text, Speech and Dialogue , Springer.

Brocki, Ł., Koržinek, D., and Marasek, K. (2008). Telephony based voice portal for a university.

Brocki, Ł., Marasek, K., and Koržinek, D. (2012a). Connectionist language model for polish. In Intelligent Tools for Building a Scientic Information Platform , Springer.

Brocki, Ł., Marasek, K., and Koržinek, D. (2012b). Multiple model text normalization for the polish language. In Foundations of Intelligent Systems , Springer.

Demenko, G., Grocholewski, S., Klessa, K., Ogórkiewicz, J., Wagner, A., Lange, M., Sledzinski, D., and Cylwik, N. (2008). Jurisdic: Polish speech database for taking dictation of legal texts. In LREC .

Federico, M., Bertoldi, N., and Cettolo, M. (2008). Irstlm: an open source toolkit for handling large scale language models. In Interspeech

Glass, J. R., Hsu, B.-J., et al. (2009). Language modeling for limited-data domains.

Graves, A., Eck, D., Beringer, N., and Schmidhuber, J. (2004). Biologically plausible speech recognition with lstm neural nets. In Biologically Inspired Approaches to Advanced Information Technology , Springer.

Hickson, I. (2012). Webvtt. living standard. World Wide Web Consortium .

Hinton, G. E., Osindero, S., and Teh, Y.-W. (2006). A fast learning algorithm for deep belief nets. Neural computation , 18(7):1527 1554.

Huijbregts, M. A. H. (2008). Segmentation, diarization and speech transcription: surprise data unraveled.

Jelinek, F. (1997). Statistical methods for speech recognition . MIT press.

Katsamanis, A., Black, M., Georgiou, P. G., Goldstein, L., and Narayanan, S. (2011). Sailalign: Robust long speech-text alignment. In Proc. of Workshop on New Tools and Methods for Very-Large Scale Phonetics Research .

Kneser, R. and Ney, H. (1995). Improved backing-o for mgram language modeling. In Acoustics, Speech, and Signal Processing, 1995. ICASSP-95., 1995 International Conference on , volume 1, IEEE.

Koržinek, D. and Brocki, Ł. (2007). Grammar based automatic speech recognition system for the polish language. In Recent Advances in Mechatronics , Springer.

Kos, M., Vlaj, D., and Kacic, Z. (1996). Sloparl-slovenian parliamentary speech and text corpus for large vocabulary continuous speech recognition.

Lee, A., Kawahara, T., and Shikano, K. (2001). Juliusan open source real-time large vocabulary recognition engine.

Lööf, J., Bisani, M., Gollan, C., Heigold, G., Homeister, B., Plahl, C., Schlüter, R., and Ney, H. (2006). The 2006 rwth parliamentary speeches transcription system. In INTERSPEECH .

Marasek, K. (2012). Ted polish-to-english translation system for the iwslt 2012. Proceedings IWSLT 2012 .

Marasek, K., Brocki, Ł., Koržinek, D., Szklanny, K., and Gubrynowicz, R. (2009). User-centered design for a voice portal. In Aspects of Natural Language Processing , Springer.

Michalewicz, Z. (1996). Genetic algorithms+ data structures= evolution programs . springer.

Miłkowski, M. (2012). The Polish language in the digital age. Springer.

Mori, R. D. (1998). Spoken Dialogue With Computers (Signal Processing and its Applications) . Academic Press.

Povey, D., Burget, L., Agarwal, M., Akyazi, P., Feng, K., Ghoshal, A., Glembek, O., Goel, N. K., Karaát, M., Rastrow, A., et al. (2010). Subspace gaussian mixture models for speech recognition. In Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on , IEEE.

Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., et al. (2011). The kaldi speech recognition toolkit. In IEEE 2011 workshop on automatic speech recognition and understanding .

Pražák, A., Psutka, J. V., Hoidekr, J., Kanis, J., Müller, L., and Psutka, J. (2006). Automatic online subtitling of the czech parliament meetings. In Text, Speech and Dialogue , Springer.

Przepiórkowski, A., Bańko, M., Górski, R., and Lewandowska-Tomaszczyk, B. (2012). Narodowy Korpus Języka Polskiego . Wydawnictwo Naukowe PWN, Warszawa.

Psutka, J. V. (2007). Benet of maximum likelihood linear transform (mllt) used at dierent levels of covariance matrices clustering in asr systems. In Text, Speech and Dialogue , Springer.

Rabiner, L. R. (1989). A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE , 77(2):257286.

Robinson, T., Hochberg, M., and Renals, S. (1996). The use of recurrent neural networks in continuous speech recognition. In Automatic speech and speaker recognition , Springer.

Romero-Fresco, P. (2011). Subtitling through speech recognition: Respeaking . St. Jerome Publishing.

Stolcke, A. et al. (2002). Srilm-an extensible language modeling toolkit. In INTERSPEECH .

Vesel

Online first
Early birds
2026, Vol 51
	No 1	No 2
2025, Vol 50
	No 1	No 2	No 3	No 4
2024, Vol 49
	No 1	No 2	No 3	No 4
2023, Vol 48
	No 1	No 2	No 3	No 4
2022, Vol 47
	No 1	No 2	No 3	No 4
2021, Vol 46
	No 1	No 2	No 3	No 4
2020, Vol 45
	No 1	No 2	No 3	No 4
2019, Vol 44
	No 1	No 2	No 3	No 4
2018, Vol 43
	No 1	No 2	No 3	No 4
2017, Vol 42
	No 1	No 2	No 3	No 4
2016, Vol 41
	No 1	No 2	No 3	No 4
2015, Vol 40
	No 1	No 2	No 3	No 4
2014, Vol 39
	No 1	No 2	No 3	No 4
2013, Vol 38
	No 1	No 2	No 3	No 4
2012, Vol 37
	No 1	No 2	No 3	No 4
2011, Vol 36
	No 1	No 2	No 3	No 4
2010, Vol 35
	No 1	No 2	No 3	No 4
2009, Vol 34
	No 1	No 2	No 3	No 4
2008, Vol 33
	No 1	No 2	No 3	No 4	No 4(S)
2007, Vol 32
	No 1	No 2	No 3	No 4	No 4(S)
2006, Vol 31
	No 1	No 2	No 3	No 4	No 4(S)
2005, Vol 30
	No 1	No 2	No 3	No 4
2004, Vol 29
	No 1	No 2	No 3	No 4
2003, Vol 28
	No 1	No 2	No 3	No 4
2002, Vol 27
	No 1	No 2	No 3	No 4
2001, Vol 26
	No 1	No 2	No 3	No 4
2000, Vol 25
	No 1	No 2	No 3	No 4
1999, Vol 24
	No 1	No 2	No 3	No 4
1998, Vol 23
	No 1	No 2	No 3	No 4
1997, Vol 22
	No 1	No 2	No 3	No 4
1996, Vol 21
	No 1	No 2	No 3	No 4
1995, Vol 20
	No 1	No 2	No 3	No 4
1994, Vol 19
	No 1	No 2	No 3	No 4
1993, Vol 18
	No 1	No 2	No 3	No 4
1992, Vol 17
	No 1	No 2	No 3	No 4
1991, Vol 16
	No 1	No 2	No 3-4
1990, Vol 15
	No 1-2		No 3-4
1989, Vol 14
	No 1-2		No 3-4
1988, Vol 13
	No 1-2		No 3-4
1987, Vol 12
	No 1	No 2	No 3-4
1986, Vol 11
	No 1	No 2	No 3	No 4
1985, Vol 10
	No 1	No 2	No 3	No 4
1984, Vol 9
	No 1-2		No 3	No 4
1983, Vol 8
	No 1	No 2	No 3	No 4
1982, Vol 7
	No 1	No 2	No 3-4
1981, Vol 6
	No 1	No 2	No 3	No 4
1980, Vol 5
	No 1	No 2	No 3	No 4
1979, Vol 4
	No 1	No 2	No 3	No 4
1978, Vol 3
	No 1	No 2	No 3	No 4
1977, Vol 2
	No 1	No 2	No 3	No 4
1976, Vol 1
	No 1	No 2	No 3	No 4

System for Automatic Transcription of Sessions of the Polish Senate

Downloads

Authors

Abstract

Keywords:

References

Other articles by the same author(s)

cover

ippt-pan

Issue

Pages

Section

DOI

License

How to Cite

Principal Contact

Address

Support Contact