A SYSTEM COMBINATION FOR MALAY BROADCAST NEWS TRANSCRIPTION
DOI:
https://doi.org/10.11113/jt.v77.6511Keywords:
System Combination, ROVER, Bahasa Malayu, Broadcast NewsAbstract
In this paper, we propose a post decoding system combination approach for automatic transcribing Malay broadcast news. This approach combines the hypotheses produced by parallel automatic speech recognition (ASR) systems. Each ASR system uses different language models, one which is generic domain model and another is domain specific model. The main idea is to take advantage of different ASR knowledge to improve ASR decoding result. It uses the language score and time information to produce a 1-best lattice, and then rescore the 1-best lattice to get the most likely word sequence as the final output. The proposed approach was compared with conventional combination approach, the recognizer output voting error reduction (ROVER). Our proposed approach improved the word error rate (WER) from 33.9% to 30.6% with an average relative WER improvement of 9.74%, and it is better than the conventional ROVER approach.
References
Grangier, David, & Vinciarelli, Alessandro. 2005. Effect of Segmentation Method on Video Retrieval Performance. In IEEE International Conference on Multimedia and Expo (ICME-05), IEEE, Amsterdam, The Netherlands. 5-8.
Wu, Chung-Hsien, & Hsieh, Chia-Hsin. 2009. Story Segmentation and Topic Classification of Broadcast News Via a Topic-Based Segmental Model and a Genetic Algorithm. IEEE Transactions on Audio, Speech, and Language Processing. 17(8): 1612-1623.
Lu, Mi Mi, Xie, Lei, Fu, Zhong Hua , Jiang, Dong Mei , & Zhang, Yan Ning. 2010. Multi-Modal Feature Integration for Story Boundary Detection in Broadcast News. In 7th International Symposium on Chinese Spoken Language processing (ISCSLP), IEEE, Taiwan. 420-425.
Lojka, M., & Juhar, J. 2014. Hypothesis Combination for Slovak Dictation Speech Recognition. In 56th International Symposium Electronics in Marine (ELMAR), IEEE, Zadar, Croatia.1-4.
Ellis, Daniel P. W. 2000. Stream Combination Before and/or After the Acoustic Model. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing, Istanbul, Turkey. 1635-1638.
Hoffmeister, Björn, Klein, Tobias, Schlüter, Ralf, & Ney, Hermann. 2006. Frame based System Combination and a Comparison with Weighted ROVER and CNC. In International Conference on Spoken Language Processing, Interspeech, Pittsburgh, PA, USA. 537-540.
Hoffmeister, Björn, Schlüter, Ralf, & Ney, Hermann. 2008. iCNC and iROVER: The Limits of Improving System Combination with Classification? In the 9th Annual Conference of the International Speech Communication Association, Interspeech, Brisbane, Australia. 232-235.
Chen, I-Fan, & Lee, Lin-Shan. 2006. A New Framework for System Combination Based on Integrated Hypothesis Space. In Proceedings of the International Conference on Spoken Language Processing (ICSLP), Pittburgh, Pensylvania, USA. 533-536.
Fiscus, Jonathan G. 1997. A Post-Processing System to Yield Reduced Word Error Rates: Recognizer Output Voting Error Reduction (ROVER). In IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Santa Barbara, CA, USA. 347-352.
Schwenk, Holger, & Gauvain, Jean-Luc. 2000. Combining Multiple Speech Recognizers Using Voting and Language Model Information. In International Conference on Spoken Language Processing, (ICSLP 2000), Beijing, China. 915-918.
Stolcke, A., Bratt, H., Butzberger, J., Franco, H., Gadde, V. R., Plauché, M., & Zheng, J. 2000. The SRI March 2000 Hub-5 Conversational Speech Transcription System. In Proceedings of the NIST Speech Transcription Workshop.
Clarkson, Philip, & Rosenfeld, Rosenfeld. 1997. Statistical Language Modeling Using the CMU-Cambridge Toolkit. In 5th European Conference on Speech Communication and Technology, Rhodes, Greece. 2707-2710.
Goel, Vaibhava, Kumar, Shankar, & Byrne, William. 2000. Segmental Minimum Bayes-risk ASR Voting Strategies. In Proceedings of the International Conference on Spoken Language Processing (ICSLP), Beijing, China. 139-142.
Hillard, Dustin, Hoffmeister, Björn, Ostendorf, Mari, Schlüter, Ralf, & Ney, Hermann. 2007. iROVER: Improving System Combination with Classification. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers , Association for Computational Linguistics, Rochester, New York, USA. 65-68.
Zhang, R., & Rudnicky, A. 2006. Investigations of Issues for Using Multiple Acoustic Models to Improve Continuous Speech Recognition. In International Conference on Spoken Language Processing (ICSLP), Pittsburgh, PA, USA. 529-533.
Ahmed, Basem A. 2014. Automatic Speech Recognition for MultiLingual Speakers. Ph.D. thesis, Universiti Sains Malaysia, Malaysia.
Brychcn, Tomas. 2012. Unsupervised Methods for Language Modeling. Ph.D. thesis, University of West Bohemia in Pilsen, Czech Republic.
Jayalakshmi, T., & Santhakumaran, D. A. 2011. Statistical Normalization and Back Propagation for Classification. International Journal of Computer Theory and Engineering. 3(1): 1793-8201.
Tan, Tien Ping, Haizhou, L. L., Kong, Tang Enya, & Xiong, Xiao. 2009. Mass: A Malay Language LVCSR Corpus Resource. In International Conference on Speech Database and Assessments, 2009 Oriental COCOSDA, IEEE, Urumqi. 25-30.
Good, Irving J. 1953. The Population Frequencies of Species and the Estimation of Population Parameters. Biometrika. 40(3-4): 327-264.
The NIST Scoring Toolkit (SCTK). Available: http://www1.icsi.berkeley.edu/Speech/docs/sctk-1.2/sctk.htm
Tan, Tien Ping. 2008. Automatic Speech Recognition for Non-Native Speakers. Ph.D. thesis, Université Joseph Fourier.
Downloads
Published
Issue
Section
License
Copyright of articles that appear in Jurnal Teknologi belongs exclusively to Penerbit Universiti Teknologi Malaysia (Penerbit UTM Press). This copyright covers the rights to reproduce the article, including reprints, electronic reproductions, or any other reproductions of similar nature.