山东大学学报(工学版) ›› 2016, Vol. 46 ›› Issue (2): 57-63.doi: 10.6040/j.issn.1672-3961.2.2015.147
郭逾,张二华*,刘驰
GUO Yu, ZHANG Erhua*, LIU Chi
摘要: 为了提高语音端点检测的准确性,增强端点检测算法在噪声环境下的鲁棒性,提出两种新的端点检测参数。其中,基于临界频带的谱熵参数综合考虑了人耳对语音的感知特性以及语音信号和噪声信号的频域分布差异,差值频域能量参数考虑了语音帧和无声帧在频域上的能量差异。结合两种参数的优点,构成一种鲁棒的端点检测参数,同时,为了避免因阀值判决的单一性而产生误判,在端点检测过程中加入了基于特征分布统计的过渡段判决。试验结果表明,本研究提出的语音端点检测算法对语音帧和无声帧具有较好的区分性,在不同噪声且信噪比较低情况下,端点检测准确率相比传统抗噪端点检测算法均有所提升,特别是在非平稳噪声下,准确率提升超过5%。
中图分类号:
[1] RAMIREZ J, YERAMOS P, GORRIZ M, et al. SVM-based speech endpoint detection using contextual speech features[J]. Electronics Letters, 2006, 42(7):426-428. [2] 蔡魁杰. 基于支持向量机的汉语语音端点检测和声韵分离[D]. 哈尔滨:哈尔滨工程大学, 2007. CAI Kuijie. Endpoint detection and initial/final segmentation of Chinese speech based on SVM[D]. Harbin: Harbin Engineering University, 2007. [3] 李发权, 杨立才, 颜红博. 基于PCA-SVM多生理信息融合的情绪识别方法[J]. 山东大学学报(工学版), 2014, 44(6):70-76. LI Faquan, YANG Licai, YAN Hongbo. An emotion recognition method of multiphysiological information fusion based on PCA-SVM[J]. Journal of Shandong University(Engineering Science), 2014, 44(6):70-76. [4] WILPIN J G, RABINER L R. Application of hidden Markov models to automatic speech endpoint detection[J]. Computer Speech & Language, 1987, 2(3-4):321-346. [5] OUZOUNOV A. Telephone speech endpoint detection using mean-delta feature[J]. Cybernetics and Information Technologies, 2014, 14(2):127-139. [6] OUZOUNOV A. A robust features for speech detection[J]. Cybernetics and Information Technologies, 2004, 4(2):3-14. [7] GHOSH P, TSIATRAS A, NARAYANAN S. Robust voice activity detection using long-term signal variability[J]. IEEE Trans on Audio, Speech and Language Processing, 2010, 19(3):600-613. [8] 张君昌, 胡海涛, 崔力. 融合Burg谱估计与信号变化率测度的语音端点检测[J]. 西安电子科技大学学报(自然科学版), 2014, 41(3):192-195. ZHANG Junchang, HU Haitao, CUI Li. Robust voice endpoint detection fusing burg specturm estimate and signal variability[J]. Journal of Xidian University(Natural Science Edition), 2014, 41(3):192-195. [9] LIANG Shenghuang, CHUNG Hoyang. A novel approach to robust speech endpoint detection in car environments[J]. IEEE ICASSP, 2000, 3:1751-1754. [10] 吴迪, 赵鹤鸣, 陶智, 等. 低信噪比下采用感知语谱结构边界参数的语音端点检测算法[J]. 声学学报, 2014, 39(3):392-399. WU Di, ZHAO Heming, TAO Zhi, et al. Speech endpoint detection in low-SNRs environment based on perception spectrogram structure boundary parameter[J]. Chinese Journal of Acoustice, 2014, 39(3):392-399. [11] 李杰, 周萍, 杜志然. 短时TEO能量在带噪语音端点检测中的应用[J]. 计算机工程与应用, 2013, 49(12):144-147. LI Jie, ZHOU Ping, DU Zhiran. Application of short-time TEO energy in noisy speech endpoint detection [J]. Computer Engineering and Application, 2013, 49(12):144-147. [12] YING G, MITCHELL C, JAMIESON L. Endpoint detection of isolated utterances based on a modified teager energy measurement [J]. IEEE ICASSP, 1993, 2:732-735. [13] 鲁远耀, 周妮, 肖珂, 等. 强噪声环境下改进的语音端点检测算法[J]. 计算机应用, 2014, 34(5):1386-1390. LU Yuanyao, ZHOU Ni, XIAO Ke, et al. Improved speech endpoint detection algorithm in strong noise environment[J]. Journal of Computer Applications, 2014, 34(5):1386-1390. [14] 吴边, 王忠, 刘兴涛. 强背景噪声下语音端点检测的算法研究 [J]. 计算机工程与应用, 2011, 47(33):137-139. WU Bian, WANG Zhong, LIU Xintao. Research on speech endpoint detection in strong noise[J]. Computer Engineering and Application, 2011, 47(33):137-139. [15] CHATLANI N, SORAGHAN J. EMD-based filtering(EMDF)of low-frequency noise for speech enhancement[J]. Audio, Speech, and Language Processing, IEEE Transactions on, 2012, 20(4):1158-1166. [16] ZAO L, COELHO R, FLANDRIN P. Speech enhancement with EMD and hurst-based mode selection[J]. Audio, Speech, and Language Processing, IEEE/ACM Transactions on, 2014, 22(5):899-911. [17] 王博, 郭英, 韩立峰. 基于熵函数的语音端点检测算法研究[J]. 信号处理, 2009, 25(3):368-373. WANG Bo, GUO Ying, HAN Lifeng. Research on entropy based voice activity detection algorithms[J]. Signal Processing, 2009, 25(3):368-373. [18] GUNNAR FANT. Acoustic theory of speech production[M]. Hague: de Gruyter Mouton, 1970. [19] HARRINGTON J, CASSIDY S. Techniques in speech acoustics[J]. Computational Linguistics, 1998, 8(2):294-295. [20] ZWICKER M, TERHARDT E, Analytical expressions for critical band rate and critical bandwidth as a function of frequency[J]. Journal of the Acoustic Society of America, 1980, 68:1523-1525. [21] 张仁志, 崔慧娟. 基于短时能量的语音端点检测算法研究[J]. 电声技术, 2005, 21(7):52-59. ZHANG Renzhi, CUI Huijuan. Speech endpoint detection algorithm analyses based on short-term energy[J]. Audio Engineering, 2005, 21(7):52-59. [22] TOMI Kinnunen. Spectral features for automatic text independent speaker recognition(Licentiate's Thesis)[EB/OL].(2003-12-21)[2006-02-20]. ftp://ftp.cs.joensuu.fi/pub/PhLic/2004-PhLic-Kinnunen- Tomi.pdf. [23] 钱博, 李燕萍, 唐振民, 等. 基于频域能量分布分析的自适应元音帧提取算法 [J]. 电子学报, 2007, 35(2):279-282. QIAN Bo, LI Yanping, TANG Zhenmin, et al. Self-Adaptive vowel-frame detection algorithm based on energy distribution analysis in frequency domain[J]. Chinese Journal of Electronics, 2007, 35(2):279-282. [24] GUNAWARDENA J. Min-max functions[J]. Discrete Event Dynamic Systems, 1994, 4(4):377-407. [25] SAHOO T, PARTRA S. Silence removal and endpoint detection of speech signal for text independent speaker identification[J]. International Journal Image, Graphics and Signal Processing, 2014, 6(6):27-35. |
No related articles found! |
|