一种基于频域特征和过渡段判决的端点检测算法

doi:10.6040/j.issn.1672-3961.2.2015.147

摘要/Abstract

摘要： 为了提高语音端点检测的准确性,增强端点检测算法在噪声环境下的鲁棒性,提出两种新的端点检测参数。其中,基于临界频带的谱熵参数综合考虑了人耳对语音的感知特性以及语音信号和噪声信号的频域分布差异,差值频域能量参数考虑了语音帧和无声帧在频域上的能量差异。结合两种参数的优点,构成一种鲁棒的端点检测参数,同时,为了避免因阀值判决的单一性而产生误判,在端点检测过程中加入了基于特征分布统计的过渡段判决。试验结果表明,本研究提出的语音端点检测算法对语音帧和无声帧具有较好的区分性,在不同噪声且信噪比较低情况下,端点检测准确率相比传统抗噪端点检测算法均有所提升,特别是在非平稳噪声下,准确率提升超过5%。

关键词: 过渡段判决, 能量熵, 频域能量, 谱熵, 端点检测, 临界频带

Abstract: In order to improve the accuracy of speech endpoint detection as well as enhance robustness of the endpoint detection algorithm in noisy environment, two new endpoint detection parameters were proposed. The spectrum entropy based on critical band took both perceptual characteristics of the human auditory system and the differences between speech and noise signals in frequency domain distribution into account, as well as the minus frequency-domain energy parameter paid attention to the difference between speech frames and silence frames in frequency energy. The advantages of those two parameters were combined to constitute a robust endpoint detection parameter. Meanwhile, in order to avoid the miscarriage of judgment caused by the unitary threshold, the transition fragment judgment based on statistics of characteristics distribution was applied. The experiment results showed that the endpoint detection algorithm had better discrimination for speech frames and silence frames, the algorithm could carry out better accuracy than other conventional anti-noisy endpoint detection algorithms under different and low signal-to-noise ratio noisy environments, especially in the case of non-stationary noise, the accuracy improved by more than 5%.

Key words: transition fragment judgment, energy entropy, frequency-domain energy, critical band, endpoint detection, spectrum entropy

中图分类号:

TP391.42

郭逾,张二华,刘驰. 一种基于频域特征和过渡段判决的端点检测算法[J]. 山东大学学报(工学版), 2016, 46(2): 57-63.

GUO Yu, ZHANG Erhua, LIU Chi. An endpoint detection algorithm based on frequency-domain characteristics and transition fragment judgment[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2016, 46(2): 57-63.

参考文献

[1] RAMIREZ J, YERAMOS P, GORRIZ M, et al. SVM-based speech endpoint detection using contextual speech features[J]. Electronics Letters, 2006, 42(7):426-428.
[2] 蔡魁杰. 基于支持向量机的汉语语音端点检测和声韵分离[D]. 哈尔滨:哈尔滨工程大学, 2007. CAI Kuijie. Endpoint detection and initial/final segmentation of Chinese speech based on SVM[D]. Harbin: Harbin Engineering University, 2007.
[3] 李发权, 杨立才, 颜红博. 基于PCA-SVM多生理信息融合的情绪识别方法[J]. 山东大学学报(工学版), 2014, 44(6):70-76. LI Faquan, YANG Licai, YAN Hongbo. An emotion recognition method of multiphysiological information fusion based on PCA-SVM[J]. Journal of Shandong University(Engineering Science), 2014, 44(6):70-76.
[4] WILPIN J G, RABINER L R. Application of hidden Markov models to automatic speech endpoint detection[J]. Computer Speech & Language, 1987, 2(3-4):321-346.
[5] OUZOUNOV A. Telephone speech endpoint detection using mean-delta feature[J]. Cybernetics and Information Technologies, 2014, 14(2):127-139.
[6] OUZOUNOV A. A robust features for speech detection[J]. Cybernetics and Information Technologies, 2004, 4(2):3-14.
[7] GHOSH P, TSIATRAS A, NARAYANAN S. Robust voice activity detection using long-term signal variability[J]. IEEE Trans on Audio, Speech and Language Processing, 2010, 19(3):600-613.
[8] 张君昌, 胡海涛, 崔力. 融合Burg谱估计与信号变化率测度的语音端点检测[J]. 西安电子科技大学学报(自然科学版), 2014, 41(3):192-195. ZHANG Junchang, HU Haitao, CUI Li. Robust voice endpoint detection fusing burg specturm estimate and signal variability[J]. Journal of Xidian University(Natural Science Edition), 2014, 41(3):192-195.
[9] LIANG Shenghuang, CHUNG Hoyang. A novel approach to robust speech endpoint detection in car environments[J]. IEEE ICASSP, 2000, 3:1751-1754.
[10] 吴迪, 赵鹤鸣, 陶智, 等. 低信噪比下采用感知语谱结构边界参数的语音端点检测算法[J]. 声学学报, 2014, 39(3):392-399. WU Di, ZHAO Heming, TAO Zhi, et al. Speech endpoint detection in low-SNRs environment based on perception spectrogram structure boundary parameter[J]. Chinese Journal of Acoustice, 2014, 39(3):392-399.
[11] 李杰, 周萍, 杜志然. 短时TEO能量在带噪语音端点检测中的应用[J]. 计算机工程与应用, 2013, 49(12):144-147. LI Jie, ZHOU Ping, DU Zhiran. Application of short-time TEO energy in noisy speech endpoint detection [J]. Computer Engineering and Application, 2013, 49(12):144-147.
[12] YING G, MITCHELL C, JAMIESON L. Endpoint detection of isolated utterances based on a modified teager energy measurement [J]. IEEE ICASSP, 1993, 2:732-735.
[13] 鲁远耀, 周妮, 肖珂, 等. 强噪声环境下改进的语音端点检测算法[J]. 计算机应用, 2014, 34(5):1386-1390. LU Yuanyao, ZHOU Ni, XIAO Ke, et al. Improved speech endpoint detection algorithm in strong noise environment[J]. Journal of Computer Applications, 2014, 34(5):1386-1390.
[14] 吴边, 王忠, 刘兴涛. 强背景噪声下语音端点检测的算法研究 [J]. 计算机工程与应用, 2011, 47(33):137-139. WU Bian, WANG Zhong, LIU Xintao. Research on speech endpoint detection in strong noise[J]. Computer Engineering and Application, 2011, 47(33):137-139.
[15] CHATLANI N, SORAGHAN J. EMD-based filtering(EMDF)of low-frequency noise for speech enhancement[J]. Audio, Speech, and Language Processing, IEEE Transactions on, 2012, 20(4):1158-1166.
[16] ZAO L, COELHO R, FLANDRIN P. Speech enhancement with EMD and hurst-based mode selection[J]. Audio, Speech, and Language Processing, IEEE/ACM Transactions on, 2014, 22(5):899-911.
[17] 王博, 郭英, 韩立峰. 基于熵函数的语音端点检测算法研究[J]. 信号处理, 2009, 25(3):368-373. WANG Bo, GUO Ying, HAN Lifeng. Research on entropy based voice activity detection algorithms[J]. Signal Processing, 2009, 25(3):368-373.
[18] GUNNAR FANT. Acoustic theory of speech production[M]. Hague: de Gruyter Mouton, 1970.
[19] HARRINGTON J, CASSIDY S. Techniques in speech acoustics[J]. Computational Linguistics, 1998, 8(2):294-295.
[20] ZWICKER M, TERHARDT E, Analytical expressions for critical band rate and critical bandwidth as a function of frequency[J]. Journal of the Acoustic Society of America, 1980, 68:1523-1525.
[21] 张仁志, 崔慧娟. 基于短时能量的语音端点检测算法研究[J]. 电声技术, 2005, 21(7):52-59. ZHANG Renzhi, CUI Huijuan. Speech endpoint detection algorithm analyses based on short-term energy[J]. Audio Engineering, 2005, 21(7):52-59.
[22] TOMI Kinnunen. Spectral features for automatic text independent speaker recognition(Licentiate's Thesis)[EB/OL].(2003-12-21)[2006-02-20]. ftp://ftp.cs.joensuu.fi/pub/PhLic/2004_-PhLic_-Kinnunen_- Tomi.pdf.
[23] 钱博, 李燕萍, 唐振民, 等. 基于频域能量分布分析的自适应元音帧提取算法 [J]. 电子学报, 2007, 35(2):279-282. QIAN Bo, LI Yanping, TANG Zhenmin, et al. Self-Adaptive vowel-frame detection algorithm based on energy distribution analysis in frequency domain[J]. Chinese Journal of Electronics, 2007, 35(2):279-282.
[24] GUNAWARDENA J. Min-max functions[J]. Discrete Event Dynamic Systems, 1994, 4(4):377-407.
[25] SAHOO T, PARTRA S. Silence removal and endpoint detection of speech signal for text independent speaker identification[J]. International Journal Image, Graphics and Signal Processing, 2014, 6(6):27-35.

多维度评价

Viewed

Full text

615

HTML			PDF

Just accepted	Online first	Issue	Just accepted	Online first	Issue
0	0	0	0	0	615

From	Others	local

Times	61	554
Rate	10%	90%

Abstract

1365

Just accepted	Online first	Issue

0	0	1365

From	Others	local

Times	1364	1
Rate	100%	0%

Cited

Web of Science	Crossref	ScienceDirect	Search for Citations in Google Scholar >>


This page requires you have already subscribed to WoS.

Shared

Discussed