您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报(工学版) ›› 2016, Vol. 46 ›› Issue (2): 57-63.doi: 10.6040/j.issn.1672-3961.2.2015.147

• 机器学习与数据挖掘 • 上一篇    下一篇

一种基于频域特征和过渡段判决的端点检测算法

郭逾,张二华*,刘驰   

  1. 南京理工大学计算机科学与工程学院, 江苏 南京 210094
  • 收稿日期:2015-05-12 出版日期:2016-04-20 发布日期:2015-05-12
  • 通讯作者: 张二华(1967— ),男,湖北汉川人,副教授,博士,主要研究方向为数字信号处理、三维数据可视化. E-mail:zherhua@163.com E-mail:westkash@qq.com
  • 作者简介:郭逾(1990— ),男,福建龙岩人,硕士研究生,主要研究方向为语音信号处理、说话人识别.E-mali:villa32@qq.com

An endpoint detection algorithm based on frequency-domain characteristics and transition fragment judgment

GUO Yu, ZHANG Erhua*, LIU Chi   

  1. School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, Jiangsu, China
  • Received:2015-05-12 Online:2016-04-20 Published:2015-05-12

摘要: 为了提高语音端点检测的准确性,增强端点检测算法在噪声环境下的鲁棒性,提出两种新的端点检测参数。其中,基于临界频带的谱熵参数综合考虑了人耳对语音的感知特性以及语音信号和噪声信号的频域分布差异,差值频域能量参数考虑了语音帧和无声帧在频域上的能量差异。结合两种参数的优点,构成一种鲁棒的端点检测参数,同时,为了避免因阀值判决的单一性而产生误判,在端点检测过程中加入了基于特征分布统计的过渡段判决。试验结果表明,本研究提出的语音端点检测算法对语音帧和无声帧具有较好的区分性,在不同噪声且信噪比较低情况下,端点检测准确率相比传统抗噪端点检测算法均有所提升,特别是在非平稳噪声下,准确率提升超过5%。

关键词: 过渡段判决, 能量熵, 频域能量, 谱熵, 端点检测, 临界频带

Abstract: In order to improve the accuracy of speech endpoint detection as well as enhance robustness of the endpoint detection algorithm in noisy environment, two new endpoint detection parameters were proposed. The spectrum entropy based on critical band took both perceptual characteristics of the human auditory system and the differences between speech and noise signals in frequency domain distribution into account, as well as the minus frequency-domain energy parameter paid attention to the difference between speech frames and silence frames in frequency energy. The advantages of those two parameters were combined to constitute a robust endpoint detection parameter. Meanwhile, in order to avoid the miscarriage of judgment caused by the unitary threshold, the transition fragment judgment based on statistics of characteristics distribution was applied. The experiment results showed that the endpoint detection algorithm had better discrimination for speech frames and silence frames, the algorithm could carry out better accuracy than other conventional anti-noisy endpoint detection algorithms under different and low signal-to-noise ratio noisy environments, especially in the case of non-stationary noise, the accuracy improved by more than 5%.

Key words: transition fragment judgment, energy entropy, frequency-domain energy, critical band, endpoint detection, spectrum entropy

中图分类号: 

  • TP391.42
[1] RAMIREZ J, YERAMOS P, GORRIZ M, et al. SVM-based speech endpoint detection using contextual speech features[J]. Electronics Letters, 2006, 42(7):426-428.
[2] 蔡魁杰. 基于支持向量机的汉语语音端点检测和声韵分离[D]. 哈尔滨:哈尔滨工程大学, 2007. CAI Kuijie. Endpoint detection and initial/final segmentation of Chinese speech based on SVM[D]. Harbin: Harbin Engineering University, 2007.
[3] 李发权, 杨立才, 颜红博. 基于PCA-SVM多生理信息融合的情绪识别方法[J]. 山东大学学报(工学版), 2014, 44(6):70-76. LI Faquan, YANG Licai, YAN Hongbo. An emotion recognition method of multiphysiological information fusion based on PCA-SVM[J]. Journal of Shandong University(Engineering Science), 2014, 44(6):70-76.
[4] WILPIN J G, RABINER L R. Application of hidden Markov models to automatic speech endpoint detection[J]. Computer Speech & Language, 1987, 2(3-4):321-346.
[5] OUZOUNOV A. Telephone speech endpoint detection using mean-delta feature[J]. Cybernetics and Information Technologies, 2014, 14(2):127-139.
[6] OUZOUNOV A. A robust features for speech detection[J]. Cybernetics and Information Technologies, 2004, 4(2):3-14.
[7] GHOSH P, TSIATRAS A, NARAYANAN S. Robust voice activity detection using long-term signal variability[J]. IEEE Trans on Audio, Speech and Language Processing, 2010, 19(3):600-613.
[8] 张君昌, 胡海涛, 崔力. 融合Burg谱估计与信号变化率测度的语音端点检测[J]. 西安电子科技大学学报(自然科学版), 2014, 41(3):192-195. ZHANG Junchang, HU Haitao, CUI Li. Robust voice endpoint detection fusing burg specturm estimate and signal variability[J]. Journal of Xidian University(Natural Science Edition), 2014, 41(3):192-195.
[9] LIANG Shenghuang, CHUNG Hoyang. A novel approach to robust speech endpoint detection in car environments[J]. IEEE ICASSP, 2000, 3:1751-1754.
[10] 吴迪, 赵鹤鸣, 陶智, 等. 低信噪比下采用感知语谱结构边界参数的语音端点检测算法[J]. 声学学报, 2014, 39(3):392-399. WU Di, ZHAO Heming, TAO Zhi, et al. Speech endpoint detection in low-SNRs environment based on perception spectrogram structure boundary parameter[J]. Chinese Journal of Acoustice, 2014, 39(3):392-399.
[11] 李杰, 周萍, 杜志然. 短时TEO能量在带噪语音端点检测中的应用[J]. 计算机工程与应用, 2013, 49(12):144-147. LI Jie, ZHOU Ping, DU Zhiran. Application of short-time TEO energy in noisy speech endpoint detection [J]. Computer Engineering and Application, 2013, 49(12):144-147.
[12] YING G, MITCHELL C, JAMIESON L. Endpoint detection of isolated utterances based on a modified teager energy measurement [J]. IEEE ICASSP, 1993, 2:732-735.
[13] 鲁远耀, 周妮, 肖珂, 等. 强噪声环境下改进的语音端点检测算法[J]. 计算机应用, 2014, 34(5):1386-1390. LU Yuanyao, ZHOU Ni, XIAO Ke, et al. Improved speech endpoint detection algorithm in strong noise environment[J]. Journal of Computer Applications, 2014, 34(5):1386-1390.
[14] 吴边, 王忠, 刘兴涛. 强背景噪声下语音端点检测的算法研究 [J]. 计算机工程与应用, 2011, 47(33):137-139. WU Bian, WANG Zhong, LIU Xintao. Research on speech endpoint detection in strong noise[J]. Computer Engineering and Application, 2011, 47(33):137-139.
[15] CHATLANI N, SORAGHAN J. EMD-based filtering(EMDF)of low-frequency noise for speech enhancement[J]. Audio, Speech, and Language Processing, IEEE Transactions on, 2012, 20(4):1158-1166.
[16] ZAO L, COELHO R, FLANDRIN P. Speech enhancement with EMD and hurst-based mode selection[J]. Audio, Speech, and Language Processing, IEEE/ACM Transactions on, 2014, 22(5):899-911.
[17] 王博, 郭英, 韩立峰. 基于熵函数的语音端点检测算法研究[J]. 信号处理, 2009, 25(3):368-373. WANG Bo, GUO Ying, HAN Lifeng. Research on entropy based voice activity detection algorithms[J]. Signal Processing, 2009, 25(3):368-373.
[18] GUNNAR FANT. Acoustic theory of speech production[M]. Hague: de Gruyter Mouton, 1970.
[19] HARRINGTON J, CASSIDY S. Techniques in speech acoustics[J]. Computational Linguistics, 1998, 8(2):294-295.
[20] ZWICKER M, TERHARDT E, Analytical expressions for critical band rate and critical bandwidth as a function of frequency[J]. Journal of the Acoustic Society of America, 1980, 68:1523-1525.
[21] 张仁志, 崔慧娟. 基于短时能量的语音端点检测算法研究[J]. 电声技术, 2005, 21(7):52-59. ZHANG Renzhi, CUI Huijuan. Speech endpoint detection algorithm analyses based on short-term energy[J]. Audio Engineering, 2005, 21(7):52-59.
[22] TOMI Kinnunen. Spectral features for automatic text independent speaker recognition(Licentiate's Thesis)[EB/OL].(2003-12-21)[2006-02-20]. ftp://ftp.cs.joensuu.fi/pub/PhLic/2004-PhLic-Kinnunen- Tomi.pdf.
[23] 钱博, 李燕萍, 唐振民, 等. 基于频域能量分布分析的自适应元音帧提取算法 [J]. 电子学报, 2007, 35(2):279-282. QIAN Bo, LI Yanping, TANG Zhenmin, et al. Self-Adaptive vowel-frame detection algorithm based on energy distribution analysis in frequency domain[J]. Chinese Journal of Electronics, 2007, 35(2):279-282.
[24] GUNAWARDENA J. Min-max functions[J]. Discrete Event Dynamic Systems, 1994, 4(4):377-407.
[25] SAHOO T, PARTRA S. Silence removal and endpoint detection of speech signal for text independent speaker identification[J]. International Journal Image, Graphics and Signal Processing, 2014, 6(6):27-35.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 李可,刘常春,李同磊 . 一种改进的最大互信息医学图像配准算法[J]. 山东大学学报(工学版), 2006, 36(2): 107 -110 .
[2] 季涛,高旭,孙同景,薛永端,徐丙垠 . 铁路10 kV自闭/贯通线路故障行波特征分析[J]. 山东大学学报(工学版), 2006, 36(2): 111 -116 .
[3] 岳远征. 远离平衡态玻璃的弛豫[J]. 山东大学学报(工学版), 2009, 39(5): 1 -20 .
[4] 王勇, 谢玉东.

大流量管道煤气的控制技术研究

[J]. 山东大学学报(工学版), 2009, 39(2): 70 -74 .
[5] 刘新1 ,宋思利1 ,王新洪2 . 石墨配比对钨极氩弧熔敷层TiC增强相含量及分布形态的影响[J]. 山东大学学报(工学版), 2009, 39(2): 98 -100 .
[6] 蔡晓军1 ,张擎1 ,柴乔林1 ,孔苏丽2 . 基于能量均衡的n分多路径路由算法[J]. 山东大学学报(工学版), 2009, 39(2): 141 -145 .
[7] 庞志俭 张长桥. 甲基丙烯酸十二酯基二元共聚制备缔合减阻剂的合成与性能研究[J]. 山东大学学报(工学版), 2009, 39(5): 128 -132 .
[8] 孟健, 李贻斌, 李彬. 四足机器人跳跃步态控制方法[J]. 山东大学学报(工学版), 2015, 45(3): 28 -34 .
[9] 张光庆,孔凡玉,李大兴, . Koblitz曲线上抵抗简单功耗分析的有效算法[J]. 山东大学学报(工学版), 2007, 37(3): 78 -80 .
[10] 许延生,刘兴芳 . 模糊聚类迭代模型在水资源承载能力评价中的应用[J]. 山东大学学报(工学版), 2007, 37(3): 100 -104 .