您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报 (工学版) ›› 2022, Vol. 52 ›› Issue (3): 1-8.doi: 10.6040/j.issn.1672-3961.0.2021.314

• 机器学习与数据挖掘 •    下一篇

基于多元函数主成分表示的识别学习

孟银凤(),李庆方   

  1. 山西大学数学科学学院, 山西 太原 030006
  • 收稿日期:2021-06-11 出版日期:2022-06-20 发布日期:2022-06-23
  • 作者简介:孟银凤(1979—),女, 山西大同人, 副教授, 博士, 主要研究方向为机器学习、数据挖掘、函数型数据分析。E-mail: mengyf@sxu.edu.cn
  • 基金资助:
    国家自然科学基金资助项目(61807022);国家自然科学基金资助项目(61976184);山西省统计局研究资助项目(KY[2021]169)

Recognition learning based on multivariate functional principal component representation

Yinfeng MENG(),Qingfang LI   

  1. School of Mathematical Science, Shanxi University, Taiyuan 030006, Shanxi, China
  • Received:2021-06-11 Online:2022-06-20 Published:2022-06-23

摘要:

针对识别学习中的多维信息融合问题, 提出一种基于多元函数主成分表示识别方法。给出多元函数主成分的数值计算方法, 利用联合协方差算子计算特征值与特征向量, 提取关键区分特征。基于这些综合特征应用随机森林方法对多元函数型数据进行识别学习。在模拟数据和真实数据上比较多元函数主成分表示方法与其他几种表示方法的识别性能。试验结果表明, 在模拟数据集、英文手写体数据集和中文手写体数据集中, 准确率为1, 在运动数据集中, 准确率为0.954 4。相较于其他方法, 多元函数主成分分析这一特征抽取方法的识别效果更好, 有效地提高了识别准确率。

关键词: 函数型数据, 基函数, 多元函数主成分分析, 随机森林, 准确率

Abstract:

Aiming at the problem of multi-dimensional information fusion in recognition learning, a recognition method based on multivariate functional principal component representation was proposed. The numerical calculation method of multivariate functional principal components was given. The joint covariance operator was used to calculate eigenvalues and eigenvectors, and the key distinguishing features were extracted. Based on these comprehensive features, the random forest method was used to recognize and learn multivariate functional data. The recognition performance of multivariate functional principal component representation method was compared with other representation methods on simulated data and real data. The experimental results showed that the accuracy was equal to 1 in the simulation dataset, English handwritten dataset and Chinese handwritten dataset, and 0.954 4 in the motion dataset. Compared with other methods, multivariate functional principal component analysis (MFPCA) had better recognition effect and improved the recognition accuracy effectively.

Key words: functional data, basis function, MFPCA, random forest, accuracy

中图分类号: 

  • TP391

图1

模拟数据集"

表1

不同阶数样条拟合的均方误差值"

阶数 EMS1 EMS2
2 0.104 0 0.103 5
3 0.097 4 0.097 4
4 0.096 8 0.096 9
5 0.057 9 0.059 2
6 0.057 4 0.058 6
7 0.038 4 0.038 7

图2

样条基"

图3

数据的生成和拟合"

图4

前两个主成分曲线"

表2

各主成分贡献率"

主成分 贡献率 累计贡献率
Z1 64.76 64.76
Z2 24.86 89.62
Z3 4.58 94.20
Z4 2.27 96.47
Z5 1.34 97.81

图5

各观测对象的主成分得分"

图6

英文手写体数据集"

图7

拆分后的“fda”数据集"

图8

“d”数据集"

图9

新数据集图像"

图10

三类函数曲线"

表3

英文手写体数据集的分类准确率"

特征抽取方法 准确率均值 准确率方差
B-spline 1.000 0 0
Fourier 1.000 0 0
Polygonal 1.000 0 0
Exponential 1.000 0 0
Monomial 1.000 0 0
Power 1.000 0 0
MFPCA 1.000 0 0

图11

中文“统计学”3D图"

图12

新数据集图像"

表4

中文手写体数据集的分类准确率"

特征抽取方法 准确率均值 准确率方差
B-spline 0.999 1 0.006 1
Fourier 1.000 0 0
Polygonal 1.000 0 0
Exponential 1.000 0 0
Monomial 0.986 1 0.024 6
Power 0.989 1 0.019 9
MFPCA 1.000 0 0

图13

3类数据在4个维度上的曲线图"

表5

运动数据集的分类准确率"

特征抽取方法 准确率均值 准确率方差
B-spline 0.922 8 0.061 6
Fourier 0.928 3 0.055 4
Polygonal 0.932 8 0.060 8
Exponential 0.907 2 0.064 7
Monomial 0.862 2 0.083 7
Power 0.858 9 0.078 7
MFPCA 0.954 4 0.053 2
1 RAMSAY J O . When the data are functions[J]. Psychometrika, 1982, 47 (4): 379- 396.
doi: 10.1007/BF02293704
2 BESSE P , RAMSAY J O . Principal components analysis of sampled functions[J]. Psychometrika, 1986, 51 (2): 285- 311.
doi: 10.1007/BF02293986
3 RAMSAY J O . A functional approach to modeling test data[M]. New York, USA: Springer, 1997: 381- 394.
4 BOENTE G , FRAIMAN R . Kernel-based functional principal components[J]. Statistics & Probability Letters, 2000, 48 (4): 335- 345.
5 CARDOT H . Conditional functional principal components analysis[J]. Scandinavian Journal of Statistics, 2007, 34 (2): 317- 335.
doi: 10.1111/j.1467-9469.2006.00521.x
6 BOENTE G , SALIBIAN-BARRERA M . S-Estimators for functional principal component analysis[J]. Journal of the American Statistical Association, 2015, 110 (511): 1100- 1111.
doi: 10.1080/01621459.2014.946991
7 ANEIROS-PÉREZ G , VIEU P . Testing linearity in sem-iparametric functional data analysis[J]. Computational Statistics, 2013, 28 (2): 413- 434.
doi: 10.1007/s00180-012-0308-2
8 ROSSI F , VILLA N . Support vector machine for functional data classification[J]. Neurocomputing, 2006, 69 (7-9): 730- 742.
doi: 10.1016/j.neucom.2005.12.010
9 FERRATY F , GONZÁLEZ-MANTEIGA W , MARTÍNEZ-CALVO A , et al. Presmoothing in functional linear regression[J]. Statistica Sinica, 2012, 22 (1): 69- 94.
10 RACHDI M , VIEU P . Nonparametric regression for functional data: automatic smoothing parameter selection[J]. Journal of Statistical Planning & Inference, 2007, 137 (9): 2784- 2801.
11 CHAMROUKHI F , GLOTIN H , SAMÉ A . Model-based functional mixture discriminant analysis with hidden process regression for curve classification[J]. Neuroc-omputing, 2013, 112, 153- 163.
doi: 10.1016/j.neucom.2012.10.030
12 PENG Q Y , ZHOU J J , TANG N S . Varying coefficient partially functional linear regression models[J]. Sta-tistical Papers, 2016, 57 (3): 827- 841.
doi: 10.1007/s00362-015-0681-3
13 JAMES G M , SUGAR C A . Clustering for sparsely sampled functional data[J]. Publications of the American Statistical Association, 2003, 98 (462): 397- 408.
doi: 10.1198/016214503000189
14 PENG J , MVLLER H G . Distance-based clustering of sparsely observed stochastic processes, with applications to online auctions[J]. Annals of Applied Statistics, 2008, 2 (3): 1056- 1077.
15 JACQUES J , PREDA C . Funclust: a curves clustering method using functional random variables density approximation[J]. Neurocomputing, 2013, 112, 164- 171.
doi: 10.1016/j.neucom.2012.11.042
16 DELAIGLE A , HALL P . Achieving near perfect classification for functional data[J]. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2012, 74 (2): 267- 286.
doi: 10.1111/j.1467-9868.2011.01003.x
17 MOSLER K , MOZHAROVSKYI P . Fast DD-classifi-cation of functional data[J]. Statistical Papers, 2017, 58 (4): 1055- 1089.
doi: 10.1007/s00362-015-0738-3
18 GÓRECKI T , KRZYŚKO M , RATAJCZAK W , et al. An extension of the classical distance correlation coefficient for multivariate functional data with applications[J]. Statistics in Transition New Series, 2016, 17 (3): 449- 466.
doi: 10.21307/stattrans-2016-032
19 BERRENDERO J R , JUSTEL A , SVARC M . Principal components for multivariate functional data[J]. Comp-utational Statistics & Data Analysis, 2011, 55 (9): 2619- 2634.
20 CHIOU J M , CHEN Y T , YANG Y F . Multivariate functional principal component analysis: a normalization approach[J]. Statistica Sinica, 2014, 24, 1571- 1596.
21 HAPP C , GREVEN S . Multivariate functional principal component analysis for data observed on different (dimensional) domains[J]. Journal of the American Statistical Association, 2018, 113 (522): 649- 659.
doi: 10.1080/01621459.2016.1273115
22 尹雪婷. 多元函数型数据四元数并行特征提取方法研究[D]. 秦皇岛: 燕山大学电器工程学院, 2017.
YIN Xueting. A research on quaternion parallel feature extraction of multivariate functional data[D]. Qin-huangdao: School of Electrical Engineering, Yanshan University, 2017.
23 GÓRECKI T , KRZYŚKO M , WASZAK Ƚ , et al. Selected statistical methods of data analysis for multivariate functional data[J]. Statistical Papers, 2018, 59 (1): 153- 182.
doi: 10.1007/s00362-016-0757-8
24 HANUSZ Z , KRZYŚKO M , NADULSKI R , et al. Discriminant coordinates analysis for multivariate fun-ctional data[J]. Communications in Statistics-Theory and Methods, 2020, 49 (18): 4506- 4519.
doi: 10.1080/03610926.2019.1602650
25 VIRTA J , LI B , NORDHAUSEN K , et al. Independent component analysis for multivariate functional data[J]. Journal of Multivariate Analysis, 2020, 176, 104568.
doi: 10.1016/j.jmva.2019.104568
26 GÓRECKI T , KRZYŚKO M , WOȽYŃSKI W . Class-ification problems based on regression models for multi-dimensional functional data[J]. Statistics in Transition New Series, 2015, 16 (1): 97- 110.
doi: 10.21307/stattrans-2015-006
27 KRZYSKO M , SMAGA Ƚ . An application of functional multivariate regression model to multiclass classification[J]. Statistics in Transition New Series, 2017, 18 (3): 433- 442.
doi: 10.21307/stattrans-2016-079
28 DAI W , GENTON M G . An outlyingness matrix for multivariate functional data classification[J]. Statistica Sinica, 2018, 28 (4): 2435- 2454.
29 BLANQUERO R , CARRIZOSA E , JIMÉNEZ-CORDERO A , et al. Variable selection in classification for multi-variate functional data[J]. Information Sciences, 2019, 481, 445- 462.
doi: 10.1016/j.ins.2018.12.060
30 GÓRECKI T , KRZYŚKO M , WOȽYŃSKI W . Variable selection in multivariate functional data classification[J]. Statistics in Transition New Series, 2019, 20 (2): 123- 138.
doi: 10.21307/stattrans-2019-018
31 RAMSAY J O , SILVERMAN B W . Functional data analysis[M]. Berlin, Germany: Springer, 2005.
32 AGUILERA A M , AGUILERA-MORILLO M C . Penalized PCA approaches for B-spline expansions of smooth functional data[J]. Applied Mathematics and Computation, 2013, 219 (14): 7805- 7819.
doi: 10.1016/j.amc.2013.02.009
33 MCCALL C, REDDY K, SHAH M. Macro-class selection for hierarchical k-NN classification of inertial sensor data[C]// Proceedings of the 2nd International Conference on Pervasive and Embedded Computing and Communication Systems. Setúbal, Portugal: SCITE-PRESS, 2012: 106-114.
[1] 孟银凤,杨佳宇,曹付元. 函数型数据的分裂转移式层次聚类算法[J]. 山东大学学报 (工学版), 2022, 52(1): 19-27.
[2] 刘新锋, 张旖旎,徐惠三,宋玲,陈梦雅. 基于随机森林和专家系统的分布式光伏电站阴影遮挡诊断[J]. 山东大学学报 (工学版), 2021, 51(2): 98-104.
[3] 曹雅,邓赵红,王士同. 基于单调约束的径向基函数神经网络模型[J]. 山东大学学报(工学版), 2018, 48(3): 127-133.
[4] 翟俊海,张素芳,胡文祥,王熙照. 核心集径向基函数极限学习机[J]. 山东大学学报(工学版), 2016, 46(2): 1-5.
[5] 李翔1,朱全银1,王尊2. 基于可变基函数和GentleAdaBoost的小波神经网络研究[J]. 山东大学学报(工学版), 2013, 43(5): 31-38.
[6] 房晓南1,2,张化祥1,2*,高爽1,2. 基于SMOTE和随机森林的Web spam检测[J]. 山东大学学报(工学版), 2013, 43(1): 22-27.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 何东之, 张吉沣, 赵鹏飞. 不确定性传播算法的MapReduce并行化实现[J]. 山东大学学报(工学版), 0, (): 22 -28 .
[2] 刘云,邱晓国 . 内插TOC系数法测定水体中COD研究[J]. 山东大学学报(工学版), 2007, 37(4): 108 -117 .
[3] 顿月芹 闵越 袁建生. 阵列侧向测井正演响应的特性分析[J]. 山东大学学报(工学版), 2010, 40(1): 121 -125 .
[4] 吴俊飞,王威强,胡德栋,崔玉良, . 平阴尿塔塔体爆炸能量分析与计算[J]. 山东大学学报(工学版), 2008, 38(4): 80 -83 .
[5] 田伟,乔谊正,马志强 . 基于DWT的二次特征提取脱机中文签名鉴定[J]. 山东大学学报(工学版), 2007, 37(3): 55 -59 .
[6] 陈冬岩. 基于多信道的MAC层协议在无线传感器网络中的应用[J]. 山东大学学报(工学版), 2009, 39(1): 41 -49 .
[7] 杨国辉1,孙晓瑜1,2,椿范立1. 应用沸石胶囊催化剂制备生物汽油(英文)[J]. 山东大学学报(工学版), 2009, 39(2): 92 -97 .
[8] 赵伟,艾洪奇. pH对Aβ42小纤维的结构影响[J]. 山东大学学报(工学版), 2018, 48(2): 134 -138 .
[9] 林彦,魏东 . 铸钢空心球管节点的破坏机理分析与承载力影响因素[J]. 山东大学学报(工学版), 2006, 36(3): 103 -107 .
[10] 黄凌 达庆利. 准时制下再制造逆向供应链成本契约激励分析[J]. 山东大学学报(工学版), 2008, 38(6): 105 -111 .