您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报 (工学版) ›› 2019, Vol. 49 ›› Issue (2): 107-115.doi: 10.6040/j.issn.1672-3961.0.2018.458

• 机器学习与数据挖掘 • 上一篇    下一篇

多代表点自约束的模糊迁移聚类

秦军1(),张远鹏1,2,*(),蒋亦樟2,杭文龙3   

  1. 1. 南通大学医学信息学系, 江苏 南通 226001
    2. 江南大学数字媒体学院, 江苏 无锡 214122
    3. 南京工业大学计算机科学与技术学院, 江苏 南京 211816
  • 收稿日期:2018-10-29 出版日期:2019-04-20 发布日期:2019-04-19
  • 通讯作者: 张远鹏 E-mail:2432533512@qq.com;155297131@qq.com
  • 作者简介:秦军(1996—),男,江苏泰州人,硕士研究生,主要研究方向为人工智能与模式识别.E-mail:2432533512@qq.com
  • 基金资助:
    国家自然科学基金资助项目(81701793);南通市科技计划资助项目(MS12017016-2);江苏省社会科学基金资助项(18YSC009)

Transfer fuzzy clustering based on self-constraint of multiple medoids

Jun QIN1(),Yuanpeng ZHANG1,2,*(),Yizhang JIANG2,Wenlong HANG3   

  1. 1. Department of Medical Informatics, Nantong University, Nantong 226001, Jiangsu, China
    2. School of Digital Media, Jiangnan University, Wuxi 214122, Jiangsu, China
    3. School of Computer Science and Technology, Nanjing Tech University, Nanjing 211816, Jiangsu, China
  • Received:2018-10-29 Online:2019-04-20 Published:2019-04-19
  • Contact: Yuanpeng ZHANG E-mail:2432533512@qq.com;155297131@qq.com
  • Supported by:
    国家自然科学基金资助项目(81701793);南通市科技计划资助项目(MS12017016-2);江苏省社会科学基金资助项(18YSC009)

摘要:

以往建立在模糊C均值(fuzzyC-means, FCM)框架下利用源域虚拟簇中心作为迁移知识的迁移聚类算法容易受到离群点和噪声的干扰,且单个簇中心不足以描述簇结构。针对此问题,提出多代表点自约束的模糊迁移聚类算法,该算法引入样本代表权重机制为簇中每个样本分配代表权重来刻画簇结构,这种机制能更好的刻画簇结构,对离群点和噪声有较好的抑制作用;同时利用源域样本,重构目标域簇结构,并以此作为迁移知识进行目标域样本聚类,相对于利用单中心作为迁移知识来说,整体重构后的目标域簇结构所包含的迁移知识量更为丰富。试验结果表明。在人工数据集和真实数据集上,所提出的聚类算法相比对比算法, NMI和ARI最高提升了0.674 5和0.608 4。说明在迁移环境下,以代表点自约束作为知识迁移规则,所提出的聚类算法具有一定的聚类效果。

关键词: 模糊聚类, 迁移聚类, 多代表点, 迁移学习, 无监督学习

Abstract:

Transfer clustering approaches derived from the fuzzy C-means (FCM) framework, which considered virtual centers from source domains as transfer knowledge, inherited the shortcomings of FCM. These methods were not robust to outliers and noises, and whose single cluster centers were not sufficient enough to capture the inner structures of clusters. To solve the problems, a transfer fuzzy clustering approach was proposed based on the self-constraint of multiple medoids. Prototype weights were introduced and assigned to each object to capture the inner structures of clusters. Such a weighting strategy could capture the inner structures of clusters more sufficiently and made the clustering more robust to outliers and noises; Furthermore, with the distribution of data in the source domain, the inner structure of data in the target domain was reconstructed, and the corresponding new structure was considered as the transfer knowledge to guide the clustering of the target domain. Relative to the use of single virtual center of each cluster as transfer knowledge, the updated inner structures of data in the target domain contained more knowledge. Experimental results demonstrated that the proposed approach achieved 0.674 5 and 0.608 4 improvements in terms of NMI and ARI on synthetic datasets and real-life datasets compared with introduced benchmarking approaches. Therefore, based on the transfer principle of the self-constraint of multiple medoids, the proposed clustering approach performed well in the transfer environment.

Key words: fuzzy clustering, transfer clustering, multiple exemplars, transfer learning, unsupervised learning

中图分类号: 

  • TP391

图1

PPKTFCM算法迁移过程"

表1

各算法实验参数设置"

算法 算法简介 参数设置
FCM 基于虚拟簇中心和“软”划分的模糊均值聚类算法 模糊指数m:[1.1, 1.2, …, 3]
AP和TAP AP聚类算法是基于数据点间的“信息传递”的一种聚类算法。TAP算法是AP算法的泛化,利用源域数据的统计特征(分布匹配迁移策略)以及源域数据和目标域数据之间的几何特征(实例保留迁移策略)来实现迁移聚类 最佳近邻个数:[1, 2, …, 7], λ1:[0.1, 0.2, …, 1], λ2:[1, 2, …, 10]
PPKTFCM 在FCM框架下提出的基于样本点与历史类中心点距离和极小规则以及隶属度变化极小规则的一种模糊迁移聚类算法 模糊指数m:[1.1, 1.2, …, 3], λ1:[0.1, 0.2, …, 10], λ2:[1, 2, …, 10]
退化的MMSC-TFC和MMSC-TFC 本研究提出的算法 λ1:[0.1, 0.2, …, 10], λ2:[1, 2, …, 10], $\boldsymbol{\epsilon}$=10-4

图2

人工数据集"

表2

目标域D-T上各算法的聚类结果"

算法 K=2 K=3
NMI ARI NMI ARI
FCM 0.246 2 0.086 2 0.796 7 0.822 1
AP 0.337 3 0.408 7 0.621 8 0.625 6
退化的MMSC-TFC 0.735 7 0.828 2 0.847 9 0.886 2
PPKTFCM 0.296 1 0.150 3 0.761 0 0.784 5
TAP 0.613 9 0.709 8 0.794 8 0.805 2
MMSC-TFC 0.867 4 0.912 1 1.000 0 1.000 0

图3

人工集上的聚类结果"

表3

2种迁移场景源域和目标域的样本组成"

场景 数据集 大小 维数 簇个数
comp VS sci 源域 1 500 350 2
目标域 150 350
rec VS talk 源域 1 500 350 2
目标域 150 350

表4

真实数据集上的试验结果"

算法 comp VS sci rec VS talk
NMI ARI NMI ARI
FCM 0.221 4 0.189 7 0.235 7 0.257 4
AP 0.489 2 0.501 1 0.202 0 0.688 5
退化的MMSC-TFC 0.732 2 0.801 2 0.712 5 0.875 2
PPKTFCM 0.723 6 0.789 6 0.892 2 0.911 4
TAP 0.795 1 0.775 4 0.823 0 0.938 7
MMSC-TFC 0.785 2 0.798 1 0.910 2 0.922 6
1 张远鹏, 邓赵红, 钟富礼, 等. 基于代表点评分策略的快速自适应聚类算法[J]. 计算机研究与发展, 2018, 55 (1): 163- 178.
ZHANG Yuanpeng , DENG Zhaohong , CHUNG Fuli , et al. Fast self-adaptive clustering algorithm based on exemplar score strategy[J]. Journal of Computer Research and Development, 2018, 55 (1): 163- 178.
2 ZHANG Y P , CHUNG F L , WANG S T . Fast exemplar-based clustering by gravity enrichment between data objects[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2018, 1 (1): 1- 14.
3 ZHANG Y P , CHUNG F L , WANG S T . Fast reduced set-based exemplar finding and cluster assignment[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2017, 2 (1): 1- 15.
4 ZHANG Y P , TIAN F , WU H Q , et al. Brain MRI tissue classification based fuzzy clustering with competitive learning[J]. Journal of Medical Imaging & Health Informatics, 2017, 7 (7): 1654- 1659.
5 TZORTZIS G , LIKAS A . The minmax K-means clustering algorithm[J]. Pattern Recognition, 2014, 47 (7): 2505- 2516.
doi: 10.1016/j.patcog.2014.01.015
6 ESTER M, KRIEGEL H P, SANDER J, et al. A density-based algorithm for discovering clusters in large spatial databases with noise[C]//Proc of KDD-96. Menlo Park, USA: AAAI Press, 1996: 226-231.
7 DENG Z H , JIANG Y Z , CHOI K S , et al. Knowledge-leverage-based TSK fuzzy system modeling[J]. IEEE Transactions on Neural Networks and Learning Systems, 2013, 24 (8): 1200- 1212.
doi: 10.1109/TNNLS.2013.2253617
8 蒋亦樟, 邓赵红, 王骏, 等. 基于知识利用的迁移学习一般化增强模糊划分聚类算法[J]. 模式识别与人工智能, 2013, 26 (10): 975- 984.
doi: 10.3969/j.issn.1003-6059.2013.10.010
JIANG Yizhang , DENG Zhaohong , WANG Jun , et al. Transfer generalized fuzzy C-means clustering algorithm with improved fuzzy partitions by leveraging knowledge[J]. Pattern Recgonition & Artificial Intelligence, 2013, 26 (10): 975- 984.
doi: 10.3969/j.issn.1003-6059.2013.10.010
9 CHEN A G , WANG S T . Knowledge transfer clustering algorithm with privacy protection[J]. Journal of Electronics & Information Technology, 38 (3): 523- 531.
10 杭文龙, 蒋亦樟, 刘解放, 等. 迁移近邻传播聚类算法[J]. 软件学报, 2016, (11): 2796- 2813.
HANG Wenlong , JIANG Yizhang , LIU Jiefang , et al. Transfer affinity propagation clustering algorithm[J]. Journal of Software, 2016, (11): 2796- 2813.
11 FREY B J , DUECK D . Clustering by passing messages between data points[J]. Science, 2007, 315 (5814): 972- 976.
doi: 10.1126/science.1136800
12 MEI J P , CHEN L H . Fuzzy relational clustering around medoids: a unified view[J]. Fuzzy Sets and Systems, 2011, 183 (2011): 44- 56.
13 MIYAMOTO S, UMAYAHARA K, Fuzzy clustering by quadratic regularization[C]//Processding of the 1998 IEEE International Conference on Fuzzy Systems. Monterey, USA: IEEE Press 1998: 1394-1399.
14 YING W H , CHUNG F L , WANG S T . Scaling up synchronization-inspired partitioning clustering[J]. IEEE Transactions on Knowledge & Data Engineering, 2014, 26 (8): 2045- 2057.
15 QIAN P J , JIANG Y Z , DENG Z H , et al. Cluster prototypes and fuzzy memberships jointly leveraged cross-domain maximum entropy clustering[J]. IEEE Transactions on Cybernetics, 2015, 46 (1): 181- 193.
16 李素姝, 王士同, 李滔. 基于LS-SVM与模糊补准则的特征选择方法[J]. 山东大学学报(工学版), 2017, 47 (3): 34- 42.
LI Sushu , WANG Shitong , LI Tao . A feature selection method based on LS-SVM and fuzzy supplementary criterion[J]. Journal of Shandong University (Engineering Science), 2017, 47 (3): 34- 42.
17 CHENG J , SAAD Y . Lanczos vectors versus singular vectors for effective dimension reduction[J]. IEEE Transactions on Knowledge and Data Engineering, 2009, 21 (8): 1091- 1103.
doi: 10.1109/TKDE.2008.228
18 MO D , HUANG S . Fractal-based intrinsic dimension estimation and its application in dimensionality reduction[J]. IEEE Transactions on Knowledge and Data Engineering, 2012, 24 (1): 59- 71.
doi: 10.1109/TKDE.2010.225
19 YAO J , LIU X , ZHU X , et al. Control of large-scale systems through dimension reduction[J]. IEEE Transactions on Services Computing, 2015, 8 (4): 563- 575.
doi: 10.1109/TSC.2014.2312946
20 ZHOU Y , PENG J , CHEN C . Dimension reduction using spatial and spectral regularized local discriminant embedding for hyperspectral image classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2015, 53 (2): 1082- 1095.
doi: 10.1109/TGRS.2014.2333539
[1] 张红斌,邱蝶蝶,邬任重,朱涛,滑瑾,姬东鸿. 基于极端梯度提升树算法的图像属性标注[J]. 山东大学学报 (工学版), 2019, 49(2): 8-16.
[2] 李雨鑫,普园媛,徐丹,钱文华,刘和娟. 深度卷积神经网络嵌套fine-tune的图像美感品质评价[J]. 山东大学学报(工学版), 2018, 48(3): 60-66.
[3] 沈冀,马志强,李图雅,张力. 面向短文本情感分析的词扩充LDA模型[J]. 山东大学学报(工学版), 2018, 48(3): 120-126.
[4] 于立萍1,2,唐焕玲1,2. 基于分类一致性的迁移学习及其在行人检测中的应用[J]. 山东大学学报(工学版), 2013, 43(4): 26-31.
[5] 张云霞,崔晓松,邹丽*. 一种基于十八元语言值模糊相似矩阵的聚类方法[J]. 山东大学学报(工学版), 2013, 43(1): 34-40.
[6] 陈斌 陈松灿 潘志松 李斌. 异常检测综述[J]. 山东大学学报(工学版), 2009, 39(6): 13-23.
[7] 王好芳 吴美 陈文艳. 模糊聚类分析在区域水资源承载能力评价中的应用[J]. 山东大学学报(工学版), 2009, 39(3): 139-143.
[8] 牛新生,叶华,王亮 . 彩色图像中的人脸检测方法[J]. 山东大学学报(工学版), 2007, 37(4): 0-0 .
[9] 马志强,常发亮,田伟,赵瑶 . 彩色图像中的人脸检测方法[J]. 山东大学学报(工学版), 2007, 37(4): 19-22 .
[10] 许延生,刘兴芳 . 模糊聚类迭代模型在水资源承载能力评价中的应用[J]. 山东大学学报(工学版), 2007, 37(3): 100-104 .
[11] 王耘,穆勇,刘庆红 . 基于灰关联分析的模糊聚类最优划分判定模型[J]. 山东大学学报(工学版), 2006, 36(2): 86-89 .
[12] 李贻斌,李彩虹,阮久宏 . ITS智能车辆横向运动模式空间构造算法研究[J]. 山东大学学报(工学版), 2006, 36(2): 36-40 .
[13] 王耘,穆勇,刘庆红 . 基于灰关联分析的模糊聚类最优划分判定模型[J]. 山东大学学报(工学版), 2006, 36(2): 86-89 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 王素玉,艾兴,赵军,李作丽,刘增文 . 高速立铣3Cr2Mo模具钢切削力建模及预测[J]. 山东大学学报(工学版), 2006, 36(1): 1 -5 .
[2] 李 侃 . 嵌入式相贯线焊接控制系统开发与实现[J]. 山东大学学报(工学版), 2008, 38(4): 37 -41 .
[3] 李梁,罗奇鸣,陈恩红. 对象级搜索中基于图的对象排序模型(英文)[J]. 山东大学学报(工学版), 2009, 39(1): 15 -21 .
[4] 陈瑞,李红伟,田靖. 磁极数对径向磁轴承承载力的影响[J]. 山东大学学报(工学版), 2018, 48(2): 81 -85 .
[5] 季涛,高旭,孙同景,薛永端,徐丙垠 . 铁路10 kV自闭/贯通线路故障行波特征分析[J]. 山东大学学报(工学版), 2006, 36(2): 111 -116 .
[6] 浦剑1 ,张军平1 ,黄华2 . 超分辨率算法研究综述[J]. 山东大学学报(工学版), 2009, 39(1): 27 -32 .
[7] 秦通,孙丰荣*,王丽梅,王庆浩,李新彩. 基于极大圆盘引导的形状插值实现三维表面重建[J]. 山东大学学报(工学版), 2010, 40(3): 1 -5 .
[8] 刘文亮,朱维红,陈涤,张泓泉. 基于雷达图像的运动目标形态检测及跟踪技术[J]. 山东大学学报(工学版), 2010, 40(3): 31 -36 .
[9] 孙国华,吴耀华,黎伟. 消费税控制策略对供应链系统绩效的影响[J]. 山东大学学报(工学版), 2009, 39(1): 63 -68 .
[10] 孙炜伟,王玉振. 考虑饱和的发电机单机无穷大系统有限增益镇定[J]. 山东大学学报(工学版), 2009, 39(1): 69 -76 .