您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报(工学版) ›› 2010, Vol. 40 ›› Issue (2): 1-10.

• 机器学习与数据挖掘 •    下一篇

知识保持的嵌入方法

张道强   

  1. 南京航空航天大学信息科学与技术学院,  江苏 南京210016
  • 收稿日期:2010-02-10 出版日期:2010-04-16 发布日期:2010-02-10

Knowledge preserving embedding

ZHANG Dao-qiang   

  1. Department of Computer Science and Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016,  China
  • Received:2010-02-10 Online:2010-04-16 Published:2010-02-10
  • About author:ZHANG Dao-qiang(1978-),male, born in Shandong, China, Ph.D., Professor, his research interests include machine learning, pattern recognition, data mining, and image processing. E-mail: dqzhang@nuaa.edu.cn
  • Supported by:

    This work was supported by the National Science Foundation of China (60875030)

摘要:

考虑了一种带有数据领域知识的降维问题。这里领域知识是指关于数据的一些额外监督信息,如类别标号以及比标号弱的样本间相似性和不相似性约束等。其中,约束可以从标号中产生,但反过来从约束中却得不到标号信息,因而约束比标号更一般。另外,在图像检索等实际应用中,约束比标号更容易获取。鉴于此,本文主要研究基于约束的降维问题。提出了一种有效利用约束进行降维的约束保持嵌入算法(constraint preserving embedding, COPE),将其纳入到图嵌入统一框架之中并指出与同类方法的关系。进一步,通过引入无标记样本提出了半监督COPE算法;提出核COPE以揭示数据中的非线性结构。最后,在人脸识别、图像检索及半监督聚类等一系列实验中的结果验证了算法的有效性。

关键词: 半监督降维, 成对约束, 领域知识

Abstract:

The problem of dimensionality reduction given some domain knowledge on the data is considered. Here the domain knowledge denotes additional supervision information other than the data, e.g. the class labels of data or more weakly, the pairwise similarity or dissimilarity constraints. The focus is on the latter because it is more general than the former. Given class labels of data, corresponding pairwise similarity or dissimilarity constraints can be generated, but not vice versa. Also in real world application such as image retrieval, obtaining pairwise constraints is much easier than obtaining labels.A simple algorithm called constraint preserving embedding (COPE) was presented, which can effectively use the pairwise constraints for better embedding. The algorithm is formulated under a unified spectral graph embedding framework and  the relationship between it and existing related methods is indicated. Moreover,  COPE  is extended to semisupervised and kernel cases, in order to include unlabeled data and capture the nonlinear relationships between data. The performance of the  proposed algorithms is evaluated through a series of experiments including face image recognition and retrieval and semisupervised clustering. Experimental results show that the algorithms are effective and promising in learning from pairwise constraints.

Key words: pairwise constraint;domain knowledge,  semi-supervised dimensionality reduction

[1] 丁彦,李永忠*. 基于PCA和半监督聚类的入侵检测算法研究[J]. 山东大学学报(工学版), 2012, 42(5): 41-46.
[2] 张友新,王立宏. 两阶段近邻传播半监督聚类算法[J]. 山东大学学报(工学版), 2012, 42(2): 18-22.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!